A Comparative Investigation of
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup

Download Presentation

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Thomas muller hinrich schutze and helmut schmid acl june 3 8 2012 reporter sitong yang

A Comparative Investigation of Morphological Language Modeling for theLanguages of the European Union

ICT

Thomas Muller, Hinrich Schutze and Helmut Schmid

ACL June 3-8, 2012

Reporter:Sitong Yang


Outline

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Outline1

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Introduction

Introduction

  • Motivation

  • Main idea


Motivation

large

dangerous

serious

large

dangerous

serious

hypothetically

potentially

(rare history)

(frequent history)

Motivation

how to transfer ?

Language model?

morphology


Main idea

main idea

  • goal

    • perplexity reduction(PD) for a large number of languages


Main idea1

main idea

  • goal

    • perplexity reduction(PD) for a large number of languages

  • Feature

    • Morphologigy

    • Shape Feature


Main idea2

main idea

  • goal

    • perplexity reduction(PD) for a large number of languages

  • Feature

    • Morphologigy

    • Shape Feature

  • parameters

    • frequency threshold θ

    • number of suffixes uesd φ

    • morphological segmentation algorithms


Outline2

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Modeling of morphology and shape

Modeling of morphology and shape

  • Morphology

  • Shape features

  • Similarity measure


Morphology

Morphology

  • Automatic suffix identification algorithms:

    Reports , Morfessor and Frequency

  • Parameter:φ most frequent suffixes


S hape features

Shape features

  • capitalization

  • special characters

  • word length


Similarity measure

similarity measure

  • similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).


Outline3

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Experimental setup

Experimental Setup

  • Baseline

  • Morphological class language model

  • Distributional class language model

  • Corpus


Experimental setup1

Experimental Setup

  • Experiments:

    • srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters

  • Baseline

    • modified KN model


Morphological class language model

Morphological class language model

Class-based language model:

Word emission probobility:


Morphological class language model1

Morphological class language model

Final model PM interpolates PC with a modified KN model:

Unknow word estimation:


Morphological class language model2

Morphological class language model

modified class model PC'


Distributional class language model

Distributional class language model

  • PD is same form PM

  • The difference is the classes are mophological for PM and distributional for PD

  • Whole-context distributional vector space model


Corpus

Corpus

  • training set(80%)

  • validation set(10%)

  • test set(10%)


Outline4

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Results and discussion

Results and Discussion

  • Morphological model vs. Distributional model

  • Sensitivity analysis of parameters


Morphological model vs distributional model

Morphological model vs. Distributional model

  • MM:more morphological, more perplexity reduction,largerφ.

  • MM:Result considerable perplexity reduc-tions 3%-11%

  • Frequency is surprisingly well

  • Noly 4 cases DM better than MM

  • DM restriction clustering to less frequent words


Morphological model vs distributional model1

Morphological model vs. Distributional model


Sensitivity analysis of parameters

Sensitivity analysis of parameters

  • best and worst values of eachparameter and the difference in perplexity improve-ment between the two.

  • θ

    • strong influence on PD

    • positive correlated with morphological complexity

  • φ and segmentation algorithms

    • negligible effect

    • frequency is perform best.


Sensitivity analysis of parameters1

Sensitivity analysis of parameters


Outline5

Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Conclusion

Conclusion

  • Feature:morphology shape feature

  • Result:perplexity reduc-tions 3%-11%

  • parameters:

    • θ:considerable influence

    • φ and segmentation algorithms: small effect


Future work

Future Work

  • A model that interpolates KN, morphological class model and distributional class model.


My thought

my thought

  • Minority language model


Thomas muller hinrich schutze and helmut schmid acl june 3 8 2012 reporter sitong yang

ICT

Q&A?


Thomas muller hinrich schutze and helmut schmid acl june 3 8 2012 reporter sitong yang

ICT

Thank you!


  • Login