slide1
Download
Skip this Video
Download Presentation
Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

Loading in 2 Seconds...

play fullscreen
1 / 33

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang' - brone


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

A Comparative Investigation of Morphological Language Modeling for theLanguages of the European Union

ICT

Thomas Muller, Hinrich Schutze and Helmut Schmid

ACL June 3-8, 2012

Reporter:Sitong Yang

outline
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

outline1
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

introduction
Introduction
  • Motivation
  • Main idea
motivation
large

dangerous

serious

large

dangerous

serious

hypothetically

potentially

(rare history)

(frequent history)

Motivation

how to transfer ?

Language model?

morphology

main idea
main idea
  • goal
    • perplexity reduction(PD) for a large number of languages
main idea1
main idea
  • goal
    • perplexity reduction(PD) for a large number of languages
  • Feature
    • Morphologigy
    • Shape Feature
main idea2
main idea
  • goal
    • perplexity reduction(PD) for a large number of languages
  • Feature
    • Morphologigy
    • Shape Feature
  • parameters
    • frequency threshold θ
    • number of suffixes uesd φ
    • morphological segmentation algorithms
outline2
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

modeling of morphology and shape
Modeling of morphology and shape
  • Morphology
  • Shape features
  • Similarity measure
morphology
Morphology
  • Automatic suffix identification algorithms:

Reports , Morfessor and Frequency

  • Parameter:φ most frequent suffixes
s hape features
Shape features
  • capitalization
  • special characters
  • word length
similarity measure
similarity measure
  • similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).
outline3
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

experimental setup
Experimental Setup
  • Baseline
  • Morphological class language model
  • Distributional class language model
  • Corpus
experimental setup1
Experimental Setup
  • Experiments:
    • srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters
  • Baseline
    • modified KN model
morphological class language model
Morphological class language model

Class-based language model:

Word emission probobility:

morphological class language model1
Morphological class language model

Final model PM interpolates PC with a modified KN model:

Unknow word estimation:

morphological class language model2
Morphological class language model

modified class model PC'

distributional class language model
Distributional class language model
  • PD is same form PM
  • The difference is the classes are mophological for PM and distributional for PD
  • Whole-context distributional vector space model
corpus
Corpus
  • training set(80%)
  • validation set(10%)
  • test set(10%)
outline4
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

results and discussion
Results and Discussion
  • Morphological model vs. Distributional model
  • Sensitivity analysis of parameters
morphological model vs distributional model
Morphological model vs. Distributional model
  • MM:more morphological, more perplexity reduction,largerφ.
  • MM:Result considerable perplexity reduc-tions 3%-11%
  • Frequency is surprisingly well
  • Noly 4 cases DM better than MM
  • DM restriction clustering to less frequent words
sensitivity analysis of parameters
Sensitivity analysis of parameters
  • best and worst values of eachparameter and the difference in perplexity improve-ment between the two.
  • θ
    • strong influence on PD
    • positive correlated with morphological complexity
  • φ and segmentation algorithms
    • negligible effect
    • frequency is perform best.
outline5
Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion

conclusion
Conclusion
  • Feature:morphology shape feature
  • Result:perplexity reduc-tions 3%-11%
  • parameters:
    • θ:considerable influence
    • φ and segmentation algorithms: small effect
future work
Future Work
  • A model that interpolates KN, morphological class model and distributional class model.
my thought
my thought
  • Minority language model
slide32
ICT

Q&A?

slide33
ICT

Thank you!

ad