A Comparative Investigation of
Download
1 / 33

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on
  • Presentation posted in: General

A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union. ICT. Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang. Outline. Introduction Modeling of morphology and shape Experimental Setup

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Thomas Muller, Hinrich Schutze and Helmut Schmid ACL June 3-8, 2012 Reporter:Sitong Yang

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A Comparative Investigation of Morphological Language Modeling for theLanguages of the European Union

ICT

Thomas Muller, Hinrich Schutze and Helmut Schmid

ACL June 3-8, 2012

Reporter:Sitong Yang


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Introduction

  • Motivation

  • Main idea


large

dangerous

serious

large

dangerous

serious

hypothetically

potentially

(rare history)

(frequent history)

Motivation

how to transfer ?

Language model?

morphology


main idea

  • goal

    • perplexity reduction(PD) for a large number of languages


main idea

  • goal

    • perplexity reduction(PD) for a large number of languages

  • Feature

    • Morphologigy

    • Shape Feature


main idea

  • goal

    • perplexity reduction(PD) for a large number of languages

  • Feature

    • Morphologigy

    • Shape Feature

  • parameters

    • frequency threshold θ

    • number of suffixes uesd φ

    • morphological segmentation algorithms


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Modeling of morphology and shape

  • Morphology

  • Shape features

  • Similarity measure


Morphology

  • Automatic suffix identification algorithms:

    Reports , Morfessor and Frequency

  • Parameter:φ most frequent suffixes


Shape features

  • capitalization

  • special characters

  • word length


similarity measure

  • similarity measure and details of the shape features in prior work (M¨ uller and Sch¨ utze, 2011).


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Experimental Setup

  • Baseline

  • Morphological class language model

  • Distributional class language model

  • Corpus


Experimental Setup

  • Experiments:

    • srilm, kneser-Ney(KN), generic class implementation, optimal interpolation parameters

  • Baseline

    • modified KN model


Morphological class language model

Class-based language model:

Word emission probobility:


Morphological class language model

Final model PM interpolates PC with a modified KN model:

Unknow word estimation:


Morphological class language model

modified class model PC'


Distributional class language model

  • PD is same form PM

  • The difference is the classes are mophological for PM and distributional for PD

  • Whole-context distributional vector space model


Corpus

  • training set(80%)

  • validation set(10%)

  • test set(10%)


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Results and Discussion

  • Morphological model vs. Distributional model

  • Sensitivity analysis of parameters


Morphological model vs. Distributional model

  • MM:more morphological, more perplexity reduction,largerφ.

  • MM:Result considerable perplexity reduc-tions 3%-11%

  • Frequency is surprisingly well

  • Noly 4 cases DM better than MM

  • DM restriction clustering to less frequent words


Morphological model vs. Distributional model


Sensitivity analysis of parameters

  • best and worst values of eachparameter and the difference in perplexity improve-ment between the two.

  • θ

    • strong influence on PD

    • positive correlated with morphological complexity

  • φ and segmentation algorithms

    • negligible effect

    • frequency is perform best.


Sensitivity analysis of parameters


Outline

Introduction

Modeling of morphology and shape

Experimental Setup

Results and Discussion

Conclusion


Conclusion

  • Feature:morphology shape feature

  • Result:perplexity reduc-tions 3%-11%

  • parameters:

    • θ:considerable influence

    • φ and segmentation algorithms: small effect


Future Work

  • A model that interpolates KN, morphological class model and distributional class model.


my thought

  • Minority language model


ICT

Q&A?


ICT

Thank you!


ad
  • Login