japanese dependency structure analysis based on maximum entropy models n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Japanese Dependency Structure Analysis Based on Maximum Entropy Models PowerPoint Presentation
Download Presentation
Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Loading in 2 Seconds...

play fullscreen
1 / 24
hieu

Japanese Dependency Structure Analysis Based on Maximum Entropy Models - PowerPoint PPT Presentation

93 Views
Download Presentation
Japanese Dependency Structure Analysis Based on Maximum Entropy Models
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University

  2. Outline • Background • Probability model for estimating dependency likelihood • Experiments and discussion • Conclusion

  3. dependency 太郎は 赤い 赤 い 太郎 は バラ を 買い ました。 バラを Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought 買いました。 bunsetsu Background • Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. • Preparing a dependency matrix • Finding an optimal set of dependencies for the entire sentence

  4. Background (2) • Approaches to preparing a dependency matrix • Rule-based approach • Several problems with handcrafted rules • Coverage and consistency • The rules have to be changed according to the target domain. • Corpus-based approach

  5. Background (3) • Corpus-based approach • Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) • Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) • Maximum Entropy model • learns the weights of given features from a training corpus

  6. Probability model :bunsetsu dependency  or • Assigning one of two tags • Whether or not there is a dependency between two bunsetsus • Probabilities of dependencies are estimated from the M. E. model. • Overall dependencies in a sentence • Product of probabilities of all dependencies • Assumption: Dependencies are independent of each other.

  7. M. E. model

  8. Feature sets • Basic features (expanded from Haruno’s list (Haruno, 1998)) • Attributes on a bunsetsu itself • Character strings, parts of speech, and inflection types of bunsetsu • Attributes between bunsetsus • Existence of punctuation, and the distance between bunsetsus • Combined features

  9. a b c d Feature sets dependency • Basic features: a, b, c, d, e • Combined features • Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) 赤 い 太郎 は バラ を 買い ました。 Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

  10. Algorithm • Detect the dependencies in a sentence by analyzing it backwards (from right to left). • Characteristics of Japanese dependencies • Dependencies are directed from left to right • Dependencies do not cross • A bunsetsu, except for the rightmost one, depends on only one bunsetsu • In many cases, the left context is not necessary to determine a dependency • Beam search

  11. Experiments • Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) • a tagged corpus of the Mainichi newspaper • Training: 7,958 sentences (Jan. 1st to 8th) • Testing: 1,246 sentences (Jan. 9th) • The input sentences were morphologically analyzed and their bunsetsus were identified correctly.

  12. Results of dependency analysis • When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.

  13. 0.8714 Relationship between the number of bunsetsus and accuracy • The accuracy does not significantly degrade with increasing sentence length.

  14. a b c d Features and accuracy • Experiments without the feature sets • Useful basic features • Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) • Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) • preferential rules with respect to the features Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

  15. Features and accuracy • Experiments without the feature sets • Combined features are useful (-18.31%). • Basic features are related to each other.

  16. Lexical features and accuracy • Experiment with the lexical features of the head word • Better accuracy than that without them (-0.84%) • Many idiomatic expressions • They had high dependency probabilities. • “応じて(oujite, according to)---決める(kimeru, decide)” • “形で(katachi_de, in the form of) ---行われる(okonawareru, be held)” • More training data • Expect to collect more of such expressions

  17. Number of training data and accuracy • Accuracy of 81.84% even with 250 sentences • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

  18. Comparison with related works

  19. Comparison with related works (2) • Combining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) • Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. • Accuracy achieved by our model was about 3% higher than that of Shirai’s model. • Using a much smaller set of training data.

  20. Comparison with related works (3) • M. E. model (Ehara, 1998) • Set of similar kinds of features to ours • Only the combination of two features • Using TV news articles for training and testing • Average sentence length = 17.8 bunsetsus • cf. 10 in the Kyoto University corpus • Difference in the combined features • We also use triplet, quadruplet, and quintuplet features (+5.86%). • Accuracy of our system was about 10% higher than Ehara’s system.

  21. Comparison with related works (4) • Maximum Likelihood model (Fujio, 1998) • Decision tree models and a boosting method (Haruno, 1998) • Set of similar kinds of features to ours • Using the EDR corpus for training and testing • EDR corpus is ten times as large as our corpus. • Accuracy was around 85%, which is slightly worse than ours.

  22. Comparison with related works (5) • Experiments with Fujio’s and Haruno’s feature sets • The important factor in the statistical approaches is feature selection.

  23. Future work • Feature selection • Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) • Considering new features • How to deal with coordinate structures • Taking into account a wide range of information

  24. Conclusion • Japanese dependency structure analysis based on the M. E. model. • Dependency accuracy of our system • 87.2% using the Kyoto University corpus • Experiments without feature sets • Some basic and combined features strongly contribute to improve the accuracy. • Number of training data and accuracy • Good accuracy even with a small set of training data • M. E. framework has suitable characteristics for overcoming the data sparseness problem.