1 / 62

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing. Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation University of Amsterdam. Outline of the lecture. Introduction Disambiguation Data Oriented Parsing

mio
Download Presentation

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima’an Institute for Logic, Language and Computation University of Amsterdam

  2. Outline of the lecture • Introduction • Disambiguation • Data Oriented Parsing • DOP1 computational aspects and experiments • Memory Based Learning framework • Conclusions

  3. Introduction • Human language cognition: • Analogy-based processes on a store of past experiences • Modern linguistics • Set of rules • Language processing algorithms • Performance model of human language processing • Competence grammar as broad framework to performance models. • Memory / Analogy - based language processing

  4. The Problem of Ambiguity Resolution • Every input string has unmanageable large number of analyses • Uncertain input – generate guesses and choose one • Syntactic disambiguation might be a side effect of semantic one

  5. The Problem of Ambiguity Resolution • Frequency of occurrence of lexical item and syntactic structures: • People register frequencies • People prefer analyses they already experienced than constructing a new ones • More frequent analyses are preferred to less frequent ones

  6. From Probabilistic Competence-Grammars to Data-Oriented Parsing • Probabilistic information derived from past experience • Characterization of the possible sentence-analyses of the language • Stochastic Grammar • Define : all sentences, all analyses. • Assign : probability for each • Achieve : preference that people display when they choose sentence or analyses.

  7. Stochastic Grammar • These predictions are limited • Platitudes and conventional phrases • Allow redundancy • Use Tree Substitution Grammar

  8. Stochastic Tree Substitution Grammar • Set of elementary trees • Tree rewrite process • Redundant model • Statistically relevant phrases • Memory based processing model

  9. Memory based processing model • Data oriented parsing approach: • Corpus of utterances – past experience • STSG to analyze new input • In order to describe a specific DOP model • A formalism for representing utterance-analyses • An extraction function • Combination operations • A probability model

  10. A Simple Data Oriented Parsing Model: DOP1 • Our corpus: DOP1 - Imaginary corpus of two trees • Possible sub trees: • t consists of more than one node • t is connected • except for the leaf nodes of t, each node in t has the same daughter-nodes as the corresponding node in T • Stochastic Tree Substitution Grammar – set of sub trees • Generation process – composition: • A B – B is substituted on the leftmost non terminal leaf node of A

  11. Example of sub trees

  12. DOP1 - Imaginary corpus of two trees

  13. Derivation and parse #1 She saw the dress with the telescope.

  14. Derivation and parse #2 She saw the dress with the telescope.

  15. Probability Computations: • Probability of substituting a sub tree t on a specific node • Probability of Derivation • Probability of Parse Tree

  16. Computational Aspects of DOP1 • Parsing • Disambiguation • Most Probable Derivation • Most Probable Parse • Optimizations

  17. Parsing • Chart-like parse forest • Derivation forest • Elementary tree t as a context-free rule: root(t)—> yield(t) • Label phrase with it’s syntactic category and its full elementary tree

  18. Elementary trees of an example STSG 0 1 2 3 4 abcd

  19. Derivation forest for the string abcd

  20. Derivations and parse trees for the string abcd

  21. Derivations and parse trees for the string abcd

  22. Disambiguation • Derivation forest define all derivation and parses • Most likely parse must be chosen • MPP in DOP1 • MPP vs. MPD

  23. Most Probable Derivation • Viterbi algorithm: • Eliminate low probability sub derivations using bottom-up fashion • Select the most probable sub derivation at each chart entry, eliminate other sub derivation of that root node.

  24. Viterbi algorithm • Two derivations for abc • d1 > d2 : eliminate the right derivation

  25. Algorithm 1 – Computing the probability of most probable derivation • Input : STSG , S , R , P • Elementary trees in R are in CNF • A—>tH : tree t, root A, sequence of labels H. • <A, i, j> - non terminal A in chart entry (i,j) after parsing the input W1,...,Wn . • PPMPD– probability of MPD of input string W1,...,Wn.

  26. Algorithm 1 – Computing the probability of most probable derivation

  27. The Most Probable Parse • Computing MPP in STSG is NP hard • Monte Carlo method • Sample derivations • Observe frequent parse tree • Estimate parse tree probability • Random – first search • The algorithm • Law of Large Numbers

  28. Algorithm 2: Sampling a random derivation • for length := 1 to n do • for start := 0 to n - length do • for each root node X chart-entry (start, start + length) do: 1. select at random a tree from the distribution of elementary trees with root node X 2. eliminate the other elementary trees with root node X from this chart-entry

  29. Results of Algorithm 2 • Random derivation for the whole sentence • First guess for MPP • Compute the size of the sampling set • Probability of error • Upper bound • 0 index of MPP,i index of parse i, N derivation • No unique MPP – ambiguity

  30. Reminder

  31. Conclusions – lower bound for N • Lower bound for N: • Pi is probability of parse i • B - Estimated probability by frequencies in N • Var(B) = Pi*(1-Pi)/N • 0 < Pi^2 <= 1 -> Var(B) <= 1/(4*N) • s = sqrt(Var(B)) -> S <= 1/(2*sqrt(N)) • 1/(4*s^2) <= N • 100 <= N -> s <= 0.05

  32. Algorithm 3: Estimating the parse probabilities • Given a derivation forest of a sentence and a threshold sm for the standard error: • N := the smallest integer larger than 1/(4 sm2) • repeat N times: • sample a random derivation from the derivation forest • store the parse generated by this derivation • for each parse i: • estimate the conditional probability given the sentence by pi := #(i) / N

  33. Complexity of Algorithm 3 • Assumes value of max allowed standard error • Samples number of derivations which is guaranteed to achieve the error • Number of needed samples is quadratic in chosen error

  34. Optimizations • Sima’an : MPD in linear time in STSG size • Bod : MPP on small random corpus of sub trees • Sekine and Grishman : use only sub trees rooted with S or NP • Goodman : different polynomial time

  35. Experimental Properties of DOP1 • Experiments on the ATIS corpus • MPP vs. MPD • Impact of fragment size • Impact of fragment lexicalization • Impact of fragment frequency • Experiments on SRI-ATIS and OVIS • Impact of sub tree depth

  36. Experiments on ATIS corpus • ATIS = Air Travel Information System • 750 annotated sentence analyses • Annotated by Penn Treebank • Purpose: compare accuracy obtained in undiluted DOP1 with the one obtained in restricted STSG

  37. Experiments on ATIS corpus • Divide into training and test sets • 90% = 675 in training set • 10% = 75 in test set • Convert training set into fragments and enrich with probabilities • Test set sentences parsed with sub trees from the training set • MPP was estimated from 100 sampled derivations • Parse accuracy = % of MPP that are identical to test set parses

  38. Results • On 10 random training / test splits of ATIS: • Average parse accuracy = 84.2% • Standard deviation = 2.9 %

  39. Impact of overlapping fragments MPP vs. MPD • Can MPD achieve parse accuracies similar to MPP • Can MPD do better than MPP • Overlapping fragments • Accuracies generated by MPD on test set • The result is 69% • Comparing to accuracy achieved with MPP on test set : 69% vs. 85% • Conclusion: overlapping fragments play important role in predicting the appropriate analysis of a sentence

  40. The impact of fragment size • Large fragments capture more lexical/syntactic dependencies than small ones. • The experiment: • Use DOP1 with restricted maximum depth • Max depth 1 -> DOP1 = SCFG • Compute the accuracies both for MPD and MPP for each max depth

  41. Impact of fragment size

  42. Impact of fragment lexicalization • Lexicalized fragment • More words -> more lexical dependencies • Experiment: • Different version of DOP1 • Restrict max number of words per fragment • Check accuracy for MPP and MPD

  43. Impact of fragment lexicalization

  44. Impact of fragment frequency • Frequent fragments contribute more • large fragments are less frequent than small ones but might contribute more • Experiment: • Restrict frequency to min number of occurrences • Not other restrictions • Check accuracy for MPP

  45. Impact of fragment frequency

  46. Experiments on SRI-ATIS and OVIS • Employ MPD because the corpus is bigger • Tests performed on DOP1 and SDOP • Use set of heuristic criteria for selecting the fragments: • Constraints of the form of sub trees • d - upper bound on depth • n – number of substitution sites • l – number of terminals • L – number of consecutive terminals • Apply constraints on all sub trees besides those with depth 1

  47. Experiments on SRI-ATIS and OVIS • d4 n2 l7 L3 • DOP(i) • Evaluation metrics: • Recognized • Tree Language Coverage – TLC • Exact match • Labeled bracketing recall and precision

  48. Experiments on SRI-ATIS • 13335 annotated syntactically utterances • Annotation scheme originated from Core Language Engine system • Fixed parameters except sub tree bound: • n2 l4 L3 • Training set – 12335 trees • Test set – 1000 trees • Experiment: • Train and test on different depths upper bounds (takes more than 10 days for DOP(4) !!! )

  49. Impact of sub tree depth SRI-ATIS

  50. Experiments on OVIS corpus • 10000 syntactically and semantically annotated trees • Both annotations treated as one • More non terminal symbols • Utterances are answers to questions in dialog -> short utterances (avg. 3.43) • Sima’an results – sentences with at least 2 words, avg. 4.57 • n2 l7 L3

More Related