1 / 17

Parsing & Language Acquisition: Parsing Child Language Data

Parsing & Language Acquisition: Parsing Child Language Data. CSMC 35100 Natural Language Processing February 7, 2006. Roadmap. Motivation Understanding language acquisition Child and child-directed Challenges Casual conversation; ungrammatical speech Robust parsing

mbarksdale
Download Presentation

Parsing & Language Acquisition: Parsing Child Language Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing & Language Acquisition:Parsing Child Language Data CSMC 35100 Natural Language Processing February 7, 2006

  2. Roadmap • Motivation • Understanding language acquisition • Child and child-directed • Challenges • Casual conversation; ungrammatical speech • Robust parsing • Dependency structure & dependency labels • Child language assessment • Contrast w/human & auto • Error analysis • Conclusions

  3. Motivation: Child-directed Speech • View child as learner • Child-directed speech • Known to differ significantly from adult-directed • E.g. acoustic-prosodic measures • Slower, increased duration, pause, pitch range, height • Children attend more carefully • Input to learner • Track vocabulary, morphology, phonology, syntax • Relate exposure to acquisition

  4. Motivation: Child Speech • Insight into language acquisition process • Assess linguistic development • Evaluate hypotheses for acquisition • E.g. Rizzi’s “truncated structure hypothesis” • Phonology, morphology, lexicon, syntax,etc

  5. Focus: Syntax • Prior research emphasizes: • Lexicon, phonology, morphology • (Relatively) Easy to extract from • Recordings, manual transcriptions • Syntactic development • Significant markers in linguistic development • Acquisition of inflection, agreement, aux-inversion,. • Rich source of information

  6. Challenges • Analyzing syntactic input & development • Requires parsing of child- and child-directed speech • Manual analysis prohibitively costly for lots • Resources for training: Treebanks • Newspaper, adult conversational speech • Current speech: • Child-directed: very conversational, ellipsis, • Vocatives, onomatopoeia • Child: fragments, possibly ungrammatical

  7. Resources • CHILDES • Corpus (100s MB) of child language data • Many languages • All manually transcribed • Marked for disfluency, repetition, retracing • Some manually morphologically analyzed, POS • Some audio/video available • Not syntactically analyzed • Specialized morphological analyzer, POS

  8. Examples • *CHI: more cookie. • %mor: qn|more n|cookie. • *MOT: how about another graham cracker? • %mor: adv:wh|how prep|about^adv|about det|another n|graham n|cracker?

  9. Parsing for Assessment • Syntactic analysis of child speech • Assign to particular developmental level • Based on presence, frequency of constructions • Measures: • MLU – Mean length of utterance • Reaches ceiling around age 3, uninformative • IPSyn – Explicitly measures syntactic structure • Scores 100 utts on 56 structures • NP,VP, questions, sentence structure • 0=absent; 1=found once; 2=found > once • Some identifiable from POS/morph and patterns • Others not: aux-inversion, conjunction, subord clauses, etc

  10. Syntactic Analysis Approach • Extract grammatical relations (GRs) • Analyze sentence to labeled dependencies • Aux, Neg, Det, Inf, (ECX)Subj, Objs, Preds, Mods • (CX)Jct, (X)Comp, etc • Decomposition: • Text processing: • Removes tagged disfluencies, reps, retracings • Morphological analysis, POS tagging (special) • Unlabeled dependency analysis • Dependency labeling

  11. Unlabeled Dependency Parsing • Identify unlabeled dependency • Parse with Charniak parser • Trained on Penn treebank • Convert to dependencies based on head table • Different domain: 90.1% vs 92% WSJ • Shorter! < 15 wds

  12. Dependency Labeling • Assign labels to dependency structure • Easier than finding dep structure itself • Labels more separable • 30 way classification: • Train on 5K words w/manual dependency labels • TiMBL, features include: • Head & dep words, POS; order, separation, label of subsumer • 91% accuracy • Parse+labels: ~87% • Some trivial: Det, INF: 98%; (X)Comp: ~60% • Competitive

  13. Automating Assessment • Prior work: Computerized Profiling (CP) • Exploits word/POS patterns • Limited for older children; more sophisticated • Generally used in semi-automatic approach • Syntactic analysis improves • Before, he told the man he was cold. • Before he told the story, he was cold • Same POS pattern, similar words; diff’t structure • GR with clausal type (e.g. COMP), dep left • Construct syntactically informed patterns

  14. Evaluation • Point difference: • Unsigned difference in scores (man vs auto) • Point-to-point accuracy: • Number of language structure decisions correct • Divided by number of decisions • Test data: • A) 20 trans: ages 2-3; manual; MLU 2.9 • B) 25 trans: ages 8-9; semi-auto CP; MLU 7.0

  15. Results • Contrast w/ human assessment; CP • Point difference: 3.3 total GR; 8.3 CP • CP worse on older children: 6.2 vs 10.2 • Less effective on more complex sentences • Point-to-point: 92% GR; ~85% CP • No pattern of miss vs false detect • GR automatic scoring: high agreement

  16. Error Analysis • 4 of 56 IPSyn structures -> 50% of errors • Propositional complements, rel. clauses, bitransitive predicates • Result of syntactic analysis errors • Esp. COMP – least accurate • Emphasis/Ellipsis • Bad search patterns • More reliable than POS/word, but still hand-crafted

  17. Conclusion • Automatic analysis of child language data • Syntax, beyond morphology & POS • Two-phrase dependency analysis • Unlabeled structure, followed by label assignment • Accurate even with out-of-domain training • Enables more nuanced assessment • Especially as learner syntax becomes complex

More Related