automatic measurement of syntactic development in child language l.
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Measurement of Syntactic Development in Child Language PowerPoint Presentation
Download Presentation
Automatic Measurement of Syntactic Development in Child Language

Loading in 2 Seconds...

play fullscreen
1 / 35

Automatic Measurement of Syntactic Development in Child Language - PowerPoint PPT Presentation

  • Uploaded on

Automatic Measurement of Syntactic Development in Child Language. Kenji Sagae Language Technologies Institute Student Research Symposium September 2005 Joint work with Alon Lavie and Brian MacWhinney. Using Natural Language Processing in Child Language Research .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Automatic Measurement of Syntactic Development in Child Language' - niveditha

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic measurement of syntactic development in child language

Automatic Measurement of Syntactic Development in Child Language

Kenji Sagae

Language Technologies Institute

Student Research Symposium

September 2005

Joint work with

Alon Lavie and Brian MacWhinney

using natural language processing in child language research
Using Natural Language Processing in Child Language Research
  • CHILDES Database (MacWhinney, 2000)
    • Several megabytes of child-parent dialog transcripts
    • Part-of-speech and morphology analysis
      • Tools available
    • Recently proposed syntactic annotation scheme (Sagae et al., 2004)
      • Grammatical Relations (GRs)
      • POS analysis not enough for many research questions
      • Very small amount of annotated data
  • Parsing
    • Can we use current NLP tools to analyze CHILDES GRs?
    • Allows, for example, automatic measurement of syntactic development
  • The CHILDES GR annotation scheme
  • Automatic GR analysis
  • Measurement of Syntactic Development
childes gr scheme sagae et al 2004
CHILDES GR Scheme(Sagae et al., 2004)
  • Addresses needs of child language researchers
  • Grammatical Relations (GRs)
    • Subject, object, adjunct, etc.
    • Labeled dependencies

Dependency Label



automatic syntactic gr analysis
Automatic Syntactic (GR) Analysis
  • Input: a sentence
  • Output: dependency structure (GRs)
  • Three steps
    • Text preprocessing
    • Unlabeled dependency identification
    • Dependency labeling
step 1 text preprocessing prepares utterances for parsing
STEP 1: Text Preprocessing Prepares Utterances for Parsing
  • CHAT transcription system
    • Explicitly marks certain extra-grammatical material: disfluency, retracing and repetitions
  • CLAN tools (MacWhinney, 2000)
    • Remove extra-grammatical material
    • Provide POS and Morphological analyses
  • CHAT and CLAN tools are publicly available

step 2 unlabeled dependency identification
Step 2: Unlabeled Dependency Identification
  • Why?
    • Large training corpus: Penn Treebank (Marcus et al., 1993)
      • Head-table converts constituents into dependencies
  • Use an existing parser (trained on the Penn Treebank)
    • Charniak (2000)
      • Convert output to dependencies
    • Alternatively, a dependency parser
      • For example: MALT parser (Nivre and Scholz, 2004), Yamada and Matsumoto (2003)
unlabeled dependency identification
Unlabeled Dependency Identification

We eat the cheese sandwich




domain issues
Domain Issues
  • Parser training data is in a very different domain
    • WSJ vs Parent-child dialogs
  • Domain specific training data would be better
    • But would have to be created (manually)
  • Performance is acceptable
    • Shorter, simpler sentences
    • Unlabeled dependency accuracy
      • WSJ test data: 92%
      • CHILDES data (2,000 words): 90%
final step dependency labeling
Final Step: Dependency Labeling
  • Training data is required
  • Labeling dependencies is easier than finding unlabeled dependencies
    • Less training data is needed for labeling than for full labeled dependency parsing
  • Use a classifier
    • TiMBL (Daelemans et al., 2004)
    • Extract features from unlabeled dependency structure
    • GR labels are target classes
features used for gr labeling
Features Used for GR Labeling
  • Head and dependent words
    • Also their POS tags
  • Whether the dependent comes before or after the head
  • How far the dependent is from the head
  • The label of the lowest node in the constituent tree that includes both the head and dependent
features used for gr labeling14
Features Used for GR Labeling

Consider the words “we” and “eat”

Features: we, pro, eat, v, before, 1, S

Class: SUBJ

good gr labeling results with small training set
Good GR Labeling Results with Small Training Set
  • 5,000 words for training
  • 2,000 words for testing
  • Accuracy of dependency labeling (on perfect dependencies): 91.4%
  • Overall accuracy (Charniak parser + dependency labeling): 86.9%
some grs are easier than others
Some GRs Are Easier Than Others
  • Overall accuracy: 86.9%
  • Easily identifiable GRs
    • DET, POBJ, INF, NEG: Precision and recall above 98%
  • Difficult GRs
    • COMP, XCOMP: below 65%
    • Less than 4% of the GRs seen in training and test sets.
index of productive syntax ipsyn scarborough 1990
Index of Productive Syntax (IPSyn)(Scarborough, 1990)
  • A measure of child language development
  • Assigns a numerical score for grammatical complexity

(from 0 to 112 points)

  • Used in hundreds of studies
ipsyn measures syntactic development
IPSyn Measures Syntactic Development
  • IPSyn: Designed for investigating differences in language acquisition
    • Differences in groups (for example: bilingual children)
    • Individual differences (for example: delayed language development)
    • Focus on syntax
  • Addresses weaknesses of Mean Length of Utterance (MLU)
    • MLU surprisingly useful until age 3, then reaches ceiling (or becomes unreliable)
  • IPSyn is very time-consuming to compute
computing ipsyn manually
Computing IPSyn (manually)
  • Corpus of 100 transcribed utterances
    • Consecutive, no repetitions
  • Identify 56 specific language structures (IPSyn Items)
    • Examples:
      • Presence of auxiliaries or modals
      • Inverted auxiliary in a wh-question
      • Conjoined clauses
      • Fronted or center-embedded subordinate clauses
    • Count occurrences (zero, one, two or more)
  • Add counts
automating ipsyn
Automating IPSyn
  • Existing state of manual computation
    • Spreadsheets
    • Search each sentence for language structures
    • Use part-of-speech tagging to narrow down the number of sentences for certain structures
      • For example: Verb + Noun, Determiner + Adjective + Noun
  • Can’t we just use part-of-speech tagging?
    • Only one other automated implementation of IPSyn exists, and it uses only words and POS tags
automating ipsyn without syntactic analysis
Automating IPSyn without Syntactic Analysis
  • Use patterns of words and parts-of-speech to find language structures
    • Computerized Profiling, or CP (Long, Fey and Channell, 2004)
    • Works well for many IPSyn items
      • Det + Adjective + Noun sequence
    • But does not work very well for several important items
      • Fronted or center-embedded subordinate clauses
      • Inverted auxiliary in a wh-question
    • Cuts down manual work significantly (good)
    • Fully automatic IPSyn scores only somewhat accurate (not so good)
some ipsyn items require syntactic analysis for reliable recognition and some don t
Some IPSyn Items Require Syntactic Analysis for Reliable Recognition(and some don’t)
  • Determiner + Adjective + Noun
  • Auxiliary verb
  • Adverb modifying adjective or nominal
  • Subject + Verb + Object
  • Sentence with 3 clauses
  • Conjoined sentences
  • Wh-question with inverted auxiliary/modal/copula
  • Relative clauses
  • Propositional complements
  • Fronted subordinate clauses
  • Center-embedded clauses
automating ipsyn with grammatical relation analyses
Automating IPSyn with Grammatical Relation Analyses
  • Search for language structures using patterns that involve POS tags and GRs (labeled dependencies)
    • Still room for under- and over-generalization, but patterns are easier to write and more reliable
  • Examples
    • Wh-embedded clauses: search for wh-words whose head (or transitive head) is a dependent in a GR of types [XC]SUBJ, [XC]PRED, [XC]JCT, [XC]MOD, COMP or XCOMP
    • Relative clauses: search for a CMOD where the dependent is to the right of the head
evaluation data
Evaluation Data
  • Two sets of transcripts with IPSyn scoring from two different child language research groups
  • Set A
    • Scored fully manually
    • 20 transcripts
    • Ages: about 3 yrs.
  • Set B
    • Scored with CP first, then manually corrected
    • 25 transcripts
    • Ages: about 8 yrs.

(Two transcripts in each set were held out for development and debugging)

evaluation metrics point difference
Evaluation Metrics: Point Difference
  • Point difference
    • The absolute point difference between the scores provided by our system, and the scores computed manually
    • Simple, and shows how close the automatic scores are to the manual scores
    • Acceptable range
      • Smaller for older children
evaluation metrics point to point accuracy
Evaluation Metrics:Point-to-Point Accuracy
  • Point-to-point accuracy
    • Reflects overall reliability over each scoring decision made in the computation of IPSyn scores
    • Scoring decisions: presence or absence of language structures in the transcript

Point-to-Point Acc = C(Correct Decisions)

C(Total Decisions)

    • Commonly used for assessing inter-rater reliability among human scorers (for IPSyn, about 94%).
  • IPSyn scores from
    • Our GR-based system (GR)
    • Manual scoring (HUMAN)
    • Computerized Profiling (CP)
error analysis four problematic items cause half of error
Error Analysis: Four Problematic Items Cause Half of Error
  • Four (of 56) IPSyn items account for about half of all mistakes made by our GR-based system
  • Propositional complement: 16.9%

“I said you can go now”

(b) Copula/Modal/Aux for emphasis or ellipsis: 12.3%

“I thought he ate his cake, but he didn’t.”

(c) Relative clause: 10.6%

“This is the car I saw.”

(d) Bitransitive predicate: 5.8%

“I gave her the book.”

(a), (c), (d): Incorrect GR analysis

(b): Imperfect search pattern

conclusion and future work
Conclusion and Future Work
  • We can annotate transcripts of child language with Grammatical Relations using current NLP tools and a small amount of manually annotated data
  • The reliability of an automated version of IPSyn that uses CHILDES GRs is close to that of human scoring
  • GR analysis still needs work
    • More training data
    • Other parsing techniques
  • Use of GR-based IPSyn by child language researchers should reveal additional problem areas

Charniak, E. 2000. A maximum-entropy-inspired parser. Proceedings of the First Annual Meeting of the North American Chapter of the Association for Computational Linguistics. Seattle, WA.

Daelemans, W., Zavrel, J., van der Sloot, K., and van den Bosch. 2004. TiMBL: Tilburg Memory Based Learner, version 5.1, Reference Guide. ILK Research Group Technical Report Series, no. 04-02, 2004.

Long, S. H., Fey, M. E., Channell, R. W. 2004. Computerized Profiling (version 9.6.0). Cleveland, OH: Case Western Reserve University.

MacWhinney, B. 2000. The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum Associates.

Marcus, M. P., Santorini, B., Marcinkiewics, M. A. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19.

Nivre, J., Scholz, M. 2004. Deterministic parsing of English text. Proceedings of the International Conference on Computational Linguistics (pp. 64-70). Geneva, Switzerland.

Sagae, K., MacWhinney, B., Lavie, A. 2004. Adding syntactic annotations to transcripts of parent-child dialogs. Proceedings of the Fourth International Conference on Language Resources and Evaluation. Lisbon, Portugal.

Scarborough, H. S. 1990. Index of Productive Syntax. Applied Psycholinguistics, 11, 1-22.

where pos tagging is not enough
Where POS Tagging is not enough
  • Sentences with same POS sequence may have different structure
  • Before [,] he told the man he was cold.
  • Before he told the story [,] he was cold.
  • Some syntactic structures are difficult to recognize using only POS tags and words
    • Search patterns may under- and over-generate
    • Using syntactic analysis is easier and more reliable