1 / 25

The PDT Morphology and Surface Syntax

The PDT Morphology and Surface Syntax. Jan Haji č Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic. Morphology (m-layer). Prerequisites for the manual annotation process: Tokenized data

acton
Download Presentation

The PDT Morphology and Surface Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The PDTMorphology and Surface Syntax Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  2. Morphology (m-layer) • Prerequisites for the manual annotation process: • Tokenized data • Annotation guidelines • Annotation tool • Manual decision making support • Offline (or online) morphological analyzer • Quality checking tool • Process description • Results (manually annotated data) to be used for... • tagger training, linguistic research, basis for further annotation, ... Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  3. Morphological Attributes Ex.: nejnezajímavějším “(to) the most uninteresting” • Tag: 13 categories • Example: AAFP3----3N---- Adjective no poss. Gendernegated Regular no poss. Numberno voice Feminine no personreserve1 Pluralno tensereserve2 Dative superlativebase var. • Lemma: POS-unique identifier Books/verb -> book-1, went -> go, to/prep. -> to-1 Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  4. Morphological Tagset • 13 categories, 4452 plausible tags (combinations): Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  5. Morphological Analysis • Formally: MA: A+→ Pow(L x T) • MA(f) = { [ l,t ] }; • f  A+ (the token), • l  L (lemma), • t  T (tag) • tokens taken in isolation • no attempt to solve e.g. auxiliaries vs. full verbs • Ex.: MA(“má“) = { [mít,VB-S---3P-AA---],lit. “to have” lit. “has”,”my”[můj,PSFS1-S1------1], lit. “my” [můj,PSFS5-S1------1], [můj,PSNP1-S1------1], [můj,PSNP4-S1------1], [můj,PSNP5-S1------1] } Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  6. Morphological Analysis:Implementation • Dictionary-based • covers 800kW (lemmas), ~ 20 mil. forms (w/tag) • C code implementation • standard (regular) derivations on-the-fly; ex.: • spojit spojený spojený spojenost spojitelný spojitelný spojitelnost • irregular forms listed in dictionary (w/tags) • no phonological processing(concatenation only) • grammatical prefixes only: negation, superlative joinedly join joined joinedliness joinably joinable joinability Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  7. The Morphological Annotation Tool (LAW) Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  8. The Process ofMorphological Annotation • From tokenized to annotated text: tokenized text (auto, w-layer) (Auto) morphological analysis morphological dictionary Manual morphological disambiguation (DA) text w/morph. interpretations annotation guidelines text w/select. interpretation annotated text (m-layer) Manual adjudication Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax

  9. PDT – Syntactic Annotation • Surface syntax annotation • Dependency surface syntax • Comparable to Penn Treebank annotation • Convertible: dependency ↔ parse trees • Deep syntactic/semantic annotation • Dependency trees • Different topology • High level of generalization and formalization • Many node attributes

  10. governor dependent Analytical Syntax (a-layer) • Dependency + Analytical Function The influence of the Mexican crisis on Central and Eastern Europe has apparently been underestimated.

  11. Analytical Syntax: Functions • Main (for [main] semantic lexemes): • Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom • “Double” dependency: AtrAdv, AtrObj, AtrAtr • Special (function words, punctuation,...): • Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxY • Prepositions/Conjunctions: AuxP, AuxC • Punctuation, Graphics: AuxX, AuxS, AuxG, AuxK • Structural • Elipsis: ExD, Coordination etc.: Coord, Apos

  12. Example • All came from Cray Research.

  13. Surface Syntax Example • Complete sentence: Sb, Pred, Obj • Resistance needs courage.

  14. Surface Syntax Example • Analytical verb form: • he would be allowed to be enrolled

  15. Surface Syntax Example • Predicate with copula (state)‏ • you were fired

  16. Surface Syntax Example • Passive construction (action)‏ • (The) book has been translated [by Mr. X]

  17. Surface Syntax Example • Complement • she left crying

  18. Surface Syntax Example • Object • he gave Mary a book

  19. Surface Syntax Example • Object used for infinitive of analytical verb forms • he wants to learn

  20. Surface Syntax Example • Relative clause (embedded)‏ • the woman, who had a French accent, was very pretty

  21. Surface Syntax Example • Coordination • ... (to) magic, mysticism(,) etc.

  22. Surface Syntax Example • Apposition • cheap, i.e. under five dollars

  23. Surface Syntax Example • Incomplete phrases • Peter works well, but Paul badly

  24. Surface Syntax Example • Variants (equality)‏ • he bought shoes for his son

  25. XML Annotation Layers (English) • Strictly top-down links • w+m+a can be easily “knitted” • API for cross-layer access (programming)‏ • PML Schema / Relax NG • [With slight modification, can be used for spoken data (audio as layer “-1”)]

More Related