prague dependency treebank 1 0 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Prague Dependency Treebank 1.0 PowerPoint Presentation
Download Presentation
Prague Dependency Treebank 1.0

Loading in 2 Seconds...

play fullscreen
1 / 58

Prague Dependency Treebank 1.0 - PowerPoint PPT Presentation


  • 60 Views
  • Uploaded on

Prague Dependency Treebank 1.0. CD-ROM PRESENTATION Dec 18, 2000. Prague Dependency Treebank 1.0. Functional Generative Description. CD-ROM PRESENTATION Dec 18, 2000. Functional Generative Description.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Prague Dependency Treebank 1.0' - mikko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
prague dependency treebank 1 0

Prague Dependency Treebank 1.0

CD-ROM PRESENTATION

Dec 18, 2000

prague dependency treebank 1 01

Prague Dependency Treebank 1.0

Functional Generative Description

CD-ROM PRESENTATION

Dec 18, 2000

functional generative description
Functional Generative Description
  • theoretical framework based on the findings of European structural linguistics, esp. of the classicalPrague School
  • methodological requirements of aformal description
  • levels:
    • tectogrammatical (underlying) representations (TRs) withdependency based syntax
    • morphemics
    • phonemics and phonetics
  • TRs(see Sgall, Hajičová and Panevová 1986, formally specified by Petkevič, also in a declarative way)

Prague Dependency Treebank 1.0

dependency tree
Dependency tree

My younger brother arrived there yesterday.

Linearized form, one-to-one relation:

((I)Appurt (younger)Rstr brother)Act arrive.Pret.Indic (Dir there) (Temp yesterday)

Prague Dependency Treebank 1.0

dependency tree1
Dependency Tree
  • labels - lexical meanings (abstract symbols) with indices
    • functors
      • subscripts at parentheses oriented towards head
    • grammatemes - values of morphological categories
      • Tense, Modality, Number, Definiteness, etc.
  • projectivity
  • valency
    • arguments (inner participants) and adjuncts (circumstantials or 'free modifications')
    • obligatoryandoptional with a given head,
    • deletable or not

Prague Dependency Treebank 1.0

dependency tree2
Dependency Tree
  • adjuncts
    • Locative, several Directional and Temporal modifications
    • Condition, Means, Manner, etc.
  • participants (arguments) of verbs
    • Actor/Bearer (underlying subject)
    • Objective (Patient, underlying direct object)
    • Addressee(underlying indirect object)
    • Effect ('second' object: to choose so. as sth.)
    • Origin(to make sth. out of sth.)

Prague Dependency Treebank 1.0

dependency tree3
inner participants

Material (Partitive) two baskets of sth.

Identitythe river Danube; the notion of operator

free modifications

Possession (Appurtenance) my table; Jim's brother

Restrictive rich man

Descriptive the Swedes, who are a Scandinavian nation

Dependency Tree

Complementations dependent mainly on nouns

Prague Dependency Treebank 1.0

dependency tree4
Dependency Tree
  • syntactic grammatemes
    • Loc, Dir - in, on, under, between...
    • Regard - with, without
  • operational (testable) criteria
    • for distinguishing
      • arguments from adjuncts,
      • from each other
    • deletability (dialogue test)

Prague Dependency Treebank 1.0

simplified valency frames
Simplified valency frames
  • brother N Appurt
  • man N
  • glass N Material
  • full A Material
  • read V Act Addr Obj
  • change V ActObj Orig Eff
  • give V ActAddrObj

obligatory complementations in blue

Prague Dependency Treebank 1.0

topic focus articulation

T

there

young

Topic-focus articulation
  • contextual boundness
    • main verb CB/NB (T/F)
    • dependents to the left/right
  • communicative dynamism
    • left-right (mother, sisters, transitive)
    • partial ordering
  • underlying word order
    • left-right
    • linear ordering

left-to-right order of nodes together with the index T or (prototypically) F indicates the TFA of the sentence (of the TR)

Prague Dependency Treebank 1.0

topic focus articulation1

T

F

there

yesterday

young

Topic-focus articulation
  • TFA - one of the basic aspects of underlying structures

Prague Dependency Treebank 1.0

complex sentence
Complex sentence
  • a subordinated (dependent) clause (i.e. its main verb) depends on a word contained in its governing clause

My brother, whom you know, arrived there yesterday.

Prague Dependency Treebank 1.0

complex sentence1
Complex sentence
  • functionwords (synsemantic)are viewed as function morphemes, syntactically fixed to certain lexical (autosemantic) words - prepositions and articles to nouns, conjunctions and auxiliaries to verbs

Martin came there late, since he had to accompany his sick mother.

Prague Dependency Treebank 1.0

complex sentence2
Complex sentence

Martin arrived late to the session, since he had to accompany his sick mother.schematically (morphemes):

Martin arrive.ed late to the session since he have.ed to accompany he.s sick mother.dot - close connection of morphemes ('semes')

Prague Dependency Treebank 1.0

slide15
deleted items restored
    • order of items - difference between 'underlying' and surface (morphemic) word order
    • transductive components - Panevová, Oliva, Borota
  • coordination (multidimensional)
    • Jim and Mary, who have two children, went to Boston.
    • the linearized notation is adequate:
    • ((Jim Mary)Conj ((who)Act have (Pat (two)Rstr children)))Act went (Dir Boston)
  • structures close to Boolean, i.e.no complex'innate properties' specific for natural language are needed.

Prague Dependency Treebank 1.0

prague dependency treebank corpus annotation
Prague Dependency Treebank - corpus annotation
  • an intermediate level - 'analytical' representations
    • dependency trees, not always projective
    • nodes for all word tokens, even for punctuation marks
  • tectogrammmatical tree: coordinating conjunction as the head

Prague Dependency Treebank 1.0

prague dependency treebank 1 02

Prague Dependency Treebank 1.0

CD-ROM PRESENTATION

Dec 18, 2000

prague dependency treebank 1 03

Prague Dependency Treebank 1.0

Morphological Layer

CD-ROM PRESENTATION

Dec 18, 2000

acknowledgements
ACKNOWLEDGEMENTS

Prague Dependency Treebank 1.0

annotated corpora
ANNOTATED CORPORA

PDT version 1.0, 2000

(1996 - 2000)

Penn Treebank, release 3, 1999

(1989 - 1999)

Prague Dependency Treebank 1.0

tag sets
TAG SETs

Czech - ambiguous inflective language

nový, nového, novému, novém, novým, nová, nové, novou, nových, novým, novými, … novější, novejšího, novějšímu, novějším, …., nejnovější, nejnovějšího, nejnovějšímu, nejnovějším….. nejnovějších, nejnovějším, …

English -language with poor inflection

work, works, worked, working

Prague Dependency Treebank 1.0

text sources
Lidové noviny

Mladá Fronta Dnes

Vesmír

Českomoravský Profit

...taken from Czech National Corpus

´88, ´89 WSJ articles

Air Travel Information System transcripts

Brown Corpus

Switchboard transcripts

TEXT SOURCES

Prague Dependency Treebank 1.0

slide24

ANNOTATION STRATEGY - Penn Treebank

TEXT

Ken Church‘s stochastic tagger,

Eric Brill‘s transformation tagger

corrections by annotator (GNU Emacs Lisp based package)

Prague Dependency Treebank 1.0

annotation strategy pdt
Automatic Morphological Analyzer (AMA)

two independent annotators;Linux, Win tools

differences resolved by third annotator

comparison with the current AMA; manual resolution; Win tools

ANNOTATION STRATEGY - PDT

Prague Dependency Treebank 1.0

internal format
SGML coding, csts dtd

word/tag(|tag)*

INTERNAL FORMAT

Prague Dependency Treebank 1.0

slide27

SAMPLES

<s id=“ln95040:020-p1s1“>

<f>Pokus<l>pokus<t>NNIS1-----A----

<f>o<l>o<t>RR--4----------

<f>zázrak<l>zázrak<t>NNIS4-----A----

<d>.<l>.<t>Z:-------------

The/DT envelope/NN arrives/VBZ in/IN the/DT mail/NN ./.

Prague Dependency Treebank 1.0

conversion
SGML coding

SGML coding

word/tag

word/lemma/tag

CONVERSION

pdt2wsj.pl

pdt2wsjFLT.pl

Prague Dependency Treebank 1.0

data size
DATA SIZE

Prague Dependency Treebank 1.0

data sets of morphologically annotated data
DATA SETs of MORPHOLOGICALLY ANNOTATED DATA

Prague Dependency Treebank 1.0

tools
Automatic Morphological Analyser/Generator of Czech

HMAnalyze.pl, HMGenerate.pl

Dictionary: CZE_a

Remote Acces

Czech Taggers

HMM

Exponential

TOOLS

Prague Dependency Treebank 1.0

prague dependency treebank 1 04

Prague Dependency Treebank 1.0

CD-ROM PRESENTATION

Dec 18, 2000

prague dependency treebank 1 05

Prague Dependency Treebank 1.0

Analytical Layer in PDT

CD-ROM PRESENTATION

Dec 18, 2000

introduction
Introduction
  • Input: morphologically tagged sentences
  • Graph Editor: “user-friendly” software
  • Output: ATS structure
    • „surface“ syntax tree structure
    • nodes labelled by the analytical functions

Prague Dependency Treebank 1.0

two stages chronologically
Two stages (chronologically)
  • (A) manual „analytic“ annotation (ATS)
    • training data for (B)(a)
  • (B)
    • (a) semiautomatic procedure (Collin‘s parser)
    • (b) manual correcting of (B)(a)

Prague Dependency Treebank 1.0

constraints and limitations
Constraints and limitations
  • any string has a node of its own
    • word-form, punctuation mark, etc.
    • AuxV, AuxP, AuxC, AuxX, AuxG…
  • reflecting the coordination and apposition relations
    • so called third dimension of the graph in the plain tree (X_Co, X_Ap, X_Pa, where X is one of analytic functions, such as Sb, Obj, Adv, etc.)

Prague Dependency Treebank 1.0

constraints and limitations1
Constraints and limitations
  • no missing nodes (on the surface) can be added
    • analytic funtion Ex_D is used
  • relations between semi-automatic and manual procedure
    • 80% edges are established correctly automatically

Prague Dependency Treebank 1.0

project organization
Project organization
  • team consisting of 5-6 annotators
  • handbook for ATS structure annotation
  • 1999: 100000 sentences on ATS
  • tectogrammatical annotation follows

Prague Dependency Treebank 1.0

slide39

AuxT

Adv

První restituční zákon českého parlamentu se do sněmovních lavic může vrátit jako bumerang.

Prague Dependency Treebank 1.0

prague dependency treebank 1 06

Prague Dependency Treebank 1.0

CD-ROM PRESENTATION

Dec 18, 2000

prague dependency treebank 1 07

Prague Dependency Treebank 1.0

From the Analyticaltowards the Tectogrammatical layer

CD-ROM PRESENTATION

Dec 18, 2000

introduction1
Introduction
  • ATS annotation
    • nodes:
      • word forms
      • punctuation
      • graphical symbols
  • TGTS annotation
      • autosemantic words
      • deletions
  • edges:
    • surface relations
    • deep layer functions

Prague Dependency Treebank 1.0

annotation process

Tokenization

ATS

PDT1.0

Morphological tagging

and lexical

disambiguation

Syntactic parsing

and analytic function

assignment

TGTS

Tree structure

pruning

Attribute assignments

Annotation process

Input Czech sentence

Prague Dependency Treebank 1.0

transition procedure
Transition procedure
  • deterministic procedure operating on trees
  • macro language for Graph Editor (C++ like)
  • automatic changes & tools for annotators
  • Requirements
    • new attributes for tectogrammatical layer
    • ATS is recoverable from TGTS
    • automatized to a maximally high degree

Prague Dependency Treebank 1.0

new attributes
New attributes
  • trlemma - lemmaof the original node or lemma composed of joined nodes
  • morphological grammatemes
    • gender, number, degree of comparison, tense,
    • aspect, iterativeness, verbal modality, deontic modality, sentence modality
  • positionof the node
    • functor, topic-focus articulation, syntactic grammateme,
    • type of relation (dependency, coordination, apposition),
    • phraseme, deletion, quoted word, direct speech,
    • coreference, antecedent

Prague Dependency Treebank 1.0

tree structure pruning
Tree Structure Pruning
  • U toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný.
  • For those, who start actually at zero, the tax outcome for the state is not substantial.

Prague Dependency Treebank 1.0

tree structure pruning1

REG

Tree Structure Pruning
  • U toho, kdo začíná opravdu od nuly, není daňový výnos pro stát podstatný.
  • For those, who start actually at zero, the tax outcome for the state is not substantial.

Prague Dependency Treebank 1.0

verbal nodes

verbmod=CDN

deontmod=HRT

PRED

Verbal Nodes
  • … podnikatelé by měli mít daně …
  • … enterpreneurs should have (their) taxes …

Prague Dependency Treebank 1.0

attribute assignments
Attribute Assignments
  • prepositions stored as fwattribute
  • quoted words
    • clause in quotes -> DSP
    • one pair of quotes in the sentence -> DSPP
    • string in quotes -> QUOT
  • gender, number, tense, degcmp, aspect
  • default values

Prague Dependency Treebank 1.0

macros for annotators
Macros for Annotators
  • keyboard shortcuts (in Graph editor)
    • structure changes
      • hide/recover nodes
      • merge nodes
    • add new nodes
    • functor assignments

Prague Dependency Treebank 1.0

manual annotation
Manual annotation
  • structure checking
  • functors
  • deletions of obligatory modifications
  • feedback for formulating the handbook for annotators

Prague Dependency Treebank 1.0

prague dependency treebank 1 08

Prague Dependency Treebank 1.0

CD-ROM PRESENTATION

Dec 18, 2000

prague dependency treebank 1 09

Prague Dependency Treebank 1.0

Tectogrammatical Layer

CD-ROM PRESENTATION

Dec 18, 2000

slide55

F

T

T

F

C

T

T

T

T

T

Prague Dependency Treebank 1.0

slide56
Jirka se včera opil do němoty a Honza dneska.
  • George himself yesterday drank to silence and Honza today.

Prague Dependency Treebank 1.0

attributes of coreferrential relations
Attributes of Coreferrential relations
  • only in MC
    • attributevaluescorefthe lemma of the antecedentcorsntNIL - in the same sentence PREV1 ... PREVi - position of the sentence which includes the antecedent
    • grammatical coreferenceantecthe functor of the antecedent

Prague Dependency Treebank 1.0

example

coref: Honza

corsnt: NIL

cornum: 1

antec: ACT

Example

Honza slíbil přijít včas.

Honza promised to come in time.

Prague Dependency Treebank 1.0