1 / 42

Grammar for Fun: IT-based Gmmar Teaching with VISL

Eckhard Bick. Grammar for Fun: IT-based Gmmar Teaching with VISL. Eckhard Bick, 2004. Talk outline. Teaching projects. CTU 1996-99: Internet based grammar teaching software (research and development) ELU1 1998-2000: VISL tools for Danish universities and teacher seminaries

vin
Download Presentation

Grammar for Fun: IT-based Gmmar Teaching with VISL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eckhard Bick Grammar for Fun:IT-based Gmmar Teaching with VISL Eckhard Bick, 2004

  2. Talk outline

  3. Teaching projects CTU1996-99: Internet based grammar teaching software (research and development) ELU1 1998-2000: VISL tools for Danish universities and teacher seminaries VISL-HHX 2001-03: VISL tools for Danish business schools VISL-GYM 2001-02: VISL tools for Danish gymnasiums PaNoLa, GREI 2002-2004: Major Nordic languages VISL-SEM 2004-05: VISL didactics for teacher training colleges URKAS 2004-05: “Almen sprogforståelse” (1.g)

  4. Unity in diversity:A unified approach for 22 languages

  5. VISL research languages

  6. The VISL teaching network

  7. kompleksitetsprogression

  8. Grammy i Klostermølleskoven 9 levels/topics Story-line about grammar Comments for teachers Interactive exercises Book = IT Explanations for students

  9. The Paintbox game

  10. ShootingGallery: Hit a noun!

  11. WordFall - Tetris for grammarians

  12. Labyrinth - a word class maze

  13. Post office - stamping syntactic function

  14. Syntris - syntax brick by brick

  15. SpaceRescue: Alien syntax

  16. Constituent trees

  17. Interactive syntactic trees

  18. Teaching corpora of analyzed sentences

  19. Function categories

  20. BuildTree: Drag & drop constituents

  21. LabelTree: Drag & drop syntactic function

  22. Cross-language problems:Infinitive marker

  23. Cross-language problems:participial clauses

  24. Cross-language problems:Discontinuity

  25. VISL lite vertical tree (non-graphical notation, filtered) VISL vertical tree (non-graphical notation, incl. morphology) UTT:cl S:prop VISL P:v er Cs:g =D:art et =H:n forskningsprojekt =D:cl ==S:pron der ==P:v involverer ==Od:g ===D:pron mange ===D:adj forskellige ===H:n sprog STA:fcl S:prop("VISL") VISL P:v-fin("være",pr,akt) er Cs:np =DN:art("en",neu,sg,idf) et =H:n("forskningsprojekt",neu,sg,idf,nom) forskningsprojekt =DN:fcl ==S:pron-rel("der",nG,nN,nom) der ==P:v-fin("involvere",pr,akt) involverer ==Od:np ===DN:pron-indef("mange",nG,pl,nom) mange ===DN:adj("forskellig",nG,pl,nD,nom) forskellige ===H:n("sprog",neu,pl,idf,nom) sprog VISL source notation

  26. CG source notation (function/dependency)

  27. Supported xml-formats TIGER-xml (constituents) TIGER-xml (dependency) MALT-xml VISL data file markers: pedagogical topic and chaptering attributes for dynamic html-layout

  28. Search interfaces for annotated corpora

  29. Menu-based searches

  30. Statistical tools

  31. Corpus annotation

  32. Annotated corpora Morphosyntactically tagged • Korpus90and Korpus2000, mixed genre, 56M words • DFK, mainly transscribed parliamentary discussions, 7M words • CETEMPúblico, European Portuguese, news text, 180M words • Folha de São Paulo, Brazilian news text, 90M words • CORDIAL-SIN, dialectal Portuguese, 30K words • NURC, transscribed Brazilian speech, 100K words • Tycho Brahe, historical Portuguese, 50K words Valency tagged • NILC corpus, Brazilian Portuguese, journalistic and essays, 39M words Treebanks • Floresta Sintá(c)tica, European Portuguese, 1M words (35K revised) • Arboretum, Danish, 50K words revised

  33. Integrating live NLPand language awareness teaching

  34. KillerFiller: Towards evaluation

  35. Performance statistics

  36. VISLhttp://visl.sdu.dkEckhard Bick, lineb@hum.au.dk **************

  37. The most common syntactic categories

  38. The DanGram system in current numbers Lexemes in morphological base lexicon: 146.342 (equals about 1.000.000 full forms), of these: proper names: 44839 (experimental) polylexicals: 460 (+ names and certain number expressions) Lexemes in the valency and semantic prototype lexicon: 95.308 Lexemes in the bilingual lexicon (Danish-Esperanto): 36.001 Danish CG-rules, in all: 6.233 morphological CG disambiguation rules: 2.678 syntactic mapping-rules: 1.701 syntactic CG disambiguation rules: 1.854 (plus 429 bilingual rules in separate MT grammars, and a smaller number of semantic case-role and proper name-rules in the semantics and name grammars) Danish PSG-rules: 490 (for generating syntactic tree structures) Performance: At full disambiguation (i.e., maximal precision), the system has an average correctness of 99% for word class (PoS), and about 96% for syntactic tags (depending, on how fine grained an annotation scheme is used) Speed: full CG-parse: ca. 400 words/sec for larger texts (start up time 3-6 sec) morphological analysis alone: ca. 1000 words/sec

  39. VISL parsing tools • Preprocessing: word- and sentence boundaries, polylexicals • Lexicon and rule based morphological analysis: Inflexion, derivation, composita recognition • Postprocessing: Valency and semantic potential • Morphological contextual disambiguation (CG) • Syntactic mapping og diambiguation (CG) • Names CG , feature propagation CG, Case role-CG • PSG-overbygning: Teaching, Arboretum, Floresta

  40. Research projects • SHF 1999-2001: CG, syntax & semantics (da,en,po) • AC/DC 1999-?: Portuguese CG-corpora • Floresta 2000-?: Portuguese treebank • DSL 2001-?: Korpus90/2000 (Danish CG-corpora) • Arboretum 2002-?: Danish treebank • PaNoLa 2002-2003: Integration of Nordic CG research • Nomen Nescio: Automatic named entity recognition

  41. Running CG-annotation Da[da] KS@SUBden [den] ART UTR S DEF @>Ngamle [gammel] ADJ nG S DEF NOM @>Nsælger [sælger] N UTR S IDF NOM @SUBJ>kørte [køre] <mv> V IMPF AKT @FS-ADVL>hjem [hjem] N NEU P IDF NOM @<ACCi [i] PRP@<ADVLsin [sin] <poss> <refl> DET UTR S @>Nbil [bil] N UTR S IDF NOM @P<, så [se] <mv> V IMPF AKT @FMVhan [han] PERS UTR 3S NOM @<SUBJmange [mange] <quant> DET nG P NOM @>Nsmå [lille] ADJ nG P nD NOM @>Ndyr [dyr] N NEU P IDF NOM &ACI-SUBJ @<ACCpå [på] PRP@<OAde [den] ART nG P DEF @>Nvåde [våd] ADJ nG P nD NOM @>Nveje [vej] N UTR P IDF NOM @P<

More Related