Tips and Tricks … with INTEX/NOOJ - PowerPoint PPT Presentation

tam s v radi institute for linguistics research hungarian academy of sciences varadi@nytud hu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tips and Tricks … with INTEX/NOOJ PowerPoint Presentation
Download Presentation
Tips and Tricks … with INTEX/NOOJ

play fullscreen
1 / 17
Tips and Tricks … with INTEX/NOOJ
131 Views
Download Presentation
gail
Download Presentation

Tips and Tricks … with INTEX/NOOJ

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Tips and Tricks … with INTEX/NOOJ • Max Silberztein • University of Franche-Comte • max.silberztein@univ-fcomte.fr

  2. Outline • Why INTEX/NOOJ should be a tool of choice? • raising language awareness • studying linguistics • lexical analysis • morphology • paradigms • word formation • automatic lexical acquisition • syntax • local grammars • semantic tagging

  3. List of useful features • instant lexical lookup • linguistically sophisticated lexicon • intuitive graphical interface • fast, robust, finite-state technology • corpus, lecxicon, grammar handled uniformly • instant confirmation from corpus • can be used at different levels of competence • simple corpus query tool • grammar development environment • research tool for NLP projects

  4. Morphology I - Inflection paradigms handled in the form of fst’s

  5. Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character

  6. Morphology II derivation • All the formsderived fromthe root ‘fran-’ • Ideal to learnand experimentwith morphologicalsegmentation

  7. Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

  8. Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

  9. Syntax • grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)

  10. Instant feedback from corpus

  11. Labelled bracketing • hit strings may be tagged (merge mode) • [NP a soft, slow step NP] • or replaced with bracketing • [NP NP]

  12. Disambiguation • Very – Adjective or Adverbs

  13. Recursion – embedded graphs

  14. An exercise in semantic tagging • Expressions of time

  15. An exercise in semantic tagging • Expressions of time

  16. Finally, not for the faint hearted … • the big picture

  17. Conclusions • Teaching linguistic analysis by doing it • INTEX/NooJ is [det THE] technology to use honestly…  All welcome to have a go at it Thank you for your attention!