tam s v radi institute for linguistics research hungarian academy of sciences varadi@nytud hu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tips and Tricks … with INTEX/NOOJ PowerPoint Presentation
Download Presentation
Tips and Tricks … with INTEX/NOOJ

Loading in 2 Seconds...

play fullscreen
1 / 17

Tips and Tricks … with INTEX/NOOJ - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu. Tips and Tricks … with INTEX/NOOJ. Max Silberztein University of Franche-Comte max.silberztein@univ-fcomte.fr. Outline. Why INTEX/NOOJ should be a tool of choice? raising language awareness

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Tips and Tricks … with INTEX/NOOJ


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Tamás Váradi Institute for Linguistics Research Hungarian Academy of Sciences varadi@nytud.hu Tips and Tricks … with INTEX/NOOJ • Max Silberztein • University of Franche-Comte • max.silberztein@univ-fcomte.fr

    2. Outline • Why INTEX/NOOJ should be a tool of choice? • raising language awareness • studying linguistics • lexical analysis • morphology • paradigms • word formation • automatic lexical acquisition • syntax • local grammars • semantic tagging

    3. List of useful features • instant lexical lookup • linguistically sophisticated lexicon • intuitive graphical interface • fast, robust, finite-state technology • corpus, lecxicon, grammar handled uniformly • instant confirmation from corpus • can be used at different levels of competence • simple corpus query tool • grammar development environment • research tool for NLP projects

    4. Morphology I - Inflection paradigms handled in the form of fst’s

    5. Morphology I - Inflection stem variants processed with operations on strings L = move left erasing character

    6. Morphology II derivation • All the formsderived fromthe root ‘fran-’ • Ideal to learnand experimentwith morphologicalsegmentation

    7. Automatic lexical extraction Store any sequence of letters, which is followed by –ize or –ify in variable $Root Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

    8. Lexical constraints check if the string stored in $Root is in the lexicon as an A, with feature +Nation Produce the lexical entry: wordform: $Root+$Suf, lemma:$Root part of speech:V synsem:+V

    9. Syntax • grammars defined in graphs relying on info stored in the lexicon (minimally lemma and POS)

    10. Instant feedback from corpus

    11. Labelled bracketing • hit strings may be tagged (merge mode) • [NP a soft, slow step NP] • or replaced with bracketing • [NP NP]

    12. Disambiguation • Very – Adjective or Adverbs

    13. Recursion – embedded graphs

    14. An exercise in semantic tagging • Expressions of time

    15. An exercise in semantic tagging • Expressions of time

    16. Finally, not for the faint hearted … • the big picture

    17. Conclusions • Teaching linguistic analysis by doing it • INTEX/NooJ is [det THE] technology to use honestly…  All welcome to have a go at it Thank you for your attention!