1 / 36

Example-Based Treebank Querying

Example-Based Treebank Querying. Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde. CLARIN Sofia, 2012-10-28. NEDERBOOMS. Exploitation of Dutch treebanks for research in linguistics CLARIN-VL Project Centre for Computational Linguistics (CCL) Dutch Grammar and Language Use (NGTG)

galena
Download Presentation

Example-Based Treebank Querying

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Example-Based Treebank Querying Liesbeth Augustinus Vincent Vandeghinste Frank Van Eynde CLARIN Sofia, 2012-10-28

  2. NEDERBOOMS • Exploitation of Dutch treebanks for research in linguistics • CLARIN-VL Project • Centre for Computational Linguistics (CCL) • Dutch Grammar and Language Use (NGTG) • Goals: • User-friendly tools • Access to large data files

  3. NEDERBOOMS How can we combine the data-oriented approach of treebank mining with the knowledge-oriented method of theoretical and descriptive linguistics?

  4. QUERYING LASSY Existing search tools: dtsearch (Kloosterman 2007) Dact (de Kok 2010) stand-alone toolsquery language: XPath = standard query language for xml trees

  5. QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"]

  6. QUERYING LASSY Some examples: “look for all NP nodes in which the head noun is modified by the adjective ‘politiek’, e.g. politieke discussies” //node[@cat="np" and node[@rel="mod" and @pos="adj" and @root="politiek"] and node[@rel="hd" and @pos="noun"] “look for verb clusters in which a separable verb particle occurs between two verb forms, e.g. “Hij zegt dat ze spoedig zullen kennis maken” //node[node[@rel="hd" and @pos="verb" and @begin < ../node[@rel="vc"]/node[@rel="svp" and @pos="part"]/@begin] and node[@rel="vc" and node[@rel="svp" and @begin < ../node[@rel="hd" and @pos="verb"]/@begin]]]

  7. QUERYING LASSY XPath Not user-friendly Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort

  8. QUERYING LASSY XPath Not user-friendly Knowledge of Alpino grammar necessary = problematic for non-technical linguists Verify theory through data with corpus or treebank examples Time consuming, requires some effort How to make interaction between computational linguistics and theoretical linguistics possible?

  9. GrETEL Greedy Extraction of Trees for Empirical Linguistics Search tool based on example sentences Input = natural language No explicit knowledge of formal query language nor Alpino grammar required Bridge gap between descriptive and computational linguistics Available online http://nederbooms.ccl.kuleuven.be(optimised for Mozilla Firefox)

  10. GrETEL - online

  11. GrETEL - online

  12. GrETEL - input Green versus red word order in Dutch green: past participle – auxiliary De NAVO stelt dat ze er alles aan gedaan heeft red: auxiliary – past participle De NAVO stelt dat ze er alles aan heeft gedaan “The NATO claim that they have done everything in their power” (deredactie.be)

  13. GrETEL - input

  14. GrETEL - input >> parsed with Alpino

  15. GrETEL - annotation

  16. GrETEL - annotation >> info added to Alpino parse

  17. GrETEL – query info

  18. GrETEL – query info input example

  19. GrETEL – query info input example Alpino parse

  20. GrETEL – query info input example Alpino parse query tree

  21. GrETEL – query info input example Alpino parse query tree XPath query

  22. GrETEL – query info input example Alpino parse query tree XPath query treebanks

  23. GrETEL – query tree

  24. GrETEL – query tree >> subtree extraction

  25. GrETEL – query tree query tree

  26. GrETEL – query tree query tree >> XPath generator >>

  27. GrETEL – query tree query tree >> XPath generator >> XPath expression //node[@cat="ssub" and node[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]] and node[@rel="hd" and @pos="verb" and @root="heb"]]

  28. GrETEL – treebanks

  29. GrETEL – treebanks LASSY Small in PostgreSQL database >> no local installation required

  30. GrETEL – results

  31. GrETEL – results >> (adapted) query

  32. GrETEL – results >> (adapted) query >> quantitative information

  33. GrETEL – results

  34. GrETEL – results >> tree viewer

  35. GrETEL – results >> tree viewer >> list of results

  36. liesbeth@ccl.kuleuven.be http://nederbooms.ccl.kuleuven.be Thanks for your attention! Questions?

More Related