1 / 30

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics. Lecture Notes January 23rd. Adminstrivia. Let us assume Installed Penn Treebank v3 Downloaded and installed tregex under MacOSX or Linux (possibly inside VirtualBox ). Trees in the Penn Treebank. Notation : LISP S-expression S-EXP =

abdalla
Download Presentation

LING 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 581: Advanced Computational Linguistics Lecture Notes January 23rd

  2. Adminstrivia • Let us assume • Installed Penn Treebank v3 • Downloaded and installed tregex • under MacOSX or Linux (possibly inside VirtualBox)

  3. Trees in the Penn Treebank Notation: LISP S-expression S-EXP = (LABEL S-EXP … S-EXP) or S-EXP = (LABEL WORD)

  4. tregex • tregex is a tgrep2-style utility for matching patterns in trees • run-tregex-gui.command (on MacOSX)

  5. tregex • tregex is a tgrep2-style utility for matching patterns in trees • select the PTB directory, e.g. TREEBANK_3/parsed/mrg/wsj/

  6. tregex • tregex is a tgrep2-style utility for matching patterns in trees • Browse Trees

  7. tregex • Search (NP-SBJ << NNP) (NP-SBJ < NNP)

  8. Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html

  9. Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html

  10. Penn Tagset Recap • Part-of-speech (POS) tags • http://www.americannationalcorpus.org/OANC/penn.html

  11. Penn Tagset Recap • Syntactic tagset: • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)

  12. Penn Tagset Recap • Syntactic tagset: • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)

  13. tregex: relations Help

  14. tregex: relations Help

  15. tregex: labels /regex/ anchors: ^, $ __ @NP matches NP, NP-SBJ etc. • Help S < NP < VP means S < VP AND S < NP Note: node grouping () vs. relation grouping []

  16. Tregex: operators Help 4. VP < VV | < NP $ NP equiv. to VP [ < VV | [< NP & $ NP ]] NP < NN | < NNS NP > S & $++ VP (& redundant) 5. NP !< NNP 6. NP < !NNP|NNS 3. NP [ < NN | < NNS ] & > S (Note: squarebrackets)

  17. tregex • Help @NP matches NP, NP-SBJ, NP-PRD, NP-TMP etc. Matches: 432,777 anywhere…

  18. tregex: names Similar to backreferences in Perl regexs (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma) • Help

  19. tregex: names same node • Pattern: • (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma) Key: <, first child $+ immediate left sister <- last child

  20. tregex: links • Help ADJP=cat <, ~cat <- ~cat

  21. tregex: variable groups • Help @SBAR < /^WH.*-([0-9]+)$/#1%index << (__=empty < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))

  22. tregex: variable groups

  23. tregex: variable groups • Different results from: • @SBAR < /^WH.*-([0-9]+)$/#1%index << (@NP < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))

  24. tregex: variable groups Example: WHADVP also possible (not just WHNP)

  25. Treebank Guides Tagging Guide Arpa94 paper Parse Guide

  26. Treebank Guides • Parts-of-speech (POS) Tagging Guide, tagguid1.pdf (34 pages): tagguid2.pdf: addendum, see POS tag ‘TO’

  27. Treebank Guides • Parsing guide 1, prsguid1.pdf (318 pages): prsguid2.pdf: addendum for the Switchboard corpus

  28. Homework Exercise • Report your regex search expression and frequency counts for NPs that have various classes of relative clauses attached • (Prsguid1.pdf, section 4.2.2, pg.63) • Criteria: • Relative clauses are adjoinedto the head noun phrase • The relative pronoun is • (1) given the appropriate WH-label, • (2) put inside the SBAR level, and • (3) coindexedwith a *T* in the position of a gap

  29. Homework Exercise • Report and document your search string and frequency counts for different categories: • Tensed relatives • Subject relative clauses vs. non-subject relative clauses • That-relatives vs. wh-word relatives vs. zero relatives • Infinitival relatives • Subject relative clauses vs. non-subject relative clauses • Submit a snapshot of a tree from the WSJ for each of the categories

  30. Homework Exercise • Put everything in one PDF file • Submit by email before class next time • Be prepared to come up and explain your searches to the class

More Related