Ling 581 advanced computational linguistics
Download
1 / 30

LING 581: Advanced Computational Linguistics - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

LING 581: Advanced Computational Linguistics. Lecture Notes January 23rd. Adminstrivia. Let us assume Installed Penn Treebank v3 Downloaded and installed tregex under MacOSX or Linux (possibly inside VirtualBox ). Trees in the Penn Treebank. Notation : LISP S-expression S-EXP =

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' LING 581: Advanced Computational Linguistics' - abdalla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Adminstrivia
Adminstrivia

  • Let us assume

    • Installed Penn Treebank v3

    • Downloaded and installed tregex

      • under MacOSX or Linux (possibly inside VirtualBox)


Trees in the penn treebank
Trees in the Penn Treebank

Notation: LISP S-expression

S-EXP =

(LABEL S-EXP … S-EXP)

or

S-EXP =

(LABEL WORD)


Tregex
tregex

  • tregex is a tgrep2-style utility for matching patterns in trees

    • run-tregex-gui.command (on MacOSX)


Tregex1
tregex

  • tregex is a tgrep2-style utility for matching patterns in trees

    • select the PTB directory, e.g. TREEBANK_3/parsed/mrg/wsj/


Tregex2
tregex

  • tregex is a tgrep2-style utility for matching patterns in trees

    • Browse Trees


Tregex3
tregex

  • Search (NP-SBJ << NNP) (NP-SBJ < NNP)


Penn tagset recap
Penn Tagset Recap

  • Part-of-speech (POS) tags

    • http://www.americannationalcorpus.org/OANC/penn.html


Penn tagset recap1
Penn Tagset Recap

  • Part-of-speech (POS) tags

    • http://www.americannationalcorpus.org/OANC/penn.html


Penn tagset recap2
Penn Tagset Recap

  • Part-of-speech (POS) tags

    • http://www.americannationalcorpus.org/OANC/penn.html


Penn tagset recap3
Penn Tagset Recap

  • Syntactic tagset:

    • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)


Penn tagset recap4
Penn Tagset Recap

  • Syntactic tagset:

    • (from The Penn Treebank: An overview, Taylor, Marcus & Santorini)


Tregex relations
tregex: relations

Help


Tregex relations1
tregex: relations

Help


T regex labels
tregex: labels

/regex/ anchors: ^, $

__

@NP matches NP, NP-SBJ etc.

  • Help

S < NP < VP means S < VP AND S < NP

Note: node grouping () vs. relation grouping []


Tregex operators
Tregex: operators

Help

4. VP < VV | < NP $ NP

equiv. to VP [ < VV | [< NP & $ NP ]]

NP < NN | < NNS

NP > S & $++ VP (& redundant)

5. NP !< NNP

6. NP < !NNP|NNS

3. NP [ < NN | < NNS ] & > S

(Note: squarebrackets)


Tregex4
tregex

  • Help

@NP matches NP, NP-SBJ, NP-PRD, NP-TMP etc.

Matches: 432,777

anywhere…


T regex names
tregex: names

Similar to backreferences in Perl regexs

(@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma)

  • Help


T regex names1
tregex: names

same node

  • Pattern:

    • (@NP <, (@NP $+ (/,/ $+ (@NP $+ /,/=comma))) <- =comma)

Key:

<, first child

$+ immediate

left sister

<- last child


Tregex links
tregex: links

  • Help

ADJP=cat <, ~cat <- ~cat


T regex variable groups
tregex: variable groups

  • Help

@SBAR < /^WH.*-([0-9]+)$/#1%index << (__=empty < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))


Tregex variable groups
tregex: variable groups


Tregex variable groups1
tregex: variable groups

  • Different results from:

    • @SBAR < /^WH.*-([0-9]+)$/#1%index << (@NP < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))


Tregex variable groups2
tregex: variable groups

Example:

WHADVP

also possible

(not just WHNP)


Treebank guides
Treebank Guides

Tagging Guide

Arpa94 paper

Parse Guide


Treebank guides1
Treebank Guides

  • Parts-of-speech (POS) Tagging Guide, tagguid1.pdf (34 pages):

tagguid2.pdf: addendum, see POS tag ‘TO’


Treebank guides2
Treebank Guides

  • Parsing guide 1, prsguid1.pdf (318 pages):

prsguid2.pdf: addendum for the Switchboard corpus


Homework exercise
Homework Exercise

  • Report your regex search expression and frequency counts for NPs that have various classes of relative clauses attached

    • (Prsguid1.pdf, section 4.2.2, pg.63)

  • Criteria:

    • Relative clauses are adjoinedto the head noun phrase

    • The relative pronoun is

      • (1) given the appropriate WH-label,

      • (2) put inside the SBAR level, and

      • (3) coindexedwith a *T* in the position of a gap


Homework exercise1
Homework Exercise

  • Report and document your search string and frequency counts for different categories:

    • Tensed relatives

      • Subject relative clauses vs. non-subject relative clauses

      • That-relatives vs. wh-word relatives vs. zero relatives

    • Infinitival relatives

      • Subject relative clauses vs. non-subject relative clauses

  • Submit a snapshot of a tree from the WSJ for each of the categories


Homework exercise2
Homework Exercise

  • Put everything in one PDF file

  • Submit by email before class next time

  • Be prepared to come up and explain your searches to the class


ad