ling 581 advanced computational linguistics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
LING 581: Advanced Computational Linguistics PowerPoint Presentation
Download Presentation
LING 581: Advanced Computational Linguistics

Loading in 2 Seconds...

play fullscreen
1 / 29

LING 581: Advanced Computational Linguistics - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

LING 581: Advanced Computational Linguistics. Lecture Notes January 26th. Penn Treebank. Bracketing guidelines. Ungraded Homework Exercise. Search for NP trace relative clauses as defined below:. Be ready to c ompare search pattern and number f ound next time in class.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'LING 581: Advanced Computational Linguistics' - aderes


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
penn treebank
Penn Treebank

Bracketing

guidelines

ungraded homework exercise
Ungraded Homework Exercise
  • Search for NP trace relative clauses as defined below:

Be ready to

compare search

pattern and

number

found next time

in class

ungraded homework exercise1
Ungraded Homework Exercise

@NP < @NP < @SBAR

12038

ungraded homework exercise2
Ungraded Homework Exercise

@NP < @NP < @SBAR

plus WH indices

10956 down from 12038

ungraded homework exercise3
Ungraded Homework Exercise

@NP < @NP < (@SBAR < /^-NONE-/)

529

Note

-NONE- < *ICH*

ungraded homework exercise5
Ungraded Homework Exercise

Not all

@NP < @NP < (@SBAR < /^-NONE-/)

are relative clauses

ungraded homework exercise6
Ungraded Homework Exercise

@NP < @NP < (@SBAR < /^-NONE-/)

plus *ICH*

count drops from 529 to 166

ungraded homework exercise7
Ungraded Homework Exercise

@NP < @NP < (@SBAR < /^-NONE-/)

plus *ICH*

Is 166 too low?

How about other -NONE- nodes?

homework exercise
Homework Exercise

Use the bracketing guides and choose three “interesting” constructions

Find all occurrences in the WSJ PTB

homework exercise1
Homework Exercise
  • 581 Homework rules
    • Due next lecture
    • Present your findings in class (slides)
parsing
Parsing

… from Treebank search to stochastic parsers trained on the WSJ Penn Treebank

bikel collins
Bikel Collins
  • Java re-implementation of Collins’ parser
  • Paper
    • Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) 
in Computational Linguistics, 30(4), pp. 479-511.
    • http://www.cis.upenn.edu/~dbikel/papers/collins-intricacies.pdf
  • Software
    • http://www.cis.upenn.edu/~dbikel/
bikel collins1
Bikel Collins
  • Download and install Dan Bikel’s parser
  • File: install.sh
    • Java code
    • but at this point I think Windows won’t work because of the shell script (.sh)
    • maybe after files are extracted?
bikel collins2
Bikel Collins
  • Download and install the POS tagger MXPOST

parser doesn’t actually need a separate tagger…

bikel collins3
Bikel Collins
  • Training the parser with the WSJ PTB
  • See guide
    • http://www.cis.upenn.edu/~dbikel/download/dbparser/guide.pdf

directory: TREEBANK_3/parsed/mrg/wsj

chapters 02-21: create one single .mrg file

events: wsj-02-21.obj.gz

bikel collins4
Bikel Collins
  • Settings:
bikel collins5
Bikel Collins
  • Parsing
    • Command
  • Input file format (sentences)
bikel collins6
Bikel Collins
  • Verify the trainer and parser work on your machine
bikel collins7
Bikel Collins
  • File: bin/parse is a shell script that sets up program parameters and calls java
bikel collins9
Bikel Collins
  • File: bin/train is another shell script
bikel collins10
Bikel Collins
  • Relevant WSJ PTB files
bikel collins11
Bikel Collins
  • If you have tcl/tk installed, I use a wrapper to call Dan Bikel’s code

makes it easy to work the parser without memorizing the command line options

bikel collins12
Bikel Collins
  • For tree viewing, you can use tregex

For demos, I use my own viewer

bikel collins13
Bikel Collins
  • POS tagging (MXPOST, in directory jmx)
    • tagger_input
    • $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt
  • Parsing
    • set ddf "wsj-02-21.obj.gz”
    • set properties "collins.properties"
    • parser_input
    • $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout
  • Training
    • set mrg "wsj-02-21.mrg”
    • set properties "collins.properties"
    • $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout

Unix file descriptors

0 Standard input (stdin)

  • Standard output (stdout)
  • Standard error (stderr)

GUI components

frame .input

text .input.t -height 4 -yscrollcommand {.input.s set}

scrollbar .input.s -command {.input.tyview}

frame .tagged

text .tagged.t -height 9 -yscrollcommand {.tagged.s set}

scrollbar .tagged.s -command {.tagged.tyview}

Code

proc tagger_input {} {

set lines [.input.t get 1.0 end]

set infile [open "/tmp/test.txt" w]

puts -nonewline $infile [string trimright $lines]

close $infile

}

proc parser_input {} {

set lines [.tagged.t get 1.0 end]

set infile [open "/tmp/test2.txt" w]

puts -nonewline $infile [string trimright $lines]

close $infile

}