1 / 16

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics. Lecture Notes January 19th. Course. Webpage http://dingo.sbs.arizona.edu/~sandiway /ling581-11/ Enrollment. Course Objectives. Gain meaningful project experience dealing with natural language software packages installation

wes
Download Presentation

LING 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 581: Advanced Computational Linguistics Lecture Notes January 19th

  2. Course • Webpage • http://dingo.sbs.arizona.edu/~sandiway/ling581-11/ • Enrollment

  3. Course Objectives • Gain meaningful project experience • dealing with natural language software packages • installation • input data formatting • operation • project exercises • useful “real-world” computational experience • write small programs • abilities gained will be of value to employers

  4. Computational Facilities • Advise using your own laptop/desktop • we can also make use of this computer lab • but you don’t have installation rights on these computers • Platforms • You need to run some variant of Unix… (your task #1 for this week) e.g. • Linux • de facto standard for advanced/research software • Cygwin on Windows • http://www.cygwin.com/ • Linux-like environment for Windows making it possible to port software running on POSIX systems (such as Linux, BSD, and Unix systems) to Windows. • MacOS X • Not quite Linux, some porting issues, especially with C programs

  5. Theme • Language Understanding

  6. Project Topics • PTB (Penn Treebank) search/lookup software (tgrep2), • Part-of-speech taggers. • The use and modification of statistical parsers trained on Treebanks (Bikel-Collins, and others) • Ontologiesand Semantic Networks: WordNet etc. • Question-Answering (QA) • Sentence Parsing using contemporary linguistic theory: Minimalist Program

  7. Grading • Completion of all homework tasks will result in a satisfactory grade (A)

  8. In the News recently… www.ibmwatson.com

  9. You will be exposed to Perl Java Lisp s-exps Bikel-Collins Parser You will need to review concepts from LING 538 regexp use Penn POS tags Project 1: PTB

  10. PTB • Availability • Linguistic Data Consortium (LDC) • U. of Arizona is a (fee-paying) member of this consortium • Resources are made available to the community through the main library • URL • http://sabio.library.arizona.edu/search/X

  11. PTB (V3) • Call Record

  12. Task 1 • Install cygwin or ubuntu • Install the PTB • Borrow it from the library • Or use the cd I’ve brought with me • Familiarize yourself with the organization and layout of the files • e.g. the difference between mrg and prd formats • As is standard in the literature, we’ll be using the WSJ (Wall Street Journal) section of the PTB

  13. 00/wsj_0001.mrg ( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (, ,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (. .) )) 00/wsj_0001.mrg ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (, ,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (, ,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (. .) )) TreeBank Browsing

  14. TreeBank Browsing • My out-dated tool (treebank viewer) • URL • http://dingo.sbs.arizona.edu/~sandiway/treebankviewer/

  15. PTB Search Tools Looking ahead • Google and Install • tgrep2 • http://tedlab.mit.edu/~dr/Tgrep2/ • a fast command line search tool for parse trees • C program (source, Makefile) • Tregex • http://nlp.stanford.edu/software/tregex.shtml • Graphical java version • Penn Treebank Online (tgrep interface) • http://www.ldc.upenn.edu/ldc/online/treebank/ • doesn’t seem to be working tgrepsearch currently unavailable.. • tgrep • VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to)) • approximation to finding Verb Phrases headed by "believe" that have an infinitival complement with a non-null subject

  16. PTB Search Tools

More Related