1 / 13

LING/C SC 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics. Lecture 3 Jan 17 th. 2019 HLT Lecture Series. Named Entity Recognition. In my other class, doing a demo: University of Illinois https://cogcomp.org/page/demo_view/NERextended Unfortunately, it is down this week so far….

herbertm
Download Presentation

LING/C SC 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING/C SC 581: Advanced Computational Linguistics Lecture 3 Jan 17th

  2. 2019 HLT Lecture Series

  3. Named Entity Recognition In my other class, doing a demo: • University of Illinois • https://cogcomp.org/page/demo_view/NERextended • Unfortunately, it is down this week so far…

  4. Named Entity Recognition • Google Cloud Natural Language: • https://cloud.google.com/natural-language/ • also supplies sentiment/magnitude scores for the identified entities

  5. Named Entity Recognition

  6. Named Entity Recognition • Illinois Named Entity Recognizer example: Helicopters will patrol the temporary no-fly zone around New Jersey's MetLife Stadium Sunday, with F-16s based in Atlantic City ready to be scrambled if an unauthorized aircraft does enter the restricted airspace. Down below, bomb-sniffing dogs will patrol the trains and buses that are expected to take approximately 30,000 of the 80,000-plus spectators to Sunday's Super Bowl between the Denver Broncos and Seattle Seahawks. The Transportation Security Administration said it has added about two dozen dogs to monitor passengers coming in and out of the airport around the Super Bowl. On Saturday, TSA agents demonstrated how the dogs can sniff out many different types of explosives. Once they do, they're trained to sit rather than attack, so as not to raise suspicion or create a panic. TSA spokeswoman Lisa Farbstein said the dogs undergo 12 weeks of training, which costs about $200,000, factoring in food, vehicles and salaries for trainers. Dogs have been used in cargo areas for some time, but have just been introduced recently in passenger areas at Newark and JFK airports. JFK has one dog and Newark has a handful, Farbstein said.

  7. Dependency-Based Parsing

  8. Universal Dependencies (UD) http://universaldependencies.org/ • 100 treebanks in over 70 languages Some relations involving dependent clauses: • ccomp: connects higher verb with verbal head of sentential complement with overt subject • xcomp: connects higher verb with verbal head of non-finite sentential complement without a subject. • csubj: connects higher verb with verbal head of sentential subject. • vmod ➤ advcl/acl: connects word to verbal head of a reduced non-finite verbal modifier (deprecated in UD; still emitted by syntaxnet)

  9. Google Cloud Natural Language RRS Sir David Attenborough "BoatyMcBoatface" • ParseyMcParseface (Andor et al., 2016) • Free: DragNN (Kong et al., 2017), the follow-on to SyntaxNet(2016) • Free sampling at https://cloud.google.com/natural-language/ • For-Pay Google Cloud version is trained on additional proprietary corpora

  10. Google Cloud Natural Language is ^

  11. Google Cloud Natural Language

  12. Quick Homework 3 • The Penn Treebank is partially installed as a corpus in NLTK Data (Sections 00 and 01: wsj_0001.mrg to wsj_0199.mrg) • from nltk.corpus import treebank • Methods: • .words() • .sents() • .parsed_sents() • .draw() • .fileids()

  13. Quick Homework 3 • Pick a random (see right) parse from treebank • Run it through the Google Cloud Parser • Analyze and comment on how it compares to the gold standard parse • include the gold tree and the Google dependency parse • One PDF file • Due next Wednesday (by midnight) • import random • random.seed() • random.randrange(0,3914) 1462 >>> len(treebank.sents()) 3914

More Related