html5-img
1 / 26

Natural Language Processing: Data, Algorithms, and Knowledge

Natural Language Processing: Data, Algorithms, and Knowledge. BEARS 2011. Dan Klein Computer Science Division University of California, Berkeley. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A. Language Technologies. Goal: Deep Understanding.

kyrie
Download Presentation

Natural Language Processing: Data, Algorithms, and Knowledge

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing: Data, Algorithms, and Knowledge BEARS 2011 Dan Klein Computer Science Division University of California, Berkeley TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

  2. Language Technologies Goal: Deep Understanding Reality: Shallow Matching Requires robustness and scale Amazing successes, but fundamental limitations • Requires context, linguistic structure, meanings…

  3. Large-Scale NLP: Watson

  4. Factoids and Limitations

  5. Text Data is Superficial An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.

  6. … But Language is Complex • Semantic structures • References and entities • Discourse-level connectives • Meanings and implicatures • Contextual factors • Perceptual grounding • … An iceberg is a large piece of freshwater ice that has broken off from a snow-formed glacier or ice shelf and is floating in open water.

  7. More Data: Machine Translation Cela constituerait une solution transitoire qui permettrait de conduire à terme à une charte à valeur contraignante. SOURCE That would be an interim solution which would make it possible to work towards a binding charter in the long term . HUMAN [this] [constituerait] [assistance] [transitoire] [who] [permettrait] [licences] [to] [terme] [to] [a] [charter] [to] [value] [contraignante] [.] 1x DATA [it] [would] [a solution] [transitional] [which] [would] [of] [lead] [to] [term] [to a] [charter] [to] [value] [binding] [.] 10x DATA [this] [would be] [a transitional solution] [which would] [lead to] [a charter] [legally binding] [.] 100x DATA [that would be] [a transitional solution] [which would] [eventually lead to] [a binding charter] [.] 1000x DATA

  8. Data By Itself Isn’t Enough!

  9. Analysis and Alignment [Burkett, Blitzer, and Klein 10]

  10. Data and Knowledge • Classic knowledge representation worry: How will a machine ever know that… • Ice is frozen water? • Beige looks like this: • Chairs are solid? • Answers: • 1980: write it all down • 2000: get by without it • 2020: learn it from data

  11. Deeper Linguistic Analysis Hurricane Emily howled toward Mexico 's Caribbean coast on Sunday packing 135 mph winds and torrential rain and causing panic in Cancun, where frightened tourists squeezed into musty shelters . Accuracy: 90+ [Petrov and Klein 09]

  12. Learning Hidden Syntax Personal Pronouns (PRP) Proper Nouns (NNP) Parsing Accuracy: 90.5+ [Petrov and Klein 09]

  13. Data and Knowlege: Parsing They considered running the ad during the Super Bowl. running * during: 3k considered * during: 2k running it during: 239 considered it during: 112 [Bansal and Klein 11]

  14. Deeper Understanding: Reference

  15. Names vs. Entities

  16. Example Errors

  17. Discovering Knowledge

  18. Unsupervised Learning

  19. Coreference Systems

  20. Cross-Document Identity

  21. Cross-Document Summaries Lindsay Lohan pleaded not guilty Wednesday to felony grand theft of a $2,500 necklace, a case that could return the troubled starlet to jail rather than the big screen. Saying it appeared that Lohan had violated her probation in a 2007 drunken driving case, the judge set bail at $40,000 and warned that if Lohan was accused of breaking the law while free he would have her held without bail. The Mean Girls star is due back in court on Feb. 23, an important hearing in which Lohan could opt to end the case early. [Berg-Kirkpatrick, Gillick, and Klein 11]

  22. Grounded Language [Golland, Liang, and Klein 10]

  23. Grounding with Natural Data … on the beige loveseat.

  24. Predictions

  25. Conclusion • Simple algorithms and large data have gotten us amazingly far! • To go further, we need • Algorithms that work with deeper structure • Learning methods that turn data into knowledge • Systems that are contextualized

  26. Thank you! TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

More Related