1 / 43

Information Communication Theory

Information Communication Theory. Kentaro Inui ( 乾 健太郎 ) Naoaki Okazaki ( 岡崎 直観 ). (情報伝達学). Course Plan. Part I ( Okazaki ) 10/04: Introduction 10/11: Classification 10/18: Part-of-speech tagging 10/25: Syntactic parsing 11/01: Statistical parsing. Part II ( Inui )

Download Presentation

Information Communication Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. InformationCommunicationTheory Kentaro Inui (乾 健太郎) Naoaki Okazaki (岡崎 直観) (情報伝達学) Information Communication Theory (情報伝達学)

  2. Course Plan • Part I (Okazaki) • 10/04: Introduction • 10/11: Classification • 10/18: Part-of-speech tagging • 10/25: Syntactic parsing • 11/01: Statistical parsing • Part II (Inui) • 11/08: Features and unification • 11/15: Representation of meaning • 11/22: Computational semantics • 11/29: Computational lexical semantics • 12/06: (no class) • Part III (Inui, Okazaki, TAs) • 12/13, 12/20, 2013/01/10, 01/17, 01/24 • Programming exercises and project from Natural Language Processing with Python(by Steven Bird) • Lectures given at 計算機大演習室(New Student Laboratory Building for Information Engineering, 情報新棟1階) Information Communication Theory (情報伝達学)

  3. Course Format • Text (optional) • Jurafsky, Daniel and Martin, James H. Speech and Language Processing. Prentice-Hall, 2009 (2nd Edition) • ~ \6,000 available at amazon.co.jp • Bird, Steven et al. Natural Language Processing with Python. Oreilly& Associates Inc., 2009 • 萩原 正人,中山 敬広,水野 貴明 訳 『入門 自然言語処理』 O'Reilly Japan, 2010 • Grading • Exercises (given in lectures): 40% • Final report (programming project) Information Communication Theory (情報伝達学)

  4. Handouts • If necessary, please print out a handout and bring it to the class by yourself • Alternatively, browse it on your laptop • Handouts will be available at (before dawn): • http://www.cl.ecei.tohoku.ac.jp/index.php?InformationCommunicationTheory • Username: nlp2012 • Password: chukougishitsu Information Communication Theory (情報伝達学)

  5. Contact Information • Office hours: • Tue, 1:00-2:30pm or by appointment • Office: • Room 305 (108 after Nov), Electrical Engineering and Applied Physics Research Building No.3 (電気系3号館) • Contact: • inui@ecei.tohoku.ac.jp@inuikentaro • okazaki@ecei.ecei.tohoku.ac.jp @chokkanorg Information Communication Theory (情報伝達学)

  6. Introduction Naoaki Okazaki okazaki@ecei.tohoku.ac.jp http://www.chokkan.org/ http://twitter.com/#!/chokkanorg #nlptohoku http://www.chokkan.org/lectures/2012nlp/p/01.pdf Information Communication Theory (情報伝達学)

  7. Natural Language Processing (NLP) • Giving computers the ability to process human language • As old as the idea of computers themselves! • Implementations and implications of the exciting idea • The long-awaited dream (that has not come true yet) Atom (Astro boy) Doraemon C-3PO (Star Wars) Information Communication Theory (情報伝達学)

  8. What are needs to be done for understanding languages as humans do? Part I: Knowledge (disciplines) Information Communication Theory (情報伝達学)

  9. Lexical semantics (語彙意味論) How much Chinese silk was exported to Western Europe by the end of the 18th century? Meaning of words Information Communication Theory (情報伝達学)

  10. Compositional semantics (合成意味論) How much Chinese silk was exported to Western Europe by the end of the 18th century? Meaning of constituents 1740 1780 1800 1700 1720 1760 The 18th Century of the end Information Communication Theory (情報伝達学)

  11. Compositional??? (with adjectives) !? wine white towel white former girl friend hole black Information Communication Theory (情報伝達学)

  12. Morphology (形態論) • Inflection (屈折) • is – was – being – been • export – exports – exporting – exported – exported • Derivation (派生) • China – Chinese • West – Western How much Chinese silk was exported to Western Europe by the end of the 18th century? Study on word formations (breaking words down into morphemes) Information Communication Theory (情報伝達学)

  13. Syntax (統語論,文法) • Part-of-speech (POS): Lecture #3 • Categorization of words, e.g., nouns, verbs, adjectives, adverbs • Constituency: Lectures #4 and #5 • Grouping words that may behave as a single unit or phrase • e.g., noun phrase, verb phrase, prepositional phrase • Grammatical relations: Lecture #5 • Relationship between words/constituents Principles and rules for constructing phrases and sentences Information Communication Theory (情報伝達学)

  14. Syntactic tagging and parsing • Assign a structure to an input sentence S Nivre and Kubler (2006) Constituent parsing VP NP PU PP NP NP NP POS tagging NN IN JJ VBD JJ NN JJ NNS Economic news had little effect on financial markets . nmod sbj nmod nmod nmod obj pc Dependency parsing p Information Communication Theory (情報伝達学)

  15. Semantic role (意味役割) How much Chinese silk was exported to Western Europe by the end of the 18th century? TEMPORAL 1740 1780 1800 1700 1720 1760 The 18th Century How much Chinese silk was exported to Western Europe by southern merchants? AGENT Information Communication Theory (情報伝達学)

  16. Coreference (共参照) U: Where is The Green Hornet playing in Mountain View? S: The Green Hornet is playing at the Century 16 theatre. U: When is it playing there? S: It’s playing at 2pm, 5pm, and 8pm. U: I’d like 1 adult and 2 children for the first show. How much would that cost? What does “it” refers to? What does “the first show” refers to? What does “that” refers to? We can guess these easily! Information Communication Theory (情報伝達学)

  17. Coreference (共参照) U: Where is The Green Hornet playing in Mountain View? S: The Green Hornet is playing at the Century 16 theatre. U: When is it playing there? S: It’s playing at 2pm, 5pm, and 8pm. U: I’d like 1 adult and 2 children for the first show. How much would that cost? How words like that or pronouns like it refer to previous parts of the discourse Information Communication Theory (情報伝達学)

  18. Pragmatics (語用論) • Bob: Are you coming to the party? • Jane: I’m afraid I can’t. • Bob: Are you coming to the party? • Jane: You know, I’m really busy. • Bob: Could you pass me the sugar? • Jane: Yes. Here you are. Actions that speakers intend by their use of text Information Communication Theory (情報伝達学)

  19. Discourse (談話) Coherent structured groups of text http://www.isi.edu/~marcu/discourse/tagging-ref-manual.pdf Information Communication Theory (情報伝達学)

  20. Various knowledge about languages • Morphology (形態論): meaningful components within words • Syntax (文法): structural relationships between words • Semantics (意味論): meanings of words, phrases, sentences • Discourse (談話): relationships across/beyond different sentences or statements; contextual processing • Pragmatic (語用論): relationship of meaning to the goals and intentions of speakers; how we use languages to communicate • World knowledge (世界知識): facts of the world; common sense Information Communication Theory (情報伝達学)

  21. What are needs to be done for understanding languages as humans do? Part II: Ambiguity Information Communication Theory (情報伝達学)

  22. Ambiguity • We may build multiple, alternative linguistic structuresand interpretations for a single input • I made her duck (see more examples later) • Disambiguation (or resolution): to decide which linguistic/semantic structure/interpretation is the most appropriate (in the context) Information Communication Theory (情報伝達学)

  23. Part-of-speech tagging and ambiguity Time flies like an arrow . NN VBZ IN DT NN . (光陰矢のごとし) VB NNS IN DT NN . (ハエの速度を矢のように測定せよ) NN NNS VBP DT NN . (時蠅は矢を好む) Information Communication Theory (情報伝達学)

  24. Attachment ambiguity (1/3) • I saw the girl on the hill with a telescope. • I saw the girl on the hill with a telescope. Information Communication Theory (情報伝達学)

  25. Attachment ambiguity (2/3) • I saw the girl on the hill with a telescope. • I saw the girl on the hill with a telescope. Information Communication Theory (情報伝達学)

  26. Attachment ambiguity (3/3) • I saw the girl on the hill with a telescope. • I saw the girl on the hill with a telescope. Information Communication Theory (情報伝達学)

  27. Coordination ambiguity • Put [[the insects in the box] and [the bowl on the table]] • Put the insects in [[the box] and [the bowl on the table]] Information Communication Theory (情報伝達学)

  28. Semantic ambiguity • Syntax structure is insufficient to represent the meaning • Distinction between syntax and semantics • Colorless green ideas sleep furiously (Chomsky, 1957) • Opposite • John bought a book from Mary vs Mary sold a book to John • Lexical ambiguity • I went to the bank… (of the river) or (to get some money) • Quantifier • Every man loves a woman Information Communication Theory (情報伝達学)

  29. The state-of-the-art of Natural Language Processing Information Communication Theory (情報伝達学)

  30. Commercial world • A lot of exciting staff going on… Information Communication Theory (情報伝達学)

  31. Machine translation (Google) Information Communication Theory (情報伝達学)

  32. Machine translation (Google) Information Communication Theory (情報伝達学)

  33. Watson (IBM) • Question answering system built on IBM’s DeepQA technology • 14-16 February 2011, Watson beat two human competitors, the biggest all-time money winner on Jeopardy! and the record holder for the longest championship streak • Hardware • 2880 processor cores (3.5 GHz POWER7 eight core processors) • 16 TB RAM in total • Software • Written in Java and C++ • Using Apache Hadoop framework for distributed computing • Data • 200M pages (about 1M books) of structured and unstructured content • Consuming 4T of disk storage • Encyclopedias, dictionaries, thesauri, newswire articles, literary works http://en.wikipedia.org/wiki/Watson_(computer) Information Communication Theory (情報伝達学)

  34. Jeopardy! • American quiz show featuring • history, literature, the arts, pop culture, science, sports, geography, wordplay, etc. • Six categories are announced, each with five trivia clues • A correct response adds the dollar value • An incorrect response or a failure to respond within a five-second time limit deducts the dollar value http://en.wikipedia.org/wiki/Jeopardy! Information Communication Theory (情報伝達学)

  35. Final Jeopardy! and the Future of Watson • Watch the video (08:58): • http://www.youtube.com/watch?v=Wq0XnBYC3nQ Information Communication Theory (情報伝達学)

  36. Science behind an answer • Watch the very nice video (06:42): • http://www.youtube.com/watch?v=DywO4zksfXw Information Communication Theory (情報伝達学)

  37. Science behind an answer • Step 1: Question analysis • What is type of question being asked? • What is the question asking for? • Step 2: Hypothesis generation • Search millions of documents for possible answers • Step 3: Hypothesis and evidence scoring • Collect positive and negative evidences to support each answer • Score evidences based on everything from source material reliability to whether time and locations appear correct • Parallelized evidence scoring for each possible answer • Step 4: Final merging and ranking • Learn the importance of each evidence by practicing games • Yield the final ranking of possible answers • Decide whether Watson answers the question or not based on the confidence Information Communication Theory (情報伝達学)

  38. A shame (of NLP) • Japanese translation of the book, “Einstein: His Life and Universe,” published on 23 June 2011 • Chapter 13 was translated by computers, not by humans! • How this happened: http://www.amazon.co.jp/review/R29GQAF5DUOAEW/ref=cm_cr_rdp_perm • Very rare incident that an MT’ed book is published • Revised version was published on 17 Aug 2011 Information Communication Theory (情報伝達学)

  39. Imagine the original sentence • ボルンの妻のヘートヴィヒに最大限にしてください。(そのヘートヴィヒは,彼の家族に関する彼の処理,今や説教された頃,彼が「自分がそのかなり不幸な回答に駆り立てられるのを許容していないべきでない」と自由に彼に叱った)。以上は,彼が目立つべきであり,彼女が言ったのを「科学の人里離れている寺」に尊敬します。 • Max Born's wife, Hedwig, who had freely scolded Einstein about his treatment of his family, now lectured, “[You should] not have allowed yourself to be goaded into that rather unfortunate reply.” He should show more respect, she said, for “the secluded temple of science.” (P286) Information Communication Theory (情報伝達学)

  40. Passing exams for University of Tokyo Information Communication Theory (情報伝達学)

  41. Writing short science fictions Information Communication Theory (情報伝達学)

  42. Goal of this course • Overview the issues and technologies for natural language understanding • What is possible/easy? What is impossible/difficult? • Why is this achieved or not achieved by the current technology? • Provide fundamental theories and techniques for natural language processing • Some techniques are useful for other research fields • Exercise programming with real NLP tasks • You will be an experienced engineer! Information Communication Theory (情報伝達学)

  43. Course plan • 4 Oct: Introduction • 11 Oct: Classification • Spam filtering, linear classifier, feature extraction, perceptron, logistic regression, evaluation (precision, recall, F1) • 18 Oct: Part-of-speech tagging • 25 Oct: Syntactic parsing • 1 Nov: Statistical parsing Information Communication Theory (情報伝達学)

More Related