1 / 44

Introduction to Natural Language Processing

Introduction to Natural Language Processing. Heshaam Faili hfaili@ece.ut.ac.ir. Session Agenda. Artificial Intelligence Natural Language Processing History of NLP Statistical NLP Applications of NLP. AI Concepts and Definitions. Encompasses Many Definitions

kinipela
Download Presentation

Introduction to Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction toNatural Language Processing Heshaam Faili hfaili@ece.ut.ac.ir

  2. Session Agenda • Artificial Intelligence • Natural Language Processing • History of NLP • Statistical NLP • Applications of NLP

  3. AI Concepts and Definitions • Encompasses Many Definitions • AI Involves Studying Human Thought Processes • Representing Thought Processes on Machines

  4. Artificial Intelligence • Behavior by a machine that, if performed by a human being, would be considered intelligent • “…study of how to make computers do things at which, at the moment, people are better” (Rich and Knight [1991]) • Theory of how the human mind works (Mark Fox)

  5. AI Objectives • Make machines smarter (primary goal) • Understand what intelligence is • Make machines more useful (practical purpose) (Winston and Prendergast [1984])

  6. Turing Test for Intelligence A computer can be considered to be smart only when a human interviewer, “conversing” with both an unseen human being and an unseen computer, can not determine which is which

  7. Major AI Areas • Expert Systems • Natural Language Processing • Speech Understanding • Robotics and Sensory Systems • Computer Vision and Scene Recognition • Intelligent Computer-Aided Instruction • Neural Computing • Fuzzy Logic • Genetic Algorithms • Intelligent Software Agents

  8. What is NLP ? • Natural Language is one of fundamental aspects of human behaviors. • One of the final aim of human-computer communication. • Provide easy interaction with computer • Make computer to understand texts.

  9. Major Disciplines Studying Language

  10. Graphical UI Command-line NL UI Human Computer Interaction level Interaction Level • The level that computer and human interact. • NL used for make Interaction level near to human.

  11. Other Titles • The most common titles, apart from Natural Language Processing include: • Automatic Language Processing • Computational Linguistics • Natural Language Understanding

  12. Computational Linguistics • This is the application of computers to the scientific study of human language. • This definition suggests that there are connections with Cognitive Science, that is to say, the study of how humans produce and understand language.

  13. Computational Linguistics • Historically, Computational Linguistics has been associated with work in Generative Linguistics and formerly included the study of formal languages (eg finite state automata) and programming languages.

  14. Natural Language Understanding • Distinguish a particular approach to Natural Language Processing. • The people using this title tend to lay much emphasis on the meaning of the language being processed, in particular getting the computer to respond to the input in an apparently intelligent fashion.

  15. Natural Language Understanding • At one time, those who belonged to the Natural Language Understanding camp avoided the use of any syntactic processing, but textbooks that bear this title now include significant sections on syntactic processing, which suggests that the edge of the title has been rather blunted. (For instance, see Allen (1987; part 1).

  16. Motivation for NLP • Understand language analysis & generation • Communication • Language is a window to the mind • Data is in linguistic form • Data can be in Structured (table form), Semi structured (XML form), Unstructured (sentence form).

  17. Language Processing • Level 1 – Speech sound (Phonetics & Phonology) • Level 2 – Words & their forms (Morphology, Lexicon) • Level 3 – Structure of sentences (Syntax, Parsing) • Level 4 – Meaning of sentences (Semantics) • Level 5 – Meaning in context & for a purpose (Pragmatics) • Level 6 – Connected sentence processing in a larger body of text (Discourse)

  18. Phonetics • Concerns processing or identifying • Languages • Accents • Pauses • Word boundaries • Amplitude, Tone • Also includes background noise elimination • E.g. “I got up late” and “I got a plate” sound similar

  19. Lexicon • Deals with vocabulary of words • Uses Dictionary, Wordnet etc. • Various levels of richness in dictionary, e.g. tense, senses, usage, etc. • Resources – Princeton, Euro-wordnet, …

  20. Syntax • Involves parsing and understanding structure of grammar • Challenges • Ungrammatical sentences • Word order – fixed, free • Word attachment and scope • e.g. Old men and women were rescued. • Only old men or old women too • Prepositional phrase attachment • e.g. I saw the boy with a telescope • With associated with boy or telescope?

  21. eat agent obj spoon I rice Semantics • Concerned with “meaning” • Creates a structure for a sentence • Main verb associated with agent, object, instrument, etc. • E.g. I ate rice with spoon. instrument • Challenges • Representation • Domain (straddles into pragmatics) • To construct meaning from individual meanings

  22. Pragmatics • Use of the sentence in a situation • Understanding user's intention • E.g. Is that water? response different on dining table and in chemistry lab • Applications: Search engine tuned to user preferences

  23. Discourse • Processing of connected text • Co-reference – Two expressions in the utterance, both refer to the same thing. • Examples • Pronoun to noun binding – John is sleeping. He is lazy (He refers to John) • In an article – George Bush, Mr. Bush, The President of United States, The President • General to specific – Ferrari launched a new model. This car is much better than the previous one. Car refers to new model launched

  24. NLP History (1) • The first recognizable NLP application was a dictionary look-up system developed at Birkbeck College, London in 1948.

  25. NLP History (2) • NLP from 1966-1980 • Augmented Transition Networks • Case Grammar • Semantic representations • ConceptualDependency • Semanticnetwork • Procedural semantics

  26. NLP History (3) • The key systems were: • LUNAR: A database interface system that used ATNs and Woods' Procedural Semantics. • LIFER/LADDER: One of the most impressive of NLP systems. It was designed as a natural language interface to a database of information about US Navy ships. • NLP from 1980 - 1990 - Grammar Formalisms • NLP from 1990- 2000 - Multilinguality and Multimodality • NLP from 2000-now - Statistical Approaches and Practical Uses

  27. Why NLP is Hard?

  28. Why NLP is Hard?

  29. Why NLP is Hard?

  30. Why NLP is Hard?

  31. Why NLP is Hard?

  32. Basics of statistical NLP • Consider NLP problems as sequence labeling tasks • Amenable to machine learning (training and generalization) • In classical NLP – rules are obtained from linguists • In statistical NLP – probabilities are learnt from data

  33. Noisy Channel Metaphor Speech Text Signal - I want food. - It is cold today. Noisy

  34. Data-Driven Approach The issues in this approach are - • Corpora collection (coherent piece of text) • Corpora cleaning – spelling, grammar, strange characters’ removal • Annotation • Named entity recognition • POS detection • Parsing • Meaning Again: The biggest challenge is Ambiguity.

  35. Sequence Labeling Tasks • In the order of complexity - • Dealing words – POS tagging, Named Entity Recognition (NER), Sense disambiguation • Phrases – Chunking • Sentences – Bracketing • Paragraphs – Co-referencing

  36. Examples of Levels • Example Sentence – The dog Bill went near cat Jack. It bit it • POS Tagging – • The dog Bill went near cat Jack. It bit it • DT NN NNP VBD PP NN NNP PN VBD PN • NER – • <person-name>Bill</person-name> • <person-name>Jack</person-name> • Sense – Using Wordnet • {dog, animal} – synset-id • synset-id assigned to each sense

  37. Chunking • (Beginning, Intermediate, End) • (The dog Bill) went near (the cat Jack) • B I E BIE BIE B I E • It bit it • BIE BIE BIE

  38. Parsing S NP VP V PP DT NP went P NP the N N near dog Bill the cat Jack

  39. Higher Order Structures • Bracketing – • [S [NP] [VP [V [PP [P [NP]]]]]] [S [NP] [VP [V [NP]]]] • Co-referencing • The dog Bill went near the cat Jack. It bit it • 1 2 3 4 5 6 7 8 9 10 11 • References – 2<-9, 7<-11, 2<-3, 7<-8

  40. Sequence labeling task is a classification task • POS • NER • Sense • Chunking • Bracketing Task Classification • word->POS cat{NN, VBD ...} • word->Name cat{person, place} • word->sense-id{001 ... N} • word->{B, I, E} • sentence->{has_tree, no_tree}

  41. Learning Algorithm • Knowledge Based • Rules • Decision Trees • Decision Lists • Statistical • Graphical Models – HMM • Neural Networks • Support Vector Machines (SVM)

  42. Applications • Machine Translation: different strategies • Systran: www.Systransoft.com • Google: Translate.google.com • Question – Answering • MIT Q&A system( START ): http://start.csail.mit.edu/ • Summarization: • Information Extraction • Spell Checking • Microsoft Spell Checker • Call centre • MT for SMS

  43. NLP Laboratory • The first aim is to establish a virtual center for NLP related researches • Defining of practical applications specially on Persian • POS TAGGER, Spell Checker, n-gram model, Machine translation, NER , Document Classification, Search Engine, Summarization, • Defining several research projects • Sharing different resources and experiences • Make a foundation of NLP-Suite • Like TINA : MIT NLP-SUITE • Contact me for any request on NLP domain (hfaili@ece.ut.ac.ir)

  44. ?

More Related