1 / 27

Natural Language Processing (NLP)

Natural Language Processing (NLP). Overview and history of the field Knowledge of language The role of ambiguity Models and Algorithms Eliza, Turing, and conversational agents History of speech and language processing Regular expressions. Computer Speech and Language Processing.

alpha
Download Presentation

Natural Language Processing (NLP)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing (NLP) • Overview and history of the field • Knowledge of language • The role of ambiguity • Models and Algorithms • Eliza, Turing, and conversational agents • History of speech and language processing • Regular expressions

  2. Computer Speech and Language Processing • What is it? • Getting computers to perform useful tasks involving human languages whether for: • Enabling human-machine communication • Improving human-human communication • Doing stuff with language objects • Examples: • Question Answering • Machine Translation • Spoken Conversational Agents

  3. Knowledge needed to build speaking computer • Speech recognition and synthesis • Dictionaries (how words are pronounced) • Phonetics (how to recognize/produce each sound of English) • Natural language understanding • Knowledge of the English words involved • What they mean • How they combine (what is a `pod bay door’?) • Knowledge of syntactic structure • I’m I do, Sorry that afraid Dave I’m can’t

  4. Eliza: Weizenbaum (1966) • User: You are like my father in some ways. • ELIZA: What resemblance do you see • User: You are not very aggressive but I think you don’t want me to notice that. • ELIZA: What makes you think I am not aggressive • User: You don’t argue with me • ELIZA: Why do you think I don’t argue with you • User: You are afraid of me • ELIZA: Does it please you to believe I am afraid of you

  5. Ambiguity • Computational linguists are obsessed with ambiguity • Ambiguity is a fundamental problem of computational linguistics • Resolving ambiguity is a crucial goal

  6. Ambiguity • Find at least 5 meanings of this sentence: • I made her duck

  7. Ambiguity • Find at least 5 meanings of this sentence: • I made her duck • I cooked waterfowl for her benefit (to eat) • I cooked waterfowl belonging to her • I created the (plaster?) duck she owns • I caused her to quickly lower her head or body • I waved my magic wand and turned her into undifferentiated waterfowl

  8. Ambiguity is Pervasive • I caused her to quickly lower her head or body • Lexical category: “duck” can be a N or V • I cooked waterfowl belonging to her. • Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun • I made the (plaster) duck statue she owns • Lexical Semantics: “make” can mean “create” or “cook”

  9. Ambiguity is Pervasive • Grammar: Make can be: • Transitive: (verb has a noun direct object) • I cooked [waterfowl belonging to her] • Ditransitive: (verb has 2 noun objects) • I made [her] (into) [undifferentiated waterfowl] • Action-transitive (verb has a direct object and another verb) - I caused [her] [to move her body]

  10. Ambiguity is Pervasive • Phonetics! • I mate or duck • I’m eight or duck • Eye maid; her duck • Aye mate, her duck • I maid her duck • I’m aid her duck • I mate her duck • I’m ate her duck • I’m ate or duck • I mate or duck

  11. Models and Algorithms • Models: formalisms used to capture the various kinds of linguistic structure. • State machines (fsa, transducers, markov models) • Formal rule systems (context-free grammars, feature systems) • Logic (predicate calculus, inference) • Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc) • Algorithms used to manipulate representations to create structure. • Search (A*, dynamic programming) • Supervised learning, etc etc

  12. Language, Thought, Understanding • A Gedanken Experiment: Turing Test • Question “can a machine think” is not operational. • Operational version: • 2 people and a computer • Interrogator talks to contestant and computer via teletype • Task of machine is to convince interrogator it is human • Task of contestant is to convince interrogator she and not machine is human.

  13. History: foundational insights 1940s-1950s • Automaton: • Turing 1936 • McCulloch-Pitts neuron (1943) • http://diwww.epfl.ch/mantra/tutorial/english/mcpits/html/ • Kleene (1951/1956) • Shannon (1948) link between automata and Markov models • Chomsky (1956)/Backus (1959)/Naur(1960): CFG • Probabilistic/Information-theoretic models • Shannon (1948) • Bell Labs speech recognition (1952)

  14. History: the two camps: 1957-1970 • Symbolic • Zellig Harris 1958 TDAP first parser • Cascade of finite-state transducers • Chomsky • AI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester) • Newell and Simon: Logic Theorist, General Problem Solver • Statistical • Bledsoe and Browning (1959): Bayesian OCR • Mosteller and Wallace (1964): Bayesian authorship attribution • Denes (1959): ASR combining grammar and acoustic probability

  15. Four paradigms: 1970-1983 • Stochastic • Hidden Markov Model 1972 • Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA • Logic-based • Colmerauer (1970,1975) Q-systems • Definite Clause Grammars (Pereira and Warren 1980) • Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification • Natural language understanding • Winograd (1972) Shrdlu • Schank and Abelson (1977) scripts, story understanding • Influence of case-role work of Fillmore (1968) via Simmons (1973), Schank. • Discourse Modeling • Grosz and colleagues: discourse structure and focus • Perrault and Allen (1980) BDI model

  16. Finite State Approach 83 - 93 • Finite State Models • Kaplan and Kay (1981): Phonology/Morphology • Church (1980): Syntax • Return of Probabilistic Models: • Corpora created for language tasks • Early statistical versions of NLP applications (parsing, tagging, machine translation) • Increased focus on methodological rigor: • Can’t test your hypothesis on the data you used to build it! • Training sets and test sets

  17. The field comes together: 1994-2007 • NLP has borrowed statistical modeling from speech recognition, is now standard: • ACL conference: • 1990: 39 articles 1 statistical • 2003 62 articles 48 statistical • Machine learning techniques key • NLP has borrowed focus on web and search and “bag of words models” from information retrieval • Unified field: • NLP, MT, ASR, TTS, Dialog, IR

  18. Regular expressions • A formal language for specifying text strings • How can we search for any of these? • woodchuck • woodchucks • Woodchuck • Woodchucks

  19. Regular Expressions • Basic regular expression patterns • Perl-based syntax (slightly different from other notations for regular expressions) • Disjunctions /[wW]oodchuck/

  20. Regular Expressions • Ranges[A-Z] • Negations [^Ss]

  21. Regular Expressions • Optional characters ? ,* and + • ? (0 or 1) • /colou?r/  colororcolour • * (0 or more) • /oo*h!/  oh! or Ooh! or Ooooh! • + (1 or more) • /o+h!/  oh! or Ooh! or Ooooh! • Wild cards .- /beg.n/  begin or began or begun

  22. RegularExpressions • Anchors ^ and $ • /^[A-Z]/  “Ramallah, Palestine” • /^[^A-Z]/  “¿verdad?” “really?” • /\.$/  “It is over.” • /.$/  ? • Boundaries \b and \B • /\bon\b/  “on my way” “Monday” • /\Bon\b/  “automaton” • Disjunction | • /yours|mine/  “it is either yours or mine”

  23. Disjunction, Grouping, Precedence • Column 1 Column 2 Column 3 …How do we express this? /Column [0-9]+ */ /(Column [0-9]+ +)*/ • Precedence • Parenthesis () • Counters * + ? {} • Sequences and anchors the ^my end$ • Disjunction |

  24. Example • Find me all instances of the word “the” in a text. • /the/ Misses capitalized examples • /[tT]he/ • Returns other or theology • /\b[tT]he\b/ • /[^a-zA-Z][tT]he[^a-zA-Z]/ • /(^|[^a-zA-Z])[tT]he[^a-zA-Z]/

  25. Errors • The process we just went through was based on fixing two kinds of errors • Matching strings that we should not have matched (there, then, other) • False positives • Not matching things that we should have matched (The) • False negatives

  26. More complex RE example • Regular expressions for prices • /$[0-9]+/ • Doesn’t deal with fractions of dollars • /$[0-9]+\.[0-9][0-9]/ • Doesn’t allow $199, not word-aligned • \b$[0-9]+(\.[0-9]0-9])?\b)

  27. Advanced operators

More Related