1 / 16

Natural Language and Speech Processing

Natural Language and Speech Processing. Creation of computational models of the understanding and the generation of natural language. Different fields coming together, looking at speech and language processing from different perspectives. Computational Linguistics (Linguistics)

starbuck
Download Presentation

Natural Language and Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language and Speech Processing • Creation of computational models of the understanding and the generation of natural language. • Different fields coming together, looking at speech and language processing from different perspectives. • Computational Linguistics (Linguistics) • Natural Language Processing (Computer Science) • Speech Recognition (Electrical Engineering) • Computational Psycholinguistics (Psychology)

  2. Different Levels of Speech and Language Processing • Phonetics and Phonology – The study of sounds in language • Morphology – The study of components of words • Syntax – The study of structural relationships between words • Semantics – The study of meaning • Pragmatics – The study of use of language for accomplishing goals • Discourse – The study of large linguistic units

  3. Ambiguity in Language Almost in every level ambiguity is introduced, and one of the main tasks in NLP is to resolve such ambiguities. I made her duck= • I cooked waterfowl for her. • I cooked waterfowl belonging to her. • I created the (plastic?) duck she owns. • I caused her to quickly lower her body. • I waved my magic wand and turned her into a waterfowl. Time flies like an arrow vs. Fruit flies like a banana

  4. Models and Algorithms for NLP • Taken mainly from Computer Science, Mathematics and Linguistics • State Machines and Automata: Finite-state automata & transducers, weighted automata, Markov models… • Formal Rule Systems: Regular grammars, CFGs, Unification Grammars… • Logic: First-order Calculus, Predicate Logic… • Probability Theory: Statistical Processing, Machine Learning…

  5. The Turing Test • Alan Turing (1950): Empirical test for Artificial Intelligence. A human interrogator asks questions to a human and to a machine through a teletype, and tries to find out who is the human and who is the machine. Q: Please write me a sonet on the topic of the Fouth Bridge. A: Count me out on this one. I never could write poetry. Q: Add 34957 to 70764. A: (Pause for 30 seconds) 105621.

  6. ELIZA • Weizenbaum (1966): Program imitating the responses of a psychotherapist. User: You are like my father in some ways. ELIZA: What resemblance do you see? User: You are not very aggresive but I think you don’t want me to notice that. ELIZA: What makes you think I am not very aggressive? User: You don’t argue with me. ELIZA: Why do you think I don’t argue with you? • Used simple pattern matching, without any deeper knowledge of the world or of the conversation. • http://www-ai.ijs.si/cgi-bin/eliza/eliza_script

  7. Foundational Insights:1940s and 1950s • Automata. • Based of Turing’s computational model. • Led to formal language theory (Chomsky). • Probabilistic – Information Theoretic Models. • Transmission of language and communication treated as a noisy channel and decoding problem. • First machine speech recognizers (1952).

  8. Two Camps: 1957-1970 • Symbolic vs. Stochastic Paradigm. • Symbolic • Formal language theory, generative syntax (Chomsky) • Implementation of first parsers • Artificial Intelligence • Stochastic • Bayesian Methods • Optical Character Recognition • Authorship Identification

  9. Four Paradigms: 1970-1983 • Stochastic Paradigm • Speech Recognition Algorithms (Hidden Markov Models) • Logic-Based Paradigm • Work that led to Prolog, Functional Grammars and Unification • Natural Language Understanding • SHRDLU • Question-answering Systems • Discourse Modeling • Automatic Reference Resolution

  10. Empiricism and Finite-State Models:1983-1993 • Return of Empiricism and Finite State Methods. • Not so popular in the previous decades. • Finite-state models: • Phonology and morphology • Syntax • Probabilistic models: • Speech recognition • Part of speech tagging • Probabilistic parsing

  11. The Field Comes Together:1994- • Spread of probabilistic and data-driven methods to all kinds of problems. • Increase in computer speed led to commercial exploitation of speech and language technologies. • The web led to emphasis on information retrieval and extraction. • Some lessened emphasis on theoretical work

  12. Practical Application Areas • Information-accessing Systems • Database queries • Information Retrieval • Information Extraction • Task-oriented Systems • Text-editors • Robots • Educational Systems • Intelligent Tutoring • Student Modelling • Translation Systems • Machine Translation • Computer-aided translation

  13. Practical Application Areas System Modality • Text • Speech • Multi-modal applications System Initiatives • Analysis • Generation

  14. Theoretical Applications • Theory-specification tools • Transformational Grammar, ATNs, LFG, GPSG, HPSG, Systemic Grammar, Functional Unification Grammar… • Theoretical modeling • Processing models: Parsing, Semantics, Speech Recognition. • Acquisition models: Language Learning Models

  15. Current Research http://cslu.cse.ogi.edu/HLTsurvey/HLTsurvey.html • Spoken Language Input • Written Language Input • Language Analysis and Understanding • Language Generation • Spoken Output Technologies • Discourse and Dialogue • Document Processing • Multilinguality • Multimodality • Transmission and Storage • Mathematical Methods • Language Resources • Evaluation

  16. Course Topics • Computational Morphology • Regular Grammars, Finite-state Automata and Transducers • Corpus Linguistics • N-Grams, Part-of-speech Tagging • Parsing and Context-free Grammars • Unification Grammars • Lexical Semantics and WordNet • Word Sence Disambiguation and Information Retrieval • Machine Translation

More Related