1 / 41

Introduction to computational linguistics

Introduction to computational linguistics . Jay Munson (special thanks to Misty Azara) May 30, 2003. Today’s Goals . I. Introduction to computational linguistics (CL) through the discussion of 7 CL core areas. II. Identify Common CL applications

agatha
Download Presentation

Introduction to computational linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to computational linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

  2. Today’s Goals • I. Introduction to computational linguistics (CL) through the discussion of 7 CL core areas. • II. Identify Common CL applications • III. Identify the importance of theoretical linguistics in CL

  3. What is computational linguistics? • Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting

  4. What is computational linguistics (CL)? • CL is interdisciplinary • Linguistics • Computer Science • Mathematics • Electrical Engineering • Speech and Hearing Science

  5. Seven Core Areas of CL • 1. Machine Translation • 2. Speech Recognition • 3. Text-to-Speech • 4. Natural Language Generation • 5. Human-Computer Dialogs • 6. Information Retrieval • 7. Computational Modeling

  6. 1.0 Machine Translation (MT) Using computers to automate some or all of translating from one language to another

  7. 1.1 MT (cont.) • Three general models or tasks: • Tasks for which a rough translation is adequate • Tasks where a human post-editor can be used to improve the output • Tasks limited to a small sublanguage

  8. 1.2 MT (cont.) • Linguistic knowledge is extremely useful in this area of CL • MT benefits from knowledge of language typology and language-specific linguistic information • Programs are typically “trained” using pre-translated documents/texts.

  9. 1.3 MT Example • KANTKnowledge-based Machine Translation • The KANT project, Knowledge-based, Accurate Translation for technical documentation, was founded in 1989 for the research and development of large-scale, practical translation systems for technical documentation. KANT uses a controlled vocabulary and grammar for each source language, and explicit yet focused semantic models for each technical domain to achieve very high accuracy in translation. Designed for multilingual document production, KANT has been applied to the domains of electric power utility management and heavy equipment technical documentation. • http://www.lti.cs.cmu.edu/Research/cmt-projects.html

  10. 2.0 Speech Recognition (SR) Taking spoken language as input and outputting the corresponding text

  11. 2.1 SR - Architecture • SR takes the source speech and produces “guesses” as to which words could correspond to the source via some type of acoustic model • The word with the highest probability is selected as the optimal candidate • Contexts are “contained” to improve accuracy

  12. 2.2 Why use SR? • Allow for hands-free human-computer interaction • Assists in automated telephony

  13. 3.0 Text-to-Speech (TTS) Taking text as input and outputting the corresponding spoken language

  14. 3.1 Three types of TTS • 1. Articulatory- models the physiological characteristics of the vocal tract • 2. Concatenative- uses pre-recorded segments to construct the utterance(s) • ScanSoft: Jennifer and Susana • http://www.scansoft.com/realspeak/demo/ • Speechify: British Female • http://www.speechworks.com/demos/speechify.cfm

  15. 3.2 Three types of TTS (cont.) • 3. Parametric/Formant- models the formant transitions of speech • ETI-Eloquence: Reed • http://www.speechworks.com/demos/eti.cfm

  16. 3.3 Why is TTS so difficult? • Spelling • through, rough, though, thought • Homonyms • PERmit (n) vs. perMIT (v) • Prosody (dependent on context) • Pitch, duration of segments, phrasing of segments, intonational tune, emotion “I am so angry at you. I have never been more enraged in my life!!”

  17. 3.4 Why use TTS? • Allows for text to be read automatically • Extremely useful for the visually and hearing impaired. • For a review of the history of TTS until 1987 with sound files, goto: • http://www.ece.ogi.edu/~macon/ECE580/klatt/

  18. 4.0 Natural Language Generation (NLG) Constructing linguistic outputs from non-linguistic inputs; the NLG goal is to produce natural language from internal data/structure.

  19. 4.1 Natural language generation (cont) • Maps meaning to text • Nature of the input varies greatly from one application to another (i.e documenting structure of a computer program) • The job of the NLG system is to extract the necessary information to drive the generation process

  20. 4.2 NLG systems have to make choices: • Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal • Lexical selection- the system must choose the lexical item most appropriate for expressing a concept

  21. 4.3 NLG (cont) • Sentence Structure • Aggregation- the system must apportion the content into phrase, clause, and sentence-sized chunks • Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task).

  22. 4.4 NLG - Structures • Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure

  23. 4.5 Sample NLG output To save a file 1. Choose save from the file menu 2. Choose the appropriate folder 3. Type the file name 4. Click the save button The system will save the document. …

  24. 5.0 Human-Computer Dialogs Uses a mix of SR, TTS, and pre-recorded prompts to achieve some goal

  25. 5.1 Human-Computer Dialogs • Uses speech recognition, or a combination of SR and touch tone as input to the system • The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts

  26. 5.2 Human-Computer Dialogs • Dialog systems have specific tasks, which limit the domain of conversation • This makes the SR problem much easier, as the potential responses become very constrained

  27. 5.3 Sample dialog system for banking … Sys: would you like information for checking or savings? User: Checking, please. Sys: Your current balance is $2,568.92. Would you like another transaction? User: Yes, has check #2431 cleared? …

  28. 5.4 Linguistic knowledge in dialog systems • Discourse structure- ensuring natural flowing discourse interaction • Building appropriate vocabularies/lexicons for the tasks • Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)

  29. 5.5 Why use human-computer systems? • Automate simple tasks- no need for a teller to be on the other end of the line! • Allow access to system information from anywhere, via the telephone

  30. 6.0 Information Retrieval Storage, analysis, and retrieval of text documents

  31. 6.1 Information Retrieval (IR) • Most current IR systems are based on some interpretation of “compositional semantics” (e.g. the meaning of the whole is based the meaning of its parts and their combination). • IR is the core of web-based searching, i.e. Google, Altavista, etc.

  32. 6.2 IR - Architecture • User inputs a word or string of words • System processes the words and retrieves documents corresponding to the request

  33. 6.3 “Bag of Words” • The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only • Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!

  34. 6.4 Linguistic Knowledge in IR • Semantics • Compositional • Lexical • Syntax (depending on the model used)

  35. 7.0 Computational Modeling Computational approaches to problem solving, modeling, and development of theories

  36. 7.1 How can we use computational modeling? • Develop working models of language evolution • Model speech perception, production, and processing • Almost any theoretical model can have a computational counterpart

  37. 7.2 Why Use Computational Modeling? • Forces explicitness – no black boxes or behind the scenes “magic” • Allows us to test our formal theories given a large amount of data • Allows for enhancements in technology and benefits to society through the implementaions of models.

  38. Conclusions • CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics (e.g. theory is necessary!) • Computational modeling can aid/test linguists’ theories of language processing and structure

  39. Conclusions - Review of 7 core areas in CL • 1. Machine Translation • 2. Speech Recognition • 3. Text-to-Speech • 4. Natural Language Generation • 5. Human-Computer Dialogs • 6. Information Retrieval • 7. Computational Modeling

  40. Conclusions – Review of Today’s Goals • I. Introduction to computational linguistics (CL) through the discussion of 7 CL core areas. • II. Identify Common CL applications • III. Identify the importance of theoretical linguistics in CL

  41. El fin.

More Related