Introduction to Computational Linguistics - PowerPoint PPT Presentation

introduction to computational linguistics n.
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Computational Linguistics PowerPoint Presentation
Download Presentation
Introduction to Computational Linguistics

play fullscreen
1 / 37
Introduction to Computational Linguistics
Download Presentation
Download Presentation

Introduction to Computational Linguistics

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to Computational Linguistics Misty Azara

  2. Agenda • Introduction to Computational Linguistics (CL) • Common CL applications • Using CL in theoretical linguistics (computational modeling)

  3. What is Computational Linguistics? • CL is interdisciplinary • Linguistics • Computer Science • Mathematics • Electrical Engineering • Psychology • Speech and Hearing Science

  4. What is Computational Linguistics? • Computational Linguistics covers many areas • Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting

  5. Core Areas of CL • Machine Translation • Speech Recognition • Text-to-Speech • Natural Language Generation • Human-Computer Dialogs • Information Retrieval • Computational Modeling …

  6. Machine Translation Using computers to automate some or all of translating from one language to another

  7. Three general models or tasks: • Tasks for which a rough translation is adequate • Tasks where a human post-editor can be used to improve the output • Tasks limited to a small sublanguage

  8. Machine Translation (cont.) • Linguistic knowledge is extremely useful in this area of CL • MT benefits from knowledge of language typology and language-specific linguistic information

  9. Speech Recognition Taking spoken language as input and outputting the corresponding text

  10. Architecture • SR takes the source speech and produces “guesses” as to which words could correspond to the source via some type of acoustic model • The word with the highest probability is selected as the optimal candidate

  11. Why use SR? • Allow for hands-free human-computer interaction

  12. Text-to-Speech Taking text as input and outputting the corresponding spoken language

  13. Three types of TTS • Articulatory- models the physiological characteristics of the vocal tract • Concatenative- uses pre-recorded segments to construct the utterance(s)

  14. Three types of TTS (cont.) • Parametric/Formant- models the formant transitions of speech [baj]

  15. Why is TTS so difficult? • Spelling • through, rough • Homonyms • PERmit (n) vs. perMIT (v) • Prosody • Pitch, duration of segments, phrasing of segments, intonational tune, emotion “I am so angry at you. I have never been more enraged in my life!!”

  16. Why use TTS? • Allows for text to be read automatically • Extremely useful for the visually impaired

  17. Natural Language Generation Constructing linguistic outputs from non-linguistic inputs

  18. Natural Language Generation • Maps meaning to text • Nature of the input varies greatly from one application to another (i.e documenting structure of a computer program) • The job of the NLG system is to extract the necessary information to drive the generation process

  19. NLG systems have to make choices: • Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal • Lexical selection- the system must choose the lexical item most appropriate for expressing a concept

  20. Sentence Structure • Aggregation- the system must apportion the content into phrase, clause, and sentence-sized chunks • Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task)

  21. Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure

  22. Sample NLG output To save a file 1. Choose save from the file menu 2. Choose the appropriate folder 3. Type the file name 4. Click the save button The system will save the document. …

  23. Human-Computer Dialogs Uses a mix of SR, TTS, and pre-recorded prompts to achieve some goal

  24. Human-Computer Dialogs • Uses speech recognition, or a combination of SR and touch tone as input to the system • The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts

  25. Dialog systems have specific tasks, which limit the domain of conversation • This makes the SR problem much easier, as the potential responses become very constrained

  26. Sample dialog system for banking … Sys: would you like information for checking or savings? User: Checking, please. Sys: Your current balance is $2,568.92. Would you like another transaction? User: Yes, has check #2431 cleared? …

  27. Linguistic knowledge in dialog systems • Discourse structure- ensuring natural flowing discourse interaction • Building appropriate vocabularies/lexicons for the tasks • Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)

  28. Why use human-computer systems? • Automate simple tasks- no need for a teller to be on the other end of the line! • Allow access to system information from anywhere, via the telephone

  29. Information Retrieval Storage, analysis, and retrieval of text documents

  30. Information Retrieval • Most current IR systems are based on some interpretation of compositional semantics • IR is the core of web-based searching, i.e. Google, Altavista, etc.

  31. Information Retrieval Architecture • User inputs a word or string of words • System processes the words and retrieves documents corresponding to the request

  32. “Bag of Words” • The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only • Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!

  33. Linguistic Knowledge in IR • Semantics • Compositional • Lexical • Syntax (depending on the model used)

  34. Computational Modeling Computational approaches to problem solving, modeling, and development of theories

  35. How can we use computational modeling? • Test our theories of language change~ synchronic or diachronic • Develop working models of language evolution • Model speech perception, production, and processing • Almost any theoretical model can have a computational counterpart

  36. Why Use Computational Modeling? • Forces explicitness – no black boxes or behind the scenes “magic” • Allows for modeling that would otherwise be impossible • Allows for modeling that would otherwise be unethical

  37. Conclusions • CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics • Computational modeling can aid linguists’ theories of language processing and structure