1 / 33

Speech, Language and Human-Computer Interaction

Cambridge University. Speech, Language and Human-Computer Interaction. William Marslen-Wilson. Steve Young. Johanna Moore Martin Pickering Mark Steedman. Contents. Background and motivation State of the Art Speech recognition and understanding Cognitive neuroscience

lamont
Download Presentation

Speech, Language and Human-Computer Interaction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cambridge University Speech, Language andHuman-Computer Interaction William Marslen-Wilson Steve Young Johanna Moore Martin Pickering Mark Steedman

  2. Contents • Background and motivation • State of the Art • Speech recognition and understanding • Cognitive neuroscience • Computational models of interaction • The Grand Challenge • Research Themes 2

  3. Spoken language and human interaction will be an essential feature of truly intelligent systems. For example, Turing made it the basis of his famous test to answer the question “Can machines think?” (Computing Machinery and Intelligence, Mind, 1950) Spoken language is the natural mode of communication and truly ubiquitous computing will rely on it. The Vision Apple’s Knowledge Navigator The Reality A currently deployed flight enquiry demo ... but we are not quite there yet! 3 Introduction

  4. Current situation: Statistical Speech Processing Automatic speech recognition, synth-esis and language understanding (e.g., via statistical modelling and machine learning) Human Language System Collect data Observe Cognitive Sciences Computational Language Use Development of neuro-biologically and psycholinguistically plausible accounts of human language processes (comprehension and production) Symbolic and statistical models of human language processing (e.g., via parsing, semantics, generation, discourse analysis) Engineering Systems 4 Introduction

  5. Solution is to use classic pattern classification This just leaves the problem of building these probability models 5 State of the Art: Speech Recognition Goal is to convert acoustic signal to words Y Acoustics: “He bought it” Words: W State of the Art: Speech Recognition

  6. Sentences -> Words Words -> Phones He bought it Phones -> States -> Features … bought b ao t 1.0 … it ih t 0.8 N-Gram Language Model it ix t 0.2 … Dictionary Phone HMM 6 General Approach: Hierarchy of Markov Chains State of the Art: Speech Recognition

  7. h iy b ... the president ate 0.003 the president said 0.01 the president told 0.02 ..... Model Building He said that ... Speaking from the White House, the president said today that the nation would stand firm against the .... about 500 million words of text about 100 hours of speech Acoustic Models Language Model 7 State of the Art: Speech Recognition

  8. Recognising Dictionary Phone Models Grammar .... N-gram h HE h iy HIS h ih s ..... iy b ... Feature Extractor Decoder He bought ... Typical state of the art system • 1G word training data • 100,000,000 parameters • 128k words • 250 –1000 hours acoustic training data • 10000 states shared by 40 logical phone models • Around 20,000,000 parameters Language Model Acoustics 8 State of the Art: Speech Recognition

  9. Progress in Automatic Speech Recognition Easy Word Error Rate Hard 9 State of the Art: Speech Recognition

  10. Switching Linear Dynamical System • dynamic Bayesian networks • support vector machines • parallel coupled models Plus significant effort in applying new ideas in machine learning 10 Current Research in Acoustic Modelling Hidden Markov Model Quasi-stationary assumption a major weakness State of the Art: Speech Recognition

  11. State of the Art: Cognitive Neuroscience of Speech and Language • Scientific understanding of human speech and language in state of rapid transformation and development • Rooted in cognitive/psycholinguistic accounts of the functional structure of the language system. • Primary drivers now coming from neurobiology, new neuroscience techniques 11 State of the Art: Cognitive Neuroscience of Speech and Language

  12. Neurobiology of homologous brain systems: primate neuroanatomy and neurophysiology (Rauschecker & Tian, PNAS, 2000) 12 State of the Art: Cognitive Neuroscience of Speech and Language

  13. Provides a potential template for investigating the human system • Illustrates the level of neural and functional specificity that is achievable • Points to an explanation in terms of multiple parallel processing pathways, hierarchically organised 13 State of the Art: Cognitive Neuroscience of Speech and Language

  14. Speech and language processing in the human brain • Requires an interdisciplinary combination of neurobiology, psycho-acoustics, acoustic-phonetics, neuro-imaging, and psycholinguistics • Starting to deliver results with high degree of functional and neural specificity • Ingredients for a future neuroscience of speech and language 14 State of the Art: Cognitive Neuroscience of Speech and Language

  15. Left Hemisphere Right Hemisphere noise-silence fixed-noise pitch change-fixed • Hierarchical organisation of processes in primary auditory cortex (belt, parabelt) (from Patterson, Uppenkamp, Johnsrude & Griffiths, Neuron, 2002) 15 State of the Art: Cognitive Neuroscience of Speech and Language

  16. Hierarchical organisation of processing streams Activation as a function of intelligibility for sentences heard in different types of noise (Davis & Johnsrude, J. Neurosci, 2003). Colour scale plots intelligibility-responsive regions which were sensitive to the acoustic-phonetic properties of the speech distortion (orange to red) contrasted with regions (green to blue) whose response was independent of lower-level acoustic differences . 16 State of the Art: Cognitive Neuroscience of Speech and Language

  17. Essential to image brain activity in time as well as in space • EEG and MEG offer excellent temporal resolution and improving spatial resolution • This allows dynamic tracking of the spatio-temporal properties of language processing in the brain • Demonstration (Pulvermüller et al) using MEG to track cortical activity related to spoken word recognition 17 State of the Art: Cognitive Neuroscience of Speech and Language

  18. 790 ms 800 ms 750 ms 760 ms 770 ms 780 ms 740 ms 720 ms 700 ms 18 State of the Art: Cognitive Neuroscience of Speech and Language

  19. Glimpse of future directions in cognitive neuroscience of language • Importance of understanding the functional properties of the domain • Neuroscience methods for revealing the spatio-temporal properties of the underlying systems in the brain

  20. Modelling interaction requires solutions for: Parsing & Interpretation Generation & Synthesis Dialogue management Integration of component theories and technologies State of the Art: Computational Language Systems 20 State of the Art: Computational Language Systems

  21. Parsing and Interpretation • Goal is to convert a string of words into an interpretable structure. • Marks bought Brooks … • (TOP (S (NP-SBJ Marks) • (VP (VPD bought) • (NP Brooks)) • (…))) • Translate treebank into a grammar and statistical model 21 State of the Art: Computational Language Systems

  22. Improvement in performance in recent years over unlexicalized baseline of 80% in ParsEval Magerman 1995: 84.3% LP 84.0% LR Collins 1997: 88.3% LP 88.1% LR Charniak 2000: 89.5% LP 89.6% LR Bod 2001: 89.7% LP 89.7% LR Interpretation is beginning to follow (Collins 1999: 90.9% unlabelled dependency recovery) However there are signs of asymptote Parser Performance

  23. Spoken dialogue systems use Pre-recorded prompts Natural sounding speech, but practically limited flexibility Text-to-speech Provides more flexibility, but lacks adequate theories of how timing, intonation, etc. convey discourse information Natural language (text) generation Discourse planners to select content from data and knowledge bases and organise it into semantic representations Broad coverage grammars to realise semantic representation in language Generation and Synthesis

  24. Implemented as Hierarchical Finite State Machines or Voice XML Can: Effectively handle simple tasks in real time automated call routing travel and entertainment information and booking Be robust in face of barge-in e.g., “cancel” or “help” Take action to get dialogue back on track Generate prompts sensitive to task context Spoken Dialogue Systems 24 State of the Art: Computational Language Systems

  25. Design and maintenance are labour intensive, domain specific, and error prone Must specify all plausible dialogues and content Mix task knowledge and dialogue knowledge Difficult to: Generate responses sensitive to linguistic context Handle user interruptions, user-initiated task switches or abandonment Provide personalised advice or feedback Build systems for new domains Limitations of Current Approaches 25 State of the Art: Computational Language Systems

  26. Use context to interpret and respond to questions Ask for clarification Relate new information to what’s already been said Avoid repetition Use linguistic and prosodic cues to convey meaning Distinguish what’s new or interesting Signal misunderstanding, lack of agreement, rejection Adapt to their conversational partners Manage the conversational turn Learn from experience What Humans Do that Today’s Systems Don’t 26 State of the Art: Computational Language Systems

  27. 1M words of labelled data is not nearly enough Current emphasis is on lexical smoothing and semi-supervised methods for training parser models Separation of dialogue management knowledge from domain knowledge Integration of modern (reactive) planning technology with dialogue managers Reinforcement learning of dialogue policies Anytime algorithms for language generation Stochastic generation of responses Concept-to-speech synthesis Current directions 27 State of the Art: Computational Language Systems

  28. Grand Challenge Goals: • To construct a neuro-biologically realistic, computationally specific account of human language processing. • To construct functionally accurate models of human interaction based on and consistent with real-world data. • To build and demonstrate human-computer interfaces which demonstrate human levels of robustness and flexibility. To understand and emulate human capability for robust communication and interaction. 28 Grand Challenge

  29. Research Programme Three inter-related themes: • Exploration of language function in the human brain • Computational modelling of human language use • Analysis and modelling of human interaction The development of all three themes will aim at a strong integration of neuroscience and computational approaches 29 Grand Challenge

  30. Theme 1: Exploration of language function in the human brain • Development of an integrated cognitive neuroscience account • precise neuro-functional mapping of speech analysis system • identification/analysis of different cortical processing streams • improved (multi-modal) neuro-imaging methods for capturing spatio-temporal patterning of brain activity supporting language function • linkage to theories of language learning/brain plasticity • Expansion of neurophysiological/cross-species comparisons • research into homologous/analogous systems in primates/birds • development of cross-species neuro-imaging to allow close integration with human data • Research across languages and modalities (speech and sign) • contrasts in language systems across different language families • cognitive and neural implications of spoken vs sign languages 30 Grand Challenge

  31. Theme 2 : Computational modelling of human language function • Auditory modelling and human speech recognition • learn from human auditory system especially use of time synchrony and vocal tract normalisation • move away from quasi-stationary assumption and develop effective continuous state models • Data-driven language acquisition and learning • extend successful speech recognition and parsing paradigm to semantics, generation and dialogue processing • apply results as filters to improve speech and syntactic recognition beyond the current asymptote • develop methods for learning from large quantities of unannotated data • Neural networks for speech and language processing • develop kernel-based machine learning techniques such as SVM to work in continuous time domain • understand and learn from human neural processing 31 Grand Challenge

  32. Theme 3 : Analysis and modelling of human interaction • Develop psychology, linguistics, and neuroscience of interactive language • Integrate psycholinguistic models with context to produce situated models • Study biological mechanisms for interaction • Controlled scientific investigation of natural interaction using hybrid methods • Integration of eye tracking with neuro-imaging methods • Computational modelling • Tractable computational models of situated interaction e.g., Joint Action, interactive alignment, obligations, SharedPlans • Integration across levels • in interpretation: integrate planning, discourse obligations, and semantics into language models • in production • semantics of intonation • speech synthesizers that allow control of intonation, timing 32 Grand Challenge

  33. Summary of Benefits Grand Challenge To understand and emulate human capability for robust communication and interaction. • Greater scientific understanding of human cognition and communication • Significant advances in noise-robust speech recognition, understanding, and generation technology • Dialogue systems capable of adapting to their users and learning on-line • Improved treatment and rehabilitation of disorders in language function; novel language prostheses 33 Grand Challenge

More Related