HUMAN LANGUAGE AND COMUNICATION:

HUMAN LANGUAGE AND COMUNICATION: Why Is This Research Area Still an Important Challenge? Joseph Picone, PhD Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering

Abstract and Biography ABSTRACT: Speech technology has quietly become a pervasive influence in our daily lives despite widespread concerns about research progress over the past 20 years. However, because language is so fundamental to our human existence, the expectations users have for human computer collaboration have continually outpaced research advances. In this talk, we will review recent research on fundamentally new approaches to speech recognition with an emphasis on machine learning and discrimination. We will then project future research directions in this field based on a historical perspective of progress over the past 50 years. We will conclude with a discussion of why research in this area will have a fundamental impact on computational science far beyond speech or language. BIOGRAPHY: Joseph Picone is currently a Professor in the Department of Electrical and Computer Engineering at Mississippi State University, where he also directs the Intelligent Electronic Systems program at the Center for Advanced Vehicular Systems. His principal research interests are the development of new statistical modeling techniques for speech recognition. He has previously been employed by Texas Instruments and AT&T Bell Laboratories. Dr. Picone received his Ph.D. in Electrical Engineering from Illinois Institute of Technology in 1983. He is a Senior Member of the IEEE.

Fundamental Challenges: Generalization and Risk • Why research human language technology? “Language is the preeminent trait of the human species.” “I never met someone who wasn’t interested in language.” “I decided to work on language because it seemed to be the hardest problem to solve.” • Fundamental challenge: diversity of data that often defies mathematical descriptions or physical constraints. • Solution: Can we integrate multiple knowledge sources using principles of risk minimization?

Internet-Accessible Speech Recognition (CARE) • Speech recognition • State of the art • Statistical (e.g., HMM) • Continuous speech • Large vocabulary • Speaker independent • Goal: Accelerate research • Flexibility, Extensibility, Modular • Efficient (C++, Parallel Proc.) • Easy to Use (documentation) • Toolkits, GUIs • Benefit: Technology • Standard benchmarks • Conversational speech

Integrate speech recognition, prosody and parsing on conversational speech • Multi-university and multidisciplinary (medium ITR) • Speech features are highly confusable • Integration of knowledge (e.g. linguistic context) is crucial Optimum • Pioneering the use of risk minimization in speech recognition and verification • First LVCSR systems based on support and relevance vector machines Testing Training • Integrating Speech and Natural Language Processing (ITR)

“Though linear statistical models have dominated the literature for the past 100 years, they have yet to explain simple physical phenomena.” • Motivated by a phase-locked loop analogy • Application of principles of chaos and strange attractor theory to acoustic modeling in speech • Baseline comparisons to other nonlinear methods • Expected outcomes: • Reduced complexity of statistical models for speech (two order of magnitude reduction) • High performance channel-independent text-independent speaker verification/identification • Nonlinear Statistical Modeling of Speech (HLC)

Use of dialog to provide on-demand training for workers • A dialog system must adapt to user stress, confusion, and learning style • Applications in Advanced Vehicular Systems (Mississippi)

Expert Systems Discriminative Methods Statistical Methods (Generative) Knowledge Integration Analog Systems Open Loop Analysis • An Algorithm Retrospective of Language Technology • Observations: • Information theory preceded modern computing. • Early research focused on basic science. • Computing capacity has enabled engineering methods. • We are now “knowledge-challenged.”

Physical Sciences:Physics, Acoustics, Linguistics Engineering Sciences:EE, CPE, Human Factors Computing Sciences: Comp. Sci., Comp. Ling. Cognitive Sciences:Psychology, Neurophysiology • A Historical Perspective of Prominent Disciplines • Observations: • Field continually accumulating new expertise. • As obvious mathematical techniques have been exhausted (“low-hanging fruit”), there will be a return to basic science (e.g., fMRI brain activity imaging).

A priori expert knowledge created a generation of highly constrained systems (e.g. isolated word recognition, parsing of written text, fixed-font OCR). Performance • Statistical methods created a generation of data-driven approaches that supplanted expert systems (e.g., conversational speech to text, speech synthesis, machine translation from parallel text). … but that isn’t the end of the story … Source of Knowledge • Evolution of Knowledge and Intelligence in HLT Systems • A number of fundamental problem still remain (e.g., channel and noise robustness, less dense or less common languages). • The solution will require approaches that use expert knowledge from related, more dense domains (e.g., similar languages) and the ability to learn from small amounts of target data (e.g., autonomic).

Historical Synergy Between IIS and HLC • Speech recognition now widely acknowledged to be a machine learning problem. But language modeling has not yet embraced advanced statistical models. • Statistical methods are now dominant in most forms of HLC research where ample amounts of data exist. • Information extraction (e.g., audio mining) is coming of age, but named entities remain a major challenge. • General perception that machine translation is at least 5 years behind spoken language in terms of resources, evaluation-driven research, and performance (but catching up quickly). • Many forms of HLC research remain underfunded (multimodal, multispeaker conferences).

Summary • Machine learning approaches to human language technology are still in their infancy. • A mathematical framework for integration of knowledge and metadata will be critical in the next 10 years. • Information extraction in a multilingual environment will be an emerging market in the next 5 years. • Mundane problems such as named entity extraction are still major barriers in information extraction. • It is widely perceived that research progress in machine translation will begin a similar trajectory to speech recognition in the next 10 years. • This is a time of great opportunity!

Recent Publications • Recent relevant peer-reviewed publications: • J. Baca and J. Picone, “Effects of Navigational Displayless Interfaces on User Prosodics,” Speech Communication, vol. 45, no. 2, pp. 187-202, Feb. 2005. • A. Ganapathiraju, J. Hamaker and J. Picone, “Applications of Support Vector Machines to Speech Recognition,”IEEE Trans. on Signal Proc., vol. 52, no. 8, pp. 2348-2355, August 2004. • R. Sundaram and J. Picone, “Effects of Transcription Errors on Supervised Learning in Speech Recognition,”International Conference on Acoustics, Speech, and Signal Processing, pp. 169-172, Montreal, Quebec, Canada, May 2004. • I. Alphonso and J. Picone, “Network Training For Continuous Speech Recognition,” to be presented at the 12th European Signal Processing Conference, Vienna, Austria, September 7-10, 2004. • J. Baca, F. Zheng, H. Gao, and J. Picone, “Dialog Systems for Automotive Environments,” European Conference on Speech Communication and Technology, pp. 1929-1932, Geneva, Switzerland, September 2003. • J. Hamaker, J. Picone, and A. Ganapathiraju, “A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines,” Proceedings of the International Conference of Spoken Language Processing, pp. 1001-1004, Denver, Colorado, USA, September 2002. • Relevant online resources: • “Projects,” http://www.isip.msstate.edu/projects/, Intelligent Electronic Systems, Center for Advanced Vehicular Systems, Mississippi State University, Mississippi State, Mississippi, USA, August 2004. • “Internet-Accessible Speech Recognition Technology,” http://www.isip.msstate.edu/projects/speech/index.html, August 2004. • “About our Software,” http://www.isip.msstate.edu/projects/speech/software/, January 2004. • “Nonlinear Statistical Modeling of Speech,” http://www.isip.msstate.edu/projects/nsf_nonlinear/, September 2004. • “Cognitive Assessment Using Voice Analysis,” http://www.isip.msstate.edu/projects/voice_analysis/, September 2004. • “Fundamentals of Speech Recognition — A Tutorial Based on a Public Domain C++ Toolkit,” http://www.isip.msstate.edu/projects/speech/software/tutorials/production/fundamentals/current/, Aug. 2003. • “Speech and Signal Processing Demonstrations,” http://www.isip.msstate.edu/projects/speech/software/demonstrations/index.html, September 2004. • “Fundamentals of Speech Recognition,” http://www.isip.msstate.edu/publications/courses/ece_8463/, September 2004.

Interactive Software: Java applets, GUIs, dialog systems, code generators, and more • Speech Recognition Toolkits: compare SVMs and RVMs to standard approaches using a state of the art ASR toolkit • Foundation Classes: generic C++ implementations of many popular statistical modeling approaches • Fun Stuff: have you seen our campus bus tracking system? Or our Home Shopping Channel commercial? • Appendix: Relevant Resources

HUMAN LANGUAGE AND COMUNICATION: