speech user interface n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speech User Interface 語音介面 PowerPoint Presentation
Download Presentation
Speech User Interface 語音介面

Loading in 2 Seconds...

play fullscreen
1 / 33

Speech User Interface 語音介面 - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Speech User Interface 語音介面. 無所不在的資訊取得 Pervasive Information Access. 動機. 當載具變得越來越小,輸入與輸出方式也受到相對的限制 輸入端 : 實體鍵盤大小受限,虛擬鍵盤也有同樣問題,且缺乏觸覺回饋。 輸出端 : 螢幕大小限制 ( 目前市售最大螢幕手機 Samsung note 5.3 吋 ). 應用實例. 電話語音系統 ( 客服專線 ) 文字輸入 汽車語音導航 語音搜尋 對話系統 語音記事 視障者介面. 應用實例 : 語音搜尋. 例如 : google voice search.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speech User Interface 語音介面' - mieko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide3
動機
  • 當載具變得越來越小,輸入與輸出方式也受到相對的限制
    • 輸入端:實體鍵盤大小受限,虛擬鍵盤也有同樣問題,且缺乏觸覺回饋。
    • 輸出端:螢幕大小限制(目前市售最大螢幕手機Samsung note 5.3吋)
slide4
應用實例
  • 電話語音系統(客服專線)
  • 文字輸入
  • 汽車語音導航
  • 語音搜尋
  • 對話系統
  • 語音記事
  • 視障者介面
slide5
應用實例:語音搜尋
  • 例如:google voice search
slide6
應用實例:文字輸入
  • Dragon dictation(聲龍聽寫)

http://itunes.apple.com/us/app/dragon-dictation/id341446764?mt=8

slide7
應用實例:對話系統
  • Siri:Apple 於2011年10月推出基於語音辨識之虛擬個人助理 (Apple 官方影片)
slide9
語音介面的優勢
  • 輸入速度: 一般人說話速度可達每分鐘 100 字 (前提: 辨識度)
  • 指令集的數量幾乎無限制
  • 身體其他部位仍可同時動作:開車時邊與乘客聊天、邊聽音樂
  • 自然:作為人與人間的主要的溝通方式(演化結果)
slide10
語音介面的限制
  • 語音辨識仍不完美
    • 錯誤率超過 5%時,花費在偵測與更正錯誤的時間可能比使用鍵盤輸入還久
    • 語音辨識的準確率易受雜訊影響
  • 語音介面沒有可見的狀態(no visible state)
  • 語音介面難以學習
    • 如何知道要下哪些指令?
    • 如何得知介面涵蓋的範圍?
slide11
完整之語音對話系統架構

Dialogue

Management

Automatic

Speech

Recognition

Natural

Language

Understanding

Natural

Language

Generation

Text-to-

speech

Planning

signal

words

words

logical form

slide12
主要組成元件
  • 語音辨識(speech recognition)
    • 電腦需辨識(理解)使用者之語音輸入
  • 語音合成(speech synthesis, text-to-speech, TTS)
    • 電腦必須能將文字轉為語音,與使用者溝通
slide13
語音辨識的型態
  • 連續vs.非連續語音(continuous vs. non-continuous)
  • 語者相關或無關 (speaker independent vs. dependent)
  • 即興或朗讀文章(spontaneous vs. read)
  • 關鍵字搜尋或全句辨識(keyword spotting vs. continuous recognition of spoken words)
  • 字彙集大或小(small vs. large vocabulary set)
slide14
語音辨識技術
  • 隱藏式馬可夫模型 (Hidden Markov Model)
  • 參考論文:A tutorial on hidden Markov models and selected applications in speech recognition
slide15
語音辨識系統評估
  • 透過 word error rate(WER) 來評估語音辨識系統的表現

ErrorRate = 100*(Subs + Ins + Dels) / Nwords

REF: I WANT TO GO HOME ***

REC: * WANT TWO GO HOME NOW

SC: D C S C C I

100*(1S+1I+1D)/5 = 60%

slide16
語音辨識的技術挑戰
  • 如何提升辨識率?
  • 如何克服雜訊干擾問題?
  • 如何處理贅字、停頓、發語詞等情況?
  • 如何加快辨識速度?
    • 雖然在桌上型電腦或筆記型電腦上的速度已沒有太大問題,但在智慧型手機尚仍有改善空間,通常做法是將語音上傳至伺服器進行後續處理及辨識。
  • 斷字segmentation(silly versus sill lea)
  • 同音異義字 (mail vs. male)
  • 從語音辨識到語意辨識
slide17
語音合成
  • 又稱為文字轉語音(text-to-speech,TTS)技術
  • 必須將輸入文字段落進行分析(如中文的斷詞),決定對應的發音與其聲調,再交由波形合成單元產生語音。
  • 一般而言,波形合成乃利用在資料庫內的許多已錄好的語音連接起來。系統則因為儲存的語音單元大小不同而有所差異,若是要儲存phone以及diphone的話,系統必須提供大量的儲存空間。
slide19
中文 TTS 線上展示
  • NTHU MIR Lab(清華大學 MIR 實驗室)
  • NTU CSIE(台大)
  • GUTTS(台科大)
  • 工研院資通所
  • 科大訊飛
slide20
英文 TTS 線上展示
  • AT & T Natural Voices
  • Good evening, class. Today we are going to discuss an important type of human-computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far.
slide21
語音合成技術

Text Analysis

Text Normalization

Part-of-Speech tagging

Homonym Disambiguation

Raw

Text in

Phonetic Analysis

Dictionary Lookup

Grapheme-to-Phoneme (LTS)

Prosodic Analysis

Boundary placement

Pitch accent assignment

Duration computation

Waveform synthesis

Speech out

slide22
波形合成方法
  • Concatenative synthesis:  based on the concatenation (or stringing together) of segments of recorded speech (將預錄的語音片段串連起來)
  • Formant synthesis: created using additive synthesis and an acoustic model  with various  fundamental frequency, voicing, and noise levels.
  • Articulatorysynthesis: synthesizing speech based on models of the human vocal tract 
slide23
波形合成:連鎖合成法
  • 目前所有商業語音合成系統均採用 Concatenative Synthesis連鎖合成法,可再細分為以下三類:
  • Diphone Synthesis
    • Units are diphones; middle of one phone to middle of next.
    • Why? Middle of phone is steady state.
    • Record 1 speaker saying each diphone
  • Unit Selection Synthesis
    • Larger units (Record 10 hours or more, so have multiple copies of each unit)
    • Use search to find best sequence of units
  • Domain-specific synthesis: concatenates prerecorded words and phrases to create complete utterances
slide24
語音合成的技術挑戰
  • 如何正確斷字 (斷詞)?(中文自然語言處理)
  • 如何合成正確的聲韻?
  • 使用 concatenative synthesis 技術時,如何在音節與音節之間交接處更為平順?
  • 如何在語音中加入聲音表情?
  • 如何產生有特色、辨識度高的語音?
slide25
語音對話系統
  • Speech conversational system
  • SIRI: 基於美國國防部 Cognitive Assistant that Learns and Organizes(CALO)project
  • 以語音為基礎的個人虛擬助理
  • http://en.wikipedia.org/wiki/Siri_(software)
slide26
展示影片
  • A conversation with Siri on the iPhone 4S
slide27
主要技術
  • Conversational Interface: 語音辨識核心由 Nuance 所提供。
  • Personal Context Awareness:CALO 計畫相關技術。
  • Service Delegation: 資訊搜尋與服務提供,有多家公司參與。
slide28
資料與服務蒐尋
  • OpenTable, Gayot, CitySearch, BooRah, Yelp, Yahoo Local, ReserveTravel, Localeze for restaurant and business questions and actions;
  • Eventful, StubHub, and LiveKick for events and concert information;
  • MovieTickets, RottenTomatoes and the New York Times for movie information and reviews;
  • True Knowledge, Bing Answers, and Wolfram Alpha for factual question answering;
  • Bing, Yahoo and Google for web search.
chatterbot
ChatterBot
  • 聊天機器人
  • 對於無法理解之問題,採取如ELIZA等對話產生器之方式來回應。
  • Siri meets ELIZA
slide30
語音介面:實用面之問題
  • Major problems:
    • modes (no feedback)
      • certain commands only work when in specific states
    • deep hierarchies (also known as voice mail hell)
  • Verbose feedback wastes time/patience
    • only confirm consequential things
    • use meaningful, short cues
  • Interruption
    • half-duplex communication (i.e., no barge-in support)
  • Too much speech on the part of customer is tiring
  • Speech takes up space in working memory
    • can cause problems when problem solving
slide31
語音介面開發標準
  • VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer.
  • 目前版本 VoiceXML 2.1
  • VoiceXML 3.0 (working draft)
slide32
語音介面開發工具
  • 語音辨識:CMUSphinx; Open Source Toolkit For Speech Recognition http://cmusphinx.sourceforge.net/
  • 語音合成 festvox:http://festvox.org/index.html
  • 語音介面: Microsoft Speech API (SAPI 5.3)
  • Java Speech API
slide33
參考資料
  • X. Huang, A. Acero and H. W. Hn, Spoken Language Processing: A Guide to Theory, Algorithm and System Development, 2001.
  • Rabiner and Schafer, Theory and Applications of Digital Speech Processing, 2010.
  • Why is Siri Important?