1 / 12

A Prototype Personal Dictation System

A Prototype Personal Dictation System. Adam Janin janin@icsi.berkeley.edu. Final Goal – A Portable Meeting Recorder. Record impromptu meetings in a natural environment. Detect multiple speakers. Allow correction and annotation. Support indexing and searching. Self-contained (using IRAM).

becca
Download Presentation

A Prototype Personal Dictation System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu

  2. Final Goal – A Portable Meeting Recorder • Record impromptu meetings in a natural environment. • Detect multiple speakers. • Allow correction and annotation. • Support indexing and searching. • Self-contained (using IRAM).

  3. Intermediate Goal – A Personal Dictation System • Record a single user dictating text. • Allow correction and editing. • Hosted system: • ASR runs on workstation. • GUI runs on Pilot. • Communicate via wired network. • Close-talking mic. • Limited domain (Broadcast News).

  4. Asides... • Why not Wizard of Oz? • Structure of correction mechanism is recognizer specific. • Develop infrastructure. • Produce a working demo. • Informal user study, mostly with speech researchers.

  5. Architecture Palm Pilot Correct transcripts Edit transcripts Create new text Sun Workstation Audio frontend Speech recognizer Correction server

  6. Correcting and Editing • Correcting – informing the recognizer that it has made an error. • If recognizer has a good idea of alternatives, it may be faster to correct than to edit. • Recognizer can adapt to user and vocabulary. • Editing – changing the output. • “That’s not what I meant to say”. • Text vs. speech input.

  7. Correction Methods: Background • Lattice contains recognizer’s best guesses. • More compact than N-best lists. • Contains word order and timing. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...

  8. System picks all words that overlap in time. Correction Methods: Selecting Hypotheses • User corrects “records”. • Presents in order from most likely to least. • Note: full overlap is probably not optimal. 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...

  9. Select only paths with “record”. • Rescore lattice. Correction Methods: Rescoring • User corrects “records” to “record”. Unexpected changes! 1). the records … 2). a rack ... 3). the wreck or … 4). a record ...

  10. Editing • Allows user to add or edit text arbitrarily. • Must synchronize with correction server. • Edit vs. Correct is currently implemented modally with push buttons on-screen. • Gestural interface for correcting and editing would be preferable.

  11. Details... • Correction allows for words not in lattice. • Tap to correct worked better than press-and-hold. • System updates text when user pauses. • Doesn’t handle punctuation, paragraphs, etc. • Correction is fast, but dictation is slow.

  12. Future Work • “Real” user studies. • Experiment more with correction mechanisms. • Implement editing synchronization. • Implement gestures. • Move to wireless network and mic. • Add punctuation, paragraphs, etc.

More Related