1 / 47

MRCPv2 – the end of proprietary speech APIs?

MRCPv2 – the end of proprietary speech APIs?. Daniel C. Burnett. Overview. What is MRCP? Why MRCP? Why MRCPv2? Why an IETF protocol?/Status Relationship to other standards Features of MRCP Sample call flow with ASR/TTS. What is MRCP?.

andrew
Download Presentation

MRCPv2 – the end of proprietary speech APIs?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MRCPv2 – the end of proprietary speech APIs? Daniel C. Burnett SpeechTek West 2007

  2. Overview • What is MRCP? • Why MRCP? • Why MRCPv2? • Why an IETF protocol?/Status • Relationship to other standards • Features of MRCP • Sample call flow with ASR/TTS SpeechTek West 2007

  3. What is MRCP? • IETF Protocol allowing a client to control the server’s ASR, TTS, Recording, and SIV resources • A standard, programming-language agnostic API for using ASR, TTS, and SIV resources SpeechTek West 2007

  4. Why MRCP? • Pre-MRCP • Every ASR and TTS vendor has a proprietary API • Some vendors support Microsoft’s SAPI • Some vendors support JSAPI • Today: every major ASR and TTS vendor supports MRCP SpeechTek West 2007

  5. Why MRCPv2? • MRCP v1 • Was designed by Cisco, Nuance, and SpeechWorks • “Tunneled over” RTSP • IETF draft but not IETF standard (http://www.ietf.org/rfc/rfc4463.txt) • MRCP v2 • Designed in a public forum by • Multiple ASR/TTS vendors • Multiple technology integrators • Multiple VoiceXML implementers • “Top-level” application protocol similar to HTTP • IETF standards-track document SpeechTek West 2007

  6. Why an IETF protocol?/Status • IETF protocols are • Implementation programming language agnostic • Public • Widely reviewed • Well-respected • Status • Developed in SPEECHSC Working Group (Real-time Applications area) • Published as Work Group Last Call (http://www.ietf.org/internet-drafts/draft-ietf-speechsc-mrcpv2-11.txt) SpeechTek West 2007

  7. Relationship to other standards • TCP: carrier for MRCP messages • SIP: used to setup calls • RTP: carries MRCP-controlled media • VoiceXML: higher-level language for ASR/TTS that is often built on top of an MRCP client • IMS: framework that allows mobile phones to use MRCP-controlled resources • SRGS, SSML: ASR grammars and TTS controls that MRCP clients can use to configure ASR/TTS resources • TLS: secure alternative to TCP for carrying MRCP SpeechTek West 2007

  8. Features of MRCP • Control of • Synthesizer resource • Recognizer resource • Recorder resource • Speaker Identification and Verification resource • Optional control channel sharing among resources SpeechTek West 2007

  9. Synthesizer • Two resource types • “basicsynth”: concatenated audio clips only • “speechsynth”: full SSML support • Capabilities • Start/stop/pause/resume speaking • Optional stop on barge-in • Live notification of <mark> encounters SpeechTek West 2007

  10. Recognizer • Two resource types • “speechrecog”: full speech and dtmf recognition with user-enrolled phrases • “dtmfrecog”: dtmf digit string recognition only • Capabilities • Start/stop recognition • Support for SRGS grammars • Interpretation of text string • Hotword mode capability (listen until match) • Voice- (user-) enrolled phrases • Recording of recognized audio • Barge-in support SpeechTek West 2007

  11. Recorder • One resource type • “recorder” • Capabilities • Start/stop recording • Barge-in support • Optional speech activity detection • Optional automatic end trimming SpeechTek West 2007

  12. SIV • One resource type • “speakverify” • Capabilities • Verification and identification using one or multiple utterances • Simultaneous verification and recognition or recording • Verification using live or buffered utterances • Voiceprint creation, querying, and deletion SpeechTek West 2007

  13. NLSML • XML data format • Carries results from the MRCP server • Can store simultaneous recognition, enrollment, and verification results • W3C’s EMMA is a future replacement for this format SpeechTek West 2007

  14. Sample call flow with ASR/TTS • Setup • Client contacts server using SIP • Setup of synthesizer resource • Setup of recognizer resource • Play • Client issues SPEAK request • <mark> and SPEAK completion • Play & Recognize (with barge-in) • Client issues RECOGNIZE request • Client issues bargeable SPEAK request • Barge-in occurs • Server returns result • Teardown • Client closes session SpeechTek West 2007

  15. Setup Play Play & Recognize Teardown Client contacts server using SIP • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mresources@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314160 ACK • Content-Length:0 15 SpeechTek West 2007

  16. C->S: INVITE sip:mrcp@server.example.com SIP/2.0 Max-Forwards:6 To:MediaServer <sip:mrcp@server.example.com> From:sarvi <sip:sarvi@example.com>;tag=1928301774 Setup Play Play & Recognize Teardown Client contacts server using SIP • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314160 ACK • Content-Length:0 15 SpeechTek West 2007

  17. S->C: SIP/2.0 200 OK To:MediaServer <sip:mrcp@server.example.com> From:sarvi <sip:sarvi@example.com>;tag=1928301774 Setup Play Play & Recognize Teardown Client contacts server using SIP • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314160 ACK • Content-Length:0 15 SpeechTek West 2007

  18. C->S: ACK sip:mrcp@server.example.com SIP/2.0 Max-Forwards:6 To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf From:Sarvi <sip:sarvi@example.com>;tag=1928301774 Setup Play Play & Recognize Teardown Client contacts server using SIP • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314159 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842807 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314160 ACK • Content-Length:0 15 SpeechTek West 2007

  19. Setup Play Play & Recognize Teardown Setup of synthesizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=setup:passive • a=connection:existing • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314162 ACK • Content-Length:0 16 SpeechTek West 2007

  20. C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • … • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly Setup Play Play & Recognize Teardown Setup of synthesizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=setup:passive • a=connection:existing • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314162 ACK • Content-Length:0 16 SpeechTek West 2007

  21. S->C: SIP/2.0 200 OK … m=application 32416 TCP/MRCPv2 a=setup:passive a=connection:existing a=channel:32AECB23433801@speechsynth m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=sendonly Setup Play Play & Recognize Teardown Setup of synthesizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=setup:passive • a=connection:existing • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314162 ACK • Content-Length:0 16 SpeechTek West 2007

  22. C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • … Setup Play Play & Recognize Teardown Setup of synthesizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=setup:passive • a=connection:existing • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314161 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842808 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • C->S: • ACK sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com>;tag=a6c85cf • From:Sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314162 ACK • Content-Length:0 16 SpeechTek West 2007

  23. Setup Play Play & Recognize Teardown Setup of recognizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechrecog • a=cmid:2 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=recvonly • a=mid:2 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechrecog • a=cmid:2 • m=audio 49180 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=sendonly • a=mid:2 Note: final C->S ack not shown 17 SpeechTek West 2007

  24. C->S: INVITE sip:mrcp@server.example.com SIP/2.0 … (same synth lines as before, plus the following) m=application 9 TCP/MRCPv2 a=setup:active a=connection:existing a=resource:speechrecog m=audio 49180 RTP/AVP 0 96 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=sendonly Setup Play Play & Recognize Teardown Setup of recognizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechrecog • a=cmid:2 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=recvonly • a=mid:2 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechrecog • a=cmid:2 • m=audio 49180 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=sendonly • a=mid:2 Note: final C->S ack not shown 17 SpeechTek West 2007

  25. S->C: SIP/2.0 200 OK … (same synth lines as before, plus the following) m=application 32416 TCP/MRCPv2 a=channel:32AECB23433801@speechrecog m=audio 48260 RTP/AVP 0 a=rtpmap:0 pcmu/8000 a=rtpmap:96 telephone-event/8000 a=fmtp:96 0-15 a=recvonly Setup Play Play & Recognize Teardown Setup of recognizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechrecog • a=cmid:2 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=recvonly • a=mid:2 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechrecog • a=cmid:2 • m=audio 49180 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=sendonly • a=mid:2 Note: final C->S ack not shown 17 SpeechTek West 2007

  26. Setup Play Play & Recognize Teardown Setup of recognizer resource • S->C: • SIP/2.0 200 OK • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:131 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechsynth • a=cmid:1 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=sendonly • a=mid:1 • m=application 32416 TCP/MRCPv2 • a=channel:32AECB23433801@speechrecog • a=cmid:2 • m=audio 48260 RTP/AVP 0 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=recvonly • a=mid:2 • C->S: • INVITE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • To:MediaServer <sip:mrcp@server.example.com> • From:sarvi <sip:sarvi@example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:314163 INVITE • Contact:<sip:sarvi@example.com> • Content-Type:application/sdp • Content-Length:142 • v=0 • o=sarvi 2890844526 2890842809 IN IP4 126.16.64.4 • s=SDP Seminar • i=A session for processing media • c=IN IP4 224.2.17.12/127 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechsynth • a=cmid:1 • m=audio 49170 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=recvonly • a=mid:1 • m=application 9 TCP/MRCPv2 • a=setup:active • a=connection:existing • a=resource:speechrecog • a=cmid:2 • m=audio 49180 RTP/AVP 0 96 • a=rtpmap:0 pcmu/8000 • a=rtpmap:96 telephone-event/8000 • a=fmtp:96 0-15 • a=sendonly • a=mid:2 Note: final C->S ack not shown 17 SpeechTek West 2007

  27. Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  28. C->S: MRCP/2.0 386 SPEAK 543257 Channel-Identifier:32AECB23433801@speechsynth Kill-On-Barge-In:false Voice-gender:neutral Voice-age:25 Prosody-volume:medium Content-Type:application/ssml+xml Content-Length:104 Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  29. <?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams <mark name="Stephanie"/> and arrived at <break/> <say-as interpret-as="vxml:time">0345p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak> Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  30. <?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams <mark name="Stephanie"/> and arrived at <break/> <say-as interpret-as="vxml:time">0345p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak> Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  31. <?xml version="1.0"?> <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" xml:lang="en-US"> <p> <s>You have 4 new messages.</s> <s>The first is from Stephanie Williams <mark name="Stephanie"/> and arrived at <break/> <say-as interpret-as="vxml:time">0345p</say-as>.</s> <s>The subject is <prosody rate="-20%">ski trip</prosody></s> </p> </speak> Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  32. S->C: MRCP/2.0 49 543257 200 IN-PROGRESS Channel-Identifier:32AECB23433801@speechsynth Speech-Marker:timestamp=857205015059 Setup Play Play & Recognize Teardown Client issues SPEAK request • S->C: • MRCP/2.0 49 543257 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857205015059 • C->S: • MRCP/2.0 386 SPEAK 543257 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:false • Voice-gender:neutral • Voice-age:25 • Prosody-volume:medium • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>You have 4 new messages.</s> • <s>The first is from Stephanie Williams • <mark name="Stephanie"/> • and arrived at <break/> • <say-as interpret-as="vxml:time">0345p</say-as>.</s> • <s>The subject is <prosody • rate="-20%">ski trip</prosody></s> • </p> • </speak> 18 SpeechTek West 2007

  33. Setup Play Play & Recognize Teardown <mark> and SPEAK completion • S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857206027059;Stephanie • S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857207685213;Stephanie 19 SpeechTek West 2007

  34. Setup Play Play & Recognize Teardown <mark> and SPEAK completion • S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857206027059;Stephanie • S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857207685213;Stephanie 19 SpeechTek West 2007

  35. Setup Play Play & Recognize Teardown <mark> and SPEAK completion • S->C: MRCP/2.0 46 SPEECH-MARKER 543257 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857206027059;Stephanie • S->C: MRCP/2.0 48 SPEAK-COMPLETE 543257 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857207685213;Stephanie 19 SpeechTek West 2007

  36. Setup Play Play & Recognize Teardown Client issues RECOGNIZE request • S->C: MRCP/2.0 49 543258 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • C->S: MRCP/2.0 343 RECOGNIZE 543258 • Channel-Identifier:32AECB23433801@speechrecog • Content-Type:application/srgs+xml • Content-Length:104 • <?xml version="1.0"?> • <!-- the default grammar language is US English --> • <grammar xmlns="http://www.w3.org/2001/06/grammar" • xml:lang="en-US" version="1.0" root="request"> • <!-- single language attachment to a rule expansion --> • <rule id="request"> • Can I speak to • <one-of xml:lang="fr-CA"> • <item>Michel Tremblay</item> • <item>Andre Roy</item> • </one-of> • </rule> • </grammar> 20 SpeechTek West 2007

  37. Setup Play Play & Recognize Teardown Client issues RECOGNIZE request • S->C: MRCP/2.0 49 543258 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • C->S: MRCP/2.0 343 RECOGNIZE 543258 • Channel-Identifier:32AECB23433801@speechrecog • Content-Type:application/srgs+xml • Content-Length:104 • <?xml version="1.0"?> • <!-- the default grammar language is US English --> • <grammar xmlns="http://www.w3.org/2001/06/grammar" • xml:lang="en-US" version="1.0" root="request"> • <!-- single language attachment to a rule expansion --> • <rule id="request"> • Can I speak to • <one-of xml:lang="fr-CA"> • <item>Michel Tremblay</item> • <item>Andre Roy</item> • </one-of> • </rule> • </grammar> 20 SpeechTek West 2007

  38. Setup Play Play & Recognize Teardown Client issues RECOGNIZE request • S->C: MRCP/2.0 49 543258 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • C->S: MRCP/2.0 343 RECOGNIZE 543258 • Channel-Identifier:32AECB23433801@speechrecog • Content-Type:application/srgs+xml • Content-Length:104 • <?xml version="1.0"?> • <!-- the default grammar language is US English --> • <grammar xmlns="http://www.w3.org/2001/06/grammar" • xml:lang="en-US" version="1.0" root="request"> • <!-- single language attachment to a rule expansion --> • <rule id="request"> • Can I speak to • <one-of xml:lang="fr-CA"> • <item>Michel Tremblay</item> • <item>Andre Roy</item> • </one-of> • </rule> • </grammar> 20 SpeechTek West 2007

  39. Setup Play Play & Recognize Teardown Client issues bargeable SPEAK request • S->C: MRCP/2.0 52 543259 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857207696314 • C->S: MRCP/2.0 289 SPEAK 543259 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:true • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>Welcome to ABC corporation.</s> • <s>Who would you like Talk to.</s> • </p> • </speak> 21 SpeechTek West 2007

  40. Setup Play Play & Recognize Teardown Client issues bargeable SPEAK request • S->C: MRCP/2.0 52 543259 200 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechsynth • Speech-Marker:timestamp=857207696314 • C->S: MRCP/2.0 289 SPEAK 543259 • Channel-Identifier:32AECB23433801@speechsynth • Kill-On-Barge-In:true • Content-Type:application/ssml+xml • Content-Length:104 • <?xml version="1.0"?> • <speak version="1.0" • xmlns="http://www.w3.org/2001/10/synthesis" • xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://www.w3.org/2001/10/synthesis • http://www.w3.org/TR/speech-synthesis/synthesis.xsd" • xml:lang="en-US"> • <p> • <s>Welcome to ABC corporation.</s> • <s>Who would you like Talk to.</s> • </p> • </speak> 21 SpeechTek West 2007

  41. Setup Play Play & Recognize Teardown Barge-in occurs Recognizer (MRCP server) sends start of input to client when input is detected • S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • Proxy-Sync-Id:987654321 • C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259 • Channel-Identifier:32AECB23433801@speechsynth • Proxy-Sync-Id:987654321 • S->C: MRCP/2.0 72 543259 200 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Active-Request-Id-List:543258 • Speech-Marker:timestamp=857206096314 • S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Completion-Cause:001 barge-in • Speech-Marker:timestamp=857207685213 22 SpeechTek West 2007

  42. Setup Play Play & Recognize Teardown Barge-in occurs • S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • Proxy-Sync-Id:987654321 • C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259 • Channel-Identifier:32AECB23433801@speechsynth • Proxy-Sync-Id:987654321 • S->C: MRCP/2.0 72 543259 200 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Active-Request-Id-List:543258 • Speech-Marker:timestamp=857206096314 • S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Completion-Cause:001 barge-in • Speech-Marker:timestamp=857207685213 MRCP client notifies synthesizer (MRCP server) that barge-in has occurred 22 SpeechTek West 2007

  43. Setup Play Play & Recognize Teardown Barge-in occurs • S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • Proxy-Sync-Id:987654321 • C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259 • Channel-Identifier:32AECB23433801@speechsynth • Proxy-Sync-Id:987654321 • S->C: MRCP/2.0 72 543259 200 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Active-Request-Id-List:543258 • Speech-Marker:timestamp=857206096314 • S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Completion-Cause:001 barge-in • Speech-Marker:timestamp=857207685213 Because Kill-on-barge-in was set to true, the synthesizer stops playing 22 SpeechTek West 2007

  44. Setup Play Play & Recognize Teardown Barge-in occurs • S->C: MRCP/2.0 49 START-OF-INPUT 543258 IN-PROGRESS • Channel-Identifier:32AECB23433801@speechrecog • Proxy-Sync-Id:987654321 • C->S: MRCP/2.0 69 BARGE-IN-OCCURRED 543259 • Channel-Identifier:32AECB23433801@speechsynth • Proxy-Sync-Id:987654321 • S->C: MRCP/2.0 72 543259 200 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Active-Request-Id-List:543258 • Speech-Marker:timestamp=857206096314 • S->C: MRCP/2.0 73 SPEAK-COMPLETE 543259 COMPLETE • Channel-Identifier:32AECB23433801@speechsynth • Completion-Cause:001 barge-in • Speech-Marker:timestamp=857207685213 Note that combined asr/tts resources can sometimes automatically terminate playback sooner. 22 SpeechTek West 2007

  45. Setup Play Play & Recognize Teardown Server returns result • S->C: MRCP/2.0 412 RECOGNITION-COMPLETE 543258 COMPLETE • Channel-Identifier:32AECB23433801@speechrecog • Completion-Cause:000 success • Waveform-URI:<http://web.media.com/session123/audio.wav>; • size=423523;duration=25432 • Content-Type:application/nlsml+xml • Content-Length:104 • <?xml version="1.0"?> • <result xmlns="http://www.ietf.org/xml/ns/mrcpv2" • xmlns:ex="http://www.example.com/example" • grammar="session:request1@form-level.store"> • <interpretation> • <instance name="Person"> • <ex:Person> • <ex:Name> Andre Roy </ex:Name> • </ex:Person> • </instance> • <input> Can I speak to Andre Roy </input> • </interpretation> • </result> 23 SpeechTek West 2007

  46. Setup Play Play & Recognize Teardown Client closes session • C->S: BYE sip:mrcp@server.example.com SIP/2.0 • Max-Forwards:6 • From:Sarvi <sip:sarvi@example.com>;tag=a6c85cf • To:MediaServer <sip:mrcp@server.example.com>;tag=1928301774 • Call-ID:a84b4c76e66710 • CSeq:231 BYE • Content-Length:0 24 SpeechTek West 2007

  47. Dan Burnett • Daniel.Burnett@nuance.com 25 SpeechTek West 2007

More Related