1 / 26

An Array of Options: Driving Gameplay with Kinect Audio Input

An Array of Options: Driving Gameplay with Kinect Audio Input. Scott Selfon Development Lead Advanced Technology Group. Voice is intuitive Command and control Menu navigation Voice is social Communicating with others (real or imaginary)

Download Presentation

An Array of Options: Driving Gameplay with Kinect Audio Input

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Array of Options:Driving Gameplaywith Kinect Audio Input Scott Selfon Development Lead Advanced Technology Group

  2. Voice is intuitive Command and control Menu navigation Voice is social Communicating with others(real or imaginary) Voice can modify, be combined with, or replace gestures/controller interactions Why a Microphone?

  3. Microphone array is an open mic Party, Private, and In-Game Voice Chat Titles can use incoming audio data like any other audio stream Speech recognition Allows title to respond to spoken phrases Both uses are supported throughout Kinect play space Kinect Audio Input Features

  4. Four microphone arraywith hardware-basedaudio processing Multichannel echo cancellation (MEC) Beam steering (auto and manual) Other digital signal processing (noise suppression and reduction) Going Inside of Kinect

  5. Uses known audio output by the Xbox 360 to cancel it in the microphone input stream Reduces game sound feeding back into voice capture Multichannel = all five directional speakers addressed Calibrated by user during initial Kinect setup or via Kinect Tuner Title can check if calibration has not been run via NuiAudioIsCalibrationValid() Returns 16-kHz PCM “clean” audio for title to use Multichannel Echo Cancellation (MEC)

  6. Kinect Audio Calibration demo

  7. Kinect Audio Calibration

  8. Demo: Multichannel Echo Cancellation Input Stream(What the microphonearray hears) Post-MEC(What Xbox titlelibraries present) MEC

  9. Sound Position Tracking (Beam Steering) • NuiAudio library exposes Kinect mic array beam former • Separate beams for chat and speech pipelines • By default, automatically steers to loudest talker in room • Register for PNUIAUDIO_CALLBACK to get angle, confidence • Title can steer beam: NuiAudioSetMicArrayBeamAngle() • Only available for speech pipeline • Can be set to a fixed angle, follow a skeleton, etc.

  10. Sound Position Tracking demo

  11. Using Sound Position Tracking

  12. Two Kinect Audio Input Pipelines • Voice chat pipeline (hardware = on Kinect device) • Chat-focused automatic gain control (AGC) applied • Characteristics may change with system updates • Speech pipeline (software) • Compiled with title, characteristics remain constant with title • Recommended for data-sensitive audio analysis (emotional analysis, speech recognition, voiceprint, etc.) • Software MEC requires 2 MB and 3.3 msec of one hardware thread (already allocated if NuiSpeech is used)

  13. Kinect Audio Routing

  14. Two Pipelines, Three Ways to Access

  15. Resource Costs (as of June 2011 XDK)

  16. Shares same code as other headsets IXHV2Engine::GetLocalChatData Can detect Kinect via IXHV2Engine::IsSharedMicPresent Can leverage for other uses beyond chat, but… Automatic gain control User can mute the stream Kinect-related system updates can change core performance(pipeline is not compiled per title) Kinect Audio Data and XHV2 Voice Chat

  17. Kinect Audio Data and XHV2 Voice Chat

  18. Kinect Audio Data and Local Use via NuiAudio • Many uses for microphone data beyond speech and chat • Coarse input • Analysis (emotion, amplitude, pitch detection) • Recording • Middleware solutions • Can capture from voice chat (NUIAUDIO_CHAT_PIPELINE) or speech (NUIAUDIO_SPEECH_PIPELINE) pipelines

  19. Kinect Audio Data and Local Use via NuiAudio

  20. Kinect Audio Data and NuiSpeech • Create grammar XML files and compile (xbpg.exe) • Phrases to recognize organized into rules • Best practice guidelines for phrase selection, user feedback • Use NuiSpeech API to load grammar(s) and receive recognition events • Include acoustic model database(s) with delivered title • Data collection, testing for robustness of experience • Understand and appropriately act on returned confidence • Tuning and testing pipeline tools within XDK • Speech Lab as an ad hoc testing/iteration tool

  21. Kinect Audio Data and NuiSpeech

  22. Run from Xbox 360 Development Kit Kinect Tuner/OOBE: MEC calibration Speech Lab (speechlab.xex) Run from Windows PC Grammar Compiler (xbpg.exe) Speech Lab Analysis Tool (xbspeechlab.exe) Speech SDK tools (separate installation within XDK) Samples Simple Speech Recognition (multiple grammars, localized to all supported languages/locales) Speech Retain Audio Sound Location Tracking Lobby Chat (and other XHV2 samples) XDK Tools Supporting Audio Input

  23. White papers included with XDK Talk to Me (Kinect audio input overview) Silver Tongue (one-page reference guide to speech) Grammar School (NuiSpeech grammar XML reference) Speech Design for Game Designers: An Overview …and more Other Gamefest 2011 sessions: “Xbox, Listen”: Using Speech Recognition with Kinect The Modern Dr. Dolittle: Talking to the Kinectimals with NuiSpeech Voice and "Other Sounds" Interaction: Beyond Simple Speech Recognition Kinect Audio Resources

  24. Questions?

More Related