Semantic hifi browsing listening interacting sharing on future hifi systems
Download
1 / 21

UPF-HIFI - PowerPoint PPT Presentation


  • 306 Views
  • Uploaded on

SEMANTIC HIFI Browsing, listening, interacting, sharing on future HIFI systems. Music Technology Group Universitat Pompeu Fabra (UPF) Barcelona. WP5. Performance Workpackage. Interaction & Performance.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'UPF-HIFI' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Semantic hifi browsing listening interacting sharing on future hifi systems l.jpg
SEMANTIC HIFIBrowsing, listening, interacting, sharing on future HIFI systems

Music Technology Group

Universitat Pompeu Fabra (UPF)

Barcelona



Interaction performance l.jpg
Interaction & Performance

“…it becomes possible for more people to make more satisfying music, more enjoyably and easily, regardless of physical coordination or theoretical study, of keyboard skills or fluency with notation. This doesn’t imply a dilution of musical quality. On the contrary, it frees us to go further and raises the base-level at which music making begins.” (Laurie Spiegel)

“Let’s develop virtual instruments that do not just play back music for people, but become increasingly adept at making new and engaging music with people, at all levels of technical proficiency.” (Robert Rowe)


Interaction l.jpg
Interaction

Has to be:

  • natural & intuitive

  • easy

    And yet…

  • allow expression

  • enjoyable

  • rewarding


Input devices l.jpg
Input devices

  • Feel natural

  • Maximize bandwidth

  • Profit from users’ knowledge


Input devices6 l.jpg
Input devices

  • Feel natural

  • Maximize bandwidth

  • Profit from users’ knowledge

    We propose the use of

  • Mouth: microphone + small video camera

  • Hands & arm: remote command used as a baton


Input devices7 l.jpg
Input devices

  • Feel natural

  • Maximize bandwidth

  • Profit from users’ knowledge

    We propose the use of

  • Mouth: microphone + small video camera

  • Hands & arm: remote command used as a baton


Slide8 l.jpg

Mouth control information will be reinforced by the two simultaneous input modes (sound + image)


Mouth l.jpg
Mouth simultaneous input modes (sound + image)

  • Mouth interaction will not only allow karaoke

  • The system will be able to detect at least 4 different mouth input modes:

    • Singing (karaoke)

    • Scat (instrumental solos)

    • Beat boxing (drums)

    • Silent mouth movements (filters & timbre changes)

  • Voice transformations include

    • Voice Excitation based Transformations (pitch change, hoarseness, whisper…)

    • Vocal Tract based Transformations (timbre…)


Music context l.jpg
Music Context simultaneous input modes (sound + image)

  • The results of each of these interaction modes, will depend on the music being played

  • Use of metadata will provide increasing information


Music context11 l.jpg
Music Context simultaneous input modes (sound + image)

  • The results of each of these interaction modes, will depend on the music being played

  • Use of metadata will provide increasing information

  • Example: Scatting on different musical styles


Music context12 l.jpg
Music Context simultaneous input modes (sound + image)

  • This would correspond to a simplified context

  • More information can be obtained:

    • From the type of voiced sound (voice analysis - not mere pitch-2-MIDI – should profit all timbre information)

    • From additional metadata


Music context13 l.jpg
Music Context simultaneous input modes (sound + image)

  • This would correspond to a simplified context

  • More information can be obtained:

    • From the type of voiced sound (voice analysis - not mere pitch-2-MIDI – should profit all timbre information)

    • From additional metadata


Additional metadata l.jpg
Additional Metadata* simultaneous input modes (sound + image)

Time-stamped information:

Music

  • Composition parts (A, B, chorus…)

  • Harmonic & rhythmic details

  • Score

  • Program changes

  • ….

    Audio Analysis

  • ….

    *Format and contents to be defined in WP1.2


Editable metadata l.jpg
Editable Metadata simultaneous input modes (sound + image)

  • Advanced users will be able to edit and enrich the Metadata (in non real time), adding value to their contribution


Hands movements l.jpg
Hands Movements simultaneous input modes (sound + image)

Will provide complementary information

  • e.g. crash cymbal on beat boxing

    Alternate functions

  • e.g. baton conduction

    • tempo changes

    • dynamic changes

    • groove & swing modification

    • ……

  • ……


Hand body tracking l.jpg
Hand & Body tracking simultaneous input modes (sound + image)

  • A camera fixed to the system could be used

  • For better tracking resolution (spatial & temporal) an additional device seems necessary

  • We propose to use the same command, fitted possibly with accelerometers (and wireless communication with the system)


Score following l.jpg
Score Following simultaneous input modes (sound + image)

IRCAM: Instrument Score follower

(for automatic performer accompaniment)

To be defined:

  • Options

    • MIDI (or synthetic) accompaniment

    • Time-stretched prerecorded audio

  • Data formats

    • data resulting from the audio analysis (UPF), sent to the score follower module (IRCAM) (voice2MIDI?)

    • position data from the score follower to the time-stretching module


Performing on a simple keyboard l.jpg
Performing on a simple keyboard simultaneous input modes (sound + image)

In this part Sony CSL will implement style and performing rules in a simple keyboard able to follow and continue the user play according to simple style constraints.


Deliverables l.jpg
Deliverables simultaneous input modes (sound + image)


Mtg participants l.jpg
MTG Participants simultaneous input modes (sound + image)

  • Xavier Serra, local manager

  • Sergi Jordà, technical manager

  • Alex Loscos, voice processing

  • Martin Kaltenbrunner, interfaces

  • 1 additional programmer


ad