1 / 18

Data Collection for the CHIL CLEAR 2007 Evaluation Campaign

Data Collection for the CHIL CLEAR 2007 Evaluation Campaign. N. Moreau 1 , D. Mostefa 1 , R. Stiefelhagen 2 , S. Burger 3 , K. Choukri 1 1 ELDA, 2 UKA-ISL, 3 CMU E-mails: {moreau;mostefa;choukri}@elda.org, stiefel@ira.uka.de, sburger@cs.cmu.edu

Download Presentation

Data Collection for the CHIL CLEAR 2007 Evaluation Campaign

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Collectionfor the CHIL CLEAR 2007 Evaluation Campaign N. Moreau1, D. Mostefa1, R. Stiefelhagen2, S. Burger3, K. Choukri1 1ELDA, 2UKA-ISL, 3CMU E-mails: {moreau;mostefa;choukri}@elda.org, stiefel@ira.uka.de, sburger@cs.cmu.edu Evaluations and Language resources distribution agency (ELDA) www.elda.org

  2. Plan • CHIL project • Evaluation campaigns • Data recordings • Annotations • Evaluation package • Conclusion

  3. CHIL Project • CHIL: Computers in the Human Interaction Loop • Integrated project funded by the European Commission (FP6) • January 2004 – August 2007 • 15 partners, 9 countries (ELDA responsible for data collection and evaluations) • Multimodal and perceptual user interface technologies • Context: • Real-life meetings (small meeting rooms) • Activities and interactions of attendees

  4. CHIL evaluation campaigns • June 2004: Dry run • January 2005: Internal evaluation campaign • February 2006: CLEAR 2006 campaign • February 2007: CLEAR 2007 campaign • CLEAR = Classification of Events, Activities and Relationships • Opened to external participants • Supported by CHIL and NIST (VACE Program) • Co-organized with the NIST RT (Rich Transcription) Evaluation

  5. CLEAR 2007 evaluation campaign • 9 technologies evaluated • Vision technologies • Face Detection and Tracking • Visual Person Tracking • Visual Person Identification • Head Pose Estimation • Acoustic technologies • Acoustic Person Tracking • Acoustic Speaker Identification • Acoustic Event Detection • Mutlimodal technologies • Multimodal Person Tracking • Multimodal Speaker Identification

  6. CHIL Scenarios Non Interactive Lectures Interactive Seminars

  7. CHIL Data Sets CLEAR 2007 Data Collection: • 25 highly interactive seminars • Attendees: between 3 and 7 • Events: several presenters, discussions, coffee breaks, people entering / leaving the room, ...

  8. Recording set up • 5 recording rooms • Sensors: • Audio • 64-channel microphone array • 4-channel T-shaped microphones • Table-top microphones • Close talking microphones • Video • 4 fixed corner cameras • 1 ceiling wide-angle camera • Pan-tilt-zoom (PTZ) cameras

  9. Camera Views

  10. Quality Santards • Recording of 25 seminars in 2007 (5 per CHIL room) • Audio-visual clap at beginning and end • Cameras (JPEG files at 15, 25 or 30 fps) • Max. desynchronisation = 200 ms • Microphone array • Max. desynchronisation = 200 ms • Other microphones (T-shape, table) • Max. desynchronisation = 50 ms • If desynchronisation > max => recording to be remade

  11. Annotations CLEAR 2007 Annotations: • Audio: transcriptions, acoustic events • Video: facial features, head pose

  12. Audio Annotations • Orthographic transcriptions • 2 channels • Based on near filed recordings (close-talking microphones) • Compared with one far-field recording • Speaker turns • Non verbal events (laugh, pauses...) • See: S. Burger “The CHIL RT07 Evaluation Data” • Acoustic events • Based on one microphone array channel • 15 categories of sounds: • Speech, door slam, step, chair moving, cup jingle, applause, laugh, key jingle, cough, keyboard, phone, music, knock, paper wrapping, unknown

  13. Video Annotations • Facial Features (Face detection, Person tracking) • annotations every 1 second • all attendees • 4 camera views • facial labels • head centroïd • left and right eyes • nose bridge • face bounding box • 2D head centroïds 3D ”ground truth” • Person Identification Database • 28 persons to identify • audio-visual excerpts for each person ID • video labels every 200 ms

  14. Video Annotations

  15. Head Pose Data Set • Persons captured with different head orientations • standing in the middle of a CHIL room (ISL) • captured by the 4 corner cameras • Annotations: • Head bounding box • Head orientation: Pan, Tilt, Roll • 10 persons for development • 5 persons for evaluation

  16. Head Pose Data Set

  17. Evaluation package • The CLEAR 2007 evaluation package is publicly available through the ELRA catalog • Enable external players to evaluate their system offline • For each of the evaluated technologies: • Data sets (development/evaluation) • Evaluation and scoring tools • Results of the official campaign

  18. Conclusion • 9 technologies evaluated during the 3rd CHIL evaluation campaign • The CHIL 2007 evaluation package available through the ELRA catalog: http://catalog.elra.info/ • For more on the evaluations see: CLEAR 2007: http://www.clear-evaluation.org/ RT 2007: http://www.nist.gov/speech/tests/rt/

More Related