500 likes | 510 Views
Presenting in virtual worlds: An architecture for a 3D presenter. Herwin van Welbergen. Supervised by: Dr. Job Zwiers Prof. Dr. Ir. Anton Nijholt Ir. Dennis Reidsma. What we want to do. During a presentation, several modalities are used to convey information Speech Gesture Sheets
E N D
Presenting in virtual worlds:An architecture for a 3D presenter Herwin van Welbergen Supervised by: Dr. Job Zwiers Prof. Dr. Ir. Anton Nijholt Ir. Dennis Reidsma
What we want to do • During a presentation, several modalities are used to convey information • Speech • Gesture • Sheets • Our goal is to create a realistic virtual human presenter that presents in a 3D meeting room, using several of those modalities Human Media Interaction
Proces of presenting Human Media Interaction
Focus en approach • Focus on the presenters behavior, not on what causes this behavior • Presentations are generated from a script, based on annotated ‘real’ presentations • Build an architecture that enables the presenter to express itself on multiple modalities • Selected modalities are implemented on the architecture in a theoretically sound way • Speech • Pointing gesture • Sheet changes • Posture and posture shifts • Evalutation/extensions Human Media Interaction
Architecture concerns • Timing/synchronization between modalities • Consistency • Extensibility • Interruptible => real-time behaviour generation • Make use of the existing HMI meeting room Human Media Interaction
Architecture: meeting room Human Media Interaction
Architecture: sheets Human Media Interaction
Architecture Human Media Interaction
Script language: goal • Describe action on different modalities • Describe the synchronization between the actions Human Media Interaction
MultiModalSync • Every modality has its own channel • Synchronization is achieved by defining a leading modality • The leading modality can change over time Human Media Interaction
MultiModalSync: example Human Media Interaction
Synchronization of gesture and speech • Phases of a gesture • Preparation (optional) • Stroke • Retraction (optional) • Phonologic synchronization rule (McNeill): • The stroke precedes or ends at the phonological peak syllable of the speech • This means that the stroke has to be synchronized with the peak of accompanying speech Human Media Interaction
Speech • Loquendo TTS • Synchronization on a word-level • Lip synchronization • Amplitude of the speech => opening of the jaw Human Media Interaction
Posture • Defines start and end position for limbs moved in gesture units • Posture shifts are implemented by interpolating between the begin and the end pose Human Media Interaction
Pointing gesture • Which modality (left hand, right hand, head) • How long does the preparation take • What are the end positions of hand and head? • How do the hand and the head move? Human Media Interaction
Pointing gesture: modality • Point to the left with the left hand, point to the right with the right hand • During pointing, the eye fixates on the target (gaze anchoring) • If the hands are busy doing something else, point with just the head Human Media Interaction
Pointing gesture: Fitts’ law • Predicts how much time is needed to move from a start position to the target area • Depended on the distance to travel and the size of the target • Models quick, aimed pointing actions • Can be used to determine the minimum preparation time Human Media Interaction
Pointing gesture: Hand end position • The presenter only uses his shoulder, elbow and hand to point • The position of the wrist is known • To calculate: the rotation of elbow and shoulder • Can be found analytically • The solution has one degree of freedom • The elbow always points down Human Media Interaction
Pointing gesture: head end position • The neck has 3 rotational degrees of freedom • Pointing the nose at a target is a 2-dimensional task • Donders’ law for the head: to each gaze direction belong 3 unique values for the 3 degrees of freedom of neck rotation Human Media Interaction
Pointing gesture: How does the hand move? • Velocity profile is bubble shaped • This bubble is not necessarily symmetrical • Adjustable: • Length acceleration phase • Maximum speed • Assumption • The elbow and hand travel along the shortest path toward the end position Human Media Interaction
Pointing gesture: How does the head move? • The rotation axis of the head is constant during gaze movement • The velocity profile is (again) bubble shaped Human Media Interaction
Pointing gesture: retraction(1) • Kendon: If a retraction phase occurs, than the movement in that retraction phase is symmetrical to the movement in the preparation phase • Tested using videos Human Media Interaction
Pointing gesture: retraction(2) Human Media Interaction
Involuntary movement • Even while standing still, our body moves in subtle ways • Eye blinking • Chest and shoulders move when we breath in and out • Balancing • A virtual human that does not make this kind of movement will look stiff and unnatural • Simulated by putting small random movements on the joints of the presenter Human Media Interaction
Demo Human Media Interaction
Evaluation: architecture • Timing • Timing on a word level is sufficient to satisfy the phonological synchrony rule • More variation in timing and tighter planning can be achieved by identifying the phonological peak in words • The model of changing modalities is more flexible than using speech as leading modality Human Media Interaction
Evaluation: architecture(2) • Consistency • As predicted: consistency conflicts between implemented and not implemented modalities • Extensibility • The architecture is used in other projects at HMI • Interruptible presenter (Jaak Vlasveld) • Virtual guide (Marco van Kessel) Human Media Interaction
Possible extensions(1) • Improve current features • Improve posture shifts motions • Use more joints in pointing gesture to reduce stiffness • Stroke animation for pointing gesture • Synchronization at peak syllable level • Etc… Human Media Interaction
Possible extensions (2) • Broaden the presenters ability to express itself • More gesture types • Beat • Iconics • Metaphors Human Media Interaction
Possible extensions(3) • Raise the presenting process to a higher level • Now: the script determines what to express in speech and what in gesture • Next abstraction step: Implement a process that determines what to say and what gestures to make • Based on content of the presenters’ story • Can be guided by style and emotional state Human Media Interaction
Questions? Human Media Interaction
Eastereggs Human Media Interaction
Digital entertainment in a virtual museum • Presenter as virtual museum guide • Corpus of annotated paintings • General aspects • Properties of specific sub areas • Automatically generated presentations Human Media Interaction
Pointing gesture: retraction(3) • Rules • If a pointing gesture is directly followed by another gesture: skip the retraction phase and start the new gesture • Otherwise, move back to the resting position in a similar way as the movement in the preparation phase (but backward) Human Media Interaction
Evaluation: Separate modalities • Sheets • By identifying rectangular sheet areas, the pointing gesture can be adjusted to the shape of the target area • Speech • Posture • Poses are useful to define the start and end position of the body during gestures • Posture changes could be done in better ways Human Media Interaction
Evaluation: Involuntary movement • Goal: reduce the stiffness of the presenter • Evaluated with a user test • 17 subjects thought the involuntary moving presenter moved in a more natural way • 1 of the subjects did not see a difference • 2 subjects thought the involuntary moving presenter moved in a more natural way • All subjects agreed that the involuntary moving presenter was less stiff Human Media Interaction
Evaluation: Pointing gesture • Fitts Law • 3 out of 4 pointing gestures in an example presentation could be modelled using Fitts’ law • Minimum preparation time is useful for gesture planning • Symmetry • Donders’ Law • IK-technique • Real-time • Looks somewhat stiff because only the shoulder and the elbow are used to move the hand Human Media Interaction
Borrel • Torenkamer Bastille Human Media Interaction
Possible extensions • Speech • Synchronization at syllable level / phonological peak • Pointing gesture • Stroke animation • Decrease stiffness by moving more joints in the body • Posture change • Predefined animation • Posture change animation model Human Media Interaction
Waarom virtuele mensen? • Tonen en valideren van theorieen over menselijk gedrag of menselijke beweging • Mensen reageren op media op dezelfde manier als ze op mensen reageren • Theorie: door interactie met media menselijker te maken wordt deze plezieriger en efficienter Human Media Interaction
Bestaande script-talen(1) • Gebruik van stempels met vaste tijden (NITE-XML, CoGest, etc) • Vooral gebruikt voor annotatie • Limiteerd flexibiliteit, de timing van alle acties moet van te voren bepaald worden • SMIL-achtige aanpak (CML, STEP,VHML) • Gebruikt par, seq and wait • Iedere mogelijke manier van synchronizatie kan hiermee uitgedrukt worden • Verschillende modaliteiten zijn niet duidelijk gescheiden • Het hele script moet gelezen worden voordat met de uitvoer begonnen kan worden Human Media Interaction
Bestaande script-talen(2) • Defineer een hoofd modaliteit die de timing van de andere modaliteiten bepaald • Er bestaat geen modaliteit die de timing van alle andere modaliteiten bepaald • Als zo’n modaliteit zou bestaan, dan zou deze moeten kunnen wisselen Human Media Interaction
Mogelijke uitbreidingen: gebaar/spraak selectie • Welke gebaren de presenter gebruikt en wat hij zegt komt nu uit een script • Volgende logische abstractiestap: maak het proces dat bepaald welke gebaren en welke spraak geselecteerd worden • Bestaand werk: • Voor wijsgebaren (Krahmer) • Voor iconische gebaren (Cassel) Human Media Interaction
Andere mogelijke uitbreidingen • Interruptie • Geavanceerdere presenter • Gebruik van vingers voor gebaren • Realistische modellen voor bijv. ademen en knipperen van ogen • Stijl en emotie Human Media Interaction
Mogelijke uitbreidingen: meer types gebaren • Meer types gebaren • Beat • Iconisch • Metaforisch • Conflict oplossing • Kies een andere modaliteit • Combineer gebaren • Voer een van de gebaren niet uit Human Media Interaction
Gebaar: Wat is een gebaar? • Een beweging van het lichaam of de ledematen dat een idee uitdrukt of bekrachtigt • Wat is het verschil met andere lichaamsbeweging? • Gebaren zijn symmetrisch • Piek structuur • Duidelijk start en einde Human Media Interaction
Gebaar: structuur • Gesture unit: meerder gebaren die direct achter elkaar worden uitgevoerd Human Media Interaction
Eisen aan het presentatie-script • De synchronisatie moet niet af hangen van constante tijds waarden • De synchronizerende modaliteit moet veranderd kunnen worden • De modaliteiten moeten duidelijk gescheiden zijn, zodat het script goed te lezen is • Het moet mogelijk zijn om te beginnen met de executie van het script voordat het volledig ingelezen en gepland is Human Media Interaction
MultiModalSync(2) • Kanalen worden parallel uitgevoerd • Binnen een kanaal worden de expressies sequentieel uitgevoerd • Synchronisatie punten kunnen binnen kanalen of binnen expressies worden gedefineerd • Een kanaal kan gesynchronizeerd worden met andere kanalen, door te wachten op een synchronizatie punt • Expressions kunnen gebruik maken van synchronizatie punten voor hun timing Human Media Interaction
Content • Goal • Focus and approach • Architecture • Script language • Presenting modalities • Demo • Evaluation • Possible extensions Human Media Interaction