1 / 14

Video retrieval and User interaction and digital rights management

Video retrieval and User interaction and digital rights management. From Multimedia Retrieval, Springer, Blanken et al. “Multimodal” is the keyword…. Based on a case study Formula race cars video recordings Fusion of multimodal information Sound

jalena
Download Presentation

Video retrieval and User interaction and digital rights management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Video retrieval and User interaction and digital rights management From Multimedia Retrieval, Springer, Blanken et al.

  2. “Multimodal” is the keyword… • Based on a case study • Formula race cars video recordings • Fusion of multimodal information • Sound • Audio signal analysis to detect interesting events – when the commentator gets excited • At the beginning of an event, there is an overview by commentator • They capture the audio signal and screen out the non-voice range signal • They also look for specific words – not general voice recognition, but searching only for a handful of race-specific words

  3. Fusion • Audio • Analysis of image stream • To catch start of race and other events • Used to locate time boundaries of isolatable events • Superimposed text • Projected on tv screen • Information on the driver • Driver’s place in race, etc.

  4. Audio processing • Mix of human language, car noise, background noise, crowd cheering, horns • Look for human voice frequency • Short time energy (STE) • To remove noise • Wave form based • Pitch – fundamental frequency (F0), the higher, the more excitement in the voice • Search for phonemes • Pause rate – to detect quantity of speech • Keyword spotting – less semantics, but lower error rate

  5. Image stream • Searched for places where commentator raised his voice • Searched histogram, looking for certain colors and shapes • Tracked the changing of colors and shapes over a series of frames • Focus on • Start of race • Passing • Fly-outs (sand and dust)

  6. Text • Two classes • Scene text • Superimposed text • The same text can span many frames, and so they count on its position being fixed to limit processing time

  7. Interaction • Ways to pose queries • Ways to give feedback • Ways to explore

  8. Interaction types • Retrieval • Query formulation • Concept based • Content based • Concept-based • Key words in natural language • People use different words for the same thing • Metadata is often missing • Easy for user, hard for software • Content-based • Query by example paradigm • User provides examples

  9. Dynamic query interaction • Sliders, buttons, etc. • Visual is the key • Of the query • Of the results • Example system, page 299 • Interaction cycle is short

  10. Browsing • Links, with a feeling similar to using the web • Browsing model • To get impression of search space • To find something when you aren’t sure what it is • Browsing a collection of objects and browsing a single object • Browsing keywords or namespace hierarchy • Example on page 301

  11. User input and relevance feedback • Modalities • Visual, audio, tactile • Or touch screen, electronic pen, camera, mic, eye tracker, locality sensor, mouse, keyboard • No user guide needed • If it is speech only, it is difficult to process • Multiple modalities at once • Such as speech and a map for location or distance • Use of ambient intelligence to collect information • Relevance feedback • Binary feedback • Weighed relevance feedback – image page 305 • Personalization • Similar to 1-to-1 marketing concept • User profiles are used • Users not excited about providing profile info, though • Users are grouped into content interest groups

  12. Feedback • Passive works well, like skipping songs on a feed • Making an offer that adds to a query, works sometimes, like Amazon trying to sell you similar books • User profiles can be built automatically from a history of purchases or a clickstream • Filtering techniques • Content based – based on triples • Attribute – value – fit • Title – war and peace – 0 • Social based – by putting people into groups and getting larger user samples and putting profiles into groups

  13. Presentation • Must provide metadata and data in an integrated way • Inherently multimedia in nature in query and response • Tree maps or complex metadata or data • Graphs to put multimedia objects together into single conceptual objects • Starfield display • Breaking videos into segments to aid non-linear searching • Providing sample frame for each segment • Images on pages 314 and 315 and 316 • Key factors in presenting multimedia data – content adaption • What capabilities the device has • Limits of device – like size, color, formats of data • Must often change formats of data to fit a device

  14. Digital rights • DRM (digital rights management) • Preventative approach • Encryption • Node locking • Dongle • Reactive approach • Embedding extra information in the product • Tracking behavior and looking for a violation • Sometimes called forensic tracking • Looking for specific watermarks, often specific to a given user • Makes it hard to pass content on • Application domains • Legal – concept: Personal Entertainment Domain (PED) • To keep content secure, commercially and intelligence-wise • Diagram on page 325 and 326 and 331 • Sometimes the media is free and commercials are embedded

More Related