Video retrieval and user interaction and digital rights management
Sponsored Links
This presentation is the property of its rightful owner.
1 / 14

Video retrieval and User interaction and digital rights management PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Video retrieval and User interaction and digital rights management. From Multimedia Retrieval, Springer, Blanken et al. “Multimodal” is the keyword…. Based on a case study Formula race cars video recordings Fusion of multimodal information Sound

Download Presentation

Video retrieval and User interaction and digital rights management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Video retrieval and User interaction and digital rights management

From Multimedia Retrieval, Springer, Blanken et al.

“Multimodal” is the keyword…

  • Based on a case study

    • Formula race cars video recordings

  • Fusion of multimodal information

  • Sound

    • Audio signal analysis to detect interesting events – when the commentator gets excited

    • At the beginning of an event, there is an overview by commentator

    • They capture the audio signal and screen out the non-voice range signal

    • They also look for specific words – not general voice recognition, but searching only for a handful of race-specific words


  • Audio

  • Analysis of image stream

    • To catch start of race and other events

    • Used to locate time boundaries of isolatable events

  • Superimposed text

    • Projected on tv screen

    • Information on the driver

    • Driver’s place in race, etc.

Audio processing

  • Mix of human language, car noise, background noise, crowd cheering, horns

  • Look for human voice frequency

  • Short time energy (STE)

    • To remove noise

    • Wave form based

  • Pitch – fundamental frequency (F0), the higher, the more excitement in the voice

  • Search for phonemes

  • Pause rate – to detect quantity of speech

  • Keyword spotting – less semantics, but lower error rate

Image stream

  • Searched for places where commentator raised his voice

  • Searched histogram, looking for certain colors and shapes

  • Tracked the changing of colors and shapes over a series of frames

  • Focus on

    • Start of race

    • Passing

    • Fly-outs (sand and dust)


  • Two classes

    • Scene text

    • Superimposed text

  • The same text can span many frames, and so they count on its position being fixed to limit processing time


  • Ways to pose queries

  • Ways to give feedback

  • Ways to explore

Interaction types

  • Retrieval

    • Query formulation

      • Concept based

      • Content based

  • Concept-based

    • Key words in natural language

    • People use different words for the same thing

    • Metadata is often missing

    • Easy for user, hard for software

  • Content-based

    • Query by example paradigm

    • User provides examples

Dynamic query interaction

  • Sliders, buttons, etc.

  • Visual is the key

    • Of the query

    • Of the results

  • Example system, page 299

  • Interaction cycle is short


  • Links, with a feeling similar to using the web

  • Browsing model

    • To get impression of search space

    • To find something when you aren’t sure what it is

    • Browsing a collection of objects and browsing a single object

    • Browsing keywords or namespace hierarchy

    • Example on page 301

User input and relevance feedback

  • Modalities

    • Visual, audio, tactile

    • Or touch screen, electronic pen, camera, mic, eye tracker, locality sensor, mouse, keyboard

    • No user guide needed

    • If it is speech only, it is difficult to process

    • Multiple modalities at once

      • Such as speech and a map for location or distance

    • Use of ambient intelligence to collect information

  • Relevance feedback

    • Binary feedback

    • Weighed relevance feedback – image page 305

    • Personalization

      • Similar to 1-to-1 marketing concept

      • User profiles are used

        • Users not excited about providing profile info, though

      • Users are grouped into content interest groups


  • Passive works well, like skipping songs on a feed

  • Making an offer that adds to a query, works sometimes, like Amazon trying to sell you similar books

  • User profiles can be built automatically from a history of purchases or a clickstream

  • Filtering techniques

    • Content based – based on triples

      • Attribute – value – fit

      • Title – war and peace – 0

    • Social based – by putting people into groups and getting larger user samples and putting profiles into groups


  • Must provide metadata and data in an integrated way

  • Inherently multimedia in nature in query and response

  • Tree maps or complex metadata or data

  • Graphs to put multimedia objects together into single conceptual objects

  • Starfield display

  • Breaking videos into segments to aid non-linear searching

    • Providing sample frame for each segment

  • Images on pages 314 and 315 and 316

  • Key factors in presenting multimedia data – content adaption

    • What capabilities the device has

    • Limits of device – like size, color, formats of data

    • Must often change formats of data to fit a device

Digital rights

  • DRM (digital rights management)

  • Preventative approach

    • Encryption

    • Node locking

    • Dongle

  • Reactive approach

    • Embedding extra information in the product

    • Tracking behavior and looking for a violation

    • Sometimes called forensic tracking

      • Looking for specific watermarks, often specific to a given user

      • Makes it hard to pass content on

  • Application domains

    • Legal – concept: Personal Entertainment Domain (PED)

    • To keep content secure, commercially and intelligence-wise

    • Diagram on page 325 and 326 and 331

  • Sometimes the media is free and commercials are embedded

  • Login