Improving the utility of automated processing for digital video archives
1 / 41

ChristelPSUMar201.. - PowerPoint PPT Presentation

  • Uploaded on

Improving the Utility of Automated Processing for Digital Video Archives. Mike Christel [email protected] Entertainment Technology Center Carnegie Mellon University. Penn State March 1, 2010. Talk Outline. Automatically creating metadata for digital video

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'ChristelPSUMar201..' - Anita

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Improving the utility of automated processing for digital video archives l.jpg
Improving the Utility of Automated Processing for Digital Video Archives

Mike Christel

[email protected]

Entertainment Technology CenterCarnegie Mellon University

Penn State

March 1, 2010

Talk outline l.jpg
Talk Outline Video Archives

  • Automatically creating metadata for digital video

  • Informedia demonstrations (oral history collection, news video collection)

  • Types of search: beyond fact-finding

  • Exploratory search through multiple views

  • Evaluation hurdles

  • Discussion

    …now is a perfect opportunity for leveraging user involvement for better video information-seeking experiences

User involvement l.jpg
User Involvement Video Archives

  • User Correction: Corrective action for metadata errors (analogous to Harry Shum’s vision at Microsoft for human-assisted computer vision success)

  • User Control: Driving the interface to overcome metadata errors

  • User Context: More useful interfaces driven implicitly by context

Cmu informedia digital video research l.jpg
CMU Informedia Digital Video Research Video Archives

  • Details at:

  • Speech recognition and alignment

  • Image processing

  • Named entity tagging

  • Synchronized metadata for search and navigation

  • Fast, direct video access to oral histories, news, etc.

  • Demonstration oral history corpus: 913 hours of interviews from 400 individuals, 18,254 interview story segments (average story segment length of 3 minutes)

  • Demonstration news corpus: TRECVID 2006 test set (165 hours of U.S., Arabic, and Chinese news with 79,484 reference shots)

Speech recognition functions l.jpg
Speech Recognition Functions Video Archives

  • Generates transcript (if one is not given) to enable text-based retrieval from spoken language documents

  • Improves text synchronization to audio/video in presence of scripts (align speech with text)

  • Supplies necessary information for library segmentation and multimedia abstractions (e.g., break stories apart at silence points rather than in the middle of sentences)

Speech alignment example l.jpg
Speech Alignment Example Video Archives

Image understanding functions l.jpg
Image Understanding Functions Video Archives

  • Scene segmentation

  • Similarity matching

  • Camera motion determination and object tracking

  • Optical Character Recognition (OCR) on video text and titles

  • Face detection and recognition

  • Ongoing research work in object identification and scene characterization, e.g., indoor/outdoor, road, building, etc.

Images containing similar colors l.jpg
Images containing similar colors… Video Archives

Image search with tropical rainforest image leads to…

Goal automatic video characterization l.jpg

Static Video Archives



Adult Female


Two adults

Head Motion

Left Motion




An Online First




Goal: Automatic Video Characterization








Goal automatic video characterization13 l.jpg

Static Video Archives



Adult Female


Two adults

Head Motion

Left Motion




An Online First




Goal: Automatic Video Characterization








Automated video processing l.jpg
Automated Video Processing Video Archives

  • Produces descriptive metadata for video libraries

  • Metadata has errors greater than metadata produced by a careful, human-provided annotation

  • Errors in metadata can be reduced:

    • By more computation-intensive algorithms

    • By taking advantage of video frame-to-frame redundancy

    • By folding in context, e.g., probable text sizes in video

    • By folding in extra sources of knowledge, e.g., a dictionary for cleaning up VOCR, or labeled data revealing patterns for named entity detection

    • By human review and correction, which can generate additional labeled data for machine learning

Camera and motion detection l.jpg
Camera and Motion Detection Video Archives


Success through Lucas-Kanade optical flow algorithm

Right object motion (not pan left)

Text and face detection l.jpg
Text and Face Detection Video Archives

Face detection a success story l.jpg
Face Detection: A Success Story Video Archives

  • Many deployments, from digital cameras to remove red-eye and improve focus, to interactive art (see ETC hallways)

  • Henry Schneiderman, PhD from Carnegie Mellon who worked with Informedia group at CMU

    • Founder of Pittsburgh Pattern Recognition (PittPatt)

    • Test out state of the art yourself at

Video ocr block diagram l.jpg
Video OCR Block Diagram Video Archives

Text Area Detection


Text Area Preprocessing

Commercial OCR


Video frames filtered frames and ed frames l.jpg
Video Frames Video ArchivesFiltered Frames AND-ed Frames

(1/2 s intervals)

Name it face name association l.jpg

Transcript Video Archives


…said President Clinton.

Al Gore presented his


stated…. In a gala affair,

Face Extraction

Clinton addressed….

Name Extraction


Face/Name Association

(Co-occurrence evaluation)

Who is Gore?

“Name-It” Face/Name Association

Named entity extraction l.jpg
Named Entity Extraction Video Archives

F. Kubala, R. Schwartz, R. Stone, and R. Weischedel, “Named Entity Extraction from Speech”, Proc. DARPA Workshop on Broadcast News Understanding Systems, Lansdowne, VA, February 1998.

CNN national correspondent John Holliman is at Hartsfield International Airport in Atlanta. Good morning, John. …But there was one situation here at Hartsfield where one airplane flying from Atlanta to Newark, New Jerseyyesterday had a mechanical problem and it caused a backup that spread throughout the whole system because even though there were a lot of planes flying to the New York area from the Atlanta area yesterday, ….

Key: Place, Time, Organization/Person

Enhancing library utility via better metadata l.jpg

Metadata Video ArchivesExtractor







User Interface






Enhancing Library Utility via Better Metadata

Improving the interface via usage context l.jpg
Improving the Interface via Usage Context Video Archives

Example: query-based thumbnail selection

Improving utility through end user control l.jpg
Improving Utility through End-User Control Video Archives

Example: filtering storyboard based on visual concepts with user controlling precision and recall

Improving the metadata via user interaction l.jpg
Improving the Metadata via User Interaction Video Archives

  • Example: collecting positive and implicit negative sets of labeled shot data for visual concepts

  • Reference: Ming-yu Chen, et al., ACM Multimedia 2005

User involvement28 l.jpg
User Involvement Video Archives

  • User Correction: Corrective action for metadata errors (analogous to Harry Shum’s vision at Microsoft for human-assisted computer vision success)

  • User Control: Driving the interface to overcome metadata errors

  • User Context: More useful interfaces driven implicitly by context

Video summaries without user context l.jpg
Video Summaries (without User Context) Video Archives

  • BBC rushes video summarization task in TRECVID 2007 and TRECVID 2008 shows difficulty of the task

  • Video summary is “a condensed version of some information, such that various judgments about the full information can be made using only the summary and taking less time and effort than would be required using the full information source”

  • Maximum 4% duration (2% in TRECVID 2008)

  • Benefits of this TRECVID task: provides a reasonably large video collection to be summarized, a uniform method of creating ground truth, and a uniform scoring mechanism

Bbc rushes l.jpg
BBC Rushes Video Archives

  • 42 test videos (+ development ones) from BBC Archive

  • Test videos:

    • minimum duration 3.3 minutes

    • maximum duration 36.4 minutes

    • mean duration 25 minutes

  • Raw (unedited) rush video with a great deal of redundancy (repeated takes), mixed quality audio, “junk” frames

Video summaries with without user context l.jpg
Video Summaries (with/without User Context) Video Archives

  • BBC Rush video has no context to build from

  • However, users often provide cues as to what is important, as will be seen shortly

Storyboards trecvid search success l.jpg
Storyboards: TRECVID Search Success Video Archives

  • For the shot-based directed search information retrieval task evaluated at TRECVID, storyboards have consistently and overwhelmingly produced the best performance (see references in paper, e.g., [Snoek et al. 2007])

  • Motivated users can navigate through thousands of shot thumbnails in storyboards, better even than with “extreme video retrieval” interfaces: 2487 shots on average per 15 minute topic for TRECVID 2006 [Christel/Yan CIVR 2007]

  • Storyboard benefits: packed visual overview, trivial interactive control needed for “overview, zoom and filter, details on demand” – Shneiderman’s Visual Information-Seeking Mantra

Beyond fact finding l.jpg
Beyond Fact-Finding Video Archives

  • CACM April 2006 special issue on this topic

  • G. Marchionini (“Exploratory Search: From Finding to Understanding,” CACM 49, April 2006) breaks down 3 types of search activities:

    • Lookup (fact-finding; solving stated/understood need)

    • Learn

    • Investigate

  • Computer scientists and information retrieval specialists emphasize evaluation of lookup activities (NIST TREC)

  • Real world interest in learn/investigate: for an oral history collection, State Univ. New York at Buffalo Workshop library science and humanities participants quite interested in learn/investigate activities

Exploratory search demonstrations l.jpg
Exploratory Search (Demonstrations) Video Archives

  • Examples where storyboards still useful: visual review, e.g., of disaster field footage

  • Where storyboards fail:

    • Showing other facets like time, space, co-occurrence, named entities (When did disasters occur? Which ones? Where?)

    • Providing collection understanding, a holistic view of what’s in say 100s of segments of 1000s of matching shots

    • Providing window into visually homogenous results, e.g., results from color search perhaps, or a corpus of just lecture slides, or head-and-shoulder interview shots

  • Claim: Storyboards are not sufficient, but are part of a useful suite of tools/interfaces for interactive video search

Anecdotal support for claim l.jpg
Anecdotal Support for Claim Video Archives

  • Collected 2006-2007 from:

    • Government analysts with news data

    • History students and faculty with oral history data

  • Views Tested:

    • Timeline

    • Visualization By Example (VIBE) Plot (query terms)

    • Map View

    • Named Entity view (people, places, organizations)

    • Text-dominant views:

      • Nested Lists (pre-defined clusters by contributor)

      • Common Text (on-the-fly grouping of common phrases)

Anecdotal results l.jpg
Anecdotal Results Video Archives

  • 38 HistoryMakers corpus users (mostly students, 15 female, average age 24), experienced web searchers, modest digital video experience

  • 6 intelligence analysts (1 female; 2 older than 40, 3 in their 30s, 1 in 20s), very experienced text searchers, experienced web searchers, novice video searchers

  • View use minimal aside from Common Text

  • Text titling and text transcripts used frequently

  • A bit of evidence for collection understanding (e.g., diffs in topic between New York and Chicago), but overall, cautious use of default settings for initial trial(s).

Evaluation hurdles l.jpg
Evaluation Hurdles Video Archives

  • How does one evaluate information visualization for promoting exploratory video search?

    • Low level simple tasks vs. complex real-world tasks

    • Traditional effectiveness, efficiency, satisfaction are even problematic: is “fast” interface for exploration good or bad?

  • HCI discount usability techniques offer some support, but ecological validity may limit impact of conclusions (e.g., HCII students found Common Text well suited for History students)

  • Look to field of Visual Analytics for help, e.g., Plaisant

  • “First hour with system” studies, or “developer as user” insights too limiting. Rather, consider Multi-dimensional In-depth Long-term Case-studies (MILC)

Concluding points 1 l.jpg
Concluding Points - 1 Video Archives

  • “Interactive” allows human direction to compensate for automation shortcomings and varying needs

    • Interactive fact-finding better than automated fact-finding in visual shot retrieval (TRECVID)

    • Interactive computer vision has successes (Harry Shum at Microsoft, Michael Brown et al. at NUS)

    • Interactive view/facet control == ??? (too early to tell)

  • Users need scaffolding/support to get started

  • Evaluations need to run longer term, in depth, with case studies to see what has benefit (Multidimensional In-depth Long-term Case studies - MILC)

Concluding points 2 l.jpg
Concluding Points - 2 Video Archives

  • Storyboards work well for visual overview

  • Video surrogates can be made more effective, efficient, and satisfying when tailored to user activity (leverage context)

  • Interface should provide easy tuning of precision vs. recall

  • As cheap storage and transmission is producing a wealth of digital video, exploratory search will gain emphasis regarding video repositories

  • Augment automatically produced metadata with human-provided descriptors (take advantage of what users are willing to volunteer, and in fact solicit additional feedback from humans through motivating games that allow for human computation, a research focus of Luis von Ahn at Carnegie Mellon University)

Games with a purpose l.jpg
Games with a Purpose Video Archives

  • Spearheaded by von Ahn,

  • ESP Game showed success of “human computation” for tagging imagery

  • Ongoing projects, including a current one at CMU Entertainment Technology Center, Prometheus, to further explore crowd-sourced games (

Credits l.jpg
Credits Video Archives

Many members of the Informedia Project, CMU research community, and The HistoryMakers contributed to this work, including:

Informedia Project Director: Howard Wactlar

The HistoryMakers Executive Director: Julieanna Richardson

HistoryMakers Beta Testers: Joe Trotter (CMU History Dept.), SUNY at Buffalo and all UB Workshop participants: Schomburg Center for Research in Black Culture, NY Public Library, Randforce Associates, University of Illinois (3 campuses)

Informedia User Interface: Ron Conescu, Neema Moraveji

Informedia Processing: Alex Hauptmann, Ming-yu Chen, Wei-Hao Lin, Rong Yan, Jun Yang

Informedia Library Essentials: Scott Stevens, Bob Baron, Bryan Maher

  • This work supported by the National Science Foundation under Grant Nos. IIS-0205219 and IIS-0705491