Concept-based Image and Video Retrieval Wei-Chen Chiu Vision Lab, NCTU 20090511
Why concept-based Retrieval? – User Expectation • Need of Image/Video Search • “[User expects to] type in a few words at most. Then expect the engine to bring back the perfect results. More than 95 percent of us never use the advanced search features most engines include,…” - The Search. J. Battelle, 2003 • Keyword query is the primary search method.
Why concept-based Retrieval? – Limitation for Content-Based Retrieval • Content-based retrieval: query-by-example • Difficult to find appropriate query examples as initial queries. • Difficult to get detailed visual descriptions • Difficult to get user feedback for refined search • Less efficient than text retrieval
Concept-Based Retrieval – For Image • For Image - One picture is worth one thousand words- Images → bag of “words” or “semantic concepts”- Retrieve images by matching queries and semantic concepts
Concept-Based Retrieval – For Video • Semantic concepts can be extracted from video with more information- Multimedia content, e.g., audio, text and visual, are available- Temporal relation between video shots People: Kofi Annan Scene: Studio Event: Un Meeting Object: Tank, Jet … …
What are (Multimedia)Semantic Concepts? • An intermediate layer of multimedia descriptors that aim to bridge the gap between user information need and low-level multimedia content. • Wide coverage- People (face, tourists, …)- Objects (building, animals, …) - Locations (indoor, studio, …)- Events (meeting, trip, …) - Genres (weather, sports, …)- …
Why is Concept-based Retrieval Important? • Growing multimedia content • Semantics need from users- Increasing expectation of accessibility and search-ability of media content
Concept Vocabulary Design • Concept Vocabulary != Text Vocabulary • Dimensions in evaluating/design concept vocabulary:- Detectability: observed from data (not abstract like “happy”)- Utility: useful for retrieval, categorization or others- Generality: sufficiently frequent across data- Specificity: not too frequent (exist in most of data)- Clarity: no definition ambiguity- Domains: application/adaptable to multiple data domains
Example: Standardized Concept Lexicon in LTRECVID A. Program CategoryB. Setting/Scene/SiteC. PeopleD. ObjectsE. ActivitiesF. EventsG. Graphics
Example: MediaMill Challenge Sample Images from MediaMill-101 Lexicon
Example: Large Scale Concept Ontology for Multimedia Understanding (LSCOM) Up to 449 Concepts!
Semantic Concept Extraction – Manual • Tagging (Widely used) • Browsing (specific domains)- Associate multiple image/video with single keyword
Limitations of Manual Approaches • Time consuming and labor intensive- Tagging: 5-6 seconds per keyword- Browsing: 1.5 seconds per relevant keyword, 0.2 per irrelevant • Subjective and inaccurate for social tagging
Semantic Concept Extraction – Automatic Concept Detection • Typical approaches for large scale of semantic concept (for images)- feature extraction, model learning, fusion
Semantic Concept Extraction – Multi-Concept Relational Modeling • Semantic concept are not isolated, e.g., “car”&”road”- feature extraction, model learning, fusion • Jointly model the relationship across multiple concepts- Ontology-based learning [Wu et al., ICME’04]- Prob. Graphical models [Yan et al., ICME’06]- Graph-based learning [Qi et al., MM’07]- Boosted conditional fields [Jiang et al., ICASSP’07]
Semantic Concept Extraction – Spatial-Temporal Context Modeling • Temporal context: constraints based on temporal relationship (e.g. for events)- Model the evolution of concept in time, e.g., “airplane landing”: sky → grass → runway • Spatial context: constraints based on spatial relationship (e.g. for images)- Assume closer image blocks share more similar concepts • Common modeling choices: probabilistic graphical models- Basic: HMM, MRF, CRF- Advance: Hier-HMM, 2D-HMM…
Semantic Concept Extraction – Many other methods… • Include:- Active Learning- Cross-Domain Adaption/Transfer Learning- Semi-Supervised Learning- Learning with Side Information- …. • Still a challenging research problem!
Retrieval by Semantic Concepts • Match text queries with a fixed set of semantic concepts