1 / 26

Visual Information System

Visual Information System. visual information retrieval (VIR) Lilian Tang. Computational steps for visual retrieval systems. image processing (colour, texture etc) human perception and computer perception (computer vision) Sensory gap features definition, extraction

Download Presentation

Visual Information System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visual Information System visual information retrieval (VIR) Lilian Tang

  2. Computational steps for visual retrieval systems • image processing (colour, texture etc) • human perception and computer perception (computer vision) • Sensory gap • features definition, extraction • low-level and high-level • content, semantics, and concepts • small scale and large scale • knowledge domain, knowledge elicitation, knowledge discovery and management • Similarity measure, learn from feedback, and dynamic indexing • Databases and system architecture • Evaluation, not just system performance, but insights for the future

  3. VIR and Traditional Database? • A traditional SQL database has as its basic element data items in a relation: select name from employee, project where employee.deptnumber = “25” AND project.number = “100” • databases exploit known structures and relations • DBMS retrieval is not probabilistic • How different from the WWW? • And from traditional IR?

  4. VIR and Traditional IR systems? • IR systems can be considered the precursors to VIR • The basic unit of a IR system is a document and the focus is on textual retrieval • exact matching - Boolean, text pattern searching • inexact matching - probabilistic, vector space, clustering • Visual information has its own characteristics that traditional IR is incapable to handle

  5. Recap IR: What’s IR • Motivation • the larger the holdings of the archive, the more useful it is • however, it is harder to find what you want • IR is all about finding what you want when what you want is buried in a mass of what you don’t want

  6. from Lesk, http://community.bellcore.com/lesk/columbia/session2/

  7. Simple IR Model User Boolean Vector Feedback Query Results Ranking Clustering Weighting Stemming Thesaurus Signature Pre- Processing Post- Processing Boolean Vector Searching Flat Files Inverted Files Signature Files PAT Trees Storage Stemming Stoplist Collection & Processing Stuff

  8. Recap IR: Precision and Recall • Precision • “ratio of the number of relevant documents retrieved over the total number of documents retrieved” • how much extra stuff did you get? • Recall • “ratio of relevant documents retrieved for a given query over the number of relevant documents for that query in the database” • how much did you miss?

  9. Recap IR: Text Retrieval • The most popular approach is to extract keywords from each text document in the database to form the indices of the document. • The keyword extraction process may be divided into three major steps, stopwords removal, stemming and word weighting • stopwords removal: “a”, “an” and “the”. • stemming: removes the suffix and prefix of each word. • word weighting: estimates the weighting of each word.

  10. Recap IR: Text Retrieval • Query will go through the same procedure • Similarity matching: calculated from the pre-computed weighting of the matched keywords. • All documents with a similarity value higher than a certain threshold will be considered as relevant documents and returned to the user. • These relevant document may be ranked according to the similarity values when presenting to the user. (Most web search engines do this.)

  11. Multimedia Information Retrieval • MIR is considered to be a totally different nature. Effective, efficient, intuitive and accurate retrieval is critical for a large archive in distributed systems and the web. • Key issue: how to create the indices of the multimedia information • Indexing: describe (or index) MM information. • Retrieval: measure the similarity between a user query and the indices of the MM information.

  12. Visual Information Retrieval-keyword • It is difficult for text to capture the perceptual saliency of some visual features • Pictures cannot speak, but they are stronger than words. • Text is not well suited for modelling perceptual similarity. • Subjective. “What is needed in these cases is the use of a more concrete description of visual content, one more closely related to human perception, and a new way of interaction that fully exploits human perception capabilities.”

  13. Visual information Retrieval – content-based approach • Textual content : free text search • imagecontent : image features, shapes, color, textures, spatial relationships • Videocontent :motions, image features, scene composition, video semantics, audio, etc.

  14. Content-Based Image Retrieval • As happens during the maturation process of many a discipline, after early successes in a few applications, research is now concentrating on deeper problems, challenging the hard problems at the crossroads of the discipline from which it was born (Arnold 2000) • computer vision, databases, and information retrieval. • Deeper analysis is needed and semantics is more desirable – make use of domain knowledge

  15. Domain and Variability • A narrow domain has a limited and predictable variability in all relevant aspects of its appearance. • Semantics is well-defined, and unique. • A broad domain has an unlimited and unpredictable variability in its appearance even for the same semantic meaning • Semantics is more ambiguous, and partial • Need more contextual information

  16. Domain and Variability • The notions of broad and narrow domains are helpful in characterizing patterns of use, in selecting features, and in designing systems. • For narrow, specialized image domains, the gap between features and their semantic interpretation is usually smaller, so domain-specific models may help. • In a broad image domain, the gap between the feature description and the semantic interpretation is generally wide • the required number of computational variables would be enormous. • Research issues raised……

  17. Research issues • How to handle variability? • Multiple processors and fusion process? • Inference engines?

  18. Domain Knowledge • Laws of syntactic (literal) equality and similarity define the relation between image pixels or image features regardless of its physical or perceptual causes. • Laws describing the human perception of equality and similarity • Physical laws describing equality and difference of images under differences in sensing and object surface properties. The physics of illumination, surface reflection, and image formation have a general effect on images. • Geometric and topological rules describe equality and differences of patterns in space. • Category-based rules encode the characteristics common to class z of the space of all notions Z. • Finally, man-made customs or man-related patterns introduce rules of culture-based equality and difference.

  19. Difficulties in VIS- the sensory gap • The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene. • disambiguation processing

  20. An infinite number of 3D drawings can give rise to the same image (C1)

  21. Difficulties in VIS- The semantic gap • The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation. (Arnold, 2000)

  22. The Semantic Gap • A linguistic description is almost always contextual, whereas an image may live by itself. • associate higher level semantics to data-driven observables • labelling is seldom complete, context sensitive, and, in any case, there is a significant fraction of requests whose semantics can't be captured by labelling alone. Both methods will cover the semantic gap only in isolated cases. • This works well in narrow domain like I-Browse, though it is not the perfect solution

  23. From broad domain to narrow domain • The challenge for image search engines on a broad domain is to tailor the engine to the narrow domain the user has in mind via specification, examples, and interaction.

  24. Bridging the Gap • New challenges in content-based retrieval are the huge amount of objects to search among, the incomplete query specification, the incomplete image description, and the variability of sensing conditions and object states. • The aim of content-based retrieval systems must be to provide maximum support in bridging the semantic gap between the simplicity of available visual features and the richness of the user semantics. • The broader the domain, the more browsing or search by association can be the right solution. The narrower the domain, the more likely an application of domain knowledge will succeed

  25. Video Retrieval • There are three major processes to prepare a video for retrieval, video segmentation, index extraction and keyframe extraction. • From another perspective, video retrieval could be considered simpler than image retrieval since video reveals its objects more easily as the points corresponding to one object move together. • In addition, video has a linear timeline, as important to the narrative structure of video as it is in text.

  26. Video Retrieval • video segmentation divides the video into a number of segments by detecting the camera breaks. • Index extraction: manual indexing, image analysis and computer vision and object recognition • Keyframe extraction is to select representative image frames from each video segment to represent the segment. These keyframes may be used for browsing and for presentation.

More Related