multimedia information retrieval n.
Skip this Video
Loading SlideShow in 5 Seconds..
Multimedia Information Retrieval PowerPoint Presentation
Download Presentation
Multimedia Information Retrieval

Loading in 2 Seconds...

play fullscreen
1 / 143

Multimedia Information Retrieval - PowerPoint PPT Presentation

  • Updated on

Multimedia Information Retrieval. Joemon M Jose Information Retrieval Group Department of Computing Science University of Glasgow. 2 Parts. Part 1- Information Retrieval Concepts & Multimedia Retrieval

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Multimedia Information Retrieval

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Multimedia Information Retrieval Joemon M Jose Information Retrieval Group Department of Computing Science University of Glasgow

    2. 2 Parts • Part 1- Information Retrieval Concepts & Multimedia Retrieval • Part 2- Interactive Retrieval of Multimedia documents (focusing on the searcher) Multimedia Retrieval

    3. Part 1- Outline • Information Retrieval – Basic Concepts • Retrieval Models/weighting • Evaluation • Multimedia Retrieval • Evaluation • Low-level features • Retrieval Models Multimedia Retrieval

    4. Scientific Paradigm of IR Theory Top down approach Experiments Real Life Applications Bottom-up approach Basic research in IR is concerned with the design of better systems Multimedia Retrieval

    5. Text retrieval system Retrieval System Queries Indexed Documents Documents Indexing Similarity Computation Retrieved Documents Relevance – relation between query and document representation Multimedia Retrieval

    6. Major Components • Architecture of the System • Retrieval Models • Document representation • Similarity Scheme • Query representation • Query Interface Multimedia Retrieval

    7. Query IR System Architecture Documents Tokenize Tokenize Stop word Indexing Indexing Stop word Stemming Stemming Matching Indexing features Query features Storage (inverted index) Term 1 Effectiveness Aspect Doc Score di dj dk Term 2 dj s1 Term 3 Efficiency aspect! di s2 dk s3 s1>s2>s3> ... Multimedia Retrieval

    8. Retrieval Models Model Queries Similarity Computation Documents An IR Model defines - a model for document representation - a model for query representation - a mechanism for estimating the relevance of a document for a given query. Multimedia Retrieval

    9. Zipf’s Distribution • Associate with each word , its frequency F(), the number of times it occurs anywhere in the corpus • Imagine that we’ve sorted the vocabulary according to frequency • George Kingsley Zipf has become famous for noticing that the distribution is same for any large sample of natural language we might consider Multimedia Retrieval

    10. Plotting Word Frequency by Rank • Main idea: Count (Frequency) • How many times tokens occur in the text • Over all texts in the collection • Now rank these according to how often they occur. Multimedia Retrieval

    11. The Corresponding Curve Rank Freq1 37 system2 32 knowledg3 24 base4 20 problem5 18 abstract6 15 model7 15 languag8 15 implem9 13 reason10 13 inform11 11 expert12 11 analysi13 10 rule14 10 program15 10 oper16 10 evalu17 10 comput18 10 case19 9 gener20 9 form Multimedia Retrieval

    12. Word Frequency vs. Resolving Power (from van Rijsbergen 79) The most frequent words are not the most descriptive. Multimedia Retrieval

    13. Resolving Power • Why some words occur more frequently and how such statistics can be exploited when building an index automatically • …. the frequency of a word occurrence in an article furnishes a useful measurement of word significance [Luhn 1957] • Two critical factors • Word frequency within a document (TF) • Collection frequency (Inverse document frequency) Multimedia Retrieval

    14. Information Retrieval Models • Examples • Boolean Model • Vector-space model • Probabilistic models (OKAPI- BM25) • Probabilistic Ranking Principle • Logic Models • Much expressive power • Language Models • Divergence from Randomness model (DFR) Multimedia Retrieval

    15. Concepts Visited • Architecture of the System • Efficiency Issues • Effectiveness Issues • Relevance • Retrieval Model • Document Representation • Query Representation • Similarity Computation • Concept of feature weighting Multimedia Retrieval

    16. Advances & Applications Applications Advances • Web search • Desktop search • Email search • Ranking Schemes • BM-25 • Retrieval Models • Language models • Divergence from randomness Croft et. al- Search engines – Information Retrieval in practice Addison Wesley, 2009 Manning et. al. Introduction to Information Retrieval, Cambridge University Press, 2008 Multimedia Retrieval

    17. Evaluation Methodology in IR Multimedia Retrieval

    18. Why Evaluate? • To provide that your ideas/approach is better than someone else • To decide between alternate methods • To tune/train/optimize your system • To discover points of failure • Types of Evaluation • System Evaluation • User Evaluation • Evaluation in operational setting Multimedia Retrieval

    19. Evaluation – “Classical approach” • Cranfield Paradigm/TREC • Collection of documents • Queries and Ground truth • Measures • Measurement of performance • Precision, recall, F-measure, MAP • Systems Multimedia Retrieval

    20. Test Collection/Corpora • Collections of documents that also have associated with them • a set of queries for which relevance assessments are available. • Experts in the domain provide such judgements • Documents? What are they? • Genuine documents like email messages, journal articles, memos etc. kind of material for which we tend to look for some information • Queries? What are they? • Kind of questions users of such collection will ask! How do we catch them? Co-operation of user population. • Until 1992- smaller textual collections are used Multimedia Retrieval

    21. TREC: Text Retrieval Conference • Started in 1992, organised by NIST, USA and funded by the US government. • Introduced a new standard for retrieval system evaluation • millions of documents, GBs of data • USA government reports, emails, scientific abstracts, news wires • avoid exhaustive assessment of documents using the pooling method. Multimedia Retrieval

    22. Pooling Method • Basic idea is to use different search engines independently and pool their results to form a set of documents that have at least a recommendation of potential relevance • Top ranked 100 documents from each search engine is grouped together and manually assessed for relevance • assessors were retired security analysts (from CIA) • un-assessed documents are assumed to be irrelevant Multimedia Retrieval

    23. The TREC Objectives • Provide a common ground for comparing different IR • techniques. • Same set of documents and queries, and same evaluation method. • Sharing of resources and experiences in developing the • benchmark. • With major sponsorship from government to develop large benchmark collections. • Encourage participation from industry and academia. • Development of new evaluation techniques, particularly for new applications. • Retrieval, routing/filtering, non-English collection, web-based collection, question answering, video collections Multimedia Retrieval

    24. IR Functionality? • Given a Query, the IR system provides a ranked list after searching the underlying collection of documents • the assumption is • Better system provides better ranked list • A better ranked list satisfies the USERS overall • So? • How to compare ranked list of documents? • How to verify whether one system is effective than other? Multimedia Retrieval

    25. Effectiveness measure? • The function of an IR system is to: • retrieve all relevant documents • measured by recall • retrieve no non-relevant documents, • measured by precision • Effectiveness Measures • Recall & Precision Van Rijsbergen- Information Retrieval - Chapter 7 on Evaluation Multimedia Retrieval

    26. Relevant vs. Retrieved All docs Retrieved Relevant Multimedia Retrieval

    27. Precision vs. Recall All docs Retrieved Relevant Multimedia Retrieval

    28. Get as much good stuff while at the same time getting as little junk as possible. Why Precision and Recall? Multimedia Retrieval

    29. Returns relevant documents but misses many useful ones too The ideal Returns most relevant documents but includes lots of junk Trade-off between Recall and Precision 1 Precision 0 1 Recall Multimedia Retrieval

    30. Multimedia Retrieval

    31. Recall/Precision Curve • Interpolate a precision value for each standard recall level: • rj {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0} • r0 = 0.0, r1 = 0.1, …, r10=1.0 • The interpolated precision at the j-th standard recall level is the maximum known precision at any recall level between the j-th and (j + 1)-th level: Multimedia Retrieval

    32. 1.0 0.8 0.6 0.4 0.2 Interpolating a Recall/Precision Curve: An Example Precision 1.0 0.2 0.4 0.6 0.8 Recall Multimedia Retrieval

    33. Average Recall/Precision Curve • Typically average performance over a large set of queries. • Query characteristics vary? • Length, term distribution etc • Document characteristics vary? • Compute average precision at each standard recall level across all queries. • Plot average precision/recall curves to evaluate overall system performance on a document/query corpus. Multimedia Retrieval

    34. Precision/Recall Curves Multimedia Retrieval

    35. Mean Average Precision (MAP) • Average of precision values obtained after each relevant document is retrieved • If not retrieved precision is 0 • NOT the average of precision at the standard 11 recall points • Mean Average Precision (MAP) • Across all queries Multimedia Retrieval

    36. F-Measure • One measure of performance that takes into account both recall and precision. • Harmonic mean of recall and precision: • Compared to arithmetic mean, both need to be high for harmonic mean to be high. Multimedia Retrieval

    37. trec-eval • trec-eval is a publicly available program, developed by Chris Buckley, used extensively by TREC and other evaluation campaigns, which computes many usable metric values based on standardised file input formats; • Its available, multi-platform, easy to use, so use it ! Multimedia Retrieval

    38. Concepts Visited • Effectiveness measures • Precision, Recall • Precision-Recall curves • Single Measures • MAP, F-Measure • Portable test collections • Pooling method to assemble test collections Multimedia Retrieval

    39. Multimedia Evaluation • Evaluation Challenge • Comparability • Unlike text, lots of dependency on content • Domain • Features extracted • Corel data set • Some categories are easy! • Effect on retrieval • Image Annotation evaluation –Using a normalised collection, it was found that SVM with global features performed better than the state of the art image annotation algorithms! • Konstantinos et. al, A Framework For Evaluating Automatic Image Annotation Algorithms, In ECIR 2010 (LNCS 5993) Multimedia Retrieval

    40. TRECVid: Video IR Evaluation • In 2001, “video retrieval” started as a TREC track; • Usual TREC mode of operation (data-topics-search submissions-pooling-evaluation by metrics-workshop) • In 2003 TRECVid separated from TREC because if was sufficiently different, and had enough participation, though TREC and TRECVid workshops are co-located; • 2003-2004- US Broadcast news • CNN, ABC world news • 2005-2006 International broadcast news • 2007-2009 Dutch sound & vision data set (infotainment) Multimedia Retrieval

    41. Major responsibilities • NIST: Organize data, tasks, and other resources of interest with input from sponsors and participating researchers • Select and secure available data for training and test • Define user and system tasks, submission formats, etc. • LDC: Collect, secure IPR, prepare, distribute data • NIST: Define and develop evaluation infrastructure • Create shot boundary ground truth • Create and support interactive judging software for features and search • Create and support the scoring software for all tasks • Researchers: create common resources & share • Researchers: Develop systems • Researchers: Run systems on the test data and submit results to NIST • NIST: Evaluate submissions • Run automatic shot boundary scoring software • Manage the manual judging by contractors viewing a sample of system output (~76,000 shots for features, ~78,000 shots for search) • NIST, Researchers: Analyze and present results • NIST: Organize and run annual workshop in mid-November at NIST Slides from Alan Smeaton Multimedia Retrieval

    42. TRECVid 2010 Data • A new set of videos characterized by a high degree of diversity in creator, content, style, production qualities, original collection device/encoding, language, etc - as is common in much "web video". • The collection also has associated keywords and descriptions provided by the video donor. The videos are available under Creative Commons licenses from the Internet Archive. • TREC VID 2010 • Known-item search task (interactive, manual, automatic) • Semantic indexing • Content-based multimedia copy detection • Event detection in airport surveillance video • Instance search • A number of datasets are available for these tasks • Details at • Multimedia Retrieval

    43. Multimedia Information Retrieval Multimedia Retrieval

    44. Multimedia Retrieval

    45. Image Retrieval System: Architecture Representation of images Intermediate Image Representation Input image Transformation Indexing Retrieved Images Image Database Query Image Transformation Matching Intermediate Image Representation Retrieval of images General architecture of a typical image archival and retrieval system Multimedia Retrieval

    46. Luminance Histogram • Represents the relative frequency of occurrenceof the various gray levels in the image • For each gray level, count the # of pixels having that level • Can group nearby levels to form a big bin & count #pixels in it ( From Matlab Image Toolbox Guide Fig.10-4 ) Multimedia Retrieval

    47. Colour Histogram 8-bit Image Multimedia Retrieval

    48. Histogram intersection Colour Frequency A colour Multimedia Retrieval

    49. Histogram intersection Colour Frequency B A colour Multimedia Retrieval

    50. Histogram intersection Colour Frequency B A A B colour Multimedia Retrieval