techniques for information searching and retrieval of multimedia digital library l.
Skip this Video
Loading SlideShow in 5 Seconds..
Techniques for Information Searching and Retrieval of Multimedia Digital Library PowerPoint Presentation
Download Presentation
Techniques for Information Searching and Retrieval of Multimedia Digital Library

Loading in 2 Seconds...

play fullscreen
1 / 34

Techniques for Information Searching and Retrieval of Multimedia Digital Library - PowerPoint PPT Presentation

  • Uploaded on

Techniques for Information Searching and Retrieval of Multimedia Digital Library. Presented by: Vincent Cheung Supervised by: Prof. Michael Lyu Prof. K. W. Ng 18 December, 1999. Abstract.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Techniques for Information Searching and Retrieval of Multimedia Digital Library

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
techniques for information searching and retrieval of multimedia digital library

Techniques for Information Searching and Retrieval of Multimedia Digital Library

Presented by: Vincent Cheung

Supervised by: Prof. Michael Lyu

Prof. K. W. Ng

18 December, 1999

  • Digital Library is getting more and more popular, due to its strength in searching and retrieving information.
  • The trend that more multimedia information are needed to be stored instead of pure text.
  • As the nature of multimedia information is very different from that of pure text, new challenge in information searching and retrieval techniques is arose.
presentation outline
Presentation Outline
  • General Information Retrieval Methods
  • Multimedia & Their Retrieval Techniques
  • Retrieval Techniques in Other Information Searching Application
  • An Indexing Tool Implemented
  • Conclusion and Q&A Session
overview information searching and retrieval procedures
Overview- Information Searching and Retrieval Procedures
  • Give indexes to the existing information
  • Store information with good organization
  • Get the user queries
  • Search the information
  • Evaluate the importance of all query results
  • Present the results to the users
  • Process the feedback of the users
flowchart of retrieval processes

User Queries

Display to Users

Extract the keywords of user query for further searching

Start operation for retrieved answers by evaluating their rankings and construct the output


Formulate the keywords with logical operations (e.g. AND, OR, etc)

Perform logical combination of terms to obtain answers which satisfy the logical restrictions

Matching Items

Search operations by comparing keywords for documents and search requests

Indexed Database

Unmatched Items

Flowchart of Retrieval Processes

Aim: to give abstract of the document and label it with a few keywords

  • Manual indexing
  • Using whole passage
  • “Content Words” counting
  • Natural language processing
query modification
Query Modification

Aim: to modify the query such that it can yield the largest amount of relevant results

Problems related to linguistic:

  • Words carry out only syntactic functions
  • Words supply the same or related meaning
  • Words can be used in different senses, depends on contents
  • Different structures represent the same idea
solving linguistic problems
Solving Linguistic Problems
  • Use of Dictionaries:
    • Negative Dictionary
    • Thesaurus (or Synonym Dictionary)
    • Phrase Dictionary
  • Use of Fuzzy Logic for matching synonym:
    • Construct a set of fuzzy relations, which represented by fuzzy graphs that are obtained from statistics of occurrence and co-occurrence of keywords.
searching and storage
Searching and Storage

Aim: Good organization in storing can give good performance in searching.

  • Two main principals of file organization: direct and inverted systems
  • Direct system: files are stored in order by document numbers, and items are retrieved by sequential scan of the complete files.
  • Advantage of Direct system: allows several searches to perform at the same time.
searching and storage cont
Searching and Storage (cont’)
  • Inverted system: arrange the files in order by a set of keywords or index terms. Each item is normally listed as many times as there are assigned keywords.
  • Advantage of Inverted system: only need to extract from the files in the sections that correspond to the index terms used in queries
  • More other methods: variations of these two principals
evaluation on searching results
Evaluation on Searching Results
  • Aim: to rank the list of answers from the search by using some ranking functions
  • Different ranking functions for calculating the weight of returned answers
  • One simple and popular function: Counting the occurrence of query keywords
  • Not very fair… longer passages would have higher opportunity to contain more keywords

Aim: to let users redefined the query statements for more responsive results

  • Asking users to give feedback to the query results because of unclear queries, change in user interest, etc.
  • Query statements may be modified, and system should performs further searching. The relevant items should produce higher correlation than the original.
flowchart of feedback

Does the user have to terminate the search, or has the maximum permission no. of iteration been reached?

Read the max no. of documents to be examined by users for successive iterations. Then do the searching.

Proceed with evaluation of successive iterations and print results

User input




Modify query using relevance judgements for the first nidocuments of previous iteration

Search document collection with newly constructed modified query and produce user output

Flowchart of Feedback
concept based query
Concept Based Query
  • An object oriented method for indexing
  • Conceptual indexes (classes) are used, and a decision tree hierarchy is formed by those classes.
  • Users make the same queries
  • Instead of returning answering documents, list of concepts are returned at first time.
  • Then narrow their search by indicating the desired classes or concepts
characteristics of multimedia
Characteristics of Multimedia
  • Large in file size
  • May be dynamic in nature (e.g. audio or video) instead of static (e.g. text, image)
  • No simple methods for indexing or describing the contents of the files
  • Varies kinds of file formats (e.g. JPEG, GIF, TIFF in images, MOV, MPEG in video)
existing multimedia digital library informedia
Existing Multimedia Digital Library - Informedia
  • Convert multimedia to text - Speech Recognition and Optical Character Recognition. So, indexing and searching can be done by traditional methods
  • Face Recognition - non-text-based technique, for matching faces of persons in videos
  • Presenting Results - Poster frame, Filestrip, and skimming. Give users a faster review of the query answers for choosing desired video
internet search engines
Internet Search Engines
  • Internet is similar to Digital Library
    • a huge database
    • heterogeneous information
    • dynamic
    • decentralized
  • Common Internet search engines are using centralized index database
  • Disadvantages:
    • heavy workload of server
    • inefficient use of bandwidth
    • bad quality of results
distributed search engine
Distributed Search Engine
  • Local proxy servers can be enhanced to perform web searching, a network of search engines then can be established
  • Faster response time and network traffic can be reduced
  • Better results should be given
video on demand systems
Video-on-Demand Systems
  • VoD systems deliver videos to clients upon their requests
  • VoD system is similar to Digital Library
    • deliver videos upon user requests, which are large in content sizes
  • Efficient retrieval is needed, and it can be archived only if there is an efficient storage method.
how data be stored in vod
How Data be Stored in VoD
  • Primary design goal is to maximize the ratio of the number of concurrent streams to system cost while guaranteeing glitch-free operation
  • An array of magnetic harddisks, and a large RAM buffer are used.
  • RAM is faster in I/O rates than harddisks, so popular videos are put in RAM
  • A popular video should not be stored with other popular videos. Better balance of workload.
  • RAID is used and I/O is done by the whole array of disks at the same time.
image databases
Image Databases
  • Documents are not indexed by verbal description, as it may not be able to well-described the contents.
  • Other means would be used, e.g. histogram representation, shape chains, etc.
  • Similar to Digital Library:
    • They are storing multimedia information.
motion databases
Motion Databases
  • Implemented by Deng (1997). Closer to digital library.
  • Index the video by three primary features:
    • color (color histogram)
    • texture (Gabor texture features)
    • motion (motion histogram)
  • Good for sports or movie data
chinese searching engines
Chinese Searching Engines
  • Similar methods as English can be used
  • Chinese is very different from English as it is less structural. (e.g. 吃了小明的狗) Cannot parse the sentence according the grammers
  • It is difficult to extract the idea in documents and identify the keywords for indexing
  • Subject-verb-object (SVO) can be used for identify the syntactic components
an indexing tool chinese subtitles extraction in video
An Indexing Tool: Chinese Subtitles Extraction in Video
  • Many dialects in Chinese, but Chinese Characters is common in anywhere
  • Many video programs have Chinese subtitles nowadays
  • Extract text from digital video programs can help for indexing, searching and retrieval
features of subtitles
Features of Subtitles
  • Characters are in foreground
  • They are monochrome
  • They are rigid, from frame to frame
  • They are upright
  • They have size restrictions
  • They contrast with the background
  • They appear in clusters at a limited distance aligned to a horizontal line
  • Two main challenges:
    • to segment the character areas
    • to recognize the characters
  • Four phases:
    • extract the subtitle block from the background
    • extract each character from subtitle block
    • recognize the Chinese Characters
    • process the whole video
sample frame
Sample Frame
  • ATV video news in MPEG format about Airport Authority
  • First, extract one frame from the video
edge filtering
Edge Filtering
  • Do edge filtering to the frame by using Sobel filter.
subtitle block extraction
Subtitle Block Extraction

High Density of Edge indicates there is a subtitle block

character extraction
Character Extraction
  • Filter the area with background and keep the subtitle block
  • Use the same method, segment the characters
results of recognition
Results of Recognition
  • A Chinese Character Image Library is built for recognition
  • 5401 frequently used Chinese characters
  • Simple subtraction is used for recognition
  • Characters segmented
  • Characters recognized
  • The successful rate of segmenting the characters is quite high (~90% in general)
  • Low successful rate in character recognition (~15% in general)
  • Better algorithms for character recognition would be tried
  • Can be used for indexing video clips for digital library
  • Information Retrieval is relating to many different fields: linguistic, image processing, data organization, hardware utilization, etc.
  • Many procedures in Information Retrieval: indexing, searching, organizing data, etc.
  • Choose one specific area to work on in the coming semester.