Techniques for information searching and retrieval of multimedia digital library
Download
1 / 34

Techniques for - PowerPoint PPT Presentation


  • 210 Views
  • Updated On :

Techniques for Information Searching and Retrieval of Multimedia Digital Library. Presented by: Vincent Cheung Supervised by: Prof. Michael Lyu Prof. K. W. Ng 18 December, 1999. Abstract.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Techniques for ' - ivanbritt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Techniques for information searching and retrieval of multimedia digital library l.jpg

Techniques for Information Searching and Retrieval of Multimedia Digital Library

Presented by: Vincent Cheung

Supervised by: Prof. Michael Lyu

Prof. K. W. Ng

18 December, 1999


Abstract l.jpg
Abstract

  • Digital Library is getting more and more popular, due to its strength in searching and retrieving information.

  • The trend that more multimedia information are needed to be stored instead of pure text.

  • As the nature of multimedia information is very different from that of pure text, new challenge in information searching and retrieval techniques is arose.


Presentation outline l.jpg
Presentation Outline

  • General Information Retrieval Methods

  • Multimedia & Their Retrieval Techniques

  • Retrieval Techniques in Other Information Searching Application

  • An Indexing Tool Implemented

  • Conclusion and Q&A Session


Overview information searching and retrieval procedures l.jpg
Overview- Information Searching and Retrieval Procedures

  • Give indexes to the existing information

  • Store information with good organization

  • Get the user queries

  • Search the information

  • Evaluate the importance of all query results

  • Present the results to the users

  • Process the feedback of the users


Flowchart of retrieval processes l.jpg

User Queries

Display to Users

Extract the keywords of user query for further searching

Start operation for retrieved answers by evaluating their rankings and construct the output

Dictionaries

Formulate the keywords with logical operations (e.g. AND, OR, etc)

Perform logical combination of terms to obtain answers which satisfy the logical restrictions

Matching Items

Search operations by comparing keywords for documents and search requests

Indexed Database

Unmatched Items

Flowchart of Retrieval Processes


Indexing l.jpg
Indexing

Aim: to give abstract of the document and label it with a few keywords

  • Manual indexing

  • Using whole passage

  • “Content Words” counting

  • Natural language processing


Query modification l.jpg
Query Modification

Aim: to modify the query such that it can yield the largest amount of relevant results

Problems related to linguistic:

  • Words carry out only syntactic functions

  • Words supply the same or related meaning

  • Words can be used in different senses, depends on contents

  • Different structures represent the same idea


Solving linguistic problems l.jpg
Solving Linguistic Problems

  • Use of Dictionaries:

    • Negative Dictionary

    • Thesaurus (or Synonym Dictionary)

    • Phrase Dictionary

  • Use of Fuzzy Logic for matching synonym:

    • Construct a set of fuzzy relations, which represented by fuzzy graphs that are obtained from statistics of occurrence and co-occurrence of keywords.


Searching and storage l.jpg
Searching and Storage

Aim: Good organization in storing can give good performance in searching.

  • Two main principals of file organization: direct and inverted systems

  • Direct system: files are stored in order by document numbers, and items are retrieved by sequential scan of the complete files.

  • Advantage of Direct system: allows several searches to perform at the same time.


Searching and storage cont l.jpg
Searching and Storage (cont’)

  • Inverted system: arrange the files in order by a set of keywords or index terms. Each item is normally listed as many times as there are assigned keywords.

  • Advantage of Inverted system: only need to extract from the files in the sections that correspond to the index terms used in queries

  • More other methods: variations of these two principals


Evaluation on searching results l.jpg
Evaluation on Searching Results

  • Aim: to rank the list of answers from the search by using some ranking functions

  • Different ranking functions for calculating the weight of returned answers

  • One simple and popular function: Counting the occurrence of query keywords

  • Not very fair… longer passages would have higher opportunity to contain more keywords


Feedback l.jpg
Feedback

Aim: to let users redefined the query statements for more responsive results

  • Asking users to give feedback to the query results because of unclear queries, change in user interest, etc.

  • Query statements may be modified, and system should performs further searching. The relevant items should produce higher correlation than the original.


Flowchart of feedback l.jpg

Does the user have to terminate the search, or has the maximum permission no. of iteration been reached?

Read the max no. of documents to be examined by users for successive iterations. Then do the searching.

Proceed with evaluation of successive iterations and print results

User input

Yes

Exit

No

Modify query using relevance judgements for the first nidocuments of previous iteration

Search document collection with newly constructed modified query and produce user output

Flowchart of Feedback


Concept based query l.jpg
Concept Based Query maximum permission no. of iteration been reached?

  • An object oriented method for indexing

  • Conceptual indexes (classes) are used, and a decision tree hierarchy is formed by those classes.

  • Users make the same queries

  • Instead of returning answering documents, list of concepts are returned at first time.

  • Then narrow their search by indicating the desired classes or concepts


Characteristics of multimedia l.jpg
Characteristics of Multimedia maximum permission no. of iteration been reached?

  • Large in file size

  • May be dynamic in nature (e.g. audio or video) instead of static (e.g. text, image)

  • No simple methods for indexing or describing the contents of the files

  • Varies kinds of file formats (e.g. JPEG, GIF, TIFF in images, MOV, MPEG in video)


Existing multimedia digital library informedia l.jpg
Existing Multimedia Digital Library - Informedia maximum permission no. of iteration been reached?

  • Convert multimedia to text - Speech Recognition and Optical Character Recognition. So, indexing and searching can be done by traditional methods

  • Face Recognition - non-text-based technique, for matching faces of persons in videos

  • Presenting Results - Poster frame, Filestrip, and skimming. Give users a faster review of the query answers for choosing desired video


Internet search engines l.jpg
Internet Search Engines maximum permission no. of iteration been reached?

  • Internet is similar to Digital Library

    • a huge database

    • heterogeneous information

    • dynamic

    • decentralized

  • Common Internet search engines are using centralized index database

  • Disadvantages:

    • heavy workload of server

    • inefficient use of bandwidth

    • bad quality of results


Distributed search engine l.jpg
Distributed Search Engine maximum permission no. of iteration been reached?

  • Local proxy servers can be enhanced to perform web searching, a network of search engines then can be established

  • Faster response time and network traffic can be reduced

  • Better results should be given


Video on demand systems l.jpg
Video-on-Demand Systems maximum permission no. of iteration been reached?

  • VoD systems deliver videos to clients upon their requests

  • VoD system is similar to Digital Library

    • deliver videos upon user requests, which are large in content sizes

  • Efficient retrieval is needed, and it can be archived only if there is an efficient storage method.


How data be stored in vod l.jpg
How Data be Stored in VoD maximum permission no. of iteration been reached?

  • Primary design goal is to maximize the ratio of the number of concurrent streams to system cost while guaranteeing glitch-free operation

  • An array of magnetic harddisks, and a large RAM buffer are used.

  • RAM is faster in I/O rates than harddisks, so popular videos are put in RAM

  • A popular video should not be stored with other popular videos. Better balance of workload.

  • RAID is used and I/O is done by the whole array of disks at the same time.


Image databases l.jpg
Image Databases maximum permission no. of iteration been reached?

  • Documents are not indexed by verbal description, as it may not be able to well-described the contents.

  • Other means would be used, e.g. histogram representation, shape chains, etc.

  • Similar to Digital Library:

    • They are storing multimedia information.


Motion databases l.jpg
Motion Databases maximum permission no. of iteration been reached?

  • Implemented by Deng (1997). Closer to digital library.

  • Index the video by three primary features:

    • color (color histogram)

    • texture (Gabor texture features)

    • motion (motion histogram)

  • Good for sports or movie data


Chinese searching engines l.jpg
Chinese Searching Engines maximum permission no. of iteration been reached?

  • Similar methods as English can be used

  • Chinese is very different from English as it is less structural. (e.g. 吃了小明的狗) Cannot parse the sentence according the grammers

  • It is difficult to extract the idea in documents and identify the keywords for indexing

  • Subject-verb-object (SVO) can be used for identify the syntactic components


An indexing tool chinese subtitles extraction in video l.jpg
An Indexing Tool: Chinese Subtitles Extraction in Video maximum permission no. of iteration been reached?

  • Many dialects in Chinese, but Chinese Characters is common in anywhere

  • Many video programs have Chinese subtitles nowadays

  • Extract text from digital video programs can help for indexing, searching and retrieval


Features of subtitles l.jpg
Features of Subtitles maximum permission no. of iteration been reached?

  • Characters are in foreground

  • They are monochrome

  • They are rigid, from frame to frame

  • They are upright

  • They have size restrictions

  • They contrast with the background

  • They appear in clusters at a limited distance aligned to a horizontal line


Implementation l.jpg
Implementation maximum permission no. of iteration been reached?

  • Two main challenges:

    • to segment the character areas

    • to recognize the characters

  • Four phases:

    • extract the subtitle block from the background

    • extract each character from subtitle block

    • recognize the Chinese Characters

    • process the whole video


Sample frame l.jpg
Sample Frame maximum permission no. of iteration been reached?

  • ATV video news in MPEG format about Airport Authority

  • First, extract one frame from the video


Edge filtering l.jpg
Edge Filtering maximum permission no. of iteration been reached?

  • Do edge filtering to the frame by using Sobel filter.


Subtitle block extraction l.jpg
Subtitle Block Extraction maximum permission no. of iteration been reached?

High Density of Edge indicates there is a subtitle block


Character extraction l.jpg
Character Extraction maximum permission no. of iteration been reached?

  • Filter the area with background and keep the subtitle block

  • Use the same method, segment the characters


Results of recognition l.jpg
Results of Recognition maximum permission no. of iteration been reached?

  • A Chinese Character Image Library is built for recognition

  • 5401 frequently used Chinese characters

  • Simple subtraction is used for recognition

  • Characters segmented

  • Characters recognized


Evaluation l.jpg
Evaluation maximum permission no. of iteration been reached?

  • The successful rate of segmenting the characters is quite high (~90% in general)

  • Low successful rate in character recognition (~15% in general)

  • Better algorithms for character recognition would be tried

  • Can be used for indexing video clips for digital library


Conclusion l.jpg
Conclusion maximum permission no. of iteration been reached?

  • Information Retrieval is relating to many different fields: linguistic, image processing, data organization, hardware utilization, etc.

  • Many procedures in Information Retrieval: indexing, searching, organizing data, etc.

  • Choose one specific area to work on in the coming semester.


Q a session l.jpg

Q & A Session maximum permission no. of iteration been reached?


ad