Techniques for information searching and retrieval of web based multimedia digital library
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library. Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof. K.W. Ng Markers:Prof. K. H. Lee Prof. Y. S. Moon. 3 May 2000. Abstract.

Download Presentation

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Techniques for information searching and retrieval of web based multimedia digital library

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library

Presented by:Vincent Cheung

Supervisors: Prof. Michael Lyu

Prof. K.W. Ng

Markers:Prof. K. H. Lee

Prof. Y. S. Moon

3 May 2000


Abstract

Abstract

  • Digital Library is getting more and more popular, due to its strength in searching and retrieving information.

  • Web-based environment provides a better media for information sharing.

  • The trend that more multimedia information are needed to be stored instead of pure text.

  • Research on the techniques for multimedia information searching and retrieval in a web-based digital library.


Presentation outline

Presentation Outline

  • XML overview

  • Data structures for multimedia news archives

    • for video clips

    • using graph structures of XML

    • giving annotation

  • Architecture and agents of digital library

  • Research plan and conclusion


Overview of xml

Overview of XML

  • XML - eXtensible Markup Language

  • Proposed by WWW Consortium, in 1998

  • To define a complete, platform-independent and system-independent environment for the authoring and delivery of information resources across the web.

  • Semistructured


How xml differs from html

How XML differs from HTML

  • Extensibility - new tags may be defined at will

  • Structure - XMLStructures can be nested to arbitrary depth

  • Validation - An XML document can contain an optional description of its grammar


Xml documents

database

news

news

date

title

reporter

content

XML Documents

  • use elements and attributes to describe your document

  • <database>

  • <news>

  • <date year = “2000” month = “4” day = “15”/>

  • <title>Press warning appropriate, says Beijing</title>

  • <reporter>Kong Lai-fan</reporter>

    • <reporter>Greg Torode</reporter>

      • <content>

      • Beijing yesterday defended remarks made by senior

      • SAR-based official Wang Fengchao that local media

      • should avoid reporting separatist views.

      • </content>

  • </news>

  • <news>

    • . . .

  • </news>

  • </database>


  • Document type definition

    Document Type Definition

    • providing the definition of a document type, for member documents to follow

    • <!DOCTYPE database [

    • <!ELEMENT database (news*)>

    • <!ELEMENT news (date,title,reporter*,content)>

    • <!ELEMENT date year CDATA#REQUIRED

      • monthCDATA#REQUIRED

        • dayCDATA#REQUIRED>

  • <!ELEMENT title (#PCDATA)>

  • <!ELEMENT reporter (#PCDATA)>

  • <!ELEMENT content (#PCDATA)>

  • ]>


  • Data structure for news videos

    Data Structure for News Videos

    • Multimedia presentation

    • Graph structure property

      • keyword directory

      • thesaurus / classification directory

      • person / place directory

      • Chinese-English dictionary

    • Semistructure property

      • annotation


    Indexing a video

    Indexing a Video

    • Segment the video hierarchically into scenes. (A video is composed of one or more related scenes.)

    • Describe the complete news video using bibliographic information (title, source, reporters, and abstract, etc…) plus format, duration, etc.

    • Describe each scene – id, start frame (time), end frame (time), keyframe, and scripts.

    • A OCR tools is implemented for indexing the videos in last semester.


    Indexing a video1

    Indexing a Video

    For a news clip:

    id = 1234

    title = N. T. swamped after torrential downpour

    date = 1999-9-9

    source = Hone Kong ATV

    reporter = Chan Tai Man

    abstract = Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR.

    duration = 2:34:56

    has_scene = 1234.1, 1234.2, 1234.3

    format = MPEG

    language = Cantonese

    identifier = http://www.cse.cuhk.edu.hk/1.mpg”


    Indexing a video2

    Indexing a Video

    For a scene:

    id = 1234.1

    belong_to = 1234

    next_scene = 1234.2

    prev_scene = null

    start_time = 0:0:00

    end_time = 0:30:45

    keyframe = 1238

    transcrpt = . . .


    Techniques for information searching and retrieval of web based multimedia digital library

    Sample News Entry

    In NewsDatabase.XML:

    <database>

    <news>

    <date><year>2000</year><month>4</month><day>15</day>

    </date>

    <title>N.T.swamped after torrential downpour</title>

    <content>Large areas of the northwest New

    Territories were under water yesterday as

    torrential rain swept across the SAR.

    </content>

    </news>

    . . .

    </database>


    Keyword directory

    Keyword Directory

    • Each news has its own keyword elements

    • Build a keyword directory containing all keywords

    • Every keyword points to the news that having the same keyword


    Techniques for information searching and retrieval of web based multimedia digital library

    Keyword Directory

    news

    ID = 0010

    title

    date

    keyword

    reporter

    N. T. swamped after torrential downpour

    15 April, 2000

    flood

    Clifford Lo

    News Database is a tree structure

    keyword

    keyword

    keyword

    keyword

    flood

    France

    fuel

    gun

    ID

    ID

    ID

    Keyword directory would be pointed by news entries, and also point to news entries.

    0010

    0017

    0137

    database

    news

    news

    news

    news

    ID = 0010

    ID = 0015

    ID = 0017

    ID = 0043

    Keywords point to news database again to for a graph structure


    Techniques for information searching and retrieval of web based multimedia digital library

    Keyword Directory

    In NewsDatabase.XML:

    <database>

    <news ID=”0010”>

    <date><year>2000</year><month>4</month><day>15</day>

    </date>

    <title>N.T.swamped after torrential downpour</title>

    <keyword>flood</keyword>

    <keyword>storm</keyword>

    <content>Large areas of the northwest New

    Territories were under water yesterday as

    torrential rain swept across the SAR.

    </content>

    </news>

    . . .

    </database>


    Techniques for information searching and retrieval of web based multimedia digital library

    Keyword Directory

    In KeywordDirectory.XML:

    <keyworddirectory>

    . . .

    <keyword word=”flood”>

    <newsid>0010</newsid>

    <newsid>0017</newsid>

    <newsid>0137</newsid>

    . . .

    </keyword>

    . . .

    </keyworddirectory>


    Techniques for information searching and retrieval of web based multimedia digital library

    Thesaurus/Classification Directory

    • To search for terms with similar meaning to the keyword

      • <thesaurus>

      • <item term = “organisation”>

      • <spelling>organization</spelling>

      • <similar>association</similar>

      • </term>

      • <item term = “World Trade Organization”>

      • <spelling>World Trade Organisation

      • </spelling>

      • <abbreviation>WTO</abbreviation>

      • </item>

      • . . .

      • <thesaurus>


    Techniques for information searching and retrieval of web based multimedia digital library

    Thesaurus/Classification Directory

    • To search for subset terms of the given keyword

      • <thesaurus>

      • <item term = “organisation”>

      • <spelling>organization</spelling>

      • <similar>association</similar>

      • </term>

      • <item term = “disaster”>

      • <contains>flood</contains>

      • <contains>earthquake</contains>

      • <contains>fire</contains>

      • <contains>storm</contains>

      • </item>

      • <item term = “flood”>

      • <belongs>disaster</belongs>

      • </item>

      • . . .

      • <thesaurus>


    Techniques for information searching and retrieval of web based multimedia digital library

    Web Search Engine


    Techniques for information searching and retrieval of web based multimedia digital library

    Person / Place Directory

    • Person Directory ( Person ID, name, newsid, …)

    • <person_directory>

      • <person id = “wangfengchao”>

      • <name><first>Fengchao</first><last>Wang</last></name>

      • <nationality>Chinese</nationality>

      • <organization> The central Government’s Liaison Office </organization>

      • <position>deputy director</position>

      • <newsid>0123</newsid> <newsid>0245</newsid> ...

      • </person>

      • . . .

      • </person_directory>


    Techniques for information searching and retrieval of web based multimedia digital library

    Person / Place Directory

    • In news database:

    • <newsdatabase>

      • <news id = “0123”>

      • <date year=“2000” month=“4” day=“15”/>

      • <title>Press warning appropriate, says Beijing

      • </title>

      • <reporter>Kong Lai-fan</reporter>

      • <content>

      • Beijing yesterday defended remarks madeby senior

      • SAR-based official <person id=“wangfengchao”>

      • Wang Fengchao</person> that local media should

      • avoid reporting separatist views.

      • </content>

      • </news>

      • . . .

      • </newsdatabase>


    Techniques for information searching and retrieval of web based multimedia digital library

    news

    ID = 0123

    title

    date

    keyword

    content

    media

    Presswarning appropriate, says Beijing

    15 April, 2000

    Person

    Wang Fengchao

    person

    person

    person

    person

    Wang Fengchao

    John

    Tom

    Robert

    ID

    ID

    ID

    Person directory would be pointed by news entries, and also point to news entries.

    0123

    0246

    0369

    database

    news

    news

    news

    news

    ID = 0123

    ID = 0155

    ID = 0246

    ID = 0258

    Person entries point to news database again to form a graph structure

    Person / Place Directory


    Techniques for information searching and retrieval of web based multimedia digital library

    Person / Place Directory

    Place Directory: category structure

    <place_directory> <place_id=“china” class=“country”> <name>China</name> <newsid>5839</newsid> . . . <have_places> <place_id>=“hongkong” class=“SAR”><name>Hong Kong</name><have_places> <place id=“NT” class=“district”> <name>New Territories</name> </place> . . . </have_places><newsid>0010</newsid> . . . </place> . . . </have_places> `</place> </place_directory>


    Techniques for information searching and retrieval of web based multimedia digital library

    Person / Place Directory

    • In news database:

    • <newsdatabase>

      • <news id = “0010” place=“hongkong”>

      • <date year=“2000” month=“4” day=“15”/>

      • <title>N.T.swamped after torrential downpour </title>

      • <reporter>Clifford Lo</reporter>

      • <content>

      • Large areas of the northwest <place id=“NT”>

      • New Territories</place> were under water yesterday as torrential rain swept across the

      • <place id=“hongkong”> SAR </place>.

      • </content>

      • </news>

      • . . .

      • </newsdatabase>


    Chinese english dictionary

    Chinese-English Dictionary

    • Translate the keywords for searching

    • We can have English to Chinese dictionary:

      <e2cdict>

      <english char = “f”>

      <english char = “l”>

      <english char = “o”>

      <english char = “o”>

      <english char = “d”>

      <chinese>氾濫</chinese>

      <chinese>水災</chinese>

      <chinese>洪水</chinese>

      . . .

      </english>

      </english>

      . . .

      </e2cdict>


    Chinese english dictionary1

    Chinese-English Dictionary

    • We can have Chinese to English dictionary:

      <c2edict>

      <chinese term = “世”>

      <chinese term = “貿”>

      <english>WTO</english>

      <english>World Trade Organization

      </english>

      </chinese>

      . . .

      </chinese>

      . . .

      </c2edict>


    Annotation

    Annotation

    • XML is semistructured!

    • More flexibility in adding tags to contents.

    • Add our tags to give annotation to the strings to provide “meanings” to it.

    • Hence, more expressive queries can be supported.


    Annotation example

    Annotation: example

    <content>

    Radioactive coolant water leaked at a nuclear reactor in western Japan yesterday, but the accident had no impact on the environment, the plant director said. "Today when the plant was operating with its usual output, a worker found a small leak of primary coolant water from a pipe of the No 2 reactor," said Katsuhiko Takahashi.

    </content>

    • We understand… but the system doesn’t…


    Annotation example1

    Annotation: example

    <content>

    <disaster nature=“radioactive” death=“0” injuried=“0”>Radioactive coolant water leaked at a nuclear reactor</disaster> in western <place id=“japan”> Japan </place>yesterday, but the accident had no impact on the environment, the plant director said. "<speech speaker="Katsuhiko Takahashi"> Today when the plant was operating with its usual output, a worker found a small leak of primary coolant water from a pipe of the No 2 reactor </speech>," said <person="Katsuhiko Takahashi">Katsuhiko Takahashi </person>.

    </content>


    Usage of annotation

    Usage of Annotation

    • So, we can have queries like:

      • All the speeches from Zhu Rongji in last month

      • All storms which kill more than 200 people

    • We can also make some links to give more details to people or places, etc.


    Architecture of digital library

    Architecture of Digital Library

    • Designing stores and query processors for semistructured data.

    • Traditional database systems use a client/server architecture.

    • Over the distributed environment has given rise to two new architectures, they are data warehouses and mediators.

    • Video servers will also be integrated to our system to provide video streaming.


    Data warehouse

    Data Warehouse

    client

    client

    client

    answer

    query

    warehouse

    data

    update

    update

    data

    data

    data

    update

    server

    server

    server

    data

    data

    data


    Mediator

    client

    client

    client

    answer

    query

    mediator

    query

    query

    answer

    answer

    answer

    query

    server

    server

    server

    data

    data

    data

    Mediator


    Agents using structured data

    Agents Using Structured Data

    • Larger demands for more structured data than loosely structured HTML.

    • Using semistructured XML data can provide a very good environment for Web agents.

    • Our main aim of implementing our agent is to illustrate that our semistructured XML data can provide a better environment for an agent to work.


    Research plan conclusion

    Research Plan & Conclusion

    • Design of the structure in XML semistructured format

      • to support multimedia data, multilingual data, and various kind of retrieval.

    • Architecture of the system that allows multiple sources of data.

    • Implementing an agent is to illustrate that our semistructured data can provide a better environment for an agent to work.


    Q a session

    Q & A Session


  • Login