creation of heterogeneous xml document collections based on the internet movie database l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Creation of Heterogeneous XML Document Collections based on the Internet Movie Database PowerPoint Presentation
Download Presentation
Creation of Heterogeneous XML Document Collections based on the Internet Movie Database

Loading in 2 Seconds...

play fullscreen
1 / 9

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database - PowerPoint PPT Presentation


  • 215 Views
  • Uploaded on

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database presented by Ivelina Stavreva Content Goalrepresentation What is IMDB? Possible Sources of Heterogenity Examplediagram for Heterogeneous XML Documents Program run Goal

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Creation of Heterogeneous XML Document Collections based on the Internet Movie Database' - jacob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
creation of heterogeneous xml document collections based on the internet movie database

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database

presented

by

Ivelina Stavreva

content
Content
  • Goalrepresentation
  • What is IMDB?
  • Possible Sources of Heterogenity
  • Examplediagram for Heterogeneous XML Documents
  • Program run
slide3
Goal
  • Motivation: Lack of large heterogeneous collections of XML data
  • Until now: DBLP or INEX, large collections with homogeneous structure
  • Problem: DBLP or INEX inadequate for similarity search
  • Goal: heterogeneous collection of XML Document Collections
what is imdb
What is IMDB?
  • a rich source of information about movies and people involved in the movie business (actors, directors, editors, producers, etc.)
  • contains:

- factual data (like birthday)

- textual data (like biography)

possible sources of heterogenity in xml docs for imdb
 Applying information from IMDB

- replace person (movie) names (titles) by their alternative names (titles). Example:

<movie id=„195067“> <movie id=„195067 “>

<title>Matrix Resolution, The</title> <title>Matrix 3, The</title>

<alt_title>Matrix 3, The</alt_title> <alt_title>Matrix resolution, The</alt_title>

</movie> </movie>

- replace tag <movie> by tags

derived from genres (<thriller>, <drama>,etc.)

Possible sources of heterogenity in XML docs for IMDB
possible sources of heterogenity in xml docs for imdb6
Possible sources of heterogenity in XML docs for IMDB

• Using different languages

- replace tags by their counterparts (e.g. <movie> by <film>)

<movie id=„195067“> <film id=„195067 “>

<title>Matrix Resolution, The</title> <titel>Matrix resolution, The</titel>

<alt_title>Matrix 3, The</alt_title> <alt_titel>Matrix 3, The</alt_titel>

</movie> </film>

possible sources of heterogenity in xml docs for imdb7
Possible sources of heterogenity in XML docs for IMDB
  • Different granularities

- One XML document per year (location), listing some of the movies filmed then (there)

-<movie id=„195067“> <year2000>

<title>Matrix Resolution, The</title> <title>Matrix resolution, The</title>

<prod_year>2000<prod_year> <title>abc</title>

</movie> </year >

-<movie id=„195068“>

<title>abc</title>

<prod_year>2000</prod_year>

</movie>

possible sources of heterogenity in xml docs for imdb8
Possible sources of heterogenity in XML docs for IMDB
  • Different granularities

- One XML document for all movies with the same

director

-<movie id=„195067“> <personX>

<title>xyz</title> <title>xyz</title>

<director>X<director> <title>abc</title>

</movie> </personX>

-<movie id=„195068“>

<title>abc</title>

<director>X</director>

</movie>

slide9

Examplediagram for Heterogeneous XML Documents

all movies

<persons>

List

60%

10%

<movie>

List

20%

30%

<location>

List

<year>

List

40%

10%

20%

30%

all persons