1 / 9

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database presented by Ivelina Stavreva Content Goalrepresentation What is IMDB? Possible Sources of Heterogenity Examplediagram for Heterogeneous XML Documents Program run Goal

jacob
Download Presentation

Creation of Heterogeneous XML Document Collections based on the Internet Movie Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creation of Heterogeneous XML Document Collections based on the Internet Movie Database presented by Ivelina Stavreva

  2. Content • Goalrepresentation • What is IMDB? • Possible Sources of Heterogenity • Examplediagram for Heterogeneous XML Documents • Program run

  3. Goal • Motivation: Lack of large heterogeneous collections of XML data • Until now: DBLP or INEX, large collections with homogeneous structure • Problem: DBLP or INEX inadequate for similarity search • Goal: heterogeneous collection of XML Document Collections

  4. What is IMDB? • a rich source of information about movies and people involved in the movie business (actors, directors, editors, producers, etc.) • contains: - factual data (like birthday) - textual data (like biography)

  5.  Applying information from IMDB - replace person (movie) names (titles) by their alternative names (titles). Example: <movie id=„195067“> <movie id=„195067 “> <title>Matrix Resolution, The</title> <title>Matrix 3, The</title> <alt_title>Matrix 3, The</alt_title> <alt_title>Matrix resolution, The</alt_title> </movie> </movie> - replace tag <movie> by tags derived from genres (<thriller>, <drama>,etc.) Possible sources of heterogenity in XML docs for IMDB

  6. Possible sources of heterogenity in XML docs for IMDB • Using different languages - replace tags by their counterparts (e.g. <movie> by <film>) <movie id=„195067“> <film id=„195067 “> <title>Matrix Resolution, The</title> <titel>Matrix resolution, The</titel> <alt_title>Matrix 3, The</alt_title> <alt_titel>Matrix 3, The</alt_titel> </movie> </film>

  7. Possible sources of heterogenity in XML docs for IMDB • Different granularities - One XML document per year (location), listing some of the movies filmed then (there) -<movie id=„195067“> <year2000> <title>Matrix Resolution, The</title> <title>Matrix resolution, The</title> <prod_year>2000<prod_year> <title>abc</title> </movie> </year > -<movie id=„195068“> <title>abc</title> <prod_year>2000</prod_year> </movie>

  8. Possible sources of heterogenity in XML docs for IMDB • Different granularities - One XML document for all movies with the same director -<movie id=„195067“> <personX> <title>xyz</title> <title>xyz</title> <director>X<director> <title>abc</title> </movie> </personX> -<movie id=„195068“> <title>abc</title> <director>X</director> </movie>

  9. Examplediagram for Heterogeneous XML Documents all movies <persons> List 60% 10% <movie> List 20% 30% <location> List <year> List 40% 10% 20% 30% all persons

More Related