Merging source query interfaces on web databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Merging Source Query Interfaces on Web Databases PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Merging Source Query Interfaces on Web Databases. Eduard C. Dragut (speaker) Wensheng Wu Prasad Sistla Clement Yu Weiyi Meng. University of Illinois at Chicago University of Illinois at Urbana-Champaign University of Illinois at Chicago University of Illinois at Chicago

Download Presentation

Merging Source Query Interfaces on Web Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Merging source query interfaces on web databases

Merging Source Query Interfaces on Web Databases

Eduard C. Dragut(speaker)

Wensheng Wu

Prasad Sistla

Clement Yu

Weiyi Meng

University of Illinois at Chicago

University of Illinois at Urbana-Champaign

University of Illinois at Chicago

University of Illinois at Chicago

SUNY at Binghamton

ICDE 2006, Atlanta, USA


A motivating scenario

orbitz.com

delta.com

A Motivating Scenario:

  • Looking for a ticket

    • Chicago – Atlanta, April 3rd – April 9th

aa.com

  • A user looking for the “best” price for a ticket:

    • Has to explore multiple sources

    • It is tedious, frustrating and time-consuming

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


The goal

Formulate the query

The goal

  • Provide a unified way to query multiple sources in the same domain

The Web

Unified query interface

Airfare.com

priceline.com

united.com

delta.com

nwa.com

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Overview integrating query interfaces

Auto

Car Rental

Books

Extract query

interfaces

Cluster query

interfaces

Match query interfaces

He05, Zhang04

B.He03, Dhamankar04, Doan02, Madvan05,

Wu04

Airfare

Peng04

Various formats

e.g. ASCII files

Merge Query

Interfaces

H.He03

  • The topic of this presentation

Overview Integrating Query Interfaces

(Deep) Web

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Merge algorithm

Merge Algorithm

  • The input

    • A set of query interfaces in the same domain

      • E.g. Airline domain: Delta, AA, NWA, Orbitz, Travelocity

      • Each query interface is represented hierarchically [Wu04]

    • Anda mapping, globally characterizing the semantic correspondences between the fields in the query interfaces.

      • Organized in clusters (e.g. [Wu04 et al, B.He03 et al])

vacations.net

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


An example

An Example

  • Three fragments of query interfaces represented hierarchically

  • The mapping between them, i.e. the set of clusters

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Merge algorithm1

Merge Algorithm

  • The output

    • A unified query interface that

      • consists of all the fields of individual interfaces, i.e. it has afield for each of the clusters in the mapping definition

      • preserves all the constraints enforced by the interfaces being merged

    • The constraints to be satisfied by the global interface are:

      • the grouping constraints (to be described) and

      • the ancestor-descendant relationships among the elements within individual interfaces.

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Grouping

Grouping

  • Within a domain of discourse (e.g. Airfare) we observe:

    • A spatial locality property among the fields of query interfaces

      • Designers tend to place related fields close to each other

  • Hence, in the integrated interface these fields should be placed in adjacent positions, too

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Grouping problem

Grouping Problem

  • The goal (requirement)

    • Groups of fields that occur together in the source query interfaces to appear together in the integrated interface

    • The actual order of elements is immaterial

  • The problem

    • Find a partition over the set of fields of a given domain characterizing the way fields are grouped in the integrated interface.

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Capture grouping constrains

Capture Grouping Constrains

  • Introduce the notion of potential groups

    • Informally, it is a maximal set of adjacent sibling leaves whose parent is not the root

    • Capture the way fields are organized within source query interfaces

    • Underline designer’s perspective that these fields should be together so that users can easily understand what is required and fill in the desired information with ease.

  • Example

The set of all potential groups induced by query interface Travel

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Constructing groups

Constructing Groups

  • Use these structural information collected from multiple source interfaces to infer the way fields are organized in the integrated interface

  • Introduce the notion of a group of fields

    • Informally, it is a sequence of fields that preserves the adjacency constraints within related potential groups

      • Two potential groups are related if their intersection is nonempty.

    • A group represents the desired organization of the fields in an integrated interface

  • An example:

    • Set of related potential groups:

      • {Depday, DepMonth, DepTime}, {Departure month, Departure day, Departure Year}, {depDay, depMonth}

    • The resulted group:

      • [DepTime, Departure day, Departure month, Departure Year]

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Grouping problem as c1p

Grouping Problem as C1P

  • The grouping problem can be cast into the Consecutive Ones Property (C1P) problem [Booth76 et al, Fulkerson65 at al].

    • For an universal set U and a subset, B, of the power set of U we want a permutation пof the elements of U such that all the elements in each set in B appear as a consecutive sequence in п.

    • In our grouping problem

      • Potential groups correspond to the set B

      • U is the union of the fields in the potential groups

      • П is the desired permutation of the fields

  • Several algorithms to obtain the groups in the integrated schema

    • E.g. PQ-tree algorithm [Meidanis98 et al]

      • Used in our implementation

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Grouping problem as c1p1

Grouping Problem as C1P

  • An example of applying the PQ-tree algorithm

    • Set of related potential groups:

      • B = {{c_DepDay, c_DepMonth, c_DepTime}, {c_DepMonth, c_DepDay, c_DepYear}, {c_DepDay, c_DepMonth}}

      • U = {c_DepDay, c_DepMonth, c_DepYear, c_DepTime}

Frontier gives the group

  • A permutation satisfying all related potential groups cannot always be derived

    • Minimize the number of violations

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Constructing groups1

Constructing Groups

  • On the running example

    • The set of all groups

      • [c_DepCity, c_DestCity]

      • [c_DepTime, c_DepDay, c_DepMonth, c_DepYear]

      • [c_Seniors, c_Adults, c_Children, c_Infants]

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Constructing groups2

Constructing Groups

  • On the running example

    • The set of all groups

      • [c_DepCity, c_DestCity]

      • [c_DepTime, c_DepDay, c_DepMonth, c_DepYear]

      • [c_Seniors, c_Adults, c_Children, c_Infants]

They were not considered (children of the root)

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Pairwise merge

Pairwise merge

  • For a set of query interfaces:

    • Iteratively merge two at a time

      • Traversing the schema trees bottom-up

      • Placing of group elements

      • Preserving ancestor-descendant relationships in the source schemas

  • On the running example

    • First iteration

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Pairwise merge1

Pairwise merge

  • Second iteration

  • Note, the fields are naturally placed in the merged interface

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Experiment

Experiment

  • Setup

    • Five real world domain:

    • Mapping consists of clusters [Wu04 et al]

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Experiment1

Experiment

  • The characteristics of the integrated interfaces.

  • All group constraints are satisfied with the exception of two potential groups in the airline domain

    • [Seniors, Adults, Children, Infants] and [Airline, Class, NonStop].

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Example integrated interfaces

Example Integrated Interfaces

  • Airfare domain integrated interface

  • Note that fields are placed naturally

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Example integrated interfaces1

Example Integrated Interfaces

  • Auto domain integrated interface

  • Note that fields are placed naturally

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


Merging source query interfaces on web databases

End

  • Please visit the project web site

    • http://www.cs.uic.edu/~edragut/QIProject.html

Thank you for your time and patience!

E. Dragut et al -

Merging Source Query Interfaces on Web Databases


  • Login