Data exchange with data metadata translations
This presentation is the property of its rightful owner.
Sponsored Links
1 / 40

Data Exchange with  Data-Metadata Translations PowerPoint PPT Presentation


  • 84 Views
  • Uploaded on
  • Presentation posted in: General

Data Exchange with  Data-Metadata Translations. MAD Algorithm. Paolo  Papotti. Mauricio A.  Hernández. Wang-Chiew Tan. Data Exchange. “ Scientia potentia est ” What is Data Exchange?:

Download Presentation

Data Exchange with  Data-Metadata Translations

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data exchange with data metadata translations

Data Exchange with Data-Metadata Translations

MAD Algorithm

Paolo 

Papotti

Mauricio A. 

Hernández

Wang-Chiew

Tan


Data exchange with data metadata translations

Data Exchange

  • “Scientiapotentiaest”

  • What is Data Exchange?:

    • The process of taking data built under a source schema and transforming it into data built under a target schema

    • Data Exchange is the restructuring of data


Data exchange with data metadata translations

Data Exchange – why?

Today when companies merge they also merge information sources.


Data exchange with data metadata translations

Data Exchange – why?

2. When several institutions are working on a joint venture – a combined database is


Data exchange with data metadata translations

Data Exchange – why?

3. Refreshing and updating data base scheme


Data exchange with data metadata translations

Few problems with data exchange

  • The labels in the Source Schema and the values Target Schema could be very different

  • Data could be kept in a plethora of ways

    • For instance: Car price could be stored in Shekels and in U.S dollars

  • Data could be lost in the exchange process if the Source Schema and Target Schema don’t correspond well


Data exchange with data metadata translations

Data Exchange

  • In the past Data Exchange was done manually, taking many resources such as time and money.

  • Many researchers struggle with ways of improving data exchange


Data exchange with data metadata translations

Antique Car Dealership

Schema Buy-A-Wreck

cars

Car AGENTS

Schema Clunkers –R-Us

Clunker table


Data exchange with data metadata translations

Matching Examples

Schema Buy-A-Wreck

Schema Clunkers –R-Us


Data exchange with data metadata translations

Matching Examples

Schema Buy-A-Wreck

Schema Clunkers –R-Us


Data exchange with data metadata translations

Matching Examples

Schema Buy-A-Wreck

cars

Car AGENTS

Schema Clunkers –R-Us


Data exchange with data metadata translations

How do we match?

  • Creating mappings:

    • schema matching: find matches

    • create query expressions: for automated data translation or exchange

Create Query expressions

Schema

Matching


Data exchange with data metadata translations

Data Exchange

There may be no way to transform an instance given all of our constraints.

There may be numerous ways to transform the instance (possibly infinitely many).

We must identify and justify a best suited choice of solutions for our need.


Data exchange with data metadata translations

Data Exchange - Summery

To conclude:

Data exchange is exchanging data from a Source Schema to a Target Schema

It is a greatly dealt problem in the computerized world

Some Data exchange scenarios deal with Metadata

Source schema S

Target schema

T

S

T


Data exchange with data metadata translations

What is Metadata?

  • Metadata: Data on Data.

  • Metadata can come as:

    • Video

    • Audio

    • Image

    • Text


Data exchange with data metadata translations

Why Do we need Meta – Data?

Meta-Data helps us to understand data

Can anyone tell what these numbers mean?

Jan 120 223 89

Feb 83 168 56


Data exchange with data metadata translations

Why Do we need Meta – Data?

After adding Meta-Data…

Umbrella Sales

Month USA UK Italy

Jan 120 223 89

Feb 83 168 56


Data exchange with data metadata translations

Why Do we need Meta – Data?

We all know this picture…


Data exchange with data metadata translations

Why Do we need Meta – Data?

What is this picture all about?


Data exchange with data metadata translations

Why Do we need Meta – Data?

Sir Edward Carson signing the Ulster Covenant


Data exchange with data metadata translations

Why Do we need Meta – Data?


Data exchange with data metadata translations

Why Do we need Meta – Data?

Wall Street, New York City, New York.


Data exchange with data metadata translations

Data exchange scenarios may involve metadata transformations.

Data-Metadata Translations

  • Transforming the data in the Stock Ticker table to metadata in the Stock Quotes table is vital in the stock exchange world.

23


Data exchange with data metadata translations

Data-Metadata Translations

  • Mapping systems support Data-to-Data transformations with fixed schemas (Clio).

  • Goal: Extend mapping systems to support Data-Metadata Translations.


Data exchange with data metadata translations

Data Exchange Clio

  • One software developed for simple graphic data exchange is “Clio”

  • Clio corresponded values between the source scheme and the target scheme

  • However, the Clio solution did not provide answers for possible data exchange scenarios that involve Metadata

  • the solution involving Metadatais based on Clio


Data exchange with data metadata translations

Clio interface


Data exchange with data metadata translations

m1: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “USA” and

$t.units = $s.USA

Metadata-to-Data

How can we transform the following source data into the corresponding target?

Target.Sales

month country units

Jan USA 120

Jan UK 223

Jan Italy 89

Feb USA 83

Feb UK 168

Feb Italy 56

Source.Sales

month USA UK Italy

Jan 120 223 89

Feb 83 168 56

Schema mapping m1

Source: Rcd

Sales: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

“USA”

27


Data exchange with data metadata translations

m2: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “UK” and

$t.units = $s.UK

Metadata-to-Data

How can we transform the following source data into the corresponding target?

Target.Sales

month country units

Jan USA 120

Jan UK 223

Jan Italy 89

Feb USA 83

Feb UK 168

Feb Italy 56

Source.Sales

month USA UK Italy

Jan 120 223 89

Feb 83 168 56

m1: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “USA” and

$t.units = $s.USA

Schema mapping m2

Source: Rcd

Sales: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

“UK”

28


Data exchange with data metadata translations

m2: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “UK” and

$t.units = $s.UK

m3: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “Italy” and

$t.units = $s.Italy

Metadata-to-Data

How can we transform the following source data into the corresponding target?

Target.Sales

month country units

Jan USA 120

Jan UK 223

Jan Italy 89

Feb USA 83

Feb UK 168

Feb Italy 56

Source.Sales

month USA UK Italy

Jan 120 223 89

Feb 83 168 56

m1: for $s in Source.Sales

exists $t in Target.Sales

where $t.month = $s.month and

$t.country = “USA” and

$t.units = $s.USA

Schema mapping m3

Source: Rcd

Sales: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

“Italy”

29


Data exchange with data metadata translations

Metadata-to-Data: Our solution

MetadatA-Data (MAD) mapping:

Source: Rcd

Sales: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

Target.Sales

Jan USA 120

Jan UK 223

Jan Italy 89

Feb USA 83

Feb UK 168

Feb Italy 56

Source.Sales

Jan 120 223 89

Feb 83 168 56

countries

label

value

Select the elements to group

Placeholder

Copy elements’ labels

Copy elements’ values

for $s in Source.Sales,

$c in{“USA”, “UK”, “Italy”}

exists $t in Target.Sales

where $t.month = $s.month

and$t.country = $c

and$t.units = $s.($c)

Set of labels (strings)

Is a label value

Dynamic selection of the source element

30


Data exchange with data metadata translations

Data-to-Metadata

Now we want to support the opposite operation

The target schema depends on the source data

We define a target template: Nested Dynamic Output Schemas (ndos)

Source: Rcd

StockTicker: SetOf Rcd

time

symbol

price

Target: Rcd

Stockquotes: SetOf Rcd

time

symbols

label

value

Dynamic element

Run-time: The dynamic element defines the target instance and the target schema.

31


Data exchange with data metadata translations

Source: Rcd

StockTicker: SetOf Rcd

time

symbol

price

Target: Rcd

Stockquotes: SetOf Rcd

time

symbols

label

value

Data-to-Metadata: Heterogeneous records

Consider this mapping and this source instance:

Source Instance

There are two possible interpretations for the target ndos:

StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 )StockTicker (time: 0900, Symbol : IBM, Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 )

First alternative:

Heterogeneous target records

Computed Target Schema

Computed Target Instance

Target: Rcd

Stockquotes: SetOf Rcd

time

symbols:Choice

MSFT

IBM

Stockquotes (time: 0900, MSFT: 27.20 )Stockquotes (time: 0900, IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30 )


Data exchange with data metadata translations

Source: Rcd

StockTicker: SetOf Rcd

time

symbol

price

Target: Rcd

Stockquotes: SetOf Rcd

time

symbols

label

value

Data-to-Metadata: Homogenous records

Consider this mapping and this source instance:

Source Instance

There are two possible interpretations for the target ndos:

StockTicker (time: 0900, Symbol : MSFT Price: 27.20 )StockTicker (time: 0900, Symbol : IBM Price: 120.00 )StockTicker (time: 0905, Symbol : MSFT Price: 27.30 )

Second alternative:

Homogeneous target records

Computed Target Schema

Computed Target Instance

Target: Rcd

Stockquotes: SetOf Rcd

time

MSFT

IBM

Stockquotes (time: 0900, MSFT: 27.20, IBM: null )Stockquotes (time: 0900, MSFT: null , IBM: 120.00 )Stockquotes (time: 0905, MSFT: 27.30, IBM: null )


Data exchange with data metadata translations

Source: Rcd

StockTicker: SetOf Rcd

time

symbol

price

Target: Rcd

Stockquotes: SetOf Rcd

time

symbols

label

value

Data-to-Metadata: Homogenous records

Natural solution for semi-structured data models (XSD, DTD, JSON)

Stockquotes(time: 0900, MSFT : 27.20 )

Stockquotes(time: 0900, IBM : 120.00 )

Stockquotes(time: 0905, MSFT : 27.30 )

Homogeneity Constraint:

“For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2”

Stockquotes(time: 0900, MSFT : 27.20, IBM: null )Stockquotes(time: 0900, MSFT : null , IBM: 120.00)Stockquotes(time: 0905, MSFT : 27.30, IBM: null )

The Homogenous approach is a MAD improvemnet

34


Data exchange with data metadata translations

MAD Mapping

MetadatA-Data(MAD) mapping three steps:

  • Preliminary mapping

    • How do we map the Source schema to the Target schema

  • Preliminary mapping for <<D>> includes the metadata label and the value label of <<D>>.


Data exchange with data metadata translations

Source.Sales

month country units

Jan USA 120

Jan UK 223

Jan Italy 89

Feb USA 83

Feb UK 168

Feb Italy 56

Target.Sales

month USA UK Italy

Jan 120 223 89

Feb 83 168 56

36

{ $x1  Source.SalesByCountries,

$x2<<countries>>; $x3=$x1.($x2) }

PreliminaryMapping

Source: Rcd

SalesByCountries: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

Label Value Transfer

countries

label

value


Data exchange with data metadata translations

MAD Mapping

  • Skeletons:

    • n x m matrix of skeletons is constructed for the set of source preliminary mapping and the set of target preliminary mapping while each entry(i,j) can be potential mapping.

  • Creating MAD Mapping:

    • At this stage, the value correspondences need to be matched against the preliminary mapping in order to factor them into the appropriate skeletons.

Source.Sales.country Target.CountrySales.country

Matched against one or more source mappings

Matched against one or more target mappings

37


Data exchange with data metadata translations

MAD Mapping Generation Example

Source : { $x1  Source.SalesByCountry, $x2<<countries>>; $x3:=$x1.($x2) }

Target : { $y1  Target.Sales}

Source.SalesByCountries.<<countries>>  Target.Sales.country

Source.SalesByCountries.&<<countries>>  Target.Sales.units

Source: Rcd

SalesByCountry: SetOf Rcd

month

USA

UK

Italy

Target: Rcd

Sales: SetOf Rcd

month

country

units

countries

label

value


Data exchange with data metadata translations

MAD vs Clio

Data exchange with data-metadata support: Data to Data is a special case

  • New construct to iterate over elements’ labels: placeholder

  • Target schema can be incomplete: nested dynamic output schema (ndos)

GUI

Source schema S

Target schema T

Declarative (internal) representation

  • New mapping & query generation algorithms

Executable code (XSLT, XQuery, Java)


Data exchange with data metadata translations

Fin.

40


  • Login