Accommodating data heterogeneity in uls systems
Download
1 / 10

Accommodating Data Heterogeneity in ULS Systems - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Accommodating Data Heterogeneity in ULS Systems. Christopher Scaffidi Mary Shaw Carnegie Mellon University. Problem: Data heterogeneity among software elements in ULS systems. Software elements: Created by autonomous stakeholders Differing data formats

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Accommodating Data Heterogeneity in ULS Systems' - baba


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Accommodating data heterogeneity in uls systems

Accommodating Data Heterogeneityin ULS Systems

Christopher Scaffidi

Mary Shaw

Carnegie Mellon University


Problem data heterogeneity among software elements in uls systems
Problem: Data heterogeneityamong software elements in ULS systems

  • Software elements:

    • Created by autonomous stakeholders

    • Differing data formats

    • May switch to new formats without prior notice

  • End-user programmers:

    • Create particularly unreliable software elements

    • “Mash up” (integrate) software elements

problem approach  proof-of-concept


Example exchanging person names
Example: Exchanging person names

Similar issues for data from users, external datasets, or the web.

John Smith today

Smith, John tomorrow – unexpected format!unanticipated need for “glue code” to reformat

Lincolnshire MCC tomorrow – questionable!need to validate data, maybe trigger fail-over

problem approach  proof-of-concept


Other examples of data format heterogeneity
Other examples ofdata format heterogeneity

  • Room Numbers

    • NSH 3103 vs Newell Simon Hall 3103

  • Stocks

    • GOOG vs Google vs Google Corporation

  • Address Lines

    • 101 Main St.vs 101 MAIN STREETvs 101 Main Str.

  • Phone Numbers

    • 888-800-2030 vs +1 888 800 2030 vs (888) 800-2030

  • State Names

    • California vs CA vs Calif.

problem approach  proof-of-concept


Insight exchange kinds of data rather than particular formats
Insight: Exchange kinds of data(rather than particular formats)

John Smith303-202-3030

101 Main St.

Pittsburgh, PA

RAY TILL(404) 555-1203

2 PITT ST

PGH, Penna.

Doe, Jane+1 717 292 3030

88 Brooke Lane

PITTSBURGH

Pennsylvania

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

MR. ART COR282.303.4040

15 RED RUN RD.

pittsburgh PA

JOHN SMITH(303) 202-3030

101 MAIN ST

Pittsburgh, PA

problem  approach  proof-of-concept


Insight exchange kinds of data rather than particular formats1
Insight: Exchange kinds of data(rather than particular formats)

  • Needed: Metadata indicating a reusable abstraction for validating and reformatting each kind of string-like data.

    • “I am sending you a string that I call a ‘phone number’, and here’s the code to validate it and reformat it”

problem  approach  proof-of-concept


Proof of concept exchanging xml and html
Proof of concept: Exchanging XML and HTML

  • Data providers label XML/HTML nodes with a “tope”

    • “This node is what I call a ‘phone number’, and here’s where you can find code to validate and reformat it.”

  • Each tope’s implementation is stored at a published URL

  • On receiving data, a system

    • Downloads the tope implementation

    • Executes it to validate and put data into desired format

problem  approach  proof-of-concept


Sample code
Sample code

XML

<!-- topesheet = http://softwaresurvey.cs.cmu.edu/topes.txt -->

<mydoc><whatever>

<tel>233-222-3040</tel><date>11-Jan-96</date>

<tel>(203)484-2030</tel><date>12/30/2007</date>

</whatever></mydoc>

TopeSheet

xpath:/mydoc/whatever/date{tope:url(http://www.w3c.org/topes/date_EN.xml);}

xpath:/mydoc/whatever/tel{tope:url(http://myserver.com/custom_tel.xml);}

Client Code

ItemLoader loader = ItemLoader.FromXml(xml);

ItemSet items = loader.Load("xpath:/*/tel");

List<String> values = items.FormatAs("+1 404 505 6060");

// overloaded methods let you override the topes and/or validate the data

problem  approach  proof-of-concept


Benefits of labeling strings with topes
Benefits of labeling strings with topes

  • Systems can detect invalid inputs

  • Software elements can use varying formats

    • No explicit references to format identifiers

    • No need for ontology consensus

  • Topes are reusable for data in…

    • XML nodes  Database tuples

    • HTML tags  Webform fields

    • Spreadsheet cells  …and more

problem  approach  proof-of-concept


Thank you
Thank You…

  • To Jeff Magee, Betty Cheng, Barbara Ryder, Margaret Burnett, and others at ICSE 2007 for early feedback

  • To NSF for funding

  • To ULSSIS for this opportunity to participate


ad