Query driven data completeness management
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Query- driven Data Completeness Management PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Query- driven Data Completeness Management. Simon Razniewski Supervised by Werner Nutt. Area: Data Quality/Decision Support. Data Quality research investigates how good data is Dimensions of Data Quality are: Correctness Timeliness Completeness.

Download Presentation

Query- driven Data Completeness Management

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Query driven data completeness management

Query-driven Data Completeness Management

Simon Razniewski

Supervisedby Werner Nutt


Area data quality decision support

Area: Data Quality/Decision Support

  • Data Quality research investigates how good data is

  • Dimensions of Data Quality are:

    • Correctness

    • Timeliness

    • Completeness

Query-driven Data Completeness Management


Example scenario school data management in south tyrol

Example Scenario: School Data Management in South Tyrol

Central school database Statistical reports

??

Notoriously incomplete Completeness important

Query-driven Data Completeness Management


Example final grades

Example: Final Grades

  • Vocational schools enter final grades, many others don‘t

  • Query: How many pupils in class 12 have ‘A‘ in Math?

  • Answer: 3400

  • Can we trust this?

    • Pupils from high schools could be missing in the result

No!

Query-driven Data Completeness Management


Example final grades 2

Example: Final Grades (2)

  • Vocationalschoolsenter final grades, manyothersdon‘t

  • Query: Howmanypupilsatvocationalschoolsin class 12

    have ‘A‘ in Math?

  • Answer: 1700

  • Can wetrustthis?

    • All grades fromvocationalschoolsare in thedatabase

Yes!

Query-driven Data Completeness Management


Research questions

Research Questions

  • How can completeness information be stored in a database?

  • How can one find out whether query answers are complete (and correct)?

  • Where can completeness information come from?

Query-driven Data Completeness Management


Where else is incompleteness a problem

Where Else is Incompleteness a Problem?

  • … whenmanyusercontributeto a database

    Openstreetmap, Wikipedia (?)

  • … whendatasubmissionis optional

    Data fromsurveys

  • … whendatafrom different sourcesisintegrated

    Biological databases

  • … whenthe real worldchangeswithoutinformingthedatabase

    Addressdata

Query-driven Data Completeness Management


Abstract problem

Abstract Problem

Database, where in some parts data may be missing (grey)

Query to the database

  • Can we trust the query answer?

  • Does the query only touch the complete (green) parts of the database?

  • If not, where does it touch grey parts?

  • How could we modify the query to touch only green parts?

Query-driven Data Completeness Management


Approach

Approach

This dataiscomplete

I querythispartofthedatabase. Can I trusttheanswer?

Database

…andthis

Youcannot, because…

But youcould..

Andthis!

1

Formalismforstatementsaboutcompleteness

4

Derivation of statements from business process analysis

2

Reasoning procedures

3

Implementation techniques

Query-driven Data Completeness Management


Approach describe complete parts by queries

Approach: Describe Complete Parts by Queries

pupil(name, class, school_name, school_type)

grade(name, subject, value)

  • Complete: All grades ofpupilsfromvocationalschools:

    QCgrades(n,s,v) :- grade(n,s,v), pupil(n,c,sn,‘vocational‘)

  • Complete: All pupils

    QCpupils (n,c,sn,st) :- pupil(n,c,sn,st)

Query-driven Data Completeness Management


Approach compare queries with complete queries

Approach: Compare Queries with Complete Queries

Query: All pupils in class 12 with ‘A‘ in Math

Qgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st)

QCgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st)

pupil(n,c,sn,‘vocational‘)

Reasoning: Are Qgoodmath and Qcgoodmathequivalent?

Query-driven Data Completeness Management


Done so far

Done so Far

  • Problem 1: Statements about database completeness and

    query completeness [LID2011]

  • Problem 2: Reasoning tasks + algorithms + complexity

    [VLDB2011]

  • Problem 3: Map reasoning tasks to satisfaction modulo theories (SMT)

Query-driven Data Completeness Management


Challenge

Challenge

  • Identify core problems and doable steps

    • Schema constraints

    • Nulls

    • Business Processes

    • Probabilistic Reasoning

    • XML-databases

Query-driven Data Completeness Management


Publications

Publications

  • Checking Query CompletenessoverIncomplete Databases, Simon Razniewskiand Werner Nutt, Workshop on Logic in Databases, 2011

  • CompletenessofQueriesoverIncomplete Databases, Simon Razniewskiand Werner Nutt, International Conference on Very Large Databases, 2011

  • Submitted:Incomplete Databases: Missing Records andMissing Values, Werner Nutt, Simon Razniewskiand Gil Vegliach, Workshop on Data Quality in Data Integration Systems, 2012

Query-driven Data Completeness Management


Query driven data completeness management

Thank you.

Query-driven Data Completeness Management


Related work

Related Work

  • Motro 1989: Introduced important concepts for describing database completeness

  • Levy 1996: Introduced a central reasoning problem about data completeness

  • Fan, Geerts 2009: Worked on a similar problem of data completeness for master data

Query-driven Data Completeness Management


  • Login