Query driven data completeness management
1 / 16

Query- driven Data Completeness Management - PowerPoint PPT Presentation

  • Uploaded on

Query- driven Data Completeness Management. Simon Razniewski Supervised by Werner Nutt. Area: Data Quality/Decision Support. Data Quality research investigates how good data is Dimensions of Data Quality are: Correctness Timeliness Completeness.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Query- driven Data Completeness Management' - tevy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Query driven data completeness management

Query-driven Data Completeness Management

Simon Razniewski

Supervisedby Werner Nutt

Area data quality decision support
Area: Data Quality/Decision Support

  • Data Quality research investigates how good data is

  • Dimensions of Data Quality are:

    • Correctness

    • Timeliness

    • Completeness

Query-driven Data Completeness Management

Example scenario school data management in south tyrol
Example Scenario: School Data Management in South Tyrol

Central school database Statistical reports


Notoriously incomplete Completeness important

Query-driven Data Completeness Management

Example final grades
Example: Final Grades

  • Vocational schools enter final grades, many others don‘t

  • Query: How many pupils in class 12 have ‘A‘ in Math?

  • Answer: 3400

  • Can we trust this?

    • Pupils from high schools could be missing in the result


Query-driven Data Completeness Management

Example final grades 2
Example: Final Grades (2)

  • Vocationalschoolsenter final grades, manyothersdon‘t

  • Query: Howmanypupilsatvocationalschoolsin class 12

    have ‘A‘ in Math?

  • Answer: 1700

  • Can wetrustthis?

    • All grades fromvocationalschoolsare in thedatabase


Query-driven Data Completeness Management

Research questions
Research Questions

  • How can completeness information be stored in a database?

  • How can one find out whether query answers are complete (and correct)?

  • Where can completeness information come from?

Query-driven Data Completeness Management

Where else is incompleteness a problem
Where Else is Incompleteness a Problem?

  • … whenmanyusercontributeto a database

    Openstreetmap, Wikipedia (?)

  • … whendatasubmissionis optional

    Data fromsurveys

  • … whendatafrom different sourcesisintegrated

    Biological databases

  • … whenthe real worldchangeswithoutinformingthedatabase


Query-driven Data Completeness Management

Abstract problem
Abstract Problem

Database, where in some parts data may be missing (grey)

Query to the database

  • Can we trust the query answer?

  • Does the query only touch the complete (green) parts of the database?

  • If not, where does it touch grey parts?

  • How could we modify the query to touch only green parts?

Query-driven Data Completeness Management


This dataiscomplete

I querythispartofthedatabase. Can I trusttheanswer?



Youcannot, because…

But youcould..





Derivation of statements from business process analysis


Reasoning procedures


Implementation techniques

Query-driven Data Completeness Management

Approach describe complete parts by queries
Approach: Describe Complete Parts by Queries

pupil(name, class, school_name, school_type)

grade(name, subject, value)

  • Complete: All grades ofpupilsfromvocationalschools:

    QCgrades(n,s,v) :- grade(n,s,v), pupil(n,c,sn,‘vocational‘)

  • Complete: All pupils

    QCpupils (n,c,sn,st) :- pupil(n,c,sn,st)

Query-driven Data Completeness Management

Approach compare queries with complete queries
Approach: Compare Queries with Complete Queries

Query: All pupils in class 12 with ‘A‘ in Math

Qgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st)

QCgoodmath(n) :- grade(n,‘Math‘,‘A‘), pupil(n,12,sn,st)


Reasoning: Are Qgoodmath and Qcgoodmathequivalent?

Query-driven Data Completeness Management

Done so far
Done so Far

  • Problem 1: Statements about database completeness and

    query completeness [LID2011]

  • Problem 2: Reasoning tasks + algorithms + complexity


  • Problem 3: Map reasoning tasks to satisfaction modulo theories (SMT)

Query-driven Data Completeness Management


  • Identify core problems and doable steps

    • Schema constraints

    • Nulls

    • Business Processes

    • Probabilistic Reasoning

    • XML-databases

Query-driven Data Completeness Management


  • Checking Query CompletenessoverIncomplete Databases, Simon Razniewskiand Werner Nutt, Workshop on Logic in Databases, 2011

  • CompletenessofQueriesoverIncomplete Databases, Simon Razniewskiand Werner Nutt, International Conference on Very Large Databases, 2011

  • Submitted:Incomplete Databases: Missing Records andMissing Values, Werner Nutt, Simon Razniewskiand Gil Vegliach, Workshop on Data Quality in Data Integration Systems, 2012

Query-driven Data Completeness Management

Thank you.

Query-driven Data Completeness Management

Related work
Related Work

  • Motro 1989: Introduced important concepts for describing database completeness

  • Levy 1996: Introduced a central reasoning problem about data completeness

  • Fan, Geerts 2009: Worked on a similar problem of data completeness for master data

Query-driven Data Completeness Management