Dq assessment and measurement and not a trace of trust nor trust in traces and no truth either
Sponsored Links
This presentation is the property of its rightful owner.
1 / 20

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either) PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either). Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us. Assessment vs. Measurement. Measurement More objective

Download Presentation

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us


Assessment vs. Measurement

  • Measurement

    • More objective

    • From inside the data

    • Uses a metric

  • Assessment

    • May have more subjective parts

    • From the outside

    • Entire process, of which measurement is one part

    • Measurement is part of the assessment-output


IQ-Assessment is difficult

  • IQ-criteria are often of subjective nature

  • Sources do not publish useful IQ-metadata.

  • Sources take measures to hinder IQ-assessment.

  • Enormous amount of data - Sampling

  • Subject to changes in content and quality


Architectural levels relevant to assessment

  • Sources

  • Wrapper

  • Mediated Schema

  • Mappings

  • Query decomposition

  • Result composition (process)

  • Integrated result at user/app

Assumption of soundness and completeness


DQ requirem.

DQ requirement

Wrapper 1

Wrapper 2

Wrapper 3

The big picture

Application 1

User 2

User profiles

Private DQ Interpretation

DQ Interpretation

DQ Interpretation

Also subjective

Mediator

+ DQ vectors

DQ feedback

Data Acquisition / Workflow

DQ Assessment

objective

X

Source 1

Source 2

Source 3


Assessment independence

  • In analogy to data independence

  • Different apps have different interpretations of DQ

  • Separation of

    • application-independent assessment

    • User/App-dependent interpretation


Data source

Past results (Traces)

Includes data & metadata

The data itself

Granularity

Subsets / partitions

Integration result

Mapping

Transformation

Aggregation

Data-oriented (objective in nature?)

Offline

Defaults

We need some

Ground estimations,

Which criteria need them?

Size of the world for completeness

Refinement

DQ assessment


DQ Interpretation

  • User or Application

    • Feedback

    • User profiles

  • Quality-requirements

    • Hard: Company-specification; ISO

    • Soft: User requirements

  • Query / Usage

    • Quality-requirements

    • Does it select certain subsets of sources

  • Online

  • User/App-oriented (subjective in nature?)

  • Defaults

    • We definitely need them as a guide and for initialization (bootstrapping)

    • Refinement


Private DQ Interpretation

  • Performed User

  • Results in certain actions

    • Exclude a source

    • Invest more Money/Time

    • Rewrite a query

    • Give up

    • Change parameters

    • Search for new sources


Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:An assessment-oriented classification

  • Process-criteria

    • amount

    • availability

    • consistent representation

    • latency

    • response time


Online; user-centric; subjective

Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Offline; data-centric; objective

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:How does it fit? Not really…

  • Offline; data-centric; objective

  • Process-criteria

    • amount

    • availability

    • consistent representation

    • latency

    • response time


Output of Assessment and Interpretation

  • Numbers (DQ values)

    • in Vectors

    • Single Values

    • Rankings

    • Units / precision

  • Categories

    • Good/bad. etc.

  • Explanations

    • Trace


Doubts

  • Can everything app-specific be done during interpretation?

  • In other words: Is a single assessment enough?

  • If not: Is a simple parameterization enough?

  • Is MS really doing a job in automatically naming files?


Comparison: GeneDB (not the same as GeneDB.org) vs. DBLP

  • Dimensions of Comparison

    • Input to DQ Assessment

    • DQ Criteria

    • DQ Interpretation

      • Requirements

        • User

        • App


Willingness and ability of giving input

GeneDB: Ability is often lacking

E.g. Schema Evolvability

Accuracy

Noisy data intrinsic to GeneDB

Up-to-Dateness

Scientific DB: Announcements

DBLP: Unknown (less in summer)

Trust and Reputation

Already here?

Completeness

Mostly willing

Identification of domain hinders assessment

Duplicates

GeneDB unable to assess (or define)

DBLP: Mentions scripts

Obvious

Ontology

Availability

Response Time

Input to DQ Assessment (Comparison)


Criteria for GeneDB

Reputation/Trust/Believability

Schema evolvability

Ontologies

Up-to-Dateness (1week)

Lineage

Criteria for DBLP

Response Time

Understandability

Completeness

Schema and data stability

DQ criteria

  • Criteria for neither

    • Up-to-Dateness (1day,1sec)

    • Availability

  • Criteria for both

    • Completeness

    • Accuracy

    • Duplicates


GeneDB

Trust is important (Oops)

Usage as source

Usage within a workflow

Hard requirements

Costs moneys

Costs lives

Default DQ requirements

DBLP

Usage as a tool, not as a source

Thus, hardly DQ requirements

Costs rejection of paper

Default DQ requirements / assumptions

DQ Interpretation (Comparison)


GeneFlow.com

TM

High Quality Integrated Geneomic Data into your face!


Availability: Percentage of time an information source is “up”.

Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source.

Amount of data: Size of result.

Believability: Degree to which the information is accepted as correct.

Completeness: Quotient of the number of response items and the number of real world items.

Concise representation: Degree to which the structure of the information matches the information itself.

Consistent representation: Degree to which the structure of the information conforms to that of other sources.

Customer support: Amount and usefulness of online support through text, email, phone etc.

Documentation: Amount and usefulness of documents with meta information.

Interpretability: Degree to which the information conforms to technical ability of the consumer.

Latency: Amount of time until first information reaches user.

IQ-Criteria


Objectivity: Degree to which information is unbiased and impartial.

Price: Monetary charge per query.

Relevancy: Degree to which information satisfies the users need.

Reliability: Degree to which the user can trust the information.

Reputation: Degree to which the information or its source is in high standing.

Response time: Amount of time until complete response reaches the user.

Security: Degree to which information is passed privately from user to information source and back.

Timeliness: Age of information.

Understandability: Degree to which the information can be comprehended by the user.

Value-added: Amount of benefit the use of the information provides.

Verifiability: Degree and ease with which the information can be checked for correctness.

IQ-Criteria


  • Login