Dq assessment and measurement and not a trace of trust nor trust in traces and no truth either
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either) PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either). Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us. Assessment vs. Measurement. Measurement More objective

Download Presentation

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dq assessment and measurement and not a trace of trust nor trust in traces and no truth either

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us


Assessment vs measurement

Assessment vs. Measurement

  • Measurement

    • More objective

    • From inside the data

    • Uses a metric

  • Assessment

    • May have more subjective parts

    • From the outside

    • Entire process, of which measurement is one part

    • Measurement is part of the assessment-output


Iq assessment is difficult

IQ-Assessment is difficult

  • IQ-criteria are often of subjective nature

  • Sources do not publish useful IQ-metadata.

  • Sources take measures to hinder IQ-assessment.

  • Enormous amount of data - Sampling

  • Subject to changes in content and quality


Architectural levels relevant to assessment

Architectural levels relevant to assessment

  • Sources

  • Wrapper

  • Mediated Schema

  • Mappings

  • Query decomposition

  • Result composition (process)

  • Integrated result at user/app

Assumption of soundness and completeness


The big picture

DQ requirem.

DQ requirement

Wrapper 1

Wrapper 2

Wrapper 3

The big picture

Application 1

User 2

User profiles

Private DQ Interpretation

DQ Interpretation

DQ Interpretation

Also subjective

Mediator

+ DQ vectors

DQ feedback

Data Acquisition / Workflow

DQ Assessment

objective

X

Source 1

Source 2

Source 3


Assessment independence

Assessment independence

  • In analogy to data independence

  • Different apps have different interpretations of DQ

  • Separation of

    • application-independent assessment

    • User/App-dependent interpretation


Dq assessment

Data source

Past results (Traces)

Includes data & metadata

The data itself

Granularity

Subsets / partitions

Integration result

Mapping

Transformation

Aggregation

Data-oriented (objective in nature?)

Offline

Defaults

We need some

Ground estimations,

Which criteria need them?

Size of the world for completeness

Refinement

DQ assessment


Dq interpretation

DQ Interpretation

  • User or Application

    • Feedback

    • User profiles

  • Quality-requirements

    • Hard: Company-specification; ISO

    • Soft: User requirements

  • Query / Usage

    • Quality-requirements

    • Does it select certain subsets of sources

  • Online

  • User/App-oriented (subjective in nature?)

  • Defaults

    • We definitely need them as a guide and for initialization (bootstrapping)

    • Refinement


Private dq interpretation

Private DQ Interpretation

  • Performed User

  • Results in certain actions

    • Exclude a source

    • Invest more Money/Time

    • Rewrite a query

    • Give up

    • Change parameters

    • Search for new sources


Old slide an assessment oriented classification

Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:An assessment-oriented classification

  • Process-criteria

    • amount

    • availability

    • consistent representation

    • latency

    • response time


Old slide how does it fit not really

Online; user-centric; subjective

Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Offline; data-centric; objective

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:How does it fit? Not really…

  • Offline; data-centric; objective

  • Process-criteria

    • amount

    • availability

    • consistent representation

    • latency

    • response time


Output of assessment and interpretation

Output of Assessment and Interpretation

  • Numbers (DQ values)

    • in Vectors

    • Single Values

    • Rankings

    • Units / precision

  • Categories

    • Good/bad. etc.

  • Explanations

    • Trace


Doubts

Doubts

  • Can everything app-specific be done during interpretation?

  • In other words: Is a single assessment enough?

  • If not: Is a simple parameterization enough?

  • Is MS really doing a job in automatically naming files?


Comparison genedb not the same as genedb org vs dblp

Comparison: GeneDB (not the same as GeneDB.org) vs. DBLP

  • Dimensions of Comparison

    • Input to DQ Assessment

    • DQ Criteria

    • DQ Interpretation

      • Requirements

        • User

        • App


Input to dq assessment comparison

Willingness and ability of giving input

GeneDB: Ability is often lacking

E.g. Schema Evolvability

Accuracy

Noisy data intrinsic to GeneDB

Up-to-Dateness

Scientific DB: Announcements

DBLP: Unknown (less in summer)

Trust and Reputation

Already here?

Completeness

Mostly willing

Identification of domain hinders assessment

Duplicates

GeneDB unable to assess (or define)

DBLP: Mentions scripts

Obvious

Ontology

Availability

Response Time

Input to DQ Assessment (Comparison)


Dq criteria

Criteria for GeneDB

Reputation/Trust/Believability

Schema evolvability

Ontologies

Up-to-Dateness (1week)

Lineage

Criteria for DBLP

Response Time

Understandability

Completeness

Schema and data stability

DQ criteria

  • Criteria for neither

    • Up-to-Dateness (1day,1sec)

    • Availability

  • Criteria for both

    • Completeness

    • Accuracy

    • Duplicates


Dq interpretation comparison

GeneDB

Trust is important (Oops)

Usage as source

Usage within a workflow

Hard requirements

Costs moneys

Costs lives

Default DQ requirements

DBLP

Usage as a tool, not as a source

Thus, hardly DQ requirements

Costs rejection of paper

Default DQ requirements / assumptions

DQ Interpretation (Comparison)


Dq assessment and measurement and not a trace of trust nor trust in traces and no truth either

GeneFlow.com

TM

High Quality Integrated Geneomic Data into your face!


Iq criteria

Availability: Percentage of time an information source is “up”.

Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source.

Amount of data: Size of result.

Believability: Degree to which the information is accepted as correct.

Completeness: Quotient of the number of response items and the number of real world items.

Concise representation: Degree to which the structure of the information matches the information itself.

Consistent representation: Degree to which the structure of the information conforms to that of other sources.

Customer support: Amount and usefulness of online support through text, email, phone etc.

Documentation: Amount and usefulness of documents with meta information.

Interpretability: Degree to which the information conforms to technical ability of the consumer.

Latency: Amount of time until first information reaches user.

IQ-Criteria


Iq criteria1

Objectivity: Degree to which information is unbiased and impartial.

Price: Monetary charge per query.

Relevancy: Degree to which information satisfies the users need.

Reliability: Degree to which the user can trust the information.

Reputation: Degree to which the information or its source is in high standing.

Response time: Amount of time until complete response reaches the user.

Security: Degree to which information is passed privately from user to information source and back.

Timeliness: Age of information.

Understandability: Degree to which the information can be comprehended by the user.

Value-added: Amount of benefit the use of the information provides.

Verifiability: Degree and ease with which the information can be checked for correctness.

IQ-Criteria


  • Login