dq assessment and measurement and not a trace of trust nor trust in traces and no truth either
Download
Skip this Video
Download Presentation
DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

Loading in 2 Seconds...

play fullscreen
1 / 20

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either) - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either). Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us. Assessment vs. Measurement. Measurement More objective

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)' - ivan


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dq assessment and measurement and not a trace of trust nor trust in traces and no truth either

DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)

Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us

assessment vs measurement
Assessment vs. Measurement
  • Measurement
    • More objective
    • From inside the data
    • Uses a metric
  • Assessment
    • May have more subjective parts
    • From the outside
    • Entire process, of which measurement is one part
    • Measurement is part of the assessment-output
iq assessment is difficult
IQ-Assessment is difficult
  • IQ-criteria are often of subjective nature
  • Sources do not publish useful IQ-metadata.
  • Sources take measures to hinder IQ-assessment.
  • Enormous amount of data - Sampling
  • Subject to changes in content and quality
architectural levels relevant to assessment
Architectural levels relevant to assessment
  • Sources
  • Wrapper
  • Mediated Schema
  • Mappings
  • Query decomposition
  • Result composition (process)
  • Integrated result at user/app

Assumption of soundness and completeness

the big picture

DQ requirem.

DQ requirement

Wrapper 1

Wrapper 2

Wrapper 3

The big picture

Application 1

User 2

User profiles

Private DQ Interpretation

DQ Interpretation

DQ Interpretation

Also subjective

Mediator

+ DQ vectors

DQ feedback

Data Acquisition / Workflow

DQ Assessment

objective

X

Source 1

Source 2

Source 3

assessment independence
Assessment independence
  • In analogy to data independence
  • Different apps have different interpretations of DQ
  • Separation of
    • application-independent assessment
    • User/App-dependent interpretation
dq assessment
Data source

Past results (Traces)

Includes data & metadata

The data itself

Granularity

Subsets / partitions

Integration result

Mapping

Transformation

Aggregation

Data-oriented (objective in nature?)

Offline

Defaults

We need some

Ground estimations,

Which criteria need them?

Size of the world for completeness

Refinement

DQ assessment
dq interpretation
DQ Interpretation
  • User or Application
    • Feedback
    • User profiles
  • Quality-requirements
    • Hard: Company-specification; ISO
    • Soft: User requirements
  • Query / Usage
    • Quality-requirements
    • Does it select certain subsets of sources
  • Online
  • User/App-oriented (subjective in nature?)
  • Defaults
    • We definitely need them as a guide and for initialization (bootstrapping)
    • Refinement
private dq interpretation
Private DQ Interpretation
  • Performed User
  • Results in certain actions
    • Exclude a source
    • Invest more Money/Time
    • Rewrite a query
    • Give up
    • Change parameters
    • Search for new sources
old slide an assessment oriented classification
Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:An assessment-oriented classification
  • Process-criteria
    • amount
    • availability
    • consistent representation
    • latency
    • response time
old slide how does it fit not really
Online; user-centric; subjective

Subject-criteria

believability

concise representation

interpretability

relevancy

reputation

underst.

value-added

Offline; data-centric; objective

Object-criteria

accuracy

completeness

cust. support

document.

objectivity

price

reliability

security

timeliness

verifiability

OLD SLIDE:How does it fit? Not really…
  • Offline; data-centric; objective
  • Process-criteria
    • amount
    • availability
    • consistent representation
    • latency
    • response time
output of assessment and interpretation
Output of Assessment and Interpretation
  • Numbers (DQ values)
    • in Vectors
    • Single Values
    • Rankings
    • Units / precision
  • Categories
    • Good/bad. etc.
  • Explanations
    • Trace
doubts
Doubts
  • Can everything app-specific be done during interpretation?
  • In other words: Is a single assessment enough?
  • If not: Is a simple parameterization enough?
  • Is MS really doing a job in automatically naming files?
comparison genedb not the same as genedb org vs dblp
Comparison: GeneDB (not the same as GeneDB.org) vs. DBLP
  • Dimensions of Comparison
    • Input to DQ Assessment
    • DQ Criteria
    • DQ Interpretation
      • Requirements
        • User
        • App
input to dq assessment comparison
Willingness and ability of giving input

GeneDB: Ability is often lacking

E.g. Schema Evolvability

Accuracy

Noisy data intrinsic to GeneDB

Up-to-Dateness

Scientific DB: Announcements

DBLP: Unknown (less in summer)

Trust and Reputation

Already here?

Completeness

Mostly willing

Identification of domain hinders assessment

Duplicates

GeneDB unable to assess (or define)

DBLP: Mentions scripts

Obvious

Ontology

Availability

Response Time

Input to DQ Assessment (Comparison)
dq criteria
Criteria for GeneDB

Reputation/Trust/Believability

Schema evolvability

Ontologies

Up-to-Dateness (1week)

Lineage

Criteria for DBLP

Response Time

Understandability

Completeness

Schema and data stability

DQ criteria
  • Criteria for neither
    • Up-to-Dateness (1day,1sec)
    • Availability
  • Criteria for both
    • Completeness
    • Accuracy
    • Duplicates
dq interpretation comparison
GeneDB

Trust is important (Oops)

Usage as source

Usage within a workflow

Hard requirements

Costs moneys

Costs lives

Default DQ requirements

DBLP

Usage as a tool, not as a source

Thus, hardly DQ requirements

Costs rejection of paper

Default DQ requirements / assumptions

DQ Interpretation (Comparison)
slide18

GeneFlow.com

TM

High Quality Integrated Geneomic Data into your face!

iq criteria
Availability: Percentage of time an information source is “up”.

Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source.

Amount of data: Size of result.

Believability: Degree to which the information is accepted as correct.

Completeness: Quotient of the number of response items and the number of real world items.

Concise representation: Degree to which the structure of the information matches the information itself.

Consistent representation: Degree to which the structure of the information conforms to that of other sources.

Customer support: Amount and usefulness of online support through text, email, phone etc.

Documentation: Amount and usefulness of documents with meta information.

Interpretability: Degree to which the information conforms to technical ability of the consumer.

Latency: Amount of time until first information reaches user.

IQ-Criteria
iq criteria1
Objectivity: Degree to which information is unbiased and impartial.

Price: Monetary charge per query.

Relevancy: Degree to which information satisfies the users need.

Reliability: Degree to which the user can trust the information.

Reputation: Degree to which the information or its source is in high standing.

Response time: Amount of time until complete response reaches the user.

Security: Degree to which information is passed privately from user to information source and back.

Timeliness: Age of information.

Understandability: Degree to which the information can be comprehended by the user.

Value-added: Amount of benefit the use of the information provides.

Verifiability: Degree and ease with which the information can be checked for correctness.

IQ-Criteria
ad