DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either). Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us. Assessment vs. Measurement. Measurement More objective
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
DQ Assessment and Measurement (and not a trace of trust (nor trust in traces) and no truth either)
Gunter, Cinzia, Vipul, Felix, and one other that did not come and two others from the Scientific DB that chose to ignore us
Assumption of soundness and completeness
Private DQ Interpretation
+ DQ vectors
Data Acquisition / Workflow
Past results (Traces)
Includes data & metadata
The data itself
Subsets / partitions
Data-oriented (objective in nature?)
We need some
Which criteria need them?
Size of the world for completeness
Online; user-centric; subjective
Offline; data-centric; objective
Willingness and ability of giving input
GeneDB: Ability is often lacking
E.g. Schema Evolvability
Noisy data intrinsic to GeneDB
Scientific DB: Announcements
DBLP: Unknown (less in summer)
Trust and Reputation
Identification of domain hinders assessment
GeneDB unable to assess (or define)
DBLP: Mentions scripts
Criteria for GeneDB
Criteria for DBLP
Schema and data stability
Trust is important (Oops)
Usage as source
Usage within a workflow
Default DQ requirements
Usage as a tool, not as a source
Thus, hardly DQ requirements
Costs rejection of paper
Default DQ requirements / assumptions
High Quality Integrated Geneomic Data into your face!
Availability: Percentage of time an information source is “up”.
Accuracy: Quotient of the number of correct values in the source and the overall number of values in the source.
Amount of data: Size of result.
Believability: Degree to which the information is accepted as correct.
Completeness: Quotient of the number of response items and the number of real world items.
Concise representation: Degree to which the structure of the information matches the information itself.
Consistent representation: Degree to which the structure of the information conforms to that of other sources.
Customer support: Amount and usefulness of online support through text, email, phone etc.
Documentation: Amount and usefulness of documents with meta information.
Interpretability: Degree to which the information conforms to technical ability of the consumer.
Latency: Amount of time until first information reaches user.
Objectivity: Degree to which information is unbiased and impartial.
Price: Monetary charge per query.
Relevancy: Degree to which information satisfies the users need.
Reliability: Degree to which the user can trust the information.
Reputation: Degree to which the information or its source is in high standing.
Response time: Amount of time until complete response reaches the user.
Security: Degree to which information is passed privately from user to information source and back.
Timeliness: Age of information.
Understandability: Degree to which the information can be comprehended by the user.
Value-added: Amount of benefit the use of the information provides.
Verifiability: Degree and ease with which the information can be checked for correctness.