html5-img
1 / 14

Problem Statement and Objectives

Kai-Uwe Sattler, Michael Gertz, Vipul Kashyap, Cai Ziegler, Cinzia Cappiello, Susanne Boll Dagstuhl Seminar “Data Quality on the Web”. Problem Statement and Objectives. What is the relationship between trust and data quality? What is the meaning of trust in the context of data quality?

lobo
Download Presentation

Problem Statement and Objectives

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kai-Uwe Sattler, Michael Gertz, Vipul Kashyap, Cai Ziegler, Cinzia Cappiello, Susanne BollDagstuhl Seminar “Data Quality on the Web”

  2. Problem Statement and Objectives What is the relationship between trust and data quality? • What is the meaning of trust in the context of data quality? • What are the dimensions of the data quality in different settings? • How do we characterize these dimensions ? • Can we determine any metrics for assessment? How can data quality measurements establish trust?

  3. Data, Fact, Belief, and Trust Data verifiable Non-verifiable Belief Fact non-evidence based evidence based Belief Belief experience atomic indirect atomic indirect reputation Trust Trust

  4. Notions we established in this work • If you can verify -> Fact • If you can not verify -> Belief • Webster: a state or habit of mind in which trust or confidence is placed in some person or thing • Two different views of belief • Evidence-based belief • Non evidence based belief • Reputation is the memory and summary of behavior from past transactions • Trust is a subjective expectation an agent has about another‘s future behavior • Two different variants • Atomic trust • Indirect trust • Reputation and Trust are built over time (feedback)

  5. Working Model • Consumers (Query); Providers (M), • Distinguish Trusted sources M1, non trusted sources M2 • Query result r1 from M1, Query result r2 from M2 • Question: What relationship between r1 and r2 can be used to estimate the quality of the result?

  6. Data Management Settings 2 3 unstructured / semistructured data Web data Doc coll. Inf. retrieval Multimedia 1 4 structured data Traditional databases Inf. retrieval Databases exact queries imprecise queries

  7. The three DQ dimensions for Setting 1 Completeness: “degree to which the expected values are included in a data collection” In the presence of trusted data sources In the absence of trusted data sources

  8. The three DQ dimensions for Setting 1 • Timeliness: relationship between the validity of the data item and the time referred to in the query • Timeliness is a verifiable notion, but there is no distinction between trusted & untrusted sources if we can make an assumption that validity intervals are trustworthy

  9. The three DQ dimensions for Setting 1 • Correctness: similar to the completeness ratio between the correct values and the total values, and we make a distinction between trusted & untrusted sources In the presence of trusted data sources In the absence of trusted data sources

  10. Setting 2 – exact queries / unstructured data • Completeness • The number of sources is big • It is harder to establish a notion of completenes than it is in Setting 1 • Timeliness • Same as Setting 1 • Under the assumption that validity intervals are explicit • Correctness • Same as Setting 1 (close to DB Scenario)

  11. Setting 3 – imprecise queries / unstructured data Quality of metadata has major impact on completeness and correctness • Completeness • Same as above • Timeliness • Same as above • Correctness The difference here is the ranking

  12. Setting 4 – imprecise queries / structured data • Completeness • Timeliness • Correctness

  13. So … • DQ is a composite of different DQ dimensions • For the DQ dimensions, there are measurements for different settings • DQ – TRUST • DQ values need to be fed into the trust values • Trust values need to be fed back into DQ values

  14. Open issues • Scalability • Metadata quality • Further dimensions • How to feedback • Models for combining quality values

More Related