1 / 26

WP3: Data Provenance and Access Control

WP3: Data Provenance and Access Control. Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion. Presentation Outline. WP3 status and outline Research achievements D3.2 status Review comments Health use case description Demo

kiet
Download Presentation

WP3: Data Provenance and Access Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP3: Data Provenance and Access Control Giorgos Flouris, Irini Fundulaki, Vassilis Papakonstantinou, FORTH September 9-10, 2013, Heraklion

  2. Presentation Outline • WP3 status and outline • Research achievements • D3.2 status • Review comments • Health use case description • Demo • Next steps (on demo)

  3. WP3: Work Plan View 24 12 0 6 18 30 36 42 Task 3.1 Provenance Management FORTH D 3.3 Access control system and privacy-aware language D 3.2 Provenance management and propagation through SPARQL query and update languages Task 3.2 Privacy, DRM and Access Control FORTH, KIT D 3.1 Access control specification language, reasoning and enforcement mechanisms D 3.4 Trust management and inference system Task 3.3 Trust Management EPFL

  4. Research So Far (Outline) • Abstract models for access control (FORTH) • Abstract models for provenance (FORTH) • Provenance for SPARQL query • Provenance for SPARQL update • Privacy (KIT) • Privacy in smart grids (not integrated) • Some integration in the demo • Problems (non-critical) – to be discussed • Trust (EPFL)

  5. Access Control • The selective exposure of information to different users/roles • Useful for applications involving sensitive information • In the context of LOD: • Encourages publication of data that may include sensitive information • Standard approach: • Data annotates with specific tags determining whether it should be accessible by specific users/roles

  6. Abstract Labels • Triples associated with abstract labels • A set of abstract tokens (a1, a2, …) • Explicit triples associated with such tokens via authorizations • Abstract operators (⊙, , ) • a1 ⊙a2: the triple occurred via inference from triples with labels a1, a2 • a1: the triple occurred via propagation from a triple with label a1 • a1 a2: the triple occurred in two different manners, one via a1, one via a2 (e.g., two different authorizations) • a1 (a2 ⊙(a3)): …

  7. Determining Accessibility • Concrete policy • Associate tokens to concrete values • Associate operators to concrete operations • Determine whether the final value corresponds to an accessible triple (access function) • Example • a1=1, a2=2, a3=3 • ⊙=min, =max, =ID function • Accessible iff result >1 • a1 (a2 ⊙(a3)) evaluates to 2 (i.e., triple is accessible)

  8. SPARQL Query Provenance • What is the provenance of the result of a complex SPARQL query? • Adapting relational solutions • Positive fragment (semirings) • Works fine • Non-monotonic fragment (m-semirings) • Problem with OPTIONAL, DIFFERENCE • Different semantics than SQL • Two alternative approaches • m-semirings: translation to SQL • spm-semirings: a new operation (and the corresponding properties) to capture the provenance of OPTIONAL, DIFFERENCE

  9. SPARQL Update Provenance • What is the provenance of a new triple, inserted via a complex SPARQL Update? • Similar to CONSTRUCT (query) • But still different • CONSTRUCT creates a new triple but does not modify the dataset • Updates specify explicitly the named graph to put the new triple(s) • Triples with different provenance may be put in the same named graph • Named graphs alone are not sufficient for capturing the provenance of updates

  10. D3.2 Status • Contents of D3.2 • Abstract models for provenance (very similar to the abstract models for access control) • Provenance for SPARQL query results • Provenance for SPARQL update (inserted triples) • Review version uploaded on the wiki on 05/09/13 • http://wiki.planet-data.eu/web/D3.2 • Only one reviewer at the moment (Oscar) • Volunteers?

  11. Review Comments • Generally happy (“impressed by D3.1”) • Applicability • Usefulness: convince industry to look into that • Focus on a real-world use case to demonstrate value • In a nutshell • Some implementation to show value • Solution: demo (use case) • Health use case • Also suitable to show synergy

  12. Health Use Case • A use case to show applicability and usefulness • In collaboration with Computational Medicine Laboratory (CML) of FORTH • Health-related data are sensitive • Proposed by the reviewers (Anders Tornquist) • Insurance companies need controlled access to sensitive medical data to determine premiums, insurance policies, contract terms etc • Relevant to access control/privacy challenges • But also related to streaming, data quality and trust

  13. Personal Health Record • Personal Health Record (PHR) • Collection of data regarding a patient • Diseases, personal information, medications, clinical observations and findings, measurements, … • Properties • Sensitive • Dynamic, sometimes streaming • Not always of good quality

  14. Relation to Other WPs • Relation to WP1 • Part of the PHR data may be of streaming nature • E.g., vital signs’ measurements of hospitalized patients • Relation to WP2 • Data often of poor quality • Up to 26,9% of the data can be erroneous • Patient provides data, faulty readings, sensors etc • Suggestion (for the review) • Outline how the technologies developed in WP1, WP2 could be used (potentially) to address these issues • Specific and concrete, but no implementation needed

  15. Access Control and Privacy • PHR (normally) accessible only by the patient • Sensitive data • Doctors, nurses, hospitals, insurance companies, public services may require access • Informed Consent • Patient allows access to (parts of) his PHR to specific entities, for a specific purpose, in a specific timeframe etc • Via Consent Forms • Formal, legal document

  16. Objectives • We will use this use case to demonstrate the benefits of our approach • Different entities have access to the same data, without accessing sensitive information • Unless the owner of the data has explicitly allowed so (via the consent form) • Without replication

  17. Health Use Case Setting Dataset (collection of PHRs) Dataset Dataset Dataset Dataset Dataset

  18. PACEM API SPARQL to SQL Translation Module AAC API Annotation Module Evaluation Module • MonetDB • Abstract expressions DB Update Module Architecture (Data Access) AUTH API • User credentials for authentication • User interface • authentication • queries AUTH Module AUTH DB user request (accessing entity, SPARQL query) result(triples) CPRP API • Purpose and role hierarchy • Assignment of concrete policies to accessing entities accessing entity SPARQL CPRP Module concrete policy CPRP DB result (triples) SQL,concrete policy

  19. Dataset • Advanced Patient Data Generator (APDG) • Synthetic, but realistic data • Developed in the context of EURECA (FP7 IP) • Data associated with large medical schemas • HL7-RIM, SNOMED-CT • 10K patients • 750K instance triples

  20. Data on HL7-RIM (1/2)

  21. Data on HL7-RIM (2/2) … Role Participation Entity Observation “Sally Berry” foaf:name http://kandel…./entityno/BC_ZSH2012A1000000 http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 …

  22. Data on SNOMED-CT (1/2) http://purl.bioontology…./408643008 skos:prefLabel “Infiltrating duct carcinoma of breast” http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 Observation indicating that the patient has“infiltrating duct carcinoma of breast”

  23. Data on SNOMED-CT (2/2) Neoplasm of breast Carc. in situ of breast Malignant tumor of breast Carc. of breast Lobular carc. in situ of breast Intraductal carc. in situ of breast Infiltrating duct carc. of breast Infiltrating lobular carc. of breast

  24. HL7-RIM and SNOMED-CT Neoplasm of breast Carc. in situ of breast Malignant tumor of breast Carc. of breast Lobular carc. in situ of breast Intraductal carc. in situ of breast Entity Infiltrating duct carc. of breast Infiltrating lobular carc. of breast “Sally Berry” Observation foaf:name http://kandel…./entityno/BC_ZSH2012A1000000 http://kandel.…/obsno/5bf7d7bc-a1e8-11e2-bb58-6d82cec8d2c3 …

  25. Demo Scenario • Breast Cancer Action Fund (BCAF) provides benefits for cancer patients • Requires info on patients’ status to give the benefit • Sally Berry wants to apply for the benefit • Alternative: insurance company wants access to (part of) the data for determining the insurance premium and the contract terms • Demo: http://daphne.ics.forth.gr:8084/pd-demo/login.jsp

  26. Next Steps • Make more explicit the benefit of abstract models • Efficient updates (no recomputation required) • Efficient change of policies (no recomputation required) • Try more scenarios • Purpose and role hierarchies • More functionality

More Related