1 / 25

WP3: Data Provenance and Access Control

WP3: Data Provenance and Access Control. Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg. Extended example scenario. GADM-RDF, Geospecies. Provenance and access control. SSN. ontologies. registry. C-SPARQL/ SPARQL-STR /HTTP. Quality control. relational DBMS. stream DB.

hashim
Download Presentation

WP3: Data Provenance and Access Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg

  2. Extended example scenario GADM-RDF, Geospecies Provenanceandaccesscontrol SSN ontologies registry C-SPARQL/SPARQL-STR/HTTP Quality control relational DBMS stream DB stream DB RDF (stream) DB CSV twitter

  3. First Year Review Comments • Collaboration • Provenance and Data Quality (with UPM, FUB for WP2) • Collaboration with KIT for a privacy awareaccess control framework • Real versus syntheric data • PlanetData datasets (GADM-RDF1, GeoSpecies2 enhanced with links from DBPedia using LDSpider), GO3 and CIDOC4 ontologies • Scalability • Real datasets with up to 11M triples 1 GADM-RDF http://gadm.geovocab.org/ 2 GeoSpecies: http://lod.geospecies.org/ 3 Gene Ontology: http://www.geneontology.org/ 4 CIDOC-CRM: http://www.cidoc-crm.org/

  4. Access Control • Access Control: • Refers to the ability to allow or deny the use of a particular resource by a particular entity • Crucial for sensitive content since it ensures the selective exposure of information to different classes of users • Access Control Enforcement techniques for RDF data in the presence of RDFS inference: • Materialization (forward chaining) • Query Rewriting (backward chaining) • Static Analysis

  5. p s o label &a type Student yes sc Student Person yes s p o label &a type Person yes Access Control Models • Standard access control models associate a concrete label to a triple to indicate whether the triple can be accessed or not • Built in rule: An implied RDF triple can be accessed if and only ifall its implying triples can be accessed

  6. Access Control Models • In the case of any kind of update, the implied triples & their labels must be re-computed • An implied RDF triple can be accessed if and only if all its implying triples can be accessed p s o label &a type Student yes sc Student Person yes no s p o label &a type Person yes no the overhead can be substantial in large datasets when updates occur frequently

  7. Our Approach • Fine-grained Abstract Access Control Model comprised of abstract tokens and operators • focus at the RDF Triple level • focus on permissions for read-only queries • labeled triples are encoded as quadruples • supports RDFS inference and propagation of labels across the RDFS hierarchies • encodes how the access label of an implied quadruple is computed • concrete policies are used to assign specific values to the abstract tokens and operators • Implementation of a fine-grained, repository independent, portable across platformsaccess control framework on top of the MonetDB column store

  8. p s o label l1 • l1 , l2:abstract tokens &a type Student q1: l2 q2: Student sc Person Person type class l3 q3: s p o label ⊙ : entailment operator records the triples used in the implication &a type Person l1⊙ l2 q4: q5: Student type class l2⊙ l3 s p o label  : propagation operator  l3 type Person &a q6: p o label s  : default access token &a fname Alice  q7: Annotating Triples with Access Tokens

  9. Annotating Triples with Access Tokens • Authorization Rules s p o l A1: (construct {?x lname ?y} where {?x type Student},l1) l2 q1: Student sc Person l2 q2: Person sc Agent A2:(construct {?x sc ?y},l2) l3 q3: &a type Student A3:(construct {?x type Student},l3)  q4: &a fname Alice A4:(construct {?x type class},l4) l4 q5: Agent type class l5 q6: Student sc Person A5:(construct {Student sc ?y},l5) Authorizations (Query, abstract token) RDF quadruples

  10. (A1, sc, A2, l1) (A2, sc, A3, l2) (A1, sc, A3, l1 ⊙l2) (&r1, type, A2, l1 ⊙l2) (&r1, type, A1, l1) (A1, sc, A2, l2) Computing the labels of implied triples • RDFS Inference: generate new knowledge - subClassOf - typeOf s p o l s p o l l2⊙ l2 l2 q8: Student sc Agent q1: Student sc Person l5⊙ l2 l2 q9: Student sc Agent q2: Person sc Agent q3: &a type Student l3⊙ l2 l3 q10: &a type Person l4 (l3⊙ l2) ⊙ l2 class q5: Agent type q11: &a type Agent (l5⊙ l2) ⊙ l2 q12: &a type Agent Person q6: Student sc l5 RDF quadruples

  11. (A1, sc, A2, l1) (A2, sc, A3, l2) (A1, sc, A3, l1 ⊙l2) (&r1, type, A2, l1 ⊙l2) (&r1, type, A1, l1) (A1, sc, A2, l2) s p o l l2 q1: Student sc Person l2⊙ l2 q8: Student sc Agent l2 q2: Person sc Agent l5⊙ l2 q9: Student sc Agent q3: &a type Student l3 l3⊙ l2 q10: &a type Person l4 class q5: Agent type (l3⊙ l2) ⊙ l2 q11: &a type Agent Person q6: Student sc l5 (l5⊙ l2) ⊙ l2 q12: &a type Agent RDF quadruples Computing the labels of implied triples • RDFS Inference: generate new knowledge - subClassOf - typeOf s p o l

  12. s p o l o l s p l2⊙ l2 q8: Student sc Agent l2 q1: Student sc Person l5⊙ l2 q9: Student sc Agent l2 q2: Person sc Agent l3⊙ l2 q3: &a type Student q10: &a type Person l3 (l3⊙ l2) ⊙ l2 l4 &a type Agent q11: class q5: Agent type (l5⊙ l2) ⊙ l2 q12: &a type Agent Person q6: Student sc l5  l4 q13: &a type Agent RDF quadruples Computing the propagated labels • Propagating labels along the RDFS hierarchies: no new knowledge created (A1, type, class, l1) (A2, type, A1, l2) (A2, type, A1,  (l1 ))

  13. From Abstract Models to Concrete Policies • Concrete Policies implement Abstract Model: • Set of Concrete Tokens • Mapping from abstract to concrete tokens • Set of concrete operators that implement the abstract ones • Conflict resolution operator to resolve cases where the same triple is assigned different labels • Access Function to decide when a triple is accessible

  14. l1l2ifl1andl2are different from  l1 ifl2is  l1⊙l2 =  ifl1 , l2are equal to Concrete Policy C1 • Concrete Tokens: LP = { true, false} • Concrete Operators: • Entailment Operator ⊙ • Propagation Operator  : Identity • Conflict Resolution Operator  • Access function: triples with label true are accessible, otherwise, inaccessible false if value false is inS true if false does not belong inS  S =  otherwise

  15. true true true true Concrete Policy C1 l s p o l o s p l2 q1: Student sc Person Student sc Person q1: l2 q2: Person sc Agent Person sc Agent q2: l3 q3: q3: &a type Student &a type Student l4 q5: Agent type class Agent type class q5: l5 false q6: q6: Student sc Person Student sc Person l2⊙ l2 true q8: Student sc Agent Student sc Agent q8: l5⊙ l2 q9: Student sc Agent false Student sc Agent q9: l4 true q13: &a type Agent &a type Agent q13: Mapping: l2  true l4  true l5  false l3  true

  16. Evaluation • Implementation: • Relational Schema: • Authorization(auth_id, query, access_token) • ExplicitTriples(qid, s, p, o, auth_id): stores explicit triples • InferopTriples(qid, s, p, o): stores implicit quadruples • PropOp(qid, s, p, o, l): stores the propagated quadruples • LabelStore(qid, qid_uses): stores for each implied and propagated quadruple, the explicit quadruples that are used in its implication; • Query Engine: CWI’s MonetDB Open Source Column Store • Stored Procedures are used for the computation of the access labels

  17. Benchmark • Datasets: • Synthetic Datasets: produced by PowerGen [1] • Real Datasets: CIDOC, GO, GeoSpecies and GADM Datasets • Authorizations: • 15 authorizations • Concrete Policy: • Access Tokens: {low, medium, high} • low < medium < high • Mapping: • same triple is assigned different labels and the same label is assigned to different triples • 5 authorizations - assign to 35% of triples low label • 5 authorizations - assign to 55% of triples medium label • 5 authorizations - assign to 45% of triples high label • Operators: • Entailment, Conflict Resolution: min(), Propagation: identity

  18. Experiments • Experiment 1:Annotation Time • the time required to compute the implied quadruples and the propagated labels • Experiment 2:Evaluation Time • the time needed to compute for a concrete policy, the concrete access label of a percentage of the RDF triples

  19. Real Datasets • Characteristics • Experiment 1: Annotation Time 35451 35451 35451

  20. Real Datasets: Experiment 2

  21. Synthetic Datasets: Experiment 1 Annotation timedoes not depend on the number of explicit (i.e, initial triples in the dataset) but on the number of produced implied quadruples

  22. Synthetic Datasets: Experiment 2 • Characteristics • Observations • Dataset D2 has a larger complexity than D1 (due to the larger depth) leading to more complex abstract expressions

  23. Pros & Cons of Abstract Access Control Models • Pros: • Same application can experiment with different concrete policies over the same dataset • e.g., liberal vs conservative policies for different classes of users • Different applications can experiment with different concrete policies for the same data • In the case of updates there is no need re-compute the access labels of the inferred triples • Cons: • overhead in the required storage space • expressions that describe how a label is computed can become quite complex depending on the structure of the dataset

  24. WP3: Work Plan View 24 12 0 6 18 30 36 42 D 3.2 Provenance management and propagation through SPARQL query and update languages Task 3.1 Provenance Management FORTH D 3.1 Access control specification language, reasoning and enforcement mechanisms D 3.3 Access control system and privacy-aware language Task 3.2 Privacy, DRM and Access Control FORTH D 3.4 Trust management and inference system Task 3.3 Trust management EPFL

  25. BackUp Slides

More Related