1 / 23

Real Security for Real Provenance is Really Hard

Real Security for Real Provenance is Really Hard. Dr. Adriane Chapman. Data Security 101. Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) Access to the data is specified by role/privilege predicates Access is administered by data owners

kiefer
Download Presentation

Real Security for Real Provenance is Really Hard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Real Security for Real Provenance is Really Hard Dr. Adriane Chapman

  2. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • Cryptographic techniques (secure hash functions, etc) maintain data integrity

  3. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • To heighten protection, data can be encrypted to ensure that it is not tampered with Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein,Theory and Practice of Provenance 2009.

  4. Who are the users? The Public Lineage Query Result :

  5. Who are the users? The Hospital Lineage Query Result :

  6. Who are the users? Congress Lineage Query Result :

  7. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • Cryptographic techniques (secure hash functions, etc) maintain data integrity Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein,Theory and Practice of Provenance 2009.

  8. How do you specify access? RBAC ABAC • RBAC model the world with roles • Form aggregates of users (groups) and privileges (roles) • Admins authorize groups to use roles • Not expressive enough • Only user (group) is tested • Allowing hospitals access to more information when threats are high is not allowed • Multi-factor policies • Every policy will create an explosion in the number of roles, e.g., • Group1: Director of Surgery at Hospital 123 where status=“emergency” • Group 2 : Director of Surgery at Hospital 123 where status=“normal” • Predicates on attributes are used to describe access • Instead of explicitly assigning users, decide based on U, R, E.

  9. Even Classic ABAC doesn’t cut it Animal_Testing_Access(user, resource, environment) ≔ [User.Division= Intelligence  User.AssignedProject.Type=Epidemiology  Request.SourceDomain is in {.gov, .mil}  Experiment.ReleaseMarking = Intel  (ExperSubject.Type = inanimate  ExperSubject.Type = animal  experimenterName.pseudonym=true  ExperSubject.Type = human  releaseOnFile(ExperSubject)  [Request.HasApproval.Level ≥ 4  (Request.HasApproval.Level ≥ 2  threat.Status = Red)]  [ … Congress just passed a new Disclosure Act. What parts need to change to update this concern?

  10. The Solution – Extend ABAC • Stakeholder Concerns traceable and editable • HIPAA wants to protect patient privacy, how does the role “doctor” protect patient privacy? • Named Concerns • Link directly to access predicates that embody these concerns

  11. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • Cryptographic techniques (secure hash functions, etc) maintain data integrity Scalable Access Controls for Lineage Arnon Rosenthal, Len Seligman, Adriane Chapman and Barbara Blaustein,Theory and Practice of Provenance 2009.

  12. Who decides what to secure? Tell no one I ran this code. Tell everyone my code was used!

  13. The Solution • Stakeholders can have differing opinions! • Represent these explicitly • Represent their combo. • Let them be edited independently • Make Administration Manageable! • Sharing the Power • Unacceptable approaches: • A single administrator, or a global conflict-resolution rule • A totally separate formalism for conflict resolution • Share power by attribute ownership, derivation • Combine as derived attribute; delegate right to define derivation rule (See paper)

  14. Our Framework Tell no one I ran this code. Tell everyone my code was used! Who says how this should be combined? Combiner: VETO Stakeholder: Analyst Smith Stakeholder: Prof Jones

  15. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • Cryptographic techniques (secure hash functions, etc) maintain data integrity Surrogate Parenthood: Protected and Informative Lineage Graphs Barbara Blaustein, Adriane Chapman, Arnon Rosenthal, Len Seligman, M. David Allen, Michael Morse, In Preparation.

  16. Provenance Surrogates The Public • Replace nodes with less sensitive information • Obscure edge information CDC Historical Disease Data EPO Epidemic Warning Reports Hospital Admissions Data Invoker Analyst Smith Author Prof. Jones TrakTek, Inc. Disease Spread Monitor EPO Epidemic Forecast Epidemic Projector, v3 Author: Agent 009 Animal Tests Bio-Threat Intelligence Pharmacy Prescription Data Laboratory Results

  17. But not straightforward for Provenance – Inference Threats • Inferring edges from the rather strong clues, such as • Parameter labels (role labels) • Results of non-graph queries • Inferring node information via edges from other nodes • e.g., ResultSize(N3) may reveal ResultSize(N1) • TimeReceived may reveal TimeProcessed at predecessor Policy specifies which surrogates are releasable, i.e., what threats are “acceptable”(see Who owns it point).

  18. Data Security 101 • Users are given roles (RBAC) or satisfy a set of privilege predicates (ABAC) • Access to the data is specified by role/privilege predicates • Access is administered by data owners • If data cannot be released, some systems allow surrogates to stand in for the unreleaseable information • e.g. a Public version of a Classified document • Cryptographic techniques (secure hash functions, etc) maintain data integrity Do you know where your data’s been? Fine-Grained Tamper-Evident Data Provenance Jing Zhang, Adriane Chapman and Kristen LeFevre, In Submission..

  19. What does provenance “integrity” mean? • Data “integrity” – encryption (e.g. SHA-1, MD5, etc) • w.r.t. query answers – provide enough extra information to prove that query results are correct (usually Merkle Hash Trees) • Provenance “integrity” • Allow users of the data to verify that the provenance has not been tampered with • AND that it accurately represents the state of the data

  20. Why is this difficult? Drug Efficacy Report White Blood Cell Count Dataset 1 Collected by Good Stewards Labs • Objects are compound • Patient records contain several attributes which were obtained via different methods and have different provenance • Non-linear sequence of information • Provenance is a DAG not a chain Dataset 2 Patient Ages And Weights TrustUsRx Aggregator Collected by PCP Paul Dataset 3 Interim Dataset Endocrine Activity Pamela Updated 1 patient record Collected by Perfect Saints Clinic

  21. Solution Sketch • A participant may alter data via insert, delete, update and aggregate • A provenance record consists of a sequenceID, participant, and the input/output values of the object • Developed an extended signature scheme • Create a checksum that verifies the integrity of provenance and data

  22. Conclusions • Provenance is a DAG and a node. • There are unique security inference problems! • Who gets to control what is released is not straightforward • Using standard access control methods doesn’t work • Provenance integrity is necessary to assure the veracity of the information

  23. Acknowledgements MITRE University of Michigan • Arnon Rosenthal • Barbara Blaustein • Len Seligman • David Allen • Michael Morse • Kristen LeFevre • H.V. Jagadish • Jing Zhang

More Related