1 / 35

Overview of Today’s Talks

Overview of Today’s Talks. Provenance Data Structures Recording and Querying Provenance Break (30 minutes) Distribution and Scalability Security Methodology. Distribution and Scalability by Paul Groth (pg03r@ecs.soton.ac.uk). Applications are distributed. Applications require scalability.

Download Presentation

Overview of Today’s Talks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Today’s Talks • Provenance Data Structures • Recording and Querying Provenance • Break (30 minutes) • Distribution and Scalability • Security • Methodology

  2. Distribution and Scalability by Paul Groth (pg03r@ecs.soton.ac.uk)

  3. Applications are distributed

  4. Applications require scalability • Applications may have millions of interactions • These interactions may be simultaneous • Applications may have large amounts of data

  5. These Issues & Provenance • Because applications are distributed and need scalability, a provenance system must support these requirements • Provenance Systems have their own requirements in these areas • Large numbers of p-assertions. • Scalability in terms of querying and recording

  6. Provenance Store Distribution Recording Patterns • - Bandwidth • - Access Control • Storage • Legal • Multiple physical Provenance Stores per site PS PS PS PS PS PS

  7. Logical Distribution • Provenance Stores are both physical and logical entities • Single physical store could have multiple logical stores • Logical Provenance Stores provide bounds to process documentation • Could be organisational, experimental, or individual

  8. Logical Provenance Stores Physical Provenance Store Hospital Payroll Store Paul’s Store Donor Data Collector Store SurgeryWard Store

  9. Provenance Store Usage • Combinations of logical and physical Provenance Stores can be adopted depending on the application’s needs • In terms of: • Scalability • Regulatory / Legal • Information partitioning

  10. Distributed Query? • Process Documentation is in multiple stores • How do we get the provenance of a data item in this case? • Solution: connections embedded in process documentation • Shared Context • Links

  11. Shared Context revisited Querier PS 2 PS 1 Query PAs with IK 1 Query PAs with IK 1 P-assertion With IK 1 P-assertion With IK 1 IK 1

  12. Links • Links are unidirectional pointers to provenance stores • Links connect provenance stores • Links are recorded by actors as part of p-assertions • Links are transferred between actors using interaction contexts • There are two kinds of links • View Links • Object Links

  13. Views revisited • A view is the set of assertions by 1 actor about 1 interaction. • A view contains: • An actor identity • A set of p-assertions • A view is one of two view kinds: sender or receiver User Interface Donor Data Collector

  14. View Links • A view link points to the provenance store containing the opposite view of the interaction • View Links are transferred in p-headers or interaction contexts PS 2 PS 1 Record Link to PS 1 Will Record P-Assertions Record Link to PS 2 Will Record P-Assertions Inform of PS1 usage Receiver Inform of PS 2 usage Sender

  15. Object Links • A pointer to the provenance store where the object of a relationship is stored • This allows for distributed provenance queries PS 1 PS 2 PS 3

  16. Implementing Distributed Queries • Querying actor centric (thick client) • The querying actor follows links • Provenance Store centric (thin client) • Provenance Stores follow links

  17. Querying Actor Centric Process Results for links PS 1 Querying Actor Issue Query Receive Result Issue Query PS 2 Receive Result

  18. Provenance Store centric Process Internal Results For Links PS 1 Issue Query Querying Actor Receive Results Collate Results Issue Query Receive Results Receive Results Issue Query PS 2 PS 3

  19. Analysis of Links • Links are unidirectional like the Web • This approach should be fairly scalable • Maintain autonomy of application actors • There is no need for synchronization between actors • Like the web, queriers must traverse the link structure to find content of interest • Two mechanisms for implementing distributed queries using links.

  20. Supporting Large Data • Depending on the size of the data involved the provenance store may not: • Be able to store the data immediately • Asynchronous recording • Be able to store the data • Solution: references • References to data, instead of the data itself • Support for three kinds • Application • Internal • External

  21. Application References • The application already transfers references in its application messages • Nothing to do. Record p-assertions as is • Inform querying actors of how to resolve these application specific references http://datastore/pr#1234

  22. External References • Application transfers a large message • Stores all or part of the message in some data repository • Reference to this external data repository • Burden is placed on the data repository to maintain the data as long as process documentation

  23. External References cont. Large Patient Record Data Repository PS DocStyle: Reference http://DataRepository/#LPR1

  24. External References cont. <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:patientRecord> <pid>1</pid> <xray>j8ladfhaufjalkdjkfaslalkfdjaljfafjaljajfdlja adfhaldfjhaslfjdasldfjaslfj…. </xray> </echrs:patientRecord> </echrs:store> </soap:body> </soap:envelope>

  25. Styled Reference P-Assertion <ps:interactionPAssertion> <ps:localPAssertionId>1</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> <ps:content> <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:ref> http://DataRepository/#LPR1 </ echrs:ref> </echrs:store> </soap:body> </soap:envelope> </ps:content> </ps:interactionPAssertion>

  26. Internal References • Same as External References • However, the reference is to data already stored inside the provenance store • This is made possible by the unique addressability of p-assertions • Useful for the case of large actor state p-assertions that are recorded several times • Example: System Configuration Information

  27. Internal References cont. PS Actor State P-Assertion Lots of Configuration information Actor State P-Assertion 1 Actor State P-Assertion 2 Actor State P-Assertion 3 Actor State P-Assertion 4

  28. Provenance Query Results Scalability • Provenance Query result sets are scalable • Return pointers to p-assertions not the assertions themselves

  29. <ps:interactionPAssertion> <ps:localPAssertionId>1</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> <ps:content> <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:ref> http://DataRepository/#LPR1</echrs:ref> </echrs:store> </soap:body> </soap:envelope> </ps:content> </ps:interactionPAssertion> <ps:interactionPAssertion> <ps:localPAssertionId>2</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> …

  30. <psdid> <interactionKey> <sender>donerdatacollector</sender> <receive>echr</receiver> <id>12233</id> </interactionKey> <viewkind>sender</viewkind> <localPAssertionId>1></localPAssertionId </psdid> <psdid> <interactionKey> <sender>donerdatacollector</sender> <receive>echr</receiver> <id>1224</id> </interactionKey> <viewkind>sender</viewkind> <localPAssertionId>5></localPAssertionId </psdid> <psdid>

  31. Provenance Query Results Scalability • Provenance query results are scalable • Return pointers to p-assertions not the assertions themselves • Scoping means provenance query results are only what is necessary for the querier

  32. Iterative Query Results • Return iterators over results from process documentation or provenance query results PS Issue Query Querying Actor Results Iterator Results Iterator getNextRes() getNextXRes(int x)

  33. Iterative Query Results • Return iterators over results from process documentation or provenance query results • This functionality is planned for future implementations • The planned implementation makes use of • OGSA-DAI • WSRF

  34. Summary • Discussed both Distribution and Scalability • Introduced links for connecting distributed provenance stores • Two ways of implementing distributed queries • Large data support through asynchronous recording and references • Query Scalability • Provenance Query Results • Iterative Query Results

  35. Questions? Paul Groth pg03r@ecs.soton.ac.uk

More Related