350 likes | 359 Views
Overview of Today’s Talks. Provenance Data Structures Recording and Querying Provenance Break (30 minutes) Distribution and Scalability Security Methodology. Distribution and Scalability by Paul Groth (pg03r@ecs.soton.ac.uk). Applications are distributed. Applications require scalability.
E N D
Overview of Today’s Talks • Provenance Data Structures • Recording and Querying Provenance • Break (30 minutes) • Distribution and Scalability • Security • Methodology
Distribution and Scalability by Paul Groth (pg03r@ecs.soton.ac.uk)
Applications require scalability • Applications may have millions of interactions • These interactions may be simultaneous • Applications may have large amounts of data
These Issues & Provenance • Because applications are distributed and need scalability, a provenance system must support these requirements • Provenance Systems have their own requirements in these areas • Large numbers of p-assertions. • Scalability in terms of querying and recording
Provenance Store Distribution Recording Patterns • - Bandwidth • - Access Control • Storage • Legal • Multiple physical Provenance Stores per site PS PS PS PS PS PS
Logical Distribution • Provenance Stores are both physical and logical entities • Single physical store could have multiple logical stores • Logical Provenance Stores provide bounds to process documentation • Could be organisational, experimental, or individual
Logical Provenance Stores Physical Provenance Store Hospital Payroll Store Paul’s Store Donor Data Collector Store SurgeryWard Store
Provenance Store Usage • Combinations of logical and physical Provenance Stores can be adopted depending on the application’s needs • In terms of: • Scalability • Regulatory / Legal • Information partitioning
Distributed Query? • Process Documentation is in multiple stores • How do we get the provenance of a data item in this case? • Solution: connections embedded in process documentation • Shared Context • Links
Shared Context revisited Querier PS 2 PS 1 Query PAs with IK 1 Query PAs with IK 1 P-assertion With IK 1 P-assertion With IK 1 IK 1
Links • Links are unidirectional pointers to provenance stores • Links connect provenance stores • Links are recorded by actors as part of p-assertions • Links are transferred between actors using interaction contexts • There are two kinds of links • View Links • Object Links
Views revisited • A view is the set of assertions by 1 actor about 1 interaction. • A view contains: • An actor identity • A set of p-assertions • A view is one of two view kinds: sender or receiver User Interface Donor Data Collector
View Links • A view link points to the provenance store containing the opposite view of the interaction • View Links are transferred in p-headers or interaction contexts PS 2 PS 1 Record Link to PS 1 Will Record P-Assertions Record Link to PS 2 Will Record P-Assertions Inform of PS1 usage Receiver Inform of PS 2 usage Sender
Object Links • A pointer to the provenance store where the object of a relationship is stored • This allows for distributed provenance queries PS 1 PS 2 PS 3
Implementing Distributed Queries • Querying actor centric (thick client) • The querying actor follows links • Provenance Store centric (thin client) • Provenance Stores follow links
Querying Actor Centric Process Results for links PS 1 Querying Actor Issue Query Receive Result Issue Query PS 2 Receive Result
Provenance Store centric Process Internal Results For Links PS 1 Issue Query Querying Actor Receive Results Collate Results Issue Query Receive Results Receive Results Issue Query PS 2 PS 3
Analysis of Links • Links are unidirectional like the Web • This approach should be fairly scalable • Maintain autonomy of application actors • There is no need for synchronization between actors • Like the web, queriers must traverse the link structure to find content of interest • Two mechanisms for implementing distributed queries using links.
Supporting Large Data • Depending on the size of the data involved the provenance store may not: • Be able to store the data immediately • Asynchronous recording • Be able to store the data • Solution: references • References to data, instead of the data itself • Support for three kinds • Application • Internal • External
Application References • The application already transfers references in its application messages • Nothing to do. Record p-assertions as is • Inform querying actors of how to resolve these application specific references http://datastore/pr#1234
External References • Application transfers a large message • Stores all or part of the message in some data repository • Reference to this external data repository • Burden is placed on the data repository to maintain the data as long as process documentation
External References cont. Large Patient Record Data Repository PS DocStyle: Reference http://DataRepository/#LPR1
External References cont. <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:patientRecord> <pid>1</pid> <xray>j8ladfhaufjalkdjkfaslalkfdjaljfafjaljajfdlja adfhaldfjhaslfjdasldfjaslfj…. </xray> </echrs:patientRecord> </echrs:store> </soap:body> </soap:envelope>
Styled Reference P-Assertion <ps:interactionPAssertion> <ps:localPAssertionId>1</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> <ps:content> <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:ref> http://DataRepository/#LPR1 </ echrs:ref> </echrs:store> </soap:body> </soap:envelope> </ps:content> </ps:interactionPAssertion>
Internal References • Same as External References • However, the reference is to data already stored inside the provenance store • This is made possible by the unique addressability of p-assertions • Useful for the case of large actor state p-assertions that are recorded several times • Example: System Configuration Information
Internal References cont. PS Actor State P-Assertion Lots of Configuration information Actor State P-Assertion 1 Actor State P-Assertion 2 Actor State P-Assertion 3 Actor State P-Assertion 4
Provenance Query Results Scalability • Provenance Query result sets are scalable • Return pointers to p-assertions not the assertions themselves
<ps:interactionPAssertion> <ps:localPAssertionId>1</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> <ps:content> <soap:envelope> <soap:header>…</soap:header> <soap:body> <echrs:store> <echrs:ref> http://DataRepository/#LPR1</echrs:ref> </echrs:store> </soap:body> </soap:envelope> </ps:content> </ps:interactionPAssertion> <ps:interactionPAssertion> <ps:localPAssertionId>2</ps:localPAssertionId> <ps:documentationStyle> http://www.pasoa.org/.../styles#Reference </ps:documentationStyle> …
<psdid> <interactionKey> <sender>donerdatacollector</sender> <receive>echr</receiver> <id>12233</id> </interactionKey> <viewkind>sender</viewkind> <localPAssertionId>1></localPAssertionId </psdid> <psdid> <interactionKey> <sender>donerdatacollector</sender> <receive>echr</receiver> <id>1224</id> </interactionKey> <viewkind>sender</viewkind> <localPAssertionId>5></localPAssertionId </psdid> <psdid>
Provenance Query Results Scalability • Provenance query results are scalable • Return pointers to p-assertions not the assertions themselves • Scoping means provenance query results are only what is necessary for the querier
Iterative Query Results • Return iterators over results from process documentation or provenance query results PS Issue Query Querying Actor Results Iterator Results Iterator getNextRes() getNextXRes(int x)
Iterative Query Results • Return iterators over results from process documentation or provenance query results • This functionality is planned for future implementations • The planned implementation makes use of • OGSA-DAI • WSRF
Summary • Discussed both Distribution and Scalability • Introduced links for connecting distributed provenance stores • Two ways of implementing distributed queries • Large data support through asynchronous recording and references • Query Scalability • Provenance Query Results • Iterative Query Results
Questions? Paul Groth pg03r@ecs.soton.ac.uk