1 / 32

Data Grid Automation

Or What is SRB Matrix?. Data Grid Automation. Arun Jagatheesan et al., San Diego Supercomputer Center University of California, San Diego. VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005. Talk Outline. Data grid Landscape Long-run data management processes

adah
Download Presentation

Data Grid Automation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Or What is SRB Matrix? Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of California, San Diego VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005

  2. Talk Outline • Data grid Landscape • Long-run data management processes • Data Grid ILM • Data Grid Triggers • Dataflow Pipelines • Execution Logic – Data Grid Language • End-to-End Infrastructure Deployment • API • User GUI • Service-oriented *Infrastructure*

  3. Data Grid Landscape

  4. The “Grid” Vision

  5. GRP /txt3.txt Data Grid Resource Providers Grid Resource Providers (GRP) providing content and/or storage GRP

  6. GRP /txt3.txt Data Grid Administrative Domain • Administrative domain with one or more GFS Resource Providers • Could include their data centers Research Lab GRP

  7. GRP GRP GRP GRP GRP GRP GRP /txt3.txt /…/text1.txt /…//text2.txt Data Grid Administrative domains University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research lab- Taiwan data + storage (40) GRP

  8. Data Grid (Enterprise Utility) Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com) 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  9. Data Grid (Enterprise Utility) Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department Project 1 Project 2 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  10. Data Grid (Enterprise Utility) Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  11. Long-run Processes in Data Grid • Data Grid ILM • Data Grid Triggers • Data Gridflows

  12. Data Grid ILM

  13. Change is Constant • Changes in access patterns • Based on number of users accessing a data • Domains which want to access data • Data Value • The value of data set (collections?) for a particular domain based on it business model and users’ access patterns • Each domain will have a different value based on its users and its role in a data grid

  14. “Data Value” based on users When more users access a project’ data, its data value increases, move that data to a faster storage type Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  15. “Data Value” based on domain When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true) Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  16. “Data Value” based on role The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  17. Data Grid ILM • ILM = Information Lifecycle Management • Dynamic re-orientation of data placement and data retention policies (rules) • Based on “business value of data” and storage cost • HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further • Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules

  18. Data Grid Triggers

  19. Data Grid Triggers • Similar to triggers in databases • Based on ECA concepts • Event • Condition • Action • Example • Event = Insert new file in collection (“/ourProject/data”) • Condition = (color= “blue” && galaxy = “Andromedia”) • Action = Run ( selectiveDataReplicator.dgl )

  20. Digital entities Meta-data Services State Data  Discovery New data Digital entities updates relationships among data in collections Meta-data Services invoked to analyze new relationships Services DGMS applications get notified of state updates State

  21. Data Gridflows

  22. Pipeline could be triggered by input at data source or by a data request from user Pipeline could be triggered by input at data source or by a data request from user Gridflow in SCEC (data  information pipeline) Metadata derivation Ingest Data Ingest Metadata Determine analysis pipeline Initiate automated analysis Use the optimal set of resources based on the task – on demand Organize result data into distributed data grid collections All gridflow activities stored for data flow provenance

  23. Data Grid Language (DGL)

  24. Data Grid Language • Requirement • Data Grid ILM process • The long run process that has to be run is described in DGL • Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic • Data Gridflows • Step by step execution of long run process on Data Grid • Analogy of SQL in relational databases • Long-run process procedures stored and executed in Data Grid it self • Captures the “Infrastructure Execution Logic”

  25. DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

  26. DGL Requests (2 types) • Data Grid Flow • An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow • Status Query • An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

  27. Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

  28. Flow Logic (How a flow executes)

  29. <userDefinedRule name="beforeEntry"> <condition> <simpleQuery>$numVar == 1</simpleQuery> </condition> <action name="true"> <actionString>SET var1 = 1</actionString> </action> <action name="true"> <actionString>SET var2 = "foo"</actionString> </action> <action name="false"> <actionString>SET var1 = 0</actionString> </action> </userDefinedRule> …

  30. What is SRB Matrix? • Matrix provides the SRB as a Web Service • Web Service based on Data Grid Language • SOA for Data Grid or Digital Library • Service oriented *infrastructure* • Asynchronous end-user facing applications • Long run operations presented to users as portlets • Data Grid Automation and ILM • File Triggers on unstructured data • Automated movement or management of data

  31. Event Publish Subscribe, Notification JMS Messaging Interface Matrix Gridflow Server Architecture JAXM Wrapper WSDL Description SOAP Service for Matrix Clients Matrix Data Grid Request Processor Sangam P2P Gridflow Broker and Protocols Transaction Handler Workflow Query Processor Status Query Handler Flow Handler and Execution Manager XQuery Processor Gridflow Meta data Manager ECA rules Handler Persistence (Store) Abstraction Matrix Agent Abstraction SDSC SRB Agents Other SDSC Data Services Agents for java, WSDL and other grid executables JDBC In Memory Store

  32. Conclusion • Data Grids are evolving • Data Grid Automation of long-run processes essential • Need a language for Data Grid Automation • Data Grid Language is one such effort as part SRB Matrix Project • Open source project for anyone to use (or join) • talk2matrix@sdsc.edu (or arun@sdsc.edu)

More Related