1 / 43

A Data Storage Language for the Requirements of Rebels and Misfits

Or A talk on Data Grids and DGL. A Data Storage Language for the Requirements of Rebels and Misfits. Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego. HPTS Workshop Asilomar, California, 25-28 September 2005.

clover
Download Presentation

A Data Storage Language for the Requirements of Rebels and Misfits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Or A talk on Data Grids and DGL A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego HPTS Workshop Asilomar, California, 25-28 September 2005

  2. He has 44 slides and 20 minutes. No infotainment slides either – Boring! Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  3. Disclaimer and Warning • My own opinion or thoughts • Arun says so… (can be wrong?) • Based on my current knowledge and understanding • On September 2005 – current knowledge and level of understanding (can change?) • My belief system • I believe in Data Grids for Inter/Intra/Multi-Organizational Unstructured Data Management (biased ?) • My belief might not be in sync with your belief, but it can co-exist with your favorite technology

  4. Meet my friends – Rebels and Misfits • Esoteric Requirements from “High-end” users • To keep them alive, they need more… more of every thing • Requirements not broadly felt or required in industry • They push the existing technology to the limits • From the existing technology’s perspective… • These folks are nuts! • The existing technology was not designed for these requirements • My friends become rebels or misfits from the existing technology’s perspective

  5. Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  6. Mapping physical data to logical view Hierarchical view, independent of network, disk, sector, track, fragments Rule : Storage Abstraction – Hide storage resources

  7. Mapping physical data to logical view Relational view (assume its a database), independent of network, disk, sector, track, fragments Thanks to rebels and misfits in Airline industry who wanted transactional capabilities

  8. Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  9. NIH BIRN SRB Data Grid • Biomedical Informatics Research Network • Access and analyze biomedical image data • Data resources distributed throughout the country • Medical schools and research centers across the US • Stable high performance grid based environment • Coordinate data sharing • Federate collections • Support data mining and analysis

  10. Mapping distributed data & storage to logical view 25 Universities or Research Hospitals, Multiple heterogeneous storage resources

  11. Approach we have taken in Data Grids • Logical Schema (view) is independent of physical schema • Just like databases or even file systems • Physical Resources are provided in the form of logical resources in the logical view • This is very different from databases (may be similar to tablespaces) • A database is used for mapping • Data path, network, access permissions, meta data, storage type, logical storage resource, physical storage resources • Used for digital libraries, persistent archives and data grids

  12. The “Grid” Vision

  13. GRP /txt3.txt Data Grid Resource Providers Grid Resource Providers (GRP) providing content and/or storage GRP

  14. GRP /txt3.txt Data Grid Administrative Domain • Administrative domain with one or more Grid Resource Providers • Could include their data centers Research Lab GRP

  15. GRP GRP GRP GRP GRP GRP GRP /txt3.txt /…/text1.txt /…//text2.txt Data Grid Administrative domains University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research lab data + storage (40) GRP

  16. GRP GRP GRP GRP GRP GRP GRP /txt3.txt /…/text1.txt /…//text2.txt Data Grid: Logical view of data & resources /home/arun.sdsc/exp1 /home/arun.sdsc/exp1/text1.txt /home/arun.sdsc/exp1/text2.txt /home/arun.sdsc/exp1/text3.txt data + storage (100) Logical Namespace (Need not be same as physical view of resources ) University data + storage (10) Storage-R-Us Resource Providers data + storage (50) Research Lab data + storage (40) GRP

  17. BIRN: Inter-organizational Data

  18. SDSC SRB User Community (Major US) • National Science Digital Library (NSDL) • National Optical Astronomy Observatory (NOAO) • ROADNet • Purdue University • SCCOOS, USA • Scientific Rich Media Archive • Salk Institute • Strand Map Service, USA • UC Berkeley Library • UCSD Library • University of Houston • Persistent Archives Test bed • University of Wisconsin, Madison • WebBase, Stanford University • Yale University Library • BaBar, Stanford Linear Accelerator Center (SLAC) • California Digital Library (CDL) • Center for Integrated Space Weather Modeling (CISM) • CVC, Visualization Portal • LDC Data Storage • NIH Bio Informatics Research Network (BIRN) • NSF Southern California Earthquake Center (SCEC) • National Archives and Records Administration (NARA) • National Aeronautics and Space Administration Centers (NASA) • National Virtual Observatory (NVO) • Npackage, NSF Middleware Initiative (NMI)

  19. Academia Sinica, Taiwan Australian National University Bio-Lab, University of Genoa, Italy Council for the Central Laboratory of the Research Councils (CCLRC), UK CC-IN2P3, France Distributed Framework, Singapore Distributed Aircraft Maintenance Environment (DAME), UK eMinerals Project, UK eScience, Belfast Center Fraunhofer ITWM, Germany High Energy Accelerator Organization, KEK, Japan K* Grid Computing, Korea KEK Computing Center, Japan Lyon, France NorGrid, Norway Nanyang Data Grid, Singapore NCHC, Taiwan Queensland University of Technology (QUT), Australia Rutherford Appleton Laboratory (RAL), UK T-Systems, Germany UK eScience Project, UK UniGrid, Poland UMK, Poland Virtual Laboratory for eScience, Netherlands SDSC SRB User Community

  20. Total data brokered by SDSC SRB 358 TB 324 TB 682 TB

  21. Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  22. Mapping distributed data, storage and processes to logical view

  23. Long-run Processes in Data Grid • Data Grid ILM • Data Grid Triggers • Data Gridflows

  24. Data Grid (Enterprise Utility) Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com) 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  25. Data Grid (Enterprise Utility) Each project has a data grid instance consisting of Logical Resources with different SLAs offered by IT department Project 1 Project 2 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  26. Change is Constant • Changes in access patterns • Based on number of users accessing a data • Domains which want to access data • Data Value • The value of data set (collections?) for a particular domain based on it business model and users’ access patterns • Each domain will have a different value based on its users and its role in a data grid

  27. “Data Value” based on users When more users access a project’ data, its data value increases, move that data to a faster storage type Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  28. “Data Value” based on domain When more users from the same domain access the data, the data value for that particular data in that particular domain increases, so replicate the data to resources in that domain. (converse is also true) Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  29. “Data Value” based on role The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term preservation Project1 Project2 Project3 Project4 3rd Party IT Department US IT Department Asia ABCZ.com US Data center ABCZ.com Asia

  30. Data Grid ILM • ILM = Information Lifecycle Management (Sales Jargon) • Dynamic re-orientation of data placement and data retention policies (rules) • Based on “business value of data” and storage cost • HSM = Hierarchical Storage Management, based on “data freshness”. ILM goes one step further • Applying this concept on Data Grid, very tricky as different autonomous domains have different business rules

  31. Data Grid Triggers • Similar to triggers in databases • Based on ECA concepts • Event • Condition • Action • Example • Event = Insert new file in collection (“/ourProject/data”) • Condition = (color= “blue” && galaxy = “Andromedia”) • Action = Run ( selectiveDataReplicator.dgl )

  32. Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  33. Data Grid Language • Requirement • Data Grid ILM process • The long run process that has to be run is described in DGL • Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic • Data Gridflows • Step by step execution of long run process on Data Grid • Analogy of SQL in relational databases • Long-run procedures stored and executed in Data Grid it self • Captures the “Infrastructure Execution Logic”

  34. DGL Request Annotations about the Data Grid Request Can be either a Flow or a Status Query

  35. DGL Requests (2 types) • Data Grid Flow • An XML Structure that describes the execution logic, associated procedural rules and DGL variables. Can be synchronous or asynchronous flow • Status Query • An XML Structure used to query the execution status any gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

  36. Flow Scoped Variables that can control the flow Logic used by the sub-members Sub-members that are the real execution statements

  37. Flow Logic (How a flow executes)

  38. DGL-Response Responses can be synchronous or asynchronous

  39. Talk Outline • “Next Hype in Grids” • My belief system before we begin • Meet my friends – Rebels and Misfits • File Systems, Databases, Datagrids • Mapping physical data to logical view • Mapping physical data and storage to logical view • SRB Statistics • Mapping physical data, storage and processes to logical view • Data Grid Language • Conclusion • What Now = work and sacrifices; What Next = Vision

  40. Conclusion • Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …) • Data Grids extend the database concepts and internally use a database • A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS) • Reference: Paper in VLDB Workshop on Data Management in Grids

  41. We are SDSC SRB Arun is here! - Shameless Self promotion  Not in picture: Many students

  42. Additional Thanks (Ignorance is a bliss) • My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?” • Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.

  43. Contact Info Arun Jagatheesan arun@sdsc.edu Or srb@sdsc.edu http://www.sdsc.edu/srb/

More Related