130 likes | 253 Views
This presentation explores the future of data grid systems, focusing on self-organizing and smart namespaces. We will discuss the state of the art, current research, and exciting innovations in distributed data management. Key topics include the Large Synoptic Survey Telescope (LSST) use case, opportunities for collaboration across organizations like OGF and SNIA, and the importance of user-defined policies and workflows. Attendees will gain insights into creating user-friendly data environments that facilitate seamless interaction and understanding within diverse system architectures.
E N D
Self-organizing Smart Namespaces : Next Generation Data Grid Systems Arun Jagatheesan iRODS.org
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
State of the art - where we are now(Shameless self promotion or fact!) • Estimated 2 petabytes of data brokerage • Multiple agencies- DoD, NARA, NSF, NIH, … • Multiple countries - US, UK, Japan, France…, Antartica • Span off a private company … • We don’t live in the past anyways…
Concepts and Lessons(Current understanding - looking back) • Don’t hide distributed computing • Allows users to “enjoy” distributed namespace rather than cheat them with “location opaque” namespace (unlike traditional file systems) • Human readable or enjoy-able (No urls, uuids etc) • Logical mappings to physical heterogeneities • Data (files), storage resource, metadata, user groups, policies, and even file systems become logical entities in data grids • Hide every thing including with logical human-friendly names • Keep it simple and scalable (It’s the data model & design) • Not layer on top of another layer. Finished product not lego blocks. • Hybrid approach - Neither too much P2P nor too much centralization. Just the right level of distributed computing with some TLC for users
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
Motivational Use Case • LSST = Large Synoptic Survey Telescope • 150+ Petabytes • Multiple countries, multiple data centers • Multiple heterogeneous file systems (high performance, high distribution, interoperability, P2P, …) • Multiple heterogeneous hardware
Yesterday’s research • Data Grid Workflows and policies • Some concepts prototyped in SRB Matrix • Event, Condition, Action (ECA) based “data grid flows” • If, for, for-each, if-else, switch-case • Server-side workflows on data grids • Use a separate language to capture the recipe of workflow and execute it as action - Data Grid Language • Let the flow be with you (Flow data type was introduced)
Today’s research = future • Now = Lessons learnt + yesterday’s research • Allow logical namespace to reflect local namespace (local file system logically mounted on global namespace) • Allow users to define their own policies and workflows (Services, rules) • iRODS.org - Open source platform - world’s first open source Data Grid Management System (DGMS).
iRODS.org • Its all about the namespace and how user’s or applications interact with it • What if we made this namespace “smart” • ECA Rules + Machine Learning or bootstrapped learning • Event: (any thing, as simple as a file upload) • Condition: based on system or user metadata • Action: Any system-defined or user-defined service
iRODS • Namespace #1 (data) • Human readable data names to data (or virtual data) • Namespace #2 (resource) • Human readable resource names to storage resource (allows distributed computing) • Namespace #3 (policies) • Human readable policy namespace of how data needs to be managed • Again every thing can be accessed and controlled by end-users (not just SYSTEM adminis)
Content Outline • State of the art • Where we stand • Concepts • What is next, new, hot and exciting? • An use case - LSST • Yesterday’s research - now • Today’s research - future? • What could be done from OGF, SNIA, IETF?? • Standard for distributed data management • Risks, rewards
OGF, SNIA and iRODS.org • Collaborative data management • FAN / Data grid??? - but still Distributed data management • But still needs a standard simple API as a standard • Data grid namespace on XAM resources • Standardize a simple API (java, C/C++) to provide data grid concepts on top of existing SNIA XAM or products • Open source data grid software • Involve engineers from different participating member organizations • Multi-institutional participation • Multiple countries, mulitple companies, academic and commercial participants
Enthusiasm is contagious http://www.iRODS.org