1 / 16

Beyond Workflows - DOE Cloud Computing Paradigm and the SDM Role and Future

Beyond Workflows - DOE Cloud Computing Paradigm and the SDM Role and Future. Mladen A. Vouk, Nagiza Smatova, Paul Breimyer, Pierre Moualem, Mei Nagappan, and the whole SPA team (list available separately) Scientific Data Management Center – Scientific Process Automation Group

rosa
Download Presentation

Beyond Workflows - DOE Cloud Computing Paradigm and the SDM Role and Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beyond Workflows - DOE Cloud Computing Paradigm and the SDMRole and Future Mladen A. Vouk, Nagiza Smatova, Paul Breimyer, Pierre Moualem, Mei Nagappan, and the whole SPA team (list available separately) Scientific Data Management Center – Scientific Process Automation Group NC State University, Raleigh, NC 27695

  2. Overview • Scientific Workflow technology – A success story from the past 7 years in the SDM center (a technology used in production or otherwise by application people) – Developed components: Workflows, Provenance, “Dashboard”, other • DOE SDM “Cloud” -Vision for the future of the SDM centre – Integration of components - Intelligent Analytics and Social Networks, Component-based “cloud”, Integrated Services (service oriented architecture) • Sustainable science - Long term approach for the survival of SDM center technology (Beyond SciDAC and longer) – Integration of Research, Engineering, Transfer-of-Technology, Partnerships, Results (ROI, TOC)

  3. Scientific Process Automation • A key differentiating element of a successful information technology (IT) is its ability to become a true, valuable, and economical contributor to cyberinfrastructure. • An IT-assisted workflow represents a series of structured activities and computations that arise in information assisted problem solving. • Scientific process automation principles, as well as production level pilots, is SDM’s Key Contribution over last 7 years – Smokey Mountains retreat. • From NC State: numerous publications, 3 graduated PhD and 4 MS with thesis students, several in progress, several generations of software.

  4. Environment Analytics Analytics Analytics Computations Computations Control Panels (Dashboard) & Display Networking Local/Remote … “Cloud” Services Orchestration (Kepler) Orchestration (Kepler) Data, DataBasesProvenance…Storage Data, DataBasesProvenance…Storage

  5. Workflow Framework Control Plane (light data flows) Provenance, Tracking & Meta-Data (DBs and Portals) Kepler Execution Plane (“Heavy Lifting” Computations and flows) Synchronous or Asynchronous

  6. Actor/Process in a Broader Sense In Out Network/”Cloud” Bsub < code_run ------------ where code_run is a script -------------- code_run #! /bin/csh source /usr/local/lsf/conf/cshrc.lsf #BSUB -W 5 #BSUB -n 100 mpiexec ./code #BSUB -o /share/vouk/WFLOW/code.out.%J #BSUB -e /share/vouk/WFLOW/code.err.%J #BSUB -J codevouk ------------------------- 6

  7. Modular Framework Auth Trust Storage Supercomputers + Analytics Nodes Kepler Data Store Access Rec API Disp API Dash Management API Orchestration Meta-Data about: Processes, Data, Workflows, System, Apps & Environment

  8. Read More … • Singh M.P. and M.A. Vouk, "Network Computing," in John G. Webster (editor), Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, New York, Vol. 14, pp. 114-132, 1999 • S Klasky, M Beck, V Bhat, E Feibush, B Ludäscher, M Parashar, A Shoshani, D Silver and M Vouk, "Data management on the fusion computational pipeline," SciDAC 2005, Journal of Physics: Conference Series 16 (2005), 510-520, doi:10.1088/1742-6596/16/1/070 • Ilkay Altintas, Oscar Barney, Zhengang Cheng, Terence Critchlow, Bertram Ludaescher, Steve Parker, Arie Shoshani and Mladen Vouk, "Accelerating the scientific exploration process with scientific workflows," sciDAC 2006, Journal of Physics: Conference Series 46 (2006), 468-478, doi:10.1088/1742-6596/46/1/065 • M. A. Vouk, I. Altintas R. Barreto, J. Blondin, Z.Cheng, T. Critchlow, A. Khan, S. Klasky, J. Ligon, B. Ludaescher, P. A. Mouallem, S. Parker, N. Podhorszki, A. Shoshani, C. Silva, " Automation of Network-Based Scientific Workflows," Proc. of the IFIP WoCo 9 on Grid-based Problem Solving Environemnts: Implications for Development and Deployment of Numerical Software, IFIP WG 2.5 on Numerical Software, Prescott, AZ, 2006, printed in IFIP, Vol 239, "Grid-Based Problem Solving Environments, eds. Gaffney PW and Pool JCT (Boston: Springer), pp. 35-61, 2007 • Klasky, S.; Barreto, R.; Kahn, A.; Parashar, M.; Podhorszki, N.; Parker, S.; Silver, D.; Vouk, M.A. "Collaborative visualization spaces for petascale simulations," Proceedings of the CTS 2008 - International Symposium on Collaborative Technologies and Systems, pp 203-211, Digital Object Identifier 10.1109/CTS.2008.4543933,10-23 May 2008 • More… http://sdm.ncsu.edu

  9. DOE Cloud • “Cloud” computing – builds on decades of research in virtualization, distributed computing, utility computing, grids, and more recently networking, web and software services. • It implies a seamless service oriented and component-based architecture - delivery of an integrated and orchestrated suite of on-demand functions to an end-user through composition of both loosely and tightly coupled functions, or services - often network-based, reduced information technology overhead for the end-user, service orchestration, virtualization of resources, great flexibility, reduced total cost of ownership, different “flavors”. • Intelligent Analytics and Knowledge-Creating Social Networks, Component-based “Clouds”, Seamless/Integrated Services • Necessary in the context of Peta- and Exa- sciences, data, etc.

  10. “Analytics Cloud" Knowledge creation & Integration, Social Networking, Provenance, Tracking & Meta-Data (DBs and Portals) Workflow control plane Concept-driven Analytics W/F Engine W/F Generation Wizard Synchronous & Asynchronous Services Run-time Manager and Scheduler Execution Plane - “Heavy duty” in-cloud Computations, Flows Services Analytics Enabled Resources Supercomputers Supercomputers Clusters Active Storage Other “cloud” devices

  11. Components • Reusability (elements can be re-used in other workflows) • Substitutability (alternative implementations are easy to insert, very precisely specified interfaces are available, run-time component replacement mechanisms exist, there is ability to verify and validate substitutions, etc), extensibility and scalability (ability to readily extend system component pool and to scale it, increase capabilities of individual components, have an extensible and scalable architecture that can automatically discover new functionalities and resources, etc), • Customizability (ability to customize generic features to the needs of a particular scientific domain and problem), • Composability (easy construction of more complex functional solutions using basic components, reasoning about such compositions, etc.). There are other characteristics that also are very important. • Reliability and availability of the components and services, • Cost - the cost of the services, total cost of ownership, economy of scale • Security and privacyand so on.

  12. Auth DB Rec API Disp API Example: Meta-Data Framework Storage Supercomputers + Analytics Kepler? Other. .. Dash Custom Web Orchestration

  13. Fault-Tolerance – Clouds of Clouds Master DB (replicated)

  14. User Categories • Developers (10) • Service Authors (100 to 1,000) • Service Integrators (100– 10,000) • End-users (1000 - ?)

  15. Read More … • Sam Averitt, Michael Bugaev, Aaron Peeler, Henry Shaffer, Eric Sills, Sarah Stein, Josh Thompson, Mladen Vouk “Virtual Computing Laboratory (VCL),” In the proceedings of the International Conference on Virtual Computing Initiative, May 7-8, 2007, IBM Corp., Research Triangle Park, NC, pp. 1-16. • Mladen Vouk, Sam Averitt, Michael Bugaev, Andy Kurth, Aaron Peeler, Andy Rindos*, Henry Shaffer, Eric Sills, Sarah Stein, Josh Thompson , “Powered by VCL” - Using Virtual Computing Laboratory (VCL) Technology to Power Cloud Computing, Published in the Prelim. Proceedings of the 2nd International Conference on Virtual Computing Initiative, 15-16 May 2008, RTP, NC, pp. 1-10, final version to be available through the ACM Digital Library • Mladen A. Vouk, “Cloud Computing – Issues, Research and Implementations,” ITI08, to appear in IEEE Digital Library • Google for “cloud computing” … • Other ..

  16. Sustainable Science • A Long term approach for the survival of SDM center technology (Beyond SciDAC and longer) • Research • Engineering • Transfer-of-Technology, • Partnerships with scientists • Operational open-source tools • Visible results (agreed upon ROI, and an accounting of TOC)

More Related