1 / 71

Microsoft Azure Research Engagement

Microsoft Azure Research Engagement. Dennis Gannon, Roger Barga, Jeff Mendenhall. Outline. Part 1. Context setting Microsoft’s goals for this project Defining the cloud and differentiating it from supercomputers Part 2. Engagement strategy

lucas
Download Presentation

Microsoft Azure Research Engagement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microsoft Azure Research Engagement Dennis Gannon, Roger Barga, Jeff Mendenhall

  2. Outline • Part 1. Context setting • Microsoft’s goals for this project • Defining the cloud and differentiating it from supercomputers • Part 2. Engagement strategy • Tutorials, workshops, sample applications and consulting • Part 3. Windows Azure • Architecture • Use cases and programming models

  3. Microsoft’s Goals for this Project • Demonstrate that a client+cloud model can revolutionize research and learning • Illustrate that cloud computing is a cost-effective and easy-to-use way to outsource select components of research infrastructure • Provide feedback from research community to our product groups • Establish the Microsoft Cloud Computing platform as leader and trendsetter for basic research

  4. The Cloud • A model of computation and data storage based on “pay as you go” access to “unlimited” remote data center capabilities • A cloud infrastructure provides a framework to manage scalable, reliable, on-demand access to applications • A cloud is the “invisible” backend to many of our mobile applications • Historical roots in today’s Internet apps • Search, email, social networks • File storage (Live Mesh, MobileMe, Flicker, …)

  5. The Physical Architecture of Clouds

  6. Clouds are built on Data Centers • Range in size from “edge” facilities to megascale. • Economies of scale • Approximate costs for a small size center (1000 servers) and a larger, 100K server center. Each data center is 11.5 times the size of a football field

  7. Advances in DC deployment • Conquering complexity. • Building racks of servers & complex cooling systems all separately is not efficient. • Package and deploy into bigger units

  8. Data Center vs Supercomputers Fat tree network • Scale • Blue Waters = 40K 8-core “servers” • Road Runner = 13K cell + 6K AMD servers • MS Chicago Data Center = 50 containers = 100K 8-core servers. • Network Architecture • Supercomputers: CLOS “Fat Tree” infiniband • Low latency – high bandwidth protocols • Data Center: IP based • Optimized for Internet Access • Data Storage • Supers: separate data farm • GPFS or other parallel file system • DCs: use disk on node + memcache + databases Standard Data Center Network

  9. Common HPC Programming Paradigm • Domain decomposition • Spreads vital data across all nodes • Each spatial cell exists in one memory • Except possible ghost or halo cells • Single node failure • Causes blockage of entire simulation • Data is lost and must be recovered • Checkpointing is the de facto HPC solution • Periodically write all data to secondary storage • Given failures, one can compute an optimal interval

  10. Cloud Data Architectures • A close integration of data with computation • “Move the computation to the data” – Jim Gray • Data is stored on server disks • Optimized more for reads than writes • Data replication • 3 to 5 copies of each data object • Copies are distributed • Unstructured data • “Blob” storage- basic metadata+ binary object • Streaming data from instruments • Structured data • Tables – billions of rows and columns • Table partitioned into blocks of rows and blocks are distributed and replicated. • Databases – replicated relational databases

  11. Data Center vs Supercomputer Apps • Supercomputer • High parallel, tightly synchronized MPI simulations • Supercomputer or Cloud • Large scale, loosely coupled data analysis • Cloud • Scalable, parallel, resilient web services HPC Supercomputer Data Center based Cloud Internet Map Reduce Data Parallel MPI communication

  12. Cloud Software Models

  13. The Cloud Landscape • Infrastructure as a Service (IaaS) • Provide a data center and a way to host client VMs and data. • Platform as a Service (PaaS) • Provide a programming environment to build a cloud application • The cloud deploys and manages the app for the client • Software as a Service (SaaS) • Delivery of software from the cloud to the desktop

  14. Platform as a Service • An application development, deployment and management fabric. • User programs web service front end and computational & Data Services • Framework manages deployment and scale out • No need to manage VM images App User Internet Web Access Layer App Developer PaaS Dev/Deploy Fabric Examples: Microsoft Azure, Google App Engine, RightScale, SalesForce, Rollbase, Bungee, Cloudera Fabric Controller Data & Compute Layer VM VM VM VM VM VM VM Sever m Sever 4 Sever 3 Sever 2 Sever 1 Sever n

  15. The Cloud Landscape Infrastructure as a Service Software as a Service Platform as a Service

  16. The Future: an Explosion of Data Experiments Simulations Archives Literature Instruments The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Enhance our Lives Participate in our own heath care. Augment experience with deeper understanding. Petabytes Doubling every 2 years

  17. Changing Nature of Discovery • Complex models • Multidisciplinary interactions • Wide temporal and spatial scales • Large multidisciplinary data • Real-time steams • Structured and unstructured • Distributed communities • Virtual organizations • Socialization and management • Diverse expectations • Client-centric and infrastructure-centric http://research.microsoft.com/en-us/collaboration/fourthparadigm/

  18. Changing the way we do research Supercomputer Users • The Branscomb Pyramid • The Rest of Us • Use laptops. • Our data collections are not as big as we wished. • Our tools are limited. • Paradigm shifts for research • Google, Yahoo and MS proved the power of the cloud with search. • The game changer: the ability to query anything, anytime, anywhere. • The cloud is also designed to support very large numbers of users or communities • Data collections are the first step. • The second step: build the apps that run on client devices and the cloud that can exploit these collections. The Rest of Us. Have own smallcluster or servers

  19. The Clients+Cloud Platform • At one time the “client” was a PC + browser. Now • The Phone • The laptop/tablet • The TV/Surface/Media wall • And the future • The instrumented room • Aware and active surfaces • Voice and gesture recognition • Knowledge of where we are • Knowledge of our health

  20. The Cloud as an extension of your desktop and other client devices • Today • Cloud storage for your data files synchronized across all your machines (mobile me, live mesh, flicker, etc.) • Your collaboration space (Sakai, SharePoint) • Cloud-enabled apps (Google Apps, Office Live) • Tomorrow (or even sooner) • The lens that magnifies the power of desktop • Operate on a table with a billion rows in excel • Matlab analysis of a thousand images in parallel

  21. Our Metrics of Success • Projects that advance scientific discovery through novel uses of cloud technology • New ways to expose and explore community data collections • Advances in client + cloud tools and programming models • Finding cloud application to reach beyond the “traditional” e-Science community • With NSF we build a model for and examples of SUSTAINABLE cloud services, tools and communities

  22. Things we need from NSF • Help with a time line for the expected progress of the program • How can we give NSF help on formulating a CFP? • Agreement on the nature of the program and our shared goals

  23. Questions?

  24. Windows AzureBuilding Community around Cloud Computing for Research

  25. CCF Academic Research EngagementResources to Build Community around Cloud Computing for Research PowerPoint tutorial for a general overview of Windows Azure; Whitepaper that presents a technical overview and best practices for developing and deploying research services on Windows Azure; Benchmark suite as a guide to application architects and developers; Host reference data sets for research, based on research value/interest; Kickoff Workshop and Annual All Hands Meeting (AHM) at MSR; Technical engagement team, accessible via ccfengage@microsoft.com (tbc); Community website, regularly updated with technical content, blogs, community supplied content, Q&A, etc.

  26. Azure Tutorial(s) Extended version of SuperComputing’09 tutorial with deep dives on Azure storage, including Blobs, Tables, XDrives, and new Azure features (85 slides) Available January 11th 2010

  27. CCF Academic Research EngagementReadily Available Online Content

  28. Whitepaper that presents the following: Overview of Azure; How we built select research applications; Best practices for developing applications and deploying research services; Links to source code intended to accelerate development Introduces benchmarks and outlines results to inform application development. Available February 1st, 2010 Whitepaper Resource for Decision Makers and Developers

  29. Reference Data Initiative Thematic Focus – goal is to have the top two or three research collections on Azure in each thematic area Health & Bio Energy & Environment Computer Science Tool Ecosystem for Managing Data Collections Sustainability and Egress Guarantees

  30. Azure BenchmarkA Resource for Programmers and Architects to Understand Azure "There are lies, damn lies and then there are performance measures." J. Gray Storage throughput, networking, and role tests. Guide for decision makers (when to use) and Developers (how to use).

  31. Azure BenchmarkA Resource for Programmers and Architects to Understand Azure Extensible Test Harness Suite of tests, able to select and schedule repeated runs, catalog results. Guide for decision makers (when to use) and Developers (how to use). Microbenchmarks – Storage throughput, networking, and role tests. End-to-End Algorithm Benchmarks Spectrum of distributed algorithms, from tightly coupled to totally decoupled Illustrates scalability for pleasingly parallel algorithms and overheads (limits) of current network architecture and I/O architecture (coordination through queues, latency to storage fabric). Targeted Benchmarks on unique Azure Features Failure recovery (inject fault , measure time to automatically restart worker)

  32. CCF Academic Research Engagement Supporting a Community of Researchers Search Examples Menu Data Source Menu Application Menu Community Colleagues Colleagues’ projects Whitepapers Azure Ocean Blast on Azure Azure benchmarks Projects Current Projects Recent Projects Archived Projects Getting Started Sandbox to Experiment with Research Services Resources Tutorials Whitepapers Hands on Labs Code Samples Services for Research Applications Quick Links Need Help Account Management My Account

  33. Questions?

  34. Windows AzureCloud Computing for Research Services

  35. Windows Azure in a Nutshell • Provide an brief overview of Windows Azure • Additional information in the Technical Tutorial, tentatively scheduled for January. • Examples of Research Services on Windows Azure • Illustration of Research Services

  36. A bunch of machines in a data center Azure FC Owns this Hardware Highly-available Fabric Controller (FC)

  37. FC Installs An Optimized Hypervisor

  38. FC Installs A Host Virtual Machine (VM)

  39. FC then Installs the Guest VM

  40. Up to 7 of Them to be Exact

  41. Each VM Has… • At Minimum • CPU: 1.5-1.7 GHz x64 • Memory: 1.7GB • Network: 100+ Mbps • Local Storage: 500GB • Up to • CPU: 8 Cores • Memory: 14.2 GB • Local Storage: 2+ TB

  42. FC Then Installs the Azure Platform Compute Storage

  43. Windows Azure Compute Service A closer look Web Role Worker Role main() { … } HTTP ASP.NET, WCF, etc. IIS Load Balancer Agent Agent Fabric VM

  44. Suggested Application ModelUsing queues for reliable messaging To scale, add more of either main() { … } Worker Role Web Role 1) Receive work 4) Do work ASP.NET, WCF, etc. 2) Put work in queue 3) Get work from queue Queue

  45. Scalable, Fault Tolerant Applications • Queues are the application glue • Decouple parts of application, easier to scale independently; • Resource allocation, different priority queues and backend servers • Mask faults in worker roles (reliable messaging). • Use Inter-role communication for performance (PDC’09) • TCP communication between role instances • Define your ports in the service models

  46. Storage Blob REST API Queue Table Load Balancer

  47. Azure Storage ServiceA closer look HTTP Blobs Drives Tables Queues Application Storage Compute Fabric …

  48. Windows Azure StoragePoints of interest Storage types • Blobs: Simple interface for storing named files along with metadata for file • Durable NTFS volumes • Tables: entity-based storage Not relational – entities, which contain a set of properties • Queues: reliable message-based communication Access • Data is exposed via .NET and RESTful interfaces • Data can be accessed by: • Windows Azure apps • Other on-premise applications or cloud applications

  49. Windows Azure Drives • Provides a durable NTFS volume for Windows Azure applications to use, a VHD up to 1TB, 4 drives per VM • Enables the following scenarios • Gives applications NTFS semantics to manage state • Helps migrates existing NTFS applications to the cloud • Durability and survival of data on VM failover • Windows Azure Drive is really a Blob • Mount Page Blob as D:\ • All writes to drive are made durable to the Page Blob • Drive made durable through standard page blob replication • Drive persists even when not mounted as a page blob

  50. How Windows Azure Drives Works VM • Mount drive as drive via lease mechanism • Writes committed to blob store before returning • Reads can be served from local cache or from blob store (cache miss) Application Drive X: OS Windows Azure Blob Store WA Drive Commands Create/Format Drive Mount/Unmount Drive Snapshot Drive Copy Drive MyBlob Lease Local Cache

More Related