1 / 28

LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205

LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205. Saptak Sen Senior Product Manager Microsoft Technical Computing. Session Objectives and Takeaways. Session Objective(s): Understand Microsoft solution for Big Data

haruko
Download Presentation

LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LINQ to HPC: Developing “Big Data” Applications on Windows HPC ServerWSV205 Saptak Sen Senior Product Manager Microsoft Technical Computing

  2. Session Objectives and Takeaways • Session Objective(s): • Understand Microsoft solution for Big Data • How to use and develop LINQ to HPC applications • Demo of LINQ to HPC/DSC on HPC, Microsoft’s solutions for unstructured Big Data • Key Takeaways: • LINQ to HPC/DSC provide a highly productive stack for writing big data applications. • Demos • Data management with DSC • Application development in LINQ to HPC • Application management of LINQ to HPC applications

  3. Characteristics of Big Data Large Data Volume Non-Traditional data Types New Questions & New Insights • 100s of TBs to 10s of PBs • Unstructured • Weak relational schema • Text, Images, Videos, Logs • How popular is my product? • What is the best ad to serve? • Is this a fraudulent transaction? New Technologies New Data Sources • Distributed Parallel Processing Frameworks • Easy to Scale on commodity hardware • MapReduce-style programming models • Sensors • Devices • Traditional applications • Web Servers • Public data New Economics • Large scale processing and analytics at unprecedented low cost (hardware and software) 4

  4. Example: Traditional e-commerce data flow

  5. New exploratory e-commerce data flow

  6. Introduction to LINQ to HPC Developing Big Data applications for HPC Server

  7. Example: find web pages from many log files LINQ query transformed into computation graph varlogentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access inlogentries whereaccess.user.EndsWith(@"\sen") select access; var accesses = from access in user group access byaccess.pageinto pages select new UserPageCount(“sen", pages.Key, pages.Count()); varhtmAccesses = from access in accesses whereaccess.page.EndsWith(".htm") orderbyaccess.countdescending select access; Input 1 Compute 2 3 Compute and resort Compute and resort 4 5 Output

  8. LINQ to HPC Job Directed Acyclic Graph (DAG) of vertices Outputs Processing vertices Edges (files) Inputs

  9. Executes DAGs by mapping vertices to Distributed Vertex Hosts Outputs Processing vertices Edges (files) Free Compute Resources Inputs

  10. HPC + LINQ to HPC Job Overview Graph Manager Application that calls LINQ to HPC APIs 2a 3a 1 3b DSC 2b HPC Head Node Vertex Host 2a 2b 3a 3b The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH A LINQ to HPC job starts 1 basic task assigning a node as the DGM 1 Graph Manager starts/stops Vertices LINQ to HPC Vertices read and write files Submit LINQ to HPC Job HPC Compute Nodes

  11. HPC + LINQ to HPC Job Overview • Graph manager starts vertices on Vertex Hosts • Preferentially schedules vertices near input files • When input is already on cluster, can make local IO the common case Graph Manager 3a 3b Vertex Host 3a 3b Graph Manager starts/stops Dryad Vertices Vertices read and write files HPC Compute Nodes Vertices in logical computation graph

  12. More on HPC + LINQ to HPC mechanics 3a DGM reads XML description of graph from share, calls DSC to locate files referenced in XML LINQ to HPC Graph Manager Application that calls LINQ to HPC APIs 2a 3a 1 3b DSC 2b HPC Head Node LINQ to HPC Vertex Host 2a 2b 1 The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH A LINQ to HPC job starts 1 basic task assigning a node as the DGM Publish to share: 1. binaries for LINQ to HPC job 2. XML description of LINQ to HPC graph HPC Compute Nodes 3b DVH loads binaries for this LINQ to HPC job from share, executes them according to commands from DGM

  13. Deployment Steps • DSC NODE ADD sen-cn1 /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:sen-hn

  14. Demo adding a new Node Using the HPC Management Tool demo

  15. LINQ to HPC Object Model

  16. Hello World! • using System; • using System.Linq; • using Microsoft.Hpc.Linq; • namespace MyProgram { • class Program { • static void Main(string[] args) { • varconfig = new HpcLinqConfiguration(“MyHpcClusterHeadNode”); • var context = new HpcLinqContext(config); • var lengths = context.FromDsc<LineRecord>("MyTextData") • .Select(r => r.Line.Length); • Console.WriteLine("The maximum line length is {0}", lengths.Max()); • } • } • }

  17. Analyzing data using LINQ to HPC demo

  18. Managing data and HPC cluster • HPC Server administration basics: • Managing the job queue • How to identify the user that submitted jobs • Canceling a runaway job • Data Storage Catalog specific tasks: • Monitor disk usage tracked by DSC on each node • View how the DSC file set maps to NTFS across nodes • Identify the nodes where files are replicated

  19. Quick overview of the software components that made this possible. NEW LINQ to HPC Programming models MPI SOA LINQ to HPC runtime Distributed runtimes Cluster and cloud services HPC provisioning, management, etc. DSC (Distributed Storage Catalog) Windows Server Azure* Platform Bind individual NTFS shares together to support the LINQ to HPC distributed runtime * Future support planned

  20. How LINQ to HPC and Parallel Data Warehouse complement each other • Customer needs for Big Data lie on a spectrum • One extreme is analytics targeting a traditional data warehouse. The analyst knows the cube he or she wants to build, and the analyst knows the data sources. • Another extreme is analyzing raw unstructured data. The analyst does not know exactly what the data contains, nor what cube would be justified. The analyst needs to do ad-hoc analyses that may never be run again. • HPC Server targets the raw unstructured data extreme.

  21. Microsoft already has great data platform assets • PowerPivot, SQL Server Integration Services (SSIS), Parallel Data Warehouse (PDW), … • HPC+LINQ to HPC’s focus on raw unstructured data analytics enables new solutions that incorporate multiple assets • E.g., analyze raw unstructured data using HPC+LINQ to HPC then pipe it to SSIS and apply rest of BI stack

  22. Microsoft Big Data End-to-End HPC Server Sensors Devices Data Marts Apps Interactive Reports Bots S S RS Integration Services Crawlers Performance Scorecard SSAS PowerPivot Integration Services ERP CRM LOB Hadoop Data & Compute Intensive HPC App SQL EDW Embedded BI Apps

  23. For more information • Download HPC Server 2008 R2 Evaluation Copy Today – microsoft.com/hpc • Download Service Pack 2 Beta - connect.microsoft.com • HPC Server Hands-on Labs – microsoft.com/hpc -> Technical Resources • Product Demo Station – in the Server and Cloud Section • HPC Server Certification Exam - microsoft.com/learning/en/us/exam.aspx?ID=70-690 • Find Me Later At… twitter: @saptak

  24. Resources • Connect. Share. Discuss. http://northamerica.msteched.com Learning • Sessions On-Demand & Community • Microsoft Certification & Training Resources www.microsoft.com/teched www.microsoft.com/learning • Resources for IT Professionals • Resources for Developers • http://microsoft.com/technet • http://microsoft.com/msdn

  25. Required Slide Complete an evaluation on CommNet and enter to win!

  26. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

More Related