linq to hpc developing big data applications on windows hpc server wsv205 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205 PowerPoint Presentation
Download Presentation
LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205

Loading in 2 Seconds...

play fullscreen
1 / 28

LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205 - PowerPoint PPT Presentation


  • 152 Views
  • Uploaded on

LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205. Saptak Sen Senior Product Manager Microsoft Technical Computing. Session Objectives and Takeaways. Session Objective(s): Understand Microsoft solution for Big Data

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205' - haruko


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
linq to hpc developing big data applications on windows hpc server wsv205

LINQ to HPC: Developing “Big Data” Applications on Windows HPC ServerWSV205

Saptak Sen

Senior Product Manager

Microsoft Technical Computing

session objectives and takeaways
Session Objectives and Takeaways
  • Session Objective(s):
    • Understand Microsoft solution for Big Data
    • How to use and develop LINQ to HPC applications
    • Demo of LINQ to HPC/DSC on HPC, Microsoft’s solutions for unstructured Big Data
  • Key Takeaways:
    • LINQ to HPC/DSC provide a highly productive stack for writing big data applications.
    • Demos
      • Data management with DSC
      • Application development in LINQ to HPC
      • Application management of LINQ to HPC applications
characteristics of big data
Characteristics of Big Data

Large Data Volume

Non-Traditional data Types

New Questions & New Insights

  • 100s of TBs to 10s of PBs
  • Unstructured
  • Weak relational schema
  • Text, Images, Videos, Logs
  • How popular is my product?
  • What is the best ad to serve?
  • Is this a fraudulent transaction?

New Technologies

New Data Sources

  • Distributed Parallel Processing Frameworks
  • Easy to Scale on commodity hardware
  • MapReduce-style programming models
  • Sensors
  • Devices
  • Traditional applications
  • Web Servers
  • Public data

New Economics

  • Large scale processing and analytics at unprecedented low cost (hardware and software)

4

introduction to linq to hpc

Introduction to LINQ to HPC

Developing Big Data applications for HPC Server

example find web pages from many log files
Example: find web pages from many log files

LINQ query transformed into computation graph

varlogentries =

from line in logs

where !line.StartsWith("#")

select new LogEntry(line);

var user =

from access inlogentries

whereaccess.user.EndsWith(@"\sen")

select access;

var accesses =

from access in user

group access byaccess.pageinto pages

select new UserPageCount(“sen", pages.Key, pages.Count());

varhtmAccesses =

from access in accesses

whereaccess.page.EndsWith(".htm")

orderbyaccess.countdescending

select access;

Input

1

Compute

2

3

Compute and resort

Compute and resort

4

5

Output

linq to hpc job directed acyclic graph dag of vertices
LINQ to HPC Job Directed Acyclic Graph (DAG) of vertices

Outputs

Processing

vertices

Edges

(files)

Inputs

e xecutes dags by mapping vertices to distributed vertex hosts
Executes DAGs by mapping vertices to Distributed Vertex Hosts

Outputs

Processing

vertices

Edges

(files)

Free Compute Resources

Inputs

hpc linq to hpc job overview
HPC + LINQ to HPC Job Overview

Graph Manager

Application that calls LINQ to HPC APIs

2a

3a

1

3b

DSC

2b

HPC Head Node

Vertex Host

2a

2b

3a

3b

The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH

A LINQ to HPC job starts 1 basic task assigning a node as the DGM

1

Graph Manager starts/stops Vertices

LINQ to HPC Vertices read and write files

Submit LINQ to HPC Job

HPC Compute Nodes

hpc linq to hpc job overview1
HPC + LINQ to HPC Job Overview
  • Graph manager starts vertices on Vertex Hosts
  • Preferentially schedules vertices near input files
  • When input is already on cluster, can make local IO the common case

Graph Manager

3a

3b

Vertex Host

3a

3b

Graph Manager starts/stops Dryad Vertices

Vertices read and write files

HPC Compute Nodes

Vertices in logical computation graph

more on hpc linq to hpc mechanics
More on HPC + LINQ to HPC mechanics

3a

DGM reads XML description of graph from share, calls DSC to locate files referenced in XML

LINQ to HPC Graph Manager

Application that calls LINQ to HPC APIs

2a

3a

1

3b

DSC

2b

HPC Head Node

LINQ to HPC Vertex Host

2a

2b

1

The LINQ to HPC job also starts a set of parametric sweep tasks across the rest of the nodes as DVH

A LINQ to HPC job starts 1 basic task assigning a node as the DGM

Publish to share:

1. binaries for LINQ to HPC job

2. XML description of LINQ to HPC graph

HPC Compute Nodes

3b

DVH loads binaries for this LINQ to HPC job from share, executes them according to commands from DGM

deployment steps
Deployment Steps
  • DSC NODE ADD sen-cn1 /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:sen-hn
demo adding a new node

Demo adding a new Node

Using the HPC Management Tool

demo

hello world
Hello World!
  • using System;
  • using System.Linq;
  • using Microsoft.Hpc.Linq;
  • namespace MyProgram {
  • class Program {
  • static void Main(string[] args) {
  • varconfig = new HpcLinqConfiguration(“MyHpcClusterHeadNode”);
  • var context = new HpcLinqContext(config);
  • var lengths = context.FromDsc<LineRecord>("MyTextData")
  • .Select(r => r.Line.Length);
  • Console.WriteLine("The maximum line length is {0}", lengths.Max());
  • }
  • }
  • }
managing data and hpc cluster
Managing data and HPC cluster
  • HPC Server administration basics:
    • Managing the job queue
    • How to identify the user that submitted jobs
    • Canceling a runaway job
  • Data Storage Catalog specific tasks:
    • Monitor disk usage tracked by DSC on each node
    • View how the DSC file set maps to NTFS across nodes
    • Identify the nodes where files are replicated
quick overview of the software components that made this possible
Quick overview of the software components that made this possible.

NEW

LINQ to HPC

Programming models

MPI

SOA

LINQ to HPC runtime

Distributed runtimes

Cluster and cloud services

HPC provisioning, management, etc.

DSC (Distributed Storage Catalog)

Windows Server

Azure*

Platform

Bind individual NTFS shares together to support the LINQ to HPC distributed runtime

* Future support planned

h ow linq to hpc and parallel data warehouse complement each other
How LINQ to HPC and Parallel Data Warehouse complement each other
  • Customer needs for Big Data lie on a spectrum
    • One extreme is analytics targeting a traditional data warehouse. The analyst knows the cube he or she wants to build, and the analyst knows the data sources.
    • Another extreme is analyzing raw unstructured data. The analyst does not know exactly what the data contains, nor what cube would be justified. The analyst needs to do ad-hoc analyses that may never be run again.
  • HPC Server targets the raw unstructured data extreme.
microsoft already has great data platform assets
Microsoft already has great data platform assets
  • PowerPivot, SQL Server Integration Services (SSIS), Parallel Data Warehouse (PDW), …
  • HPC+LINQ to HPC’s focus on raw unstructured data analytics enables new solutions that incorporate multiple assets
    • E.g., analyze raw unstructured data using HPC+LINQ to HPC then pipe it to SSIS and apply rest of BI stack
microsoft big data end to end
Microsoft Big Data End-to-End

HPC Server

Sensors

Devices

Data Marts

Apps

Interactive Reports

Bots

S S RS

Integration Services

Crawlers

Performance Scorecard

SSAS

PowerPivot

Integration Services

ERP

CRM

LOB

Hadoop

Data & Compute Intensive HPC App

SQL EDW

Embedded BI Apps

for more information
For more information
  • Download HPC Server 2008 R2 Evaluation Copy Today – microsoft.com/hpc
  • Download Service Pack 2 Beta - connect.microsoft.com
  • HPC Server Hands-on Labs – microsoft.com/hpc -> Technical Resources
  • Product Demo Station – in the Server and Cloud Section
  • HPC Server Certification Exam - microsoft.com/learning/en/us/exam.aspx?ID=70-690
  • Find Me Later At… twitter: @saptak
resources
Resources
  • Connect. Share. Discuss.

http://northamerica.msteched.com

Learning

  • Sessions On-Demand & Community
  • Microsoft Certification & Training Resources

www.microsoft.com/teched

www.microsoft.com/learning

  • Resources for IT Professionals
  • Resources for Developers
  • http://microsoft.com/technet
  • http://microsoft.com/msdn
slide26

Required Slide

Complete an evaluation on CommNet and enter to win!

slide27

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.