High productivity computing
Download
1 / 10

High Productivity Computing - PowerPoint PPT Presentation


  • 97 Views
  • Updated On :

High Productivity Computing. Large-scale Knowledge Discovery: Co-evolving Algorithms and Mechanisms Steve Reinhardt Principal Architect Microsoft. Prof. John Gilbert, UCSB Dr. Viral Shah, UCSB. Context for Knowledge Discovery.

Related searches for High Productivity Computing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'High Productivity Computing' - amadeus


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
High productivity computing l.jpg
High Productivity Computing

Large-scale Knowledge Discovery:

Co-evolving Algorithms and Mechanisms

Steve Reinhardt

Principal Architect

Microsoft

Prof. John Gilbert, UCSB

Dr. Viral Shah, UCSB


Slide2 l.jpg

Context for Knowledge Discovery

From Debbie Gracio and Ian Gorton, PNNL Data Intensive Computing Initiative


Knowledge discovery kd definition l.jpg
Knowledge Discovery (KD) Definition

  • Data-intensive computing: when the acquisition and movement of input data is a primary limitation on feasibility or performance

  • Simple data mining: searching for exceptional values on elemental measures (e.g., heat, #transactions)

  • Knowledge discovery: searching for exceptional values on associative/social measures (e.g., most between, belonging to greatest number of valuable reactions)


Today s biggest obstacle in the kd field l.jpg
Today’s Biggest Obstacle in the KD Field

  • Lack of fast feedback between domain experts and infrastructure/tool developers about good usable scalable KD software platforms

  • Need to accelerate the rate of learning about both good KD algorithms and good KD infrastructure

  • Domain experts want:

  • Good infrastructure that works

  • … and scales greatly and runs fast

  • Flexibility to develop/tweak algorithms to suit their needs

  • Algorithms with strong math basis

  • But don’t know

  • The best approach or algorithms

  • Infrastructure developers want:

  • Clear audience for what they develop

  • Architecture that copes with client, cluster, cloud, GPU, and huge data

  • But don’t know

  • The best approach

Need to get good (not perfect) scalable platforms in use to co-evolve towards best approaches and algorithms



Kdt layers enable overloading with various technologies l.jpg
KDT Layers: Enable overloading with various technologies

Community

Detection

Elementary Mode Analysis

kdt.

Betweenness Centrality

All Pairs Shortest Path

BarycentricClustering

All Pairs Shortest Path(Cray XMT)

Parallel/distributed operations

(constructors, SpGEMM, SpMV, SpAdd, SpGEMM semi-rings, I/O)

Parallel/distributed operations

(in-memory (Star-P) or out-of-memory (DryadLINQ-based))

Localconstructors

LocalSpGEMM

LocalSpRef/

SpAsgn

LocalSpMV

LocalSpAdd

LocalSpGEMMon semi-rings

LocalI/O

LocalSpGEMM(GPU)

LocalSpGEMM(GPU)

scipy.


Dryadlinq query plan parallel execution l.jpg
DryadLINQ: Query + Plan + Parallel Execution

  • Dryad

    • Distributed-memory coarse-grain run-time

    • Generalized MapReduce

    • Using computational vertices and communication channels to form a dataflow execution graph

  • LINQ (Language INtegrated Query)

    • A query-style language interface to Dryad

    • Typical relational operators (e.g., Select, Join, GroupBy)

  • Scaling for histogram example

    • Input data 10.2TB, using 1,800 cluster nodes, 43,171 execution-graph verticesspawning 11,072 processes, creating 33GB output data in 11.5 minutes of execution

data plane

Files, TCP, FIFO, Network

sched

V

V

V

NS

PD

PD

PD

control plane

Job manager

cluster


Star p bridges scientists to hpcs l.jpg

MATLAB

Star-P Bridges Scientists to HPCs

Star-P enables domain experts to use parallel, big-memory systems via productivity languages

(e.g., the M language of MATLAB)

Knowledge discovery scaling with Star-P

  • Kernels to 55B edges between 5B vertices, on 128 cores (consuming 4TB memory)

  • Compact applications to 1B edges on 256 cores


Next steps l.jpg
Next Steps

  • Get prototypes available for early experience and feedback

    • in-memory and out-of-memory targets of KDT

    • with graph layer

    • likely exposed via Python library interface


Slide10 l.jpg

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista, Windows 7, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it shouldnot be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


ad