Introduction to data center computing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Introduction to Data Center Computing PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Data Center Computing. Derek Murray. What we’ll cover. Techniques for handling “big data” Distributed storage Distributed computation Focus on recent papers describing real systems. Example: web search. Crawling. Indexing. Querying. WWW. A system architecture?.

Download Presentation

Introduction to Data Center Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to data center computing

Introduction to Data Center Computing

Derek Murray


What we ll cover

What we’ll cover

  • Techniques for handling “big data”

    • Distributed storage

    • Distributed computation

  • Focus on recent papers describing real systems


Example web search

Example: web search

Crawling

Indexing

Querying

WWW


A system architecture

A system architecture?

Network

Computers

Storage


Data center architecture

Data Center architecture

Core switch

Rack switch

Rack switch

Rack switch

Server

Server

Server

Server

Server

Server

Server

Server

Server


Distributed storage

Distributed storage

  • High volume of data

  • High volume of read/writerequests

  • Fault tolerance


Brewer s cap theorem

Brewer’s CAP theorem


The google file system

The Google file system

GFS Master

Client

Chunk server

Chunk server

Chunk server


Dynamo

Dynamo

Client


Distributed computation

Distributed computation

  • Parallel distributed processing

  • Single Program, Multiple Data (SPMD)

  • Fault tolerance

  • Applications


Task farming

Task farming

Master

Worker

Worker

Worker

Storage


Mapreduce

MapReduce

Input

Map

Shuffle

Reduce

Output


Dryad

Dryad

  • Arbitrary task graph

  • Vertices and channels

  • Topological ordering


Dryadlinq

DryadLINQ

  • Language Integrated Query (LINQ)

    vartable = PartitionedTable.Get<int>(“…”);

    varresult = from x in table

    selectx * x;

    intsumSquares = result.Sum();


Scheduling issues

Scheduling issues

  • Heterogeneous performance

  • Sharing a cluster fairly

  • Data locality


References

References

  • Storage

    • Ghemawatet al., “The Google File System”, Proceedings of SOSP 2003

    • DeCandiaet al., “Dynamo: Amazon’s Highly-Available Key-value Store”, Proceedings of SOSP 2007

    • Burrows, “The Chubby lock service for loosely-coupled distributed systems”, Proceedings of OSDI 2006

  • Computation

    • Dean and Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Proceedings of OSDI 2004

    • Isardet al., “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks”, Proceedings of EuroSys 2007

    • Yu et al., “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language”, Proceedings of OSDI 2008

    • Olstonet al., “Pig Latin: A Not-So-Foreign Language for Data Processing”, Proceedings of SIGMOD 2008

  • Scheduling

    • Zahariaet al., “Improving MapReduce Performance in Heterogeneous Environments”, Proceedings of OSDI 2008

    • Isardet al., “Quincy: Fair Scheduling for Distributed Computing Clusters”, Proceedings of SOSP 2009

    • Zahariaet al., “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, Proceedings of EuroSys 2010


Conclusions

Conclusions

  • Data centers achieve high performance with commodity parts

  • Efficient storage requires application-specific trade-offs

  • Data-parallelism simplifies distributed computation on the data


  • Login