Introduction to data center computing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Introduction to Data Center Computing PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Data Center Computing. Derek Murray. What we’ll cover. Techniques for handling “big data” Distributed storage Distributed computation Focus on recent papers describing real systems. Example: web search. Crawling. Indexing. Querying. WWW. A system architecture?.

Download Presentation

Introduction to Data Center Computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to Data Center Computing

Derek Murray


What we’ll cover

  • Techniques for handling “big data”

    • Distributed storage

    • Distributed computation

  • Focus on recent papers describing real systems


Example: web search

Crawling

Indexing

Querying

WWW


A system architecture?

Network

Computers

Storage


Data Center architecture

Core switch

Rack switch

Rack switch

Rack switch

Server

Server

Server

Server

Server

Server

Server

Server

Server


Distributed storage

  • High volume of data

  • High volume of read/writerequests

  • Fault tolerance


Brewer’s CAP theorem


The Google file system

GFS Master

Client

Chunk server

Chunk server

Chunk server


Dynamo

Client


Distributed computation

  • Parallel distributed processing

  • Single Program, Multiple Data (SPMD)

  • Fault tolerance

  • Applications


Task farming

Master

Worker

Worker

Worker

Storage


MapReduce

Input

Map

Shuffle

Reduce

Output


Dryad

  • Arbitrary task graph

  • Vertices and channels

  • Topological ordering


DryadLINQ

  • Language Integrated Query (LINQ)

    vartable = PartitionedTable.Get<int>(“…”);

    varresult = from x in table

    selectx * x;

    intsumSquares = result.Sum();


Scheduling issues

  • Heterogeneous performance

  • Sharing a cluster fairly

  • Data locality


References

  • Storage

    • Ghemawatet al., “The Google File System”, Proceedings of SOSP 2003

    • DeCandiaet al., “Dynamo: Amazon’s Highly-Available Key-value Store”, Proceedings of SOSP 2007

    • Burrows, “The Chubby lock service for loosely-coupled distributed systems”, Proceedings of OSDI 2006

  • Computation

    • Dean and Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Proceedings of OSDI 2004

    • Isardet al., “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks”, Proceedings of EuroSys 2007

    • Yu et al., “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language”, Proceedings of OSDI 2008

    • Olstonet al., “Pig Latin: A Not-So-Foreign Language for Data Processing”, Proceedings of SIGMOD 2008

  • Scheduling

    • Zahariaet al., “Improving MapReduce Performance in Heterogeneous Environments”, Proceedings of OSDI 2008

    • Isardet al., “Quincy: Fair Scheduling for Distributed Computing Clusters”, Proceedings of SOSP 2009

    • Zahariaet al., “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, Proceedings of EuroSys 2010


Conclusions

  • Data centers achieve high performance with commodity parts

  • Efficient storage requires application-specific trade-offs

  • Data-parallelism simplifies distributed computation on the data


  • Login