Introduction to cloud computing l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 60

Introduction to cloud computing PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to cloud computing. Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net. Yahoo ! Cloud computing . Yahoo! Cloud Stack. EDGE. Horizontal Cloud Services. YCS. YCPI . Brooklyn. …. WEB. Horizontal Cloud Services. VM/OS. yApache. PHP.

Download Presentation

Introduction to cloud computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to cloud computing l.jpg

Introduction to cloud computing

Jiaheng Lu

Department of Computer Science

Renmin University of China

www.jiahenglu.net


Yahoo cloud computing l.jpg

Yahoo! Cloud computing


Yahoo cloud stack l.jpg

Yahoo! Cloud Stack

EDGE

Horizontal Cloud Services

YCS

YCPI

Brooklyn

WEB

Horizontal Cloud Services

VM/OS

yApache

PHP

App Engine

APP

Provisioning (Self-serve)

Monitoring/Metering/Security

Horizontal Cloud Services

VM/OS

Serving Grid

Data Highway

STORAGE

Horizontal Cloud Services

Sherpa

MOBStor

BATCH

Horizontal Cloud Services

Hadoop


Web data management l.jpg

Web Data Management

  • CRUD

  • Point lookups and short scans

  • Index organized table and random I/Os

  • $ per latency

  • Scan oriented workloads

  • Focus on sequential disk I/O

  • $ per cpu cycle

Structured record storage

(PNUTS/Sherpa)

Large data analysis

(Hadoop)

  • Object retrieval and streaming

  • Scalable file storage

  • $ per GB

Blob storage

(SAN/NAS)


The world has changed l.jpg

The World Has Changed

  • Web serving applications need:

    • Scalability!

      • Preferably elastic

    • Flexible schemas

    • Geographic distribution

    • High availability

    • Reliable storage

  • Web serving applications can do without:

    • Complicated queries

    • Strong transactions


Slide6 l.jpg

PNUTS /

SHERPA

To Help You Scale Your Mountains of Data


Yahoo serving storage problem l.jpg

Yahoo! Serving Storage Problem

  • Small records – 100KB or less

  • Structured records – lots of fields, evolving

  • Extreme data scale - Tens of TB

  • Extreme request scale - Tens of thousands of requests/sec

  • Low latency globally - 20+ datacenters worldwide

  • High Availability - outages cost $millions

  • Variable usage patterns - as applications and users change

9


What is pnuts sherpa l.jpg

What is PNUTS/Sherpa?

A 42342 E

A 42342 E

B 42521 W

B 42521 W

C 66354 W

D 12352 E

F 15677 E

A 42342 E

E 75656 C

B 42521 W

C 66354 W

C 66354 W

D 12352 E

D 12352 E

E 75656 C

E 75656 C

F 15677 E

F 15677 E

CREATE TABLE Parts (

ID VARCHAR,

StockNumber INT,

Status VARCHAR

)

Structured, flexible schema

Geographic replication

Parallel database

Hosted, managed infrastructure

11


What will it become l.jpg

A 42342 E

A 42342 E

A 42342 E

B 42521 W

B 42521 W

B 42521 W

C 66354 W

C 66354 W

C 66354 W

D 12352 E

D 12352 E

D 12352 E

E 75656 C

E 75656 C

E 75656 C

F 15677 E

F 15677 E

F 15677 E

What Will It Become?

Indexes and views


Design goals l.jpg

Design Goals

Consistency

Per-record guarantees

Timeline model

Option to relax if needed

Multiple access paths

Hash table, ordered table

Primary, secondary access

Hosted service

Applications plug and play

Share operational cost

Scalability

Thousands of machines

Easy to add capacity

Restrict query language to avoid costly queries

Geographic replication

Asynchronous replication around the globe

Low-latency local access

High availability and fault tolerance

Automatically recover from failures

Serve reads and writes despite failures

14


Technology elements l.jpg

Technology Elements

Applications

Tabular API

PNUTS API

  • PNUTS

  • Query planning and execution

  • Index maintenance

  • Distributed infrastructure for tabular data

  • Data partitioning

  • Update consistency

  • Replication

YCA: Authorization

  • YDOT FS

  • Ordered tables

  • YDHT FS

  • Hash tables

  • Tribble

  • Pub/sub messaging

  • Zookeeper

  • Consistency service

15


Data manipulation l.jpg

Data Manipulation

Per-record operations

Get

Set

Delete

Multi-record operations

Multiget

Scan

Getrange

16


Tablets hash table l.jpg

Tablets—Hash Table

Name

Description

Price

0x0000

$12

Grape

Grapes are good to eat

$9

Limes are green

Lime

$1

Apple

Apple is wisdom

$900

Strawberry

Strawberry shortcake

0x2AF3

$2

Orange

Arrgh! Don’t get scurvy!

$3

Avocado

But at what price?

Lemon

How much did you pay for this lemon?

$1

$14

Is this a vegetable?

Tomato

0x911F

$2

The perfect fruit

Banana

$8

Kiwi

New Zealand

0xFFFF

17


Tablets ordered table l.jpg

Tablets—Ordered Table

Name

Description

Price

A

$1

Apple

Apple is wisdom

$3

Avocado

But at what price?

$2

Banana

The perfect fruit

$12

Grape

Grapes are good to eat

H

$8

Kiwi

New Zealand

Lemon

$1

How much did you pay for this lemon?

Limes are green

Lime

$9

$2

Orange

Arrgh! Don’t get scurvy!

Q

$900

Strawberry

Strawberry shortcake

$14

Is this a vegetable?

Tomato

Z

18


Flexible schema l.jpg

Flexible Schema


Detailed architecture l.jpg

Detailed Architecture

Remote regions

Local region

Clients

REST API

Routers

Tribble

Tablet Controller

Storage

units

20


Tablet splitting and balancing l.jpg

Tablet Splitting and Balancing

Storage unit

Tablet

Each storage unit has many tablets (horizontal partitions of the table)

Storage unit may become a hotspot

Tablets may grow over time

Overfull tablets split

Shed load by moving tablets to other servers

21


Query processing l.jpg

QUERY PROCESSING

22


Accessing data l.jpg

Accessing Data

Record for key k

Get key k

Record for key k

1

2

3

4

Get key k

SU

SU

SU

23


Bulk read l.jpg

Bulk Read

{k1, k2, … kn}

Get k1

Get k2

Get k3

Scatter/

gather server

1

2

SU

SU

SU

24


Range queries in ydot l.jpg

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Grapefruit…Pear?

Grapefruit…Lime?

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Lime…Pear?

Router

Storage unit 1

Storage unit 2

Storage unit 3

Range Queries in YDOT

  • Clustered, ordered retrieval of records

Apple

Avocado

Banana

Blueberry

Canteloupe

Grape

Kiwi

Lemon

Lime

Mango

Orange

Strawberry

Tomato

Watermelon

Apple

Avocado

Banana

Blueberry

Strawberry

Tomato

Watermelon

Lime

Mango

Orange

Canteloupe

Grape

Kiwi

Lemon


Updates l.jpg

Updates

Write key k

SU

SU

SU

6

5

2

4

1

8

7

3

Write key k

Sequence # for key k

Routers

Message brokers

Write key k

Sequence # for key k

SUCCESS

Write key k

26


Asynchronous replication and consistency l.jpg

ASYNCHRONOUS REPLICATION AND CONSISTENCY

27


Asynchronous replication l.jpg

Asynchronous Replication

28


Consistency model l.jpg

Goal: Make it easier for applications to reason about updates and cope with asynchrony

What happens to a record with primary key “Alice”?

Consistency Model

Record inserted

Delete

Update

Update

Update

Update

Update

Update

Update

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Time

Generation 1

As the record is updated, copies may get out of sync.

29


Example social alice l.jpg

Example: Social Alice

East

Record Timeline

West

___

Busy

Free

Free


Consistency model27 l.jpg

Consistency Model

Read

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

In general, reads are served using a local copy

31


Consistency model28 l.jpg

Consistency Model

Read up-to-date

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

But application can request and get current version

32


Consistency model29 l.jpg

Consistency Model

Read ≥ v.6

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Or variations such as “read forward”—while copies may lag the

master record, every copy goes through the same sequence of changes

33


Consistency model30 l.jpg

Consistency Model

Write

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Achieved via per-record primary copy protocol

(To maximize availability, record masterships automaticlly

transferred if site fails)

Can be selectively weakened to eventual consistency

(local writes that are reconciled using version vectors)

34


Consistency model31 l.jpg

Consistency Model

Write if = v.7

ERROR

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Test-and-set writes facilitate per-record transactions

35


Consistency techniques l.jpg

Consistency Techniques

  • Per-record mastering

    • Each record is assigned a “master region”

      • May differ between records

    • Updates to the record forwarded to the master region

    • Ensures consistent ordering of updates

  • Tablet-level mastering

    • Each tablet is assigned a “master region”

    • Inserts and deletes of records forwarded to the master region

    • Master region decides tablet splits

  • These details are hidden from the application

    • Except for the latency impact!


Mastering l.jpg

Mastering

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

A 42342 E

B 42521 W

Tablet master

C 66354 W

D 12352 E

E 75656 C

F 15677 E

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

37


Bulk insert update replace l.jpg

Bulk Insert/Update/Replace

  • Client feeds records to bulk manager

  • Bulk loader transfers records to SU’s in batches

    • Bypass routers and message brokers

    • Efficient import into storage unit

Client

Bulk manager

Source Data


Bulk load in ydot l.jpg

Bulk Load in YDOT

  • YDOT bulk inserts can cause performance hotspots

  • Solution: preallocate tablets


Index maintenance l.jpg

Index Maintenance

  • How to have lots of interesting indexes and views, without killing performance?

  • Solution: Asynchrony!

    • Indexes/views updated asynchronously when base table updated


Sherpa in context l.jpg

SHERPAIN CONTEXT

41


Types of record stores l.jpg

Types of Record Stores

  • Query expressiveness

S3

PNUTS

Oracle

Simple

Feature rich

Object retrieval

Retrieval from single table of objects/records

SQL


Types of record stores39 l.jpg

Types of Record Stores

  • Consistency model

S3

PNUTS

Oracle

Best effort

Strong guarantees

Eventual consistency

Timeline consistency

ACID

Program centric consistency

Object-centric consistency


Types of record stores40 l.jpg

Types of Record Stores

  • Data model

PNUTS

CouchDB

Oracle

Flexibility,

Schema evolution

Optimized for

Fixed schemas

Object-centric consistency

Consistency spans objects


Types of record stores41 l.jpg

Types of Record Stores

  • Elasticity (ability to add resources on demand)

PNUTS

S3

Oracle

Inelastic

Elastic

Limited

(via data distribution)

VLSD

(Very Large Scale Distribution /Replication)


Data stores comparison l.jpg

User-partitioned SQL stores

Microsoft Azure SDS

Amazon SimpleDB

Multi-tenant application databases

Salesforce.com

Oracle on Demand

Mutable object stores

Amazon S3

Versus PNUTS

More expressive queries

Users must control partitioning

Limited elasticity

Highly optimized for complex workloads

Limited flexibility to evolving applications

Inherit limitations of underlying data management system

Object storage versus record management

Data Stores Comparison


Application design space l.jpg

Application Design Space

Get a few things

Sherpa

MObStor

YMDB

MySQL

Oracle

Filer

BigTable

Scan everything

Hadoop

Everest

Files

Records

47


Alternatives matrix l.jpg

Alternatives Matrix

Consistency model

Structured

access

Global low

latency

SQL/ACID

Availability

Operability

Updates

Elastic

Sherpa

Y! UDB

MySQL

Oracle

HDFS

BigTable

Dynamo

Cassandra

48


Further reading l.jpg

Further Reading

Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)

Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,

Ramana Yerneni, Raghu Ramakrishnan

PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)

Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,

Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,

Nick Puz, Daniel Weaver, Ramana Yerneni

Asynchronous View Maintenance for VLSD Databases,

Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and

Raghu Ramakrishnan

SIGMOD 2009 (to appear)

Cloud Storage Design in a PNUTShell

Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava

Beautiful Data, O’Reilly Media, 2009 (to appear)


Questions l.jpg

QUESTIONS?

50


Hadoop l.jpg

Hadoop


Problem l.jpg

Problem

How do you scale up applications?

Run jobs processing 100’s of terabytes of data

Takes 11 days to read on 1 computer

Need lots of cheap computers

Fixes speed problem (15 minutes on 1000 computers), but…

Reliability problems

In large clusters, computers fail every day

Cluster size is not fixed

Need common infrastructure

Must be efficient and reliable


Solution l.jpg

Solution

Open Source Apache Project

Hadoop Core includes:

Distributed File System - distributes data

Map/Reduce - distributes application

Written in Java

Runs on

Linux, Mac OS/X, Windows, and Solaris

Commodity hardware


Hardware cluster of hadoop l.jpg

Hardware Cluster of Hadoop

Typically in 2 level architecture

Nodes are commodity PCs

40 nodes/rack

Uplink from rack is 8 gigabit

Rack-internal is 1 gigabit


Distributed file system l.jpg

Distributed File System

Single namespace for entire cluster

Managed by a single namenode.

Files are single-writer and append-only.

Optimized for streaming reads of large files.

Files are broken in to large blocks.

Typically 128 MB

Replicated to several datanodes, for reliability

Access from Java, C, or command line.


Block placement l.jpg

Block Placement

Default is 3 replicas, but settable

Blocks are placed (writes are pipelined):

On same node

On different rack

On the other rack

Clients read from closest replica

If the replication for a block drops below target, it is automatically re-replicated.


How is yahoo using hadoop l.jpg

How is Yahoo using Hadoop?

Started with building better applications

Scale up web scale batch applications (search, ads, …)

Factor out common code from existing systems, so new applications will be easier to write

Manage the many clusters


Running production webmap l.jpg

Running Production WebMap

  • Search needs a graph of the “known” web

    • Invert edges, compute link text, whole graph heuristics

  • Periodic batch job using Map/Reduce

    • Uses a chain of ~100 map/reduce jobs

  • Scale

    • 1 trillion edges in graph

    • Largest shuffle is 450 TB

    • Final output is 300 TB compressed

    • Runs on 10,000 cores

    • Raw disk used 5 PB


Terabyte sort benchmark l.jpg

Terabyte Sort Benchmark

  • Started by Jim Gray at Microsoft in 1998

  • Sorting 10 billion 100 byte records

  • Hadoop won the general category in 209 seconds

    • 910 nodes

    • 2 quad-core Xeons @ 2.0Ghz / node

    • 4 SATA disks / node

    • 8 GB ram / node

    • 1 gb ethernet / node

    • 40 nodes / rack

    • 8 gb ethernet uplink / rack

  • Previous records was 297 seconds


Hadoop clusters l.jpg

Hadoop clusters

We have ~20,000 machines running Hadoop

Our largest clusters are currently 2000 nodes

Several petabytes of user data (compressed, unreplicated)

We run hundreds of thousands of jobs every month


Research cluster usage l.jpg

Research Cluster Usage


Who uses hadoop l.jpg

Who Uses Hadoop?

Amazon/A9

AOL

Facebook

Fox interactive media

Google / IBM

New York Times

PowerSet (now Microsoft)

Quantcast

Rackspace/Mailtrust

Veoh

Yahoo!

More at http://wiki.apache.org/hadoop/PoweredBy


Slide59 l.jpg

Q&A

For more information:

Website: http://hadoop.apache.org/core

Mailing lists:

[email protected]

[email protected]


Questions60 l.jpg

QUESTIONS?

64


  • Login