An Overview of Cloud Computing @ Yahoo!
Download
1 / 68

Reflects many discussions with: - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research. Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Reflects many discussions with:' - nuncio


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

An Overview of Cloud Computing @ Yahoo!Raghu RamakrishnanChief Scientist, Audience and Cloud ComputingResearch Fellow, Yahoo! Research

Reflects many discussions with:

Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata

and joint work with the Sherpa team, in particular:

Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! Research

Chuck Neerdaels, P.P. Suryanarayanan and many others in CCDI


Questions
Questions

  • What is cloud computing?

    • Horizontal and functional services

  • What’s it going to change?

    • Software business models, science, life

  • How many clouds will there be?

    • 1, 2, 3, infinity

  • What’s new in cloud computing?

    • HPC grids, ASPs, hosted services, Multics (!)

    • Emerging “cloud stack” to support a broad class of programs, including data intensive applications


Scenarios
SCENARIOS

Pie-in-the-sky


Living in the clouds
Living in the Clouds

  • We want to start a new website, FredsList.com

  • Our site will provide listings of items for sale, jobs, etc.

  • As time goes on, we’ll add more features

    • And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed

      • List of capabilities/components is illustrative, not exhaustive

  • Our cloud provides a “dataset” abstraction

    • FredsList doesn’t worry about the underlying components


Step 1 listings scenario
Step 1: Listings Scenario

FredsList wants to store listings as (key, category, description)

FredsList.com application

DECLARE DATASET Listings AS

( ID String PRIMARY KEY,

Category String,

Description Text )

5523442, childcare,

Nanny available in San Jose

1234323, transportation, For sale: one bicycle, barely used

215534,

wanted,

Looking for issue 1 of Superman comic book

Simple Web Service API’s

Database

PNUTS


Step 2 system evolution
Step 2: System Evolution

Fred belatedly realizes prices are useful information!

FredsList.com application

ALTER DATASET Listings

ADD (Price Float)

5523442, childcare,

Nanny available in San Jose

215534,

wanted,

Looking for issue 1 of Superman comic book

32138,

camera,

Nikon D40,

USD 300

1234323, transportation, For sale: one bicycle, barely used

Simple Web Service API’s

Schemas are flexible, and evolve

vs.

Database

PNUTS

Not every record in a

dataset has values defined

for all fields declared for

the dataset


Step 3 search

Federation of systems offering different capabilities

Step 3: Search

FredsList’s customers quickly ask for keyword search

FredsList.com application

ALTER Listings

SET Description SEARCHABLE

“dvd’s”

“bicycle”

“nanny”

Simple Web Service API’s

Database

Search

PNUTS

Vespa

Messaging

Tribble


Step 4 photos

Federation of systems offering different performance points

Step 4: Photos

FredsList decides to add photos/videos to listings

FredsList.com application

ALTER Listings

ADD Photo BLOB

Simple Web Service API’s

Storage

Database

Search

Foreign key

photo → listing

MObStor

PNUTS

Vespa

Messaging

Tribble


Step 5 data analysis
Step 5: Data Analysis

FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.

FredsList.com application

ALTER Listings

MAKE ANALYZABLE

Hadoop program to generate fancy pages for listings

Hadoop program to geocode data

Pig query to analyze categories

Simple Web Service API’s

Storage

Compute

Database

Search

Foreign key

photo → listing

MObStor

Grid

PNUTS

Vespa

Messaging

Tribble

Batch export


Step 6 performance

And by now, Fred is global, and wants geo-replication!

Step 6: Performance

FredsList wants to reduce its data access latency

FredsList.com application

ALTER Listings

MAKE CACHEABLE

Simple Web Service API’s

Storage

Compute

Database

Caching

Search

Foreign key

photo → listing

MObStor

Grid

PNUTS

memcached

Vespa

Messaging

Tribble

Batch export


Data serving vs analysis
Data Serving vs. Analysis

  • Very different workloads, requirements

  • Data from serving system is one of many kinds of data (click streams are another common kind, as are syndicated feeds) to be analyzed and integrated

  • The result of analysis often goes right back into serving system


Eyes to the skies
EYES TO THE SKIES

Motherhood-and-Apple-Pie


Why clouds
Why Clouds?

  • On-demand infrastructure to create a fundamental shift in the OE curve:

    • Do things we can’t do

    • Build more robustly, more efficiently, more globally, more completely, more quickly, for a given budget

  • Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability

    • Today, this is done at the app-level, which is not productive


Requirements for cloud services
Requirements for Cloud Services

  • Multitenant. A cloud service must support multiple, organizationally distant customers.

  • Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand.

  • Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient, e.g., due to spikes.

  • Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service.

  • Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service.

  • Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.

  • Availability. A cloud service should be highly available.

  • Operability. A cloud service should be easy to operate, with few operators. Operating costs should scale linearly or better with the capacity of the service.


Types of cloud services
Types of Cloud Services

  • Two kinds of cloud services:

    • Horizontal (“Platform”) Cloud Services

      • Functionality enabling tenants to build applications or new services on top of the cloud

    • Functional Cloud Services

      • Functionality that is useful in and of itself to tenants. E.g., various SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping

      • Could be built on top of horizontal cloud services or from scratch

      • Yahoo! has been offering these for a long while (e.g., Mail for SMB, Groups, Flickr, BOSS, Ad exchanges)


Opening up yahoo search
Opening Up Yahoo! Search

Phase 1

Phase 2

BOSS takes Yahoo!’s open strategy to the next level by providing Yahoo! Search infrastructure and technology to developers and companies to help them build their own search experiences.

Giving site owners and developers control over the appearance of Yahoo! Search results.


Boss offerings
BOSS Offerings

BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search.

  • ACADEMIC

  • Working with the following universities to allow for wide-scale research in the search field:

API

A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences.

CUSTOM

Working with 3rd parties to build a more relevant, brand/site specific web search experience.

This option is jointly built by Yahoo! and select partners.

  • University of Illinois Urbana Champaign

  • Carnegie Mellon University

  • Stanford University

  • Purdue University

  • • MIT

  • Indian Institute of

  • Technology Bombay

  • University of

  • Massachusetts

(Slide courtesy Prabhakar Raghavan)



Horizontal cloud services
Horizontal Cloud Services

  • Horizontal cloudservices are foundations on which tenants build applications or new services. They should be:

    • Semantics-free. Must be "generic infrastructure,” and not tied to specific app-logic.

      • May provide the ability to inject application logic through well-defined APIs

    • Broadly applicable. Must be broadly applicable (i.e., it can't be intended for just one or two properties).

    • Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures.

  • While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications.


Yahoo cloud stack
Yahoo! Cloud Stack

EDGE

Horizontal Cloud Services

YCS

YCPI

Brooklyn

WEB

Horizontal Cloud Services

VM/OS

yApache

PHP

App Engine

APP

Provisioning (Self-serve)

Monitoring/Metering/Security

Horizontal Cloud Services

VM/OS

Serving Grid

Data Highway

STORAGE

Horizontal Cloud Services

Sherpa

MOBStor

BATCH

Horizontal Cloud Services

Hadoop


Yahoo ccdi thrust areas
Yahoo! CCDI Thrust Areas

  • Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services

    • Multiple hosts may be multiplexed onto the same physical machine.

  • Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities

  • Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval

  • Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP

Rest of

today’s talk


Web data management
Web Data Management

  • CRUD

  • Point lookups and short scans

  • Index organized table and random I/Os

  • $ per latency

  • Scan oriented workloads

  • Focus on sequential disk I/O

  • $ per cpu cycle

Structured record storage

(PNUTS/Sherpa)

Large data analysis

(Hadoop)

  • Object retrieval and streaming

  • Scalable file storage

  • $ per GB

Blob storage

(SAN/NAS)


Hadoop batch storage analysis
Hadoop: Batch Storage/Analysis

Why is batch processing important?

  • Whether it’s

    • response-prediction for advertising

    • machine-learned relevance for Search, or

    • content optimization for audience,

    • data-intensive computing is increasingly central to everything Yahoo! does

    • Hadoop is central to addressing this need

  • Hadoop is a case-study in our cloud vision

    • Processes enormous amounts of data

    • Provides horizontal scaling and fault-tolerance for our users

    • Allows those users to focus on their app logic

[Workflow]

High-level query layer (Pig)

Map-Reduce

HDFS


The world has changed
The World Has Changed

  • Web serving applications need:

    • Scalability!

      • Preferably elastic

    • Flexible schemas

    • Geographic distribution

    • High availability

    • Reliable storage

  • Web serving applications can do without:

    • Complicated queries

    • Strong transactions


Mobstor
MObStor

Yahoo!’s next-generation globally replicated, virtualized media object storage service

Better provisioning, easy migration, replication, better BCP, and performance

New features (Evergreen URLs, CDN integration, REST API, …)

The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage.

27



PNUTS /

SHERPA

To Help You Scale Your Mountains of Data


Ccdi research collaboration

Yahoo! Research

Raghu Ramakrishnan

Brian Cooper

Utkarsh Srivastava

Adam Silberstein

Rodrigo Fonseca

CCDI

Chuck Neerdaels

P.P.S. Narayan

Kevin Athey

Toby Negrin

Plus Dev/QA teams

CCDI—Research Collaboration


Yahoo serving storage problem
Yahoo! Serving Storage Problem

  • Small records – 100KB or less

  • Structured records – lots of fields, evolving

  • Extreme data scale - Tens of TB

  • Extreme request scale - Tens of thousands of requests/sec

  • Low latency globally - 20+ datacenters worldwide

  • High Availability - outages cost $millions

  • Variable usage patterns - as applications and users change

31


What is pnuts sherpa
What is PNUTS/Sherpa?

A 42342 E

A 42342 E

B 42521 W

B 42521 W

C 66354 W

D 12352 E

F 15677 E

A 42342 E

E 75656 C

B 42521 W

C 66354 W

C 66354 W

D 12352 E

D 12352 E

E 75656 C

E 75656 C

F 15677 E

F 15677 E

CREATE TABLE Parts (

ID VARCHAR,

StockNumber INT,

Status VARCHAR

)

Structured, flexible schema

Geographic replication

Parallel database

Hosted, managed infrastructure

33


What will it become

A 42342 E

A 42342 E

A 42342 E

B 42521 W

B 42521 W

B 42521 W

C 66354 W

C 66354 W

C 66354 W

D 12352 E

D 12352 E

D 12352 E

E 75656 C

E 75656 C

E 75656 C

F 15677 E

F 15677 E

F 15677 E

What Will It Become?

Indexes and views


Design goals
Design Goals

Consistency

Per-record guarantees

Timeline model

Option to relax if needed

Multiple access paths

Hash table, ordered table

Primary, secondary access

Hosted service

Applications plug and play

Share operational cost

Scalability

Thousands of machines

Easy to add capacity

Restrict query language to avoid costly queries

Geographic replication

Asynchronous replication around the globe

Low-latency local access

High availability and fault tolerance

Automatically recover from failures

Serve reads and writes despite failures

36


Technology elements
Technology Elements

Applications

Tabular API

PNUTS API

  • PNUTS

  • Query planning and execution

  • Index maintenance

  • Distributed infrastructure for tabular data

  • Data partitioning

  • Update consistency

  • Replication

YCA: Authorization

  • YDOT FS

  • Ordered tables

  • YDHT FS

  • Hash tables

  • Tribble

  • Pub/sub messaging

  • Zookeeper

  • Consistency service

37


Data manipulation
Data Manipulation

Per-record operations

Get

Set

Delete

Multi-record operations

Multiget

Scan

Getrange

Web service (RESTful) API

38


Tablets hash table
Tablets—Hash Table

Name

Description

Price

0x0000

$12

Grape

Grapes are good to eat

$9

Limes are green

Lime

$1

Apple

Apple is wisdom

$900

Strawberry

Strawberry shortcake

0x2AF3

$2

Orange

Arrgh! Don’t get scurvy!

$3

Avocado

But at what price?

Lemon

How much did you pay for this lemon?

$1

$14

Is this a vegetable?

Tomato

0x911F

$2

The perfect fruit

Banana

$8

Kiwi

New Zealand

0xFFFF

39


Tablets ordered table
Tablets—Ordered Table

Name

Description

Price

A

$1

Apple

Apple is wisdom

$3

Avocado

But at what price?

$2

Banana

The perfect fruit

$12

Grape

Grapes are good to eat

H

$8

Kiwi

New Zealand

Lemon

$1

How much did you pay for this lemon?

Limes are green

Lime

$9

$2

Orange

Arrgh! Don’t get scurvy!

Q

$900

Strawberry

Strawberry shortcake

$14

Is this a vegetable?

Tomato

Z

40



Detailed architecture
Detailed Architecture

Remote regions

Local region

Clients

REST API

Routers

Tribble

Tablet Controller

Storage

units

42


Tablet splitting and balancing
Tablet Splitting and Balancing

Storage unit

Tablet

Each storage unit has many tablets (horizontal partitions of the table)

Storage unit may become a hotspot

Tablets may grow over time

Overfull tablets split

Shed load by moving tablets to other servers

43


Query processing
QUERY PROCESSING

44


Accessing data
Accessing Data

Record for key k

Get key k

Record for key k

1

2

3

4

Get key k

SU

SU

SU

45


Bulk read
Bulk Read

{k1, k2, … kn}

Get k1

Get k2

Get k3

Scatter/

gather server

1

2

SU

SU

SU

46


Range queries in ydot

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Grapefruit…Pear?

Grapefruit…Lime?

Storage unit 1

Canteloupe

Storage unit 3

Lime

Storage unit 2

Strawberry

Storage unit 1

Lime…Pear?

Router

Storage unit 1

Storage unit 2

Storage unit 3

Range Queries in YDOT

  • Clustered, ordered retrieval of records

Apple

Avocado

Banana

Blueberry

Canteloupe

Grape

Kiwi

Lemon

Lime

Mango

Orange

Strawberry

Tomato

Watermelon

Apple

Avocado

Banana

Blueberry

Strawberry

Tomato

Watermelon

Lime

Mango

Orange

Canteloupe

Grape

Kiwi

Lemon


Updates
Updates

Write key k

SU

SU

SU

6

5

2

4

1

8

7

3

Write key k

Sequence # for key k

Routers

Message brokers

Write key k

Sequence # for key k

SUCCESS

Write key k

48




Consistency model

Goal: Make it easier for applications to reason about updates and cope with asynchrony

What happens to a record with primary key “Alice”?

Consistency Model

Record inserted

Delete

Update

Update

Update

Update

Update

Update

Update

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Time

Generation 1

As the record is updated, copies may get out of sync.

51


Example social alice
Example: Social Alice updates and cope with asynchrony

East

Record Timeline

West

___

Busy

Free

Free


Consistency model1
Consistency Model updates and cope with asynchrony

Read

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

In general, reads are served using a local copy

53


Consistency model2
Consistency Model updates and cope with asynchrony

Read up-to-date

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

But application can request and get current version

54


Consistency model3
Consistency Model updates and cope with asynchrony

Read ≥ v.6

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Or variations such as “read forward”—while copies may lag the

master record, every copy goes through the same sequence of changes

55


Consistency model4
Consistency Model updates and cope with asynchrony

Write

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Achieved via per-record primary copy protocol

(To maximize availability, record masterships automaticlly

transferred if site fails)

Can be selectively weakened to eventual consistency

(local writes that are reconciled using version vectors)

56


Consistency model5
Consistency Model updates and cope with asynchrony

Write if = v.7

ERROR

Stale version

Current version

Stale version

v. 2

v. 5

v. 1

v. 3

v. 4

v. 6

v. 7

v. 8

Time

Generation 1

Test-and-set writes facilitate per-record transactions

57


Consistency techniques
Consistency Techniques updates and cope with asynchrony

  • Per-record mastering

    • Each record is assigned a “master region”

      • May differ between records

    • Updates to the record forwarded to the master region

    • Ensures consistent ordering of updates

  • Tablet-level mastering

    • Each tablet is assigned a “master region”

    • Inserts and deletes of records forwarded to the master region

    • Master region decides tablet splits

  • These details are hidden from the application

    • Except for the latency impact!


Mastering
Mastering updates and cope with asynchrony

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

A 42342 E

B 42521 W

Tablet master

C 66354 W

D 12352 E

E 75656 C

F 15677 E

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

59


Bulk insert update replace
Bulk Insert/Update/Replace updates and cope with asynchrony

  • Client feeds records to bulk manager

  • Bulk loader transfers records to SU’s in batches

    • Bypass routers and message brokers

    • Efficient import into storage unit

Client

Bulk manager

Source Data


Bulk load in ydot
Bulk Load in YDOT updates and cope with asynchrony

  • YDOT bulk inserts can cause performance hotspots

  • Solution: preallocate tablets


Index maintenance
Index Maintenance updates and cope with asynchrony

  • How to have lots of interesting indexes and views, without killing performance?

  • Solution: Asynchrony!

    • Indexes/views updated asynchronously when base table updated


Sherpa in context
SHERPA updates and cope with asynchronyIN CONTEXT

63


Types of record stores
Types of Record Stores updates and cope with asynchrony

  • Query expressiveness

S3

PNUTS

Oracle

Simple

Feature rich

Object retrieval

Retrieval from single table of objects/records

SQL


Types of record stores1
Types of Record Stores updates and cope with asynchrony

  • Consistency model

S3

PNUTS

Oracle

Best effort

Strong guarantees

Eventual consistency

Timeline consistency

ACID

Program centric consistency

Object-centric consistency


Types of record stores2
Types of Record Stores updates and cope with asynchrony

  • Data model

PNUTS

CouchDB

Oracle

Flexibility,

Schema evolution

Optimized for

Fixed schemas

Object-centric consistency

Consistency spans objects


Types of record stores3
Types of Record Stores updates and cope with asynchrony

  • Elasticity (ability to add resources on demand)

PNUTS

S3

Oracle

Inelastic

Elastic

Limited

(via data distribution)

VLSD

(Very Large Scale Distribution /Replication)


Data stores comparison

User-partitioned SQL stores updates and cope with asynchrony

Microsoft Azure SDS

Amazon SimpleDB

Multi-tenant application databases

Salesforce.com

Oracle on Demand

Mutable object stores

Amazon S3

Versus PNUTS

More expressive queries

Users must control partitioning

Limited elasticity

Highly optimized for complex workloads

Limited flexibility to evolving applications

Inherit limitations of underlying data management system

Object storage versus record management

Data Stores Comparison


Application design space
Application Design Space updates and cope with asynchrony

Get a few things

Sherpa

MObStor

YMDB

MySQL

Oracle

Filer

BigTable

Scan everything

Hadoop

Everest

Files

Records

69


Alternatives matrix
Alternatives Matrix updates and cope with asynchrony

Consistency model

Structured

access

Global low

latency

SQL/ACID

Availability

Operability

Updates

Elastic

Sherpa

Y! UDB

MySQL

Oracle

HDFS

BigTable

Dynamo

Cassandra

70


Further reading
Further Reading updates and cope with asynchrony

Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)

Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee,

Ramana Yerneni, Raghu Ramakrishnan

PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)

Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,

Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen,

Nick Puz, Daniel Weaver, Ramana Yerneni

Asynchronous View Maintenance for VLSD Databases,

Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and

Raghu Ramakrishnan

SIGMOD 2009 (to appear)

Cloud Storage Design in a PNUTShell

Brian F. Cooper, Raghu Ramakrishnan, and Utkarsh Srivastava

Beautiful Data, O’Reilly Media, 2009 (to appear)


Questions1
QUESTIONS? updates and cope with asynchrony

72


ad