APWeb’2011 Tutorial

Cloud Computing and Scalable Data Management Jiaheng Lu and Sai Wu Renmin Universtiy of China , National University of Singapore APWeb’2011 Tutorial

Outline • Cloud computing • Map/Reduce, Bigtable and PNUT • CAP Theorem and datalog • Data indexing in the clouds • Conclusion and open issues Part 1 Part 2 APWeb 2011

Cloud computing CLOUD COMPUTING

Why we use cloud computing?

Why we use cloud computing? Case 1: Write a file Save Computer down, file is lost Files are always stored in cloud, never lost

Why we use cloud computing? Case 2: Use IE --- download, install, use Use QQ --- download, install, use Use C++ --- download, install, use …… Get the serve from the cloud

What is cloud and cloud computing? Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

Characteristics of cloud computing • Virtual. software, databases, Web servers, operating systems, storage and networking as virtual servers. • On demand. add and subtract processors, memory, network bandwidth, storage.

Types of cloud service SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service

Outline • Cloud computing • Map/Reduce, Bigtable and PNUT • CAP Theorem and datalog • Data indexing in the clouds • Conclusion and open issues Part 1 Part 2 APWeb 2011

Introduction to MapReduce

MapReduce Programming Model • Inspired from map and reduce operations commonly used in functional programming languages like Lisp. • Users implement interface of two primary methods: • 1. Map: (key1, val1) → (key2, val2) • 2. Reduce: (key2, [val2]) → [val3]

Map operation • Map, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs. • e.g. (doc—id, doc-content) • Draw an analogy to SQL, map can be visualized as group-by clause of an aggregate query.

Reduce operation On completion of map phase, all the intermediate values for a given output key are combined together into a list and given to a reducer. Can be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.

Pseudo-code map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));

MapReduce: Execution overview

MapReduce: Example

Research works • MapReduce is slower than Parallel Databases by a factor 3.1 to 6.5 [1][2] • By adopting an hybrid architecture, the performance of MapReduce can approach to Parallel Databases [3] • MapReduce is an efficient tool [4] • Numerous discussions in MapReduce community … [1] A comparison of approaches to large-scale data analysis. SIGMOD 2009 [2] MapReduce and Parallel DBMSs: friends or foes? CACM 2010 [3] HadoopDB: an architectural hybrid of MapReduce and DBMS techonologies for analytical workloads. VLDB 2009 [4] MapReduce: a flexible data processing tool. CACM 2010

An in-depth study (1) • A performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of parallelism. [5] [5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010) • Scheduling • I/O modes • Record Parsing • Indexing

An in-depth study (2) • By carefully tuning these factors, the overall performance of Hadoop can be comparable to that of parallel database systems. • Possible to build a cloud data processing system that is both elastically scalable and efficient [5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010)

Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden: Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. ICDE 2010:657-668

Problem proposed • Problem: Node failures on distributed database system • Faults may be common on large clusters of machines • Existing solution: Aborting and (possibly) restarting the query • A reasonable approach for short OLTP-style queries • But it’s time-wasting for analytical (OLAP) warehouse queries

MapReduce-style fault tolerance for a SQL database • Break up a SQL query (or program) into smaller, parallelizable subqueries • Adapt the load balancing strategy of greedy assignment of work

Osprey • A middleware implementation of MapReduce-style fault tolerance for a SQL database

SQL procedure Query Query Transformer Subquery Subquery Subquery Subquery Subquery Subquery Subquery Subquery Subquery PWQ A PWQ B PWQ :partition work queue PWQ C Scheduler Execution Result Result Merger

Mapreduce Online Evaluation Plateform

Construction School Of Information Renmin University Of China

Cloud data management

Four new principles in Cloud-based data management

New principle in cloud dta management（1） • Partition Everythingand key-value storage • 切分万物以治之 • 1st normal form cannot be satisfied

New principle in cloud dta management （2） • Embrace Inconsistency • 容不同乃成大同 • ACID properties are not satisfied

New principle in cloud dta management （3） • Backup everything with three copies • 狡兔三窟方高枕 • Guarantee 99.999999% safety

New principle in cloud dta management （4） • Scalable and high performance • 运筹沧海量兼容

Cloud data management • 切分万物以治之 • Partition Everything • 容不同乃成大同 • Embrace Inconsistency • 狡兔三窟方高枕 • Backup data with three copies • 运筹沧海量兼容 • Scalable and high performance

BigTable: A Distributed Storage System for Structured Data

Introduction BigTable is a distributed storage system for managing structured data. Designed to scale to a very large size Petabytes of data across thousands of servers Used for many Google projects Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … Flexible, high-performance solution for all of Google’s products

Why not just use commercial DB? Scale is too large for most commercial databases Even if it weren’t, cost would be very high Building internally means system can be applied across many projects for low incremental cost Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer

Goals Want asynchronous processes to be continuously updating different pieces of data Want access to most current data at any time Need to support: Very high read/write rates (millions of ops per second) Efficient scans over all or interesting subsets of data Efficient joins of large one-to-one and one-to-many datasets Often want to examine data changes over time E.g. Contents of a web page over multiple crawls

BigTable Distributed multi-level map Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance

Basic Data Model A BigTable is a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents Good match for most Google applications

WebTable Example Want to keep copy of a large collection of web pages and related information Use URLs as row keys Various aspects of web page as column names Store contents of web pages in the contents: column under the timestamps when they were fetched.

Rows Name is an arbitrary string Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually on one or a small number of machines

Rows (cont.) Reads of short row ranges are efficient and typically require communication with a small number of machines. Can exploit this property by selecting row keys so they get good locality for data access. Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

Columns Columns have two-level name structure: family:optional_qualifier Column family Unit of access control Has associated type information Qualifier gives unbounded columns Additional levels of indexing, if desired

Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Lookup options: “Return most recent K values” “Return all values in timestamp range (or all values)” Column families can be marked w/ attributes: “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”

Implementation – Three Major Components Library linked into every client One master server Responsible for: Assigning tablets to tablet servers Detecting addition and expiration of tablet servers Balancing tablet-server load Garbage collection Many tablet servers Tablet servers handle read and write requests to its table Splits tablets that have grown too large

Tablets Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys to achieve locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machines each pick up 1 tablet for failed machine Fine-grained load balancing: Migrate tablets away from overloaded machine Master makes load-balancing decisions

APWeb’2011 Tutorial

APWeb’2011 Tutorial

Presentation Transcript

UCL Tutorial on: Deep Belief Nets (An updated and extended version of my 2007 NIPS tutorial)

Design and Analysis of Large Scale Log Studies A CHI 2011 course v11

Short M ATLAB Tutorial

Rec Trac Recreation Tracking Software

Scientific Computing (a tutorial)

EVERYTHING NONPUBLIC

Tutorial 3 OAI and OAI-PMH for absolute beginners a non-technical introduction

Clementine Tutorial

SPSS Tutorial

Introduction to the Semantic Web

Practical Online Retrieval Evaluation SIGIR 2011 Tutorial

Tutorial by:mozafar bag mohammadi

2011

Java GUI La librería Swing //java.sun/docs/books/tutorial/uiswing

BigSim Tutorial

CORBA Component Model Tutorial

Kerberos v5 Tutorial

Using this Tutorial

CORBA Component Model Tutorial

Linux-HA Release 2 Tutorial

Tutorial 3 OAI and OAI-PMH for absolute beginners a non-technical introduction

Managing Information Extraction SIGMOD 2006 Tutorial