APWeb’2011 Tutorial - PowerPoint PPT Presentation

slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
APWeb’2011 Tutorial PowerPoint Presentation
Download Presentation
APWeb’2011 Tutorial

play fullscreen
1 / 99
Download Presentation
APWeb’2011 Tutorial
Download Presentation

APWeb’2011 Tutorial

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Cloud Computing and Scalable Data Management Jiaheng Lu and Sai Wu Renmin Universtiy of China , National University of Singapore APWeb’2011 Tutorial

  2. Outline • Cloud computing • Map/Reduce, Bigtable and PNUT • CAP Theorem and datalog • Data indexing in the clouds • Conclusion and open issues Part 1 Part 2 APWeb 2011

  3. Cloud computing CLOUD COMPUTING

  4. Why we use cloud computing?

  5. Why we use cloud computing? Case 1: Write a file Save Computer down, file is lost Files are always stored in cloud, never lost

  6. Why we use cloud computing? Case 2: Use IE --- download, install, use Use QQ --- download, install, use Use C++ --- download, install, use …… Get the serve from the cloud

  7. What is cloud and cloud computing? Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

  8. Characteristics of cloud computing • Virtual. software, databases, Web servers, operating systems, storage and networking as virtual servers. • On demand. add and subtract processors, memory, network bandwidth, storage.

  9. Types of cloud service SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service

  10. Outline • Cloud computing • Map/Reduce, Bigtable and PNUT • CAP Theorem and datalog • Data indexing in the clouds • Conclusion and open issues Part 1 Part 2 APWeb 2011

  11. Introduction to MapReduce

  12. MapReduce Programming Model • Inspired from map and reduce operations commonly used in functional programming languages like Lisp. • Users implement interface of two primary methods: • 1. Map: (key1, val1) → (key2, val2) • 2. Reduce: (key2, [val2]) → [val3]

  13. Map operation • Map, a pure function, written by the user, takes an input key/value pair and produces a set of intermediate key/value pairs. • e.g. (doc—id, doc-content) • Draw an analogy to SQL, map can be visualized as group-by clause of an aggregate query.

  14. Reduce operation On completion of map phase, all the intermediate values for a given output key are combined together into a list and given to a reducer. Can be visualized as aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute.

  15. Pseudo-code map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1"); reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));

  16. MapReduce: Execution overview

  17. MapReduce: Example

  18. Research works • MapReduce is slower than Parallel Databases by a factor 3.1 to 6.5 [1][2] • By adopting an hybrid architecture, the performance of MapReduce can approach to Parallel Databases [3] • MapReduce is an efficient tool [4] • Numerous discussions in MapReduce community … [1] A comparison of approaches to large-scale data analysis. SIGMOD 2009 [2] MapReduce and Parallel DBMSs: friends or foes? CACM 2010 [3] HadoopDB: an architectural hybrid of MapReduce and DBMS techonologies for analytical workloads. VLDB 2009 [4] MapReduce: a flexible data processing tool. CACM 2010

  19. An in-depth study (1) • A performance study of MapReduce (Hadoop) on a 100-node cluster of Amazon EC2 with various levels of parallelism. [5] [5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010) • Scheduling • I/O modes • Record Parsing • Indexing

  20. An in-depth study (2) • By carefully tuning these factors, the overall performance of Hadoop can be comparable to that of parallel database systems. • Possible to build a cloud data processing system that is both elastically scalable and efficient [5] Dawei Jiang, Beng Chin Ooi, Lei Shi, Sai Wu: The Performance of MapReduce: An In-depth Study. PVLDB 3(1): 472-483 (2010)

  21. Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden: Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. ICDE 2010:657-668

  22. Problem proposed • Problem: Node failures on distributed database system • Faults may be common on large clusters of machines • Existing solution: Aborting and (possibly) restarting the query • A reasonable approach for short OLTP-style queries • But it’s time-wasting for analytical (OLAP) warehouse queries

  23. MapReduce-style fault tolerance for a SQL database • Break up a SQL query (or program) into smaller, parallelizable subqueries • Adapt the load balancing strategy of greedy assignment of work

  24. Osprey • A middleware implementation of MapReduce-style fault tolerance for a SQL database

  25. SQL procedure Query Query Transformer Subquery Subquery Subquery Subquery Subquery Subquery Subquery Subquery Subquery PWQ A PWQ B PWQ :partition work queue PWQ C Scheduler Execution Result Result Merger

  26. Mapreduce Online Evaluation Plateform

  27. Construction School Of Information Renmin University Of China

  28. Cloud data management

  29. Four new principles in Cloud-based data management

  30. New principle in cloud dta management(1) • Partition Everythingand key-value storage • 切分万物以治之 • 1st normal form cannot be satisfied

  31. New principle in cloud dta management (2) • Embrace Inconsistency • 容不同乃成大同 • ACID properties are not satisfied

  32. New principle in cloud dta management (3) • Backup everything with three copies • 狡兔三窟方高枕 • Guarantee 99.999999% safety

  33. New principle in cloud dta management (4) • Scalable and high performance • 运筹沧海量兼容

  34. Cloud data management • 切分万物以治之 • Partition Everything • 容不同乃成大同 • Embrace Inconsistency • 狡兔三窟方高枕 • Backup data with three copies • 运筹沧海量兼容 • Scalable and high performance

  35. BigTable: A Distributed Storage System for Structured Data

  36. Introduction BigTable is a distributed storage system for managing structured data. Designed to scale to a very large size Petabytes of data across thousands of servers Used for many Google projects Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … Flexible, high-performance solution for all of Google’s products

  37. Why not just use commercial DB? Scale is too large for most commercial databases Even if it weren’t, cost would be very high Building internally means system can be applied across many projects for low incremental cost Low-level storage optimizations help performance significantly Much harder to do when running on top of a database layer

  38. Goals Want asynchronous processes to be continuously updating different pieces of data Want access to most current data at any time Need to support: Very high read/write rates (millions of ops per second) Efficient scans over all or interesting subsets of data Efficient joins of large one-to-one and one-to-many datasets Often want to examine data changes over time E.g. Contents of a web page over multiple crawls

  39. BigTable Distributed multi-level map Fault-tolerant, persistent Scalable Thousands of servers Terabytes of in-memory data Petabyte of disk-based data Millions of reads/writes per second, efficient scans Self-managing Servers can be added/removed dynamically Servers adjust to load imbalance

  40. Basic Data Model A BigTable is a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents Good match for most Google applications

  41. WebTable Example Want to keep copy of a large collection of web pages and related information Use URLs as row keys Various aspects of web page as column names Store contents of web pages in the contents: column under the timestamps when they were fetched.

  42. Rows Name is an arbitrary string Access to data in a row is atomic Row creation is implicit upon storing data Rows ordered lexicographically Rows close together lexicographically usually on one or a small number of machines

  43. Rows (cont.) Reads of short row ranges are efficient and typically require communication with a small number of machines. Can exploit this property by selecting row keys so they get good locality for data access. Example: math.gatech.edu, math.uga.edu, phys.gatech.edu, phys.uga.edu VS edu.gatech.math, edu.gatech.phys, edu.uga.math, edu.uga.phys

  44. Columns Columns have two-level name structure: family:optional_qualifier Column family Unit of access control Has associated type information Qualifier gives unbounded columns Additional levels of indexing, if desired

  45. Timestamps Used to store different versions of data in a cell New writes default to current time, but timestamps for writes can also be set explicitly by clients Lookup options: “Return most recent K values” “Return all values in timestamp range (or all values)” Column families can be marked w/ attributes: “Only retain most recent K values in a cell” “Keep values until they are older than K seconds”

  46. Implementation – Three Major Components Library linked into every client One master server Responsible for: Assigning tablets to tablet servers Detecting addition and expiration of tablet servers Balancing tablet-server load Garbage collection Many tablet servers Tablet servers handle read and write requests to its table Splits tablets that have grown too large

  47. Tablets Large tables broken into tablets at row boundaries Tablet holds contiguous range of rows Clients can often choose row keys to achieve locality Aim for ~100MB to 200MB of data per tablet Serving machine responsible for ~100 tablets Fast recovery: 100 machines each pick up 1 tablet for failed machine Fine-grained load balancing: Migrate tablets away from overloaded machine Master makes load-balancing decisions