1 / 64

Introduction to cloud computing

Introduction to cloud computing. Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net. Advanced MapReduce Application Reference: Jimmy Lin http://www.umiacs.umd.edu/~jimmylin/cloud-2008-Fall/schedule.html. Managing Dependencies.

kera
Download Presentation

Introduction to cloud computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China www.jiahenglu.net

  2. Advanced MapReduce Application • Reference: Jimmy Lin • http://www.umiacs.umd.edu/~jimmylin/cloud-2008-Fall/schedule.html

  3. Managing Dependencies • Remember: Mappers run in isolation • You have no idea in what order the mappers run • You have no idea on what node the mappers run • You have no idea when each mapper finishes • Tools for synchronization: • Ability to hold state in reducer across multiple key-value pairs • Sorting function for keys • Partitioner • Cleverly-constructed data structures

  4. Motivating Example • Term co-occurrence matrix for a text collection • M = N x N matrix (N = vocabulary size) • Mij: number of times i and j co-occur in some context (for concreteness, let’s say context = sentence) • Why? • Distributional profiles as a way of measuring semantic distance • Semantic distance useful for many language processing tasks e.g., Mohammad and Hirst (EMNLP, 2006)

  5. MapReduce: Large Counting Problems • Term co-occurrence matrix for a text collection= specific instance of a large counting problem • A large event space (number of terms) • A large number of events (the collection itself) • Goal: keep track of interesting statistics about the events • Basic approach • Mappers generate partial counts • Reducers aggregate partial counts

  6. First Try: “Pairs” • Each mapper takes a sentence: • Generate all co-occurring term pairs • For all pairs, emit (a, b) → count • Reducers sums up counts associated with these pairs • Use combiners!

  7. “Pairs” Analysis • Advantages • Easy to implement, easy to understand • Disadvantages • Lots of pairs to sort and shuffle around (upper bound?)

  8. Another Try: “Stripes” a → { b: 1, c: 2, d: 5, e: 3, f: 2 } (a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2 a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } • Idea: group together pairs into an associative array • Each mapper takes a sentence: • Generate all co-occurring term pairs

  9. Another Try: “Stripes” a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } + • Reducers perform element-wise sum of associative arrays

  10. “Stripes” Analysis • Advantages • Far less sorting and shuffling of key-value pairs • Can make better use of combiners • Disadvantages • More difficult to implement • Underlying object is more heavyweight • Fundamental limitation in terms of size of event space

  11. Cluster size: 38 cores Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3), which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)

  12. Conditional Probabilities • How do we compute conditional probabilities from counts? • Why do we want to do this? • How do we do this with MapReduce?

  13. P(B|A): “Pairs” Reducer holds this value in memory (a, *) → 32 (a, b1) → 3 (a, b2) → 12 (a, b3) → 7 (a, b4) → 1 … (a, b1) → 3 / 32 (a, b2) → 12 / 32 (a, b3) → 7 / 32 (a, b4) → 1 / 32 … • For this to work: • Must emit extra (a, *) for every bn in mapper • Must make sure all a’s get sent to same reducer (use Partitioner) • Must make sure (a, *) comes first (define sort order)

  14. P(B|A): “Stripes” a → {b1:3, b2 :12, b3 :7, b4 :1, … } • Easy! • One pass to compute (a, *) • Another pass to directly compute P(B|A)

  15. Synchronization in Hadoop • Approach 1: turn synchronization into an ordering problem • Sort keys into correct order of computation • Partition key space so that each reducer gets the appropriate set of partial results • Hold state in reducer across multiple key-value pairs to perform computation • Approach 2: construct data structures that “bring the pieces together” • Each reducer receives all the data it needs to complete the computation

  16. Issues and Tradeoffs • Number of key-value pairs • Object creation overhead • Time for sorting and shuffling pairs across the network • Size of each key-value pair • De/serialization overhead • Combiners make a big difference! • RAM vs. disk and network • Arrange data to maximize opportunities to aggregate partial results

  17. Data Types in Hadoop Writable Defines a de/serialization protocol. Every data type in Hadoop is a Writable. WritableComprable Defines a sort order. All keys must be of this type (but not values). Concrete classes for different data types. IntWritableLongWritable Text …

  18. Complex Data Types in Hadoop • How do you implement complex data types? • The easiest way: • Encoded it as Text, e.g., (a, b) = “a:b” • Use regular expressions to parse and extract data • The hard way: • Define a custom implementation of WritableComprable • Must implement: readFields, write, compareTo • Computationally efficient, but slow for rapid prototyping

  19. Yahoo! PNUTS and Hadoop

  20. Yahoo! Cloud Stack EDGE Horizontal Cloud Services YCS YCPI Brooklyn … WEB Horizontal Cloud Services VM/OS yApache PHP App Engine APP Provisioning (Self-serve) Monitoring/Metering/Security Horizontal Cloud Services VM/OS Serving Grid … Data Highway STORAGE Horizontal Cloud Services Sherpa MOBStor … BATCH Horizontal Cloud Services Hadoop …

  21. Yahoo! CCDI Thrust Areas • Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services • Multiple hosts may be multiplexed onto the same physical machine. • Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities • Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval • Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP Rest of today’s talk

  22. Web Data Management • CRUD • Point lookups and short scans • Index organized table and random I/Os • $ per latency • Scan oriented workloads • Focus on sequential disk I/O • $ per cpu cycle Structured record storage (PNUTS/Sherpa) Large data analysis (Hadoop) • Object retrieval and streaming • Scalable file storage • $ per GB Blob storage (SAN/NAS)

  23. The World Has Changed • Web serving applications need: • Scalability! • Preferably elastic • Flexible schemas • Geographic distribution • High availability • Reliable storage • Web serving applications can do without: • Complicated queries • Strong transactions

  24. PNUTS / SHERPA To Help You Scale Your Mountains of Data

  25. Yahoo! Serving Storage Problem • Small records – 100KB or less • Structured records – lots of fields, evolving • Extreme data scale - Tens of TB • Extreme request scale - Tens of thousands of requests/sec • Low latency globally - 20+ datacenters worldwide • High Availability - outages cost $millions • Variable usage patterns - as applications and users change 27

  26. What is PNUTS/Sherpa? A 42342 E A 42342 E B 42521 W B 42521 W C 66354 W D 12352 E F 15677 E A 42342 E E 75656 C B 42521 W C 66354 W C 66354 W D 12352 E D 12352 E E 75656 C E 75656 C F 15677 E F 15677 E CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Structured, flexible schema Geographic replication Parallel database Hosted, managed infrastructure 29

  27. A 42342 E A 42342 E A 42342 E B 42521 W B 42521 W B 42521 W C 66354 W C 66354 W C 66354 W D 12352 E D 12352 E D 12352 E E 75656 C E 75656 C E 75656 C F 15677 E F 15677 E F 15677 E What Will It Become? Indexes and views

  28. Design Goals Consistency Per-record guarantees Timeline model Option to relax if needed Multiple access paths Hash table, ordered table Primary, secondary access Hosted service Applications plug and play Share operational cost Scalability Thousands of machines Easy to add capacity Restrict query language to avoid costly queries Geographic replication Asynchronous replication around the globe Low-latency local access High availability and fault tolerance Automatically recover from failures Serve reads and writes despite failures 32

  29. Technology Elements Applications Tabular API PNUTS API • PNUTS • Query planning and execution • Index maintenance • Distributed infrastructure for tabular data • Data partitioning • Update consistency • Replication YCA: Authorization • YDOT FS • Ordered tables • YDHT FS • Hash tables • Tribble • Pub/sub messaging • Zookeeper • Consistency service 33

  30. Data Manipulation Per-record operations Get Set Delete Multi-record operations Multiget Scan Getrange Web service (RESTful) API 34

  31. Tablets—Hash Table Name Description Price 0x0000 $12 Grape Grapes are good to eat $9 Limes are green Lime $1 Apple Apple is wisdom $900 Strawberry Strawberry shortcake 0x2AF3 $2 Orange Arrgh! Don’t get scurvy! $3 Avocado But at what price? Lemon How much did you pay for this lemon? $1 $14 Is this a vegetable? Tomato 0x911F $2 The perfect fruit Banana $8 Kiwi New Zealand 0xFFFF 35

  32. Tablets—Ordered Table Name Description Price A $1 Apple Apple is wisdom $3 Avocado But at what price? $2 Banana The perfect fruit $12 Grape Grapes are good to eat H $8 Kiwi New Zealand Lemon $1 How much did you pay for this lemon? Limes are green Lime $9 $2 Orange Arrgh! Don’t get scurvy! Q $900 Strawberry Strawberry shortcake $14 Is this a vegetable? Tomato Z 36

  33. Flexible Schema

  34. Detailed Architecture Remote regions Local region Clients REST API Routers Tribble Tablet Controller Storage units 38

  35. Tablet Splitting and Balancing Storage unit Tablet Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Tablets may grow over time Overfull tablets split Shed load by moving tablets to other servers 39

  36. QUERY PROCESSING 40

  37. Accessing Data Record for key k Get key k Record for key k 1 2 3 4 Get key k SU SU SU 41

  38. Bulk Read {k1, k2, … kn} Get k1 Get k2 Get k3 Scatter/ gather server 1 2 SU SU SU 42

  39. Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Grapefruit…Pear? Grapefruit…Lime? Storage unit 1 Canteloupe Storage unit 3 Lime Storage unit 2 Strawberry Storage unit 1 Lime…Pear? Router Storage unit 1 Storage unit 2 Storage unit 3 Range Queries in YDOT • Clustered, ordered retrieval of records Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon Apple Avocado Banana Blueberry Strawberry Tomato Watermelon Lime Mango Orange Canteloupe Grape Kiwi Lemon

  40. Updates Write key k SU SU SU 6 5 2 4 1 8 7 3 Write key k Sequence # for key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k 44

  41. ASYNCHRONOUS REPLICATION AND CONSISTENCY 45

  42. Asynchronous Replication 46

  43. Goal: Make it easier for applications to reason about updates and cope with asynchrony What happens to a record with primary key “Alice”? Consistency Model Record inserted Delete Update Update Update Update Update Update Update v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Time Generation 1 As the record is updated, copies may get out of sync. 47

  44. Example: Social Alice East Record Timeline West ___ Busy Free Free

  45. Consistency Model Read Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 In general, reads are served using a local copy 49

  46. Consistency Model Read up-to-date Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 But application can request and get current version 50

  47. Consistency Model Read ≥ v.6 Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 Or variations such as “read forward”—while copies may lag the master record, every copy goes through the same sequence of changes 51

  48. Consistency Model Write Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 Achieved via per-record primary copy protocol (To maximize availability, record masterships automaticlly transferred if site fails) Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors) 52

  49. Consistency Model Write if = v.7 ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1 Test-and-set writes facilitate per-record transactions 53

  50. Consistency Techniques • Per-record mastering • Each record is assigned a “master region” • May differ between records • Updates to the record forwarded to the master region • Ensures consistent ordering of updates • Tablet-level mastering • Each tablet is assigned a “master region” • Inserts and deletes of records forwarded to the master region • Master region decides tablet splits • These details are hidden from the application • Except for the latency impact!

More Related