1 / 30

BigTable & MapReduce

BigTable & MapReduce. http://net.pku.edu.cn/~wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 1 2/16/2014. Google File System. Google File System. 一个 master ,若干个 chunkserver ,若干个 client 存储大文件( GB-TB ) 一个文件由若干个定长块( chunk , 64MB ) 块是普通 linux 文件,有若干个复本( replica ). Google File System.

powellm
Download Presentation

BigTable & MapReduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BigTable & MapReduce http://net.pku.edu.cn/~wbia 黄连恩 hle@net.pku.edu.cn 北京大学信息工程学院 12/16/2014

  2. Google File System

  3. Google File System • 一个master,若干个chunkserver,若干个client • 存储大文件(GB-TB) • 一个文件由若干个定长块(chunk,64MB) • 块是普通linux文件,有若干个复本(replica)

  4. Google File System • 对一个文件块,在不同机器上存储3个备份 • 只要不是这3个机器同时宕掉,该文件块就是可恢复的

  5. BigTable

  6. Google’s Motivation – Scale! • Scale Problem • Lots of data • Millions of machines • Different project/applications • Hundreds of millions of users • Storage for (semi-)structured data • No commercial system big enough • Couldn’t afford if there was one • Low-level storage optimization help performance significantly • Much harder to do when running on top of a database layer

  7. Bigtable • Distributed multi-level map • Fault-tolerant, persistent • Scalable • Thousands of servers • Terabytes of in-memory data • Petabyte of disk-based data • Millions of reads/writes per second, efficient scans • Self-managing • Servers can be added/removed dynamically • Servers adjust to load imbalance

  8. Real Applications

  9. Data Model • a sparse, distributed persistent multi-dimensional sorted map (row, column, timestamp) -> cell contents

  10. Data Model • Rows • Arbitrary string • Access to data in a row is atomic • Ordered lexicographically

  11. Data Model • Column • Tow-level name structure: • family: qualifier • Column Family is the unit of access control

  12. Data Model • Timestamps • Store different versions of data in a cell • Lookup options • Return most recent K values • Return all values

  13. Data Model The row range for a table is dynamically partitioned Each row range is called a tablet Tablet is the unit for distribution and load balancing

  14. APIs • Metadata operations • Create/delete tables, column families, change metadata • Writes • Set(): write cells in a row • DeleteCells(): delete cells in a row • DeleteRow(): delete all cells in a row • Reads • Scanner: read arbitrary cells in a bigtable • Each row read is atomic • Can restrict returned rows to a particular range • Can ask for just data from 1 row, all rows, etc. • Can ask for all columns, just certain column families, or specific columns

  15. Typical Cluster Shared pool of machines that also run other distributed applications

  16. Building Blocks • Google File System (GFS) • stores persistent data (SSTable file format) • Scheduler • schedules jobs onto machines • Chubby • Lock service: distributed lock manager • master election, location bootstrapping • MapReduce (optional) • Data processing • Read/write Bigtable data

  17. Chubby • {lock/file/name} service • Coarse-grained locks • Each clients has a session with Chubby. • The session expires if it is unable to renew its session lease within the lease expiration time. • 5 replicas, need a majority vote to be active • Also an OSDI ’06 Paper

  18. Implementation • Single-master distributed system • Three major components • Library that linked into every client • One master server • Assigning tablets to tablet servers • Detecting addition and expiration of tablet servers • Balancing tablet-server load • Garbage collection • Metadata Operations • Many tablet servers • Tablet servers handle read and write requests to its table • Splits tablets that have grown too large

  19. Implementation

  20. Tablets • Each Tablets is assigned to one tablet server. • Tablet holds contiguous range of rows • Clients can often choose row keys to achieve locality • Aim for ~100MB to 200MB of data per tablet • Tablet server is responsible for ~100 tablets • Fast recovery: • 100 machines each pick up 1 tablet for failed machine • Fine-grained load balancing: • Migrate tablets away from overloaded machine • Master makes load-balancing decisions

  21. How to locate a Tablet? • METADATA: Key: table id + end row, Data: location • Aggressive Caching and Prefetching at Client side Given a row, how do clients find the location of the tablet whose row range covers the target row?

  22. Tablet Assignment Each tablet is assigned to one tablet server at a time. Master server keeps track of the set of live tablet servers and current assignments of tablets to servers. When a tablet is unassigned, master assigns the tablet to an tablet server with sufficient room. It uses Chubby to monitor health of tablet servers, and restart/replace failed servers.

  23. Tablet Assignment • Chubby • Tablet server registers itself by getting a lock in a specific directory chubby • Chubby gives “lease” on lock, must be renewed periodically • Server loses lock if it gets disconnected • Master monitors this directory to find which servers exist/are alive • If server not contactable/has lost lock, master grabs lock and reassigns tablets • GFS replicates data. Prefer to start tablet server on same machine that the data is already at

  24. Refinement – Locality groups & Compression • Locality Groups • Can group multiple column families into a locality group • Separate SSTable is created for each locality group in each tablet. • Segregating columns families that are not typically accessed together enables more efficient reads. • In WebTable, page metadata can be in one group and contents of the page in another group. • Compression • Many opportunities for compression • Similar values in the cell at different timestamps • Similar values in different columns • Similar values across adjacent rows

  25. Performance - Scaling Not Linear! WHY? • As the number of tablet servers is increased by a factor of 500: • Performance of random reads from memory increases by a factor of 300. • Performance of scans increases by a factor of 260.

  26. Not linearly? • Load Imbalance • Competitions with other processes • Network • CPU • Rebalancing algorithm does not work perfectly • Reduce the number of tablet movement • Load shifted around as the benchmark progresses

  27. MapReduce Application

  28. 1. Top 10 Problem • Lots of data • paper,author, contents • million papers, million authors, millions of possible terms (occuring in contents) • Problem • Top 10 terms for each author • Top 10 authors for each term • Use MapReduce to solve the problem

  29. 2. Naive Bayes • Lost of cases • Each case has many features, and YES or NO • Problem • Likelihoods of features

  30. Thank You! Q&A

More Related