1 / 41

Bigtable : A Distributed Storage System for Structured Data Google, Inc.

Bigtable : A Distributed Storage System for Structured Data Google, Inc. 김 윤호. 1. Introduction 2. Data Model 3. API 4. Building Blocks 5. Implementation. 6. Refinements 7. Performance Evaluation 8. Real Application 9. Lessons 10. Conclusions. WHAT IS THE B I G T A B L E ?.

lona
Download Presentation

Bigtable : A Distributed Storage System for Structured Data Google, Inc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bigtable: A Distributed Storage System for Structured DataGoogle, Inc. 김 윤호

  2. 1. Introduction 2. Data Model 3. API 4. Building Blocks 5. Implementation 6. Refinements 7. Performance Evaluation 8. Real Application 9. Lessons 10. Conclusions

  3. WHAT IS THE B I G T A B L E ?

  4. Robert Therrien

  5. 1. Introduction • Very large size data (Petabyte) • Managing Structured data • Distributed Storage System • Google Earth, Google Finance…… • Bigtable resembles a database. (NoSQL)

  6. 1. Introduction A Bigtable is a Sparse, distributed, persistent Multi dimensional Map. Sorted

  7. 2. Data Model • Rows • Column Family • Timestamp

  8. 2. Data Model – Rows • Lexicographic order • Tablet • Good Locality (Reversed URL)

  9. 2. Data Model – Rows

  10. 2. Data Model – Rows

  11. 2. Data Model – Column Family • Unit of access control • Set of Column keys • Same data type • A few number of Column Family • A number of Columns

  12. 2. Data Model – Timestamp • Multiple versions of the same data • Bigtable, Real time, Client App • Decreasing order

  13. 3. API • NoSQL • Functions for creating and deletingTable, Column family • C++

  14. 3. API - Write • RowMutation

  15. 3. API - Read • Scanner

  16. 4. Building Blocks • Google File System • Store log and data files • SSTable • Chubby

  17. 4. Building Blocks Chubby Master Tablet Server GFS Client Tablet Server SSTable Tablet Server SSTable 참조: 구글을 지탱하는 기술 (나시다 케이스케)

  18. 4. Building Blocks • SSTable • Used internally to store Bigtable data • Provide a persistent, ordered immutable map • Contains a sequence of block (64KB) Data Index

  19. 4. Building Blocks • Chubby • Small distributed file system • Distributed lock service • To Ensure one active master • To store the bootstrap location of Bigtable data • To discover tablet servers and finalize tablet server deaths • To store Bigtable schema information • To store access control lists

  20. 5. Implementation • Tablet Location • Tablet Assignment • Tablet Serving • Compactions

  21. 5. Implementation - Tablet Location • Three-level hierarchy • METADATA • Row = 1KB • Tablet = 128MB • # of METADATA Tablet = • # of User Tablet = • All Capacity = 128MB * = 2EB

  22. 5. Implementation - Tablet Assignment Bigtable • One Tablet - One Tablet server ( GFS) Master Tablet Server Tablet GFS Tablet Info Tablet Info Tablet Tablet Tablet Info Tablet Info Tablet

  23. 5. Implementation - Tablet Serving • Recovery Tablet • Write operation • Read operation

  24. 5. Implementation - Tablet Serving • Memtable • Commit log

  25. Tablet Recovery • Read Metadata from METADATA Table • Metadata contains the list of SSTable • SSTable comprise a tablet and a set of a redo point • Redo points are pointers into any commit logs • Reconstructs the memtable from redo points

  26. Write operation • Check for well-formedness and proper authorization • Write to the Commit log • Insert into the memtable Read operation • Check for well-formedness and proper authorization • Read from merged view of the sequence of SSTable and the memtable • Or Fail

  27. 5. Implementation - Compactions • Minor compaction • Merging compaction • Major compaction

  28. Minor compaction GFS Tablet Server SSTable Memtable Write Op SSTable Memtable SSTable • Shrink the memory usage of the tablet server • Reduce the amount of data that has to be read from the commit log during recovery

  29. Compaction • Merging compaction • Major compaction GFS GFS SSTable SSTable SSTable SSTable SSTable SSTable SSTable SSTable

  30. 5. Implementation - Compactions • Reclaim resources used by deleted data

  31. 6. Refinements • High performance, Availability, Reliability • Locality groups • Compression • Caching for read performance • Bloom filters • Commit-log implementation • Speeding up tablet recovery

  32. Locality groups • Group of multiple column families • A separate SSTable is generated for each locality group • More efficient read • Compression • 2-pass custom compression scheme • Bentley and MCIlroy’s scheme • Fast compression algorithm

  33. Caching for read performance • Two-level caching • Scan Cache • Block Cache Bloom filters • Whether an element is a member of a set • Reduce the number of disk access for read operation

  34. Commit-log implementation • Commit log for each tablet in a separate log file • Single commit log per tablet server Speeding up tablet recovery • Master moves a tablet from one tablet server to another. • Minor compaction

  35. 7. Performance Evaluation • Single tablet-server performance • Scaling

  36. 8. Real Application • Google Analytics • Google Earth • Personalized Search

  37. 8. Real Application - Google Analytics • Help webmasters analyze traffic patterns • # of Visitors, page views, site-tracking reports • Embed a JavaScript program • Two tables • Raw click table (~200TB) • Summary table (~20TB)

  38. 8. Real Application - Google Earth • Table to preprocess data • Set of tables for serving client data

  39. 8. Real Application - Personalized Search • Records user queries and clicks • Web search, images, news • Row – userid • Column family – user action • Timestamp – the time at user action occurred

  40. 9. Lessons • Large distributed systems are vulnerable to many types of failures • Important to delay adding new features • Proper system-level monitoring • The value of simple designs 10. Conclusions • Resource-sharing issues within Bigtable itself

  41. THANK YOU Q & A

More Related