140 likes | 264 Views
Bigtable is a distributed storage system designed for managing petabytes of data across thousands of commodity servers. It emphasizes scalability, high performance, and availability, making it ideal for applications like Google Analytics, Google Earth, and personalized search solutions. Utilizing a unique data model based on row keys, column keys, and timestamps, Bigtable also incorporates a reliable distributed lock service called Chubby. This architecture supports high throughput and flexibility, allowing efficient management of large datasets while minimizing bottlenecks.
E N D
Bigtable: A Distributed Storage System for Structured Data 0256803 高睿鴻
Introduction • Petabytes of data across thousands of commodity servers. • Goal: wide applicability, scalability, high performance , and high availability. • Product: Google Analytics, Google Earth, Personalized Search ….
Data Model • Row key, column key, timestamp.
Chubby • Highly-available and persistent distributed lock service . • Ensure that there is at most one active master at any time. • Discover tablet servers and finalize tablet server deaths. • Store Bigtable schema information. • Store access control lists.
Tablet • Table consists of a set of tablets. • Tablet contains all data associated with a row range. • 100 ~ 200 MB
SSTable • SSTable file format is used internally to store Bigtable data. • Provides a persistent, ordered immutable map from keys to values. • Disk v.s memory
Compactions • Size of memtable increase • Minor compaction process (old memtable→ SSTable→ GFS) • Merging compaction (old SSTables + memtable→ new SSTable)
Performance • A tablet server executes approximately 1200 reads per second. • Significant drop in per-server throughput (1~50)
Performance • Imbalance in load in multiple server configuration • Other processes contending for CPU and network • Throughput 100-fold V.S 500-fold servers • Transfer 64KB block over the network for every 1000byte read
Real Application • Google Analytics • JavaScript, raw click table, summary table • Google Earth • Satellite imagery, imagery table • Personalized search • Web search, images, news
Conclusion • Substantial amount of flexibility from designing their data model for Bigtable • Can remove bottlenecks and inefficiencies