1 / 22

Big Table: Distributed Storage System For Structured Data

Big Table: Distributed Storage System For Structured Data. Sergejs Melderis. 1. Dennis Kafura – CS5204 – Operating Systems. BigTable. Unstructured Data vs. Structured Data. Unstructured data refers to computerized information that either does not have a data model plain text, audio

harlan
Download Presentation

Big Table: Distributed Storage System For Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Table:Distributed Storage System For Structured Data • Sergejs Melderis 1 Dennis Kafura – CS5204 – Operating Systems

  2. BigTable Unstructured Data vs. Structured Data • Unstructured data refers to computerized information that either does not have a data model • plain text, audio • Structured data can be described by data model • Flat • Hierarchical • Network • Relational • Dimensional • Object-relational Dennis Kafura – CS5204 – Operating Systems

  3. BigTable Relational Model and RDBMS • most popular model of organizing structured data • model based on first-order predicate logic • provides a declarative method for specifying data and queries via SQL • data is organized in tables of fixed-length records • variety of open source and commercial implementations • provides ACID properties 3 Dennis Kafura – CS5204 – Operating Systems

  4. BigTable NoSQL • not relational database • no fixed table schemas • no join operations • no sql • flexible and/or no data model • usually do not provide ACID properties • scale horizontally 4 Dennis Kafura – CS5204 – Operating Systems

  5. BigTable BigTable • distributed, high performance, fault-tolerant, NoSql storage system build on top of Google File System • designed to scale to a very large size on low cost commodity hardware • it was designed by Google and used in various projects (web indexing) • the paper was published in 2006 • related implementations • HBase • Hypertable • Apache Cassandra • Neptune 5 Dennis Kafura – CS5204 – Operating Systems

  6. BigTable BigTable Data Model • sparse, distributed, persistent multi-dimensional sorted map • map is indexed by a row key, column family, column key, and a timestamp • { row : { column_family : { column : { timestamp : value } } • } 6 Dennis Kafura – CS5204 – Operating Systems

  7. BigTable Webtable “contents” “anchor:cnnsi.com “anchor:my.look.ca” t6 t9 t9 “com.cnn.www” 7 Dennis Kafura – CS5204 – Operating Systems

  8. BigTable Relational Data Model 8 Dennis Kafura – CS5204 – Operating Systems

  9. student_id Column Qualifier BigTable Student table Row Key Column Family Column Qualifier Dennis Kafura – CS5204 – Operating Systems

  10. crn Column Qualifier BigTable Course table Row Key Column Family Column Qualifier Dennis Kafura – CS5204 – Operating Systems

  11. BigTable Example info:first_name info:last_name info:major courses:96322 courses:96320 “905514” info:course info:title info:instructor_id students:905514 students:905520 “96322” 11 Dennis Kafura – CS5204 – Operating Systems

  12. BigTable Students data view in JSON • { 905514: { info : { first_name : { t1 : Sergejs }, last_name : { t1 : Melderis }, major : { t1 : Comp Science } }, courses : { 96322: { t1 : “YES” }, 96320: { t2 : “NO” } } • } 12 Dennis Kafura – CS5204 – Operating Systems

  13. BigTable Rows • row keys are arbitrary strings up to 64 KB • read and write of data under a single row is atomic • ordered in lexicographic order by row key • row range is dynamically partitioned into blocks called tablets • tablets are units of distribution and loadbalancing 13 Dennis Kafura – CS5204 – Operating Systems

  14. BigTable Columns • Column keys are grouped by column families • Column family is a basic unit of access control • All data stored in a column family is of the same type • Number of column families should be small • There can be unlimited number of columns • Column key is named using family:qualifier 14 Dennis Kafura – CS5204 – Operating Systems

  15. BigTable Timestamps • Bigtable can contain multiple versions of the same data • timestamps are 64-bit integers assigned by Bigtable or client • client can specify to keep up to n versions of data 15 Dennis Kafura – CS5204 – Operating Systems

  16. BigTable Implementation • client library • one master server • distributed lock service called Chubby • many tablet servers containing several tablets • tablet server • handles read and write requests • automatically splits tablets that have grown too large (100 - 200 MB) • client data directly goes to tablet server 16 Dennis Kafura – CS5204 – Operating Systems

  17. BigTable Tablet Location • three-level hierarchy to store tablet location • first level is stored in lock service • root tablet contains the location of metadata tables • metadata tablets contain the location of user tables UserTable1 METADATA tablets Root tablet Lock Service UserTable2 Dennis Kafura – CS5204 – Operating Systems

  18. BigTable Distribution of data • One master server • Chubby distributed lock service • Hundred or thousands of tablet servers • Each tablet contains a contiguous range of rows • Master distributes tablets across of servers • Each tablet server contains tablets with different ranges 18 Dennis Kafura – CS5204 – Operating Systems

  19. BigTable Tablet Representation memtable Read Op Memory GFS tablet log SSTable SSTable Write Op 19 Dennis Kafura – CS5204 – Operating Systems

  20. BigTable Compactions • compaction is a process of writing memtable to SSTable • minor compaction write memtable to SSTable • shrinks the memory usage of the tablet server • reduces the commit log • merging compaction merges several SSTables • major compaction rewrites all SSTables into exactly one SSTable 20 Dennis Kafura – CS5204 – Operating Systems

  21. BigTable API • create, delete tables and column families • write or delete values • look up values from individual rows • scan over a subset of the data in a table 21 Dennis Kafura – CS5204 – Operating Systems

  22. BigTable 22 Dennis Kafura – CS5204 – Operating Systems

More Related