1 / 25

A NOSQL Study: Apache Cassandra

A NOSQL Study: Apache Cassandra. Shujaat Hussain. Data Model. A single column. Data Model. A single row. Data Model. CAP Theorem. Consistency –the system is in a consistent state after an operation Availability –the system is “always on”, no downtime

amie
Download Presentation

A NOSQL Study: Apache Cassandra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A NOSQL Study: Apache Cassandra ShujaatHussain

  2. Data Model A single column

  3. Data Model A single row

  4. Data Model

  5. CAP Theorem • Consistency –the system is in a consistent state after an operation • Availability –the system is “always on”, no downtime • Partition tolerance–the system continues to function even when split into disconnected subsets (by a network disruption)

  6. Performance vsMySQL w/ 50GB • MySQL • 300ms write • 350ms read • Cassandra • 0.12ms write • 15ms read

  7. Querying: Overview • You need a key or keys: • Single: key=‘a’ • Range: key=‘a’ through ’f’ • And columns to retrieve: • Slice: cols={bar through kite} • By name: key=‘b’ cols={bar, cat, llama} • Nothing like SQL “WHERE col=‘faz’”

  8. Digg is a social news site that allows people to discover and share content from anywhere on the Internet by submitting stories and links, and voting and commenting on submitted stories and links.

  9. Problems • Terabytes of data; high transaction rate (reads dominated) • Multiple clusters • Management nightmare (high effort, error prone) • Unsatisfied availability requirements (geographic isolation) • Solution • Cassandra as primary data store • Datacenter and rack-aware replication

  10. Twitter is a social networking and microblogging service that enables its users to send and read tweets, text-based posts of up to 140 characters. • Terabytes of data, ~1,000,000 ops/s

  11. Inbox Search • 100 TB • 160 nodes • 1/2 billion writes per day (2yr old number?)

  12. Pros • Advantages • Massive scalability • High availability • Lower cost (than competitive solutions at that scale) • (usually) predictable elasticity • Schema flexibility, sparse & semi-structured data

  13. Cons • Disadvantages • Limited query capabilities (so far) • Eventual consistency is not intuitive to program for • Makes client applications more complicated • No standardizatrion • Portability might be an issue • Insufficient access control

More Related