1 / 62

Cassandra Training

Cassandra Training. Introduction & Data Modeling. Aims. By the end of today you should know: How Cassandra organises data How to configure replicas How to choose between consistency and availability How to efficiently model data for both reads and writes

cynara
Download Presentation

Cassandra Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cassandra Training Introduction & Data Modeling

  2. Aims • By the end of today you should know: • How Cassandra organises data • How to configure replicas • How to choose between consistency and availability • How to efficiently model data for both reads and writes • You need to consider Active-Active scenarios • Who to ask to help you & sign off on your data model • HINT: Ask Neil directly or email harch@expedia.com. Introduction to Cassandra

  3. Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra

  4. Elevator Pitch What? Write path optimised Eventually consistent (ms) Distributed Hash Table Highly durable Tunable consistency Introduction to Cassandra

  5. DHT 101 Each physical node is assigned a token Nodes own the range from the previous token Introduction to Cassandra

  6. Cassandra Write Path The coordinator will send the update to two nodes, starting at the owning node and working clockwise Introduction to Cassandra

  7. Cassandra Write Path 128-bit hash used to compute partition key Keys are therefore distributed randomly around the ring If Unavailable - Hinted Handoff Introduction to Cassandra

  8. Cassandra Write Path • SSTables are sequential and immutable • Data may reside across SSTables • SSTables are periodically compacted together Introduction to Cassandra

  9. Cassandra Read Path Data read command sent to closest replica - snitch Digest commands sent to other replicas – CL Read Repair Chance 10% - digest all replicas Introduction to Cassandra

  10. Start & Interrogate C* • vagrant box add dse.boxhttp://htraining.s3.amazonaws.com/dse.box • mkdir ~/vagrant • curl http://htraining.s3.amazonaws.com/vagrant-dse.tar.gz > ~/vagrant/dse.tar.gz • cd ~/vagrant && tar xzvfdse.tar.gz • cd dse && vagrant up • vagrant ssh node1 • nodetool ring Introduction to Cassandra

  11. Cassandra Read Path Read Mechanics Find Candidate SSTables - Bloom Filters Seek Through SSTables Memory Mapped Files Check Memtable -> minimisesstables for best efficiency Introduction to Cassandra

  12. Deletion& Tombstones Deleted data marked as removed – tombstone Stops zombie data – distributed system Tombstones collected after a few days – configurable Introduction to Cassandra

  13. Brewer’s Theorem Distributed Data – only 2 at a time – Consistency Availability Partition Tolerance Introduction to Cassandra

  14. Brewer’s Theorem CA - normal operation, no partition, consistency and availability provided Introduction to Cassandra

  15. Brewer’s Theorem AP - partition occurs, maintaining two mutable, disconnected state copies breaks consistency, availability is conserved Introduction to Cassandra

  16. Brewer’s Theorem CP - partition occurs, to maintain consistency we need to take one side offline, sacrificing availability Introduction to Cassandra

  17. Tuneable Consistency Cassandra Consistency Level Specify node number to agree on read/write Choose consistency or availability: CL.LOCAL_QUORUM, CL.ONE Eventual consistency will bring both sides into agreement eventually Introduction to Cassandra

  18. Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra

  19. Data Model Keyspace Analogous to Database/Schema Segregate Applications Replication configured at this level Introduction to Cassandra

  20. Data Model Column Family Analogous to Table Contains many rows Caches configurable at this level Introduction to Cassandra

  21. Data Model Row Each one has a partition key - hash Has many columns– up to 2Bn Columns don’t have to be defined ahead of time Rows in the same CF can have different columns No sorting by rows, model ordering in rows Introduction to Cassandra

  22. Data Model Columns Sorted by name before being written to SSTable Name and Value are typed Values can be type-validated Column update is timestamped Can have TTL Introduction to Cassandra

  23. Data Model Counter Columns Distributed counters Can get false counts Introduction to Cassandra

  24. Data Model Super Columns – Don’t Use Blob of columns stored inside a single column Have to read and write whole blob Memory intensive Conflicts resolved for whole blob - bad Introduction to Cassandra

  25. Secondary Indices Can define an index on a column Cassandra will maintain an inverted index Use sparingly Low Cardinality Columns Only Often times better to maintain own view Introduction to Cassandra

  26. Thrift vs CQL Thrift Original interface, hash style syntax CQL SQL-like syntax but highly limited Sent over Thrift but plans for own protocol Introduction to Cassandra

  27. Scaling Cassandra Imagine RF=3, Quorum, Nodes=6 Each query impacts 2 nodes sync Each write will touch all 3 nodes, though async To scale writes add more nodes To scale reads, add more replicas Introduction to Cassandra

  28. Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra

  29. Data Modelling - Concepts Rows in same CF will live on different nodes High cost of multi-get De-normalise your data into rows Don’t Put Consistent Load on Single Row Will heat up replica nodes Introduction to Cassandra

  30. Data Modelling - Concepts Writes to Single Row Atomic & Isolated Columns are Ordered Column Range Slicing Efficient Mutating data often needs compaction tuning Introduction to Cassandra

  31. Wide Rows Efficient Reads Store how you want to fetch Fetch most efficient over few rows Store what you want to fetch in few rows Introduction to Cassandra

  32. Time Series Use Timestamp for Column Name – ordered Range slicing efficient Can limit row length by using date partition key e.g. 20121004 Introduction to Cassandra

  33. Composite Columns Composite Column e.g. time1:log_class, time1:log_message, time2:log_class, time2:log_message Introduction to Cassandra

  34. Time Series Writing to a Single Row Hotspots Use Round Robin Over Rows e.g. 20121004:1, 20121004:2, etc… Introduction to Cassandra

  35. Compound Keys Compound Key in CQL3 Partition Key is the row key Compound Key = Partition Key + Composite Key e.g. partition key = 20121004, composite key = time1 20121004 => time1:name, time1:msg, time2:name, time2:msg Introduction to Cassandra

  36. Agenda – 100ft • Quick Introduction • Data Structures • Efficient Data Modeling • Data Modeling Examples Introduction to Cassandra

  37. Working with CQL • cqlsh -3 192.168.33.21 • CREATE KEYSPACE my_app_data • WITH strategy_class = SimpleStrategy • AND strategy_options:replication_factor = 2; • DESCRIBE KEYSPACE my_app_data; Introduction to Cassandra

  38. Compound Keys USE my_app_data; CREATE COLUMNFAMILY logs ( day text, -- partition key log_idtimeuuid, -- clustering column log_class text, log_message text, primary key (day, log_id) ); DESCRIBE columnfamilies; Introduction to Cassandra

  39. Compound Keys INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’,‘2013-06-04 10:05:00’, ‘error’, ‘itbroke’) USING CONSISTENCY ONE; INSERT INTO logs (day,log_id,log_class,log_message) VALUES (‘20130604’, ‘2013-06-04 11:05:00’, ‘error’, ‘itbrokeagain’) USING CONSISTENCY QUORUM; Introduction to Cassandra

  40. Compound Keys SELECT * FROM logs USING CONSISTENCY ONE WHERE day=‘20130604’; SELECT * FROM logs USING CONSISTENCY QUORUM WHERE day=‘20130604’ AND log_id > ‘2013-06-04 11:00:00’; TRY WITH CL.TWO: vagrant suspend node2 Setting CL and range querying columns, losing consistency Introduction to Cassandra

  41. Compound Keys cassandra-cli -h 192.168.33.21 usemy_app_data; list logs; See the raw Cassandra data Introduction to Cassandra

  42. Code Example - Clients Hector Solid Java Client In Use in Production Round Robin Node Discovery Introduction to Cassandra

  43. Code Example - Clients Astyanax Netflix Open Source Library Simpler APIs Introduction to Cassandra

  44. Code Example Example: Storing Payment Methods https://github.com/neilbeveridge/example-compoundkeys Introduction to Cassandra

  45. Code Example Requirements Store 1-10 payment methods Use a single row Introduction to Cassandra

  46. Code Example Non-CQL Define a composite column class public static final class Composite { private @Component(ordinal = 0) String paymentUuid; private @Component(ordinal = 1) String field; Introduction to Cassandra

  47. Code Example Writing Data UUID paymentUUID = TimeUUIDUtils.getUniqueTimeUUIDinMillis(); String sPaymentUUID = paymentUUID.toString(); batch.withRow(PAYMENTS_CF, userId) .putColumn(new Composite(sPaymentUUID, "pvtoken"), paymentInfo.pvToken, null) .putColumn(new Composite(sPaymentUUID, "name"), paymentInfo.name, null) .putColumn(new Composite(sPaymentUUID, "number"), paymentInfo.number, null) Introduction to Cassandra

  48. Code Example Reading Data Need some logic to handle record boundaries //handle the payment info boundary if (lastSeen != null && !column.getName().getPaymentUuid().equals(lastSeen)) { payments.add(payment); payment = new PaymentInfo(); payment.paymentUUID= UUID.fromString(column.getName().paymentUuid); } lastSeen= column.getName().getPaymentUuid(); Introduction to Cassandra

  49. Code Example A Bit Messy Introduction to Cassandra

  50. Code Example CQL3 Need to define a Schema Cassandra needs it to split up the row for us Introduction to Cassandra

More Related