1 / 28

Cassandra DB

Cassandra DB. Not Only SQL. Table of Content. Background and history Used Applications What is Cassandra? – Overview Replication & Consistency Writing, Reading, Querying and Sorting API’s & Installation World Database in Cassandra Using Hector API Administration tools. Background.

chuck
Download Presentation

Cassandra DB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cassandra DB Not Only SQL

  2. Table of Content • Background and history • Used Applications • What is Cassandra? – Overview • Replication & Consistency • Writing, Reading, Querying and Sorting • API’s & Installation • World Database in Cassandra • Using Hector API • Administration tools

  3. Background • Influential Technologies: • Dynamo – Fully distributed design - infrastructure • BigTable – Sparse data model

  4. Other NoSql databases NoSql Big Data NoSql • MongoDB • Neo4J • HyperGra • Memcach • Tokyo Ca • Redis • CouchDB • Hypertab • Cassandra • Riak • Voldemort • HBase

  5. Bigtable / Dynamo Bigtable Dynamo • Hbase • Hypertable • Riak • Voldemort Cassandra Combination of Both

  6. CAP Theorem • Consistency • Availability • Partition Tolerance

  7. Applications • Facebook • Google Code • Apache • Digg • Twitter • Rackspace • Others…

  8. What Is Cassandra? • O(1) node lookup • Key – Value Store • Column based data store • Highly Distributed – decentralized (no master\slave) • Elasticity • Durable, Fault-tolerant - Replications • Sparse • ACID NoSQL!

  9. Overview – Data Model • Keyspace • Uppermost namespace • Typically one per application • Column • Basic unit of storage – Name, Value and timestamp • ColumnFamily • Associates records of a similar kind • Record-level Atomicity • Indexed • SuperColumn • Columns whose values are columns • Array of columns • SuperColumnFamily • ColumnFamily whose values are only SuperColumns

  10. Examples • Column - City: ORANJESTAD {"id": 1, "name": "ORANJESTAD", "population": 33000, "capital": true} • SuperColumns – Country: Aruba {"id": "aa", "name": "Aruba", "fullName": "Aruba“, "location": "Caribbean, island in the Caribbean Sea, north of Venezuela", "coordinates": { "latitudeType": "N", "latitude": 12.5, "longitudeType": "W", "longitude": 69.96667}, ….

  11. Replication & Consistency • Consistency Level is based on Replication Factor (N), nor the number of nodes in the system. • The are a few options to set How many replicas must respond to declare success • Query all replicas on every read • Every Column has a value and a timestamp – latest timestamp wins • Read repair – read one replica and check the checksum/timestamp to verify • R(number of nodes to read from) + W(number of nodes to write on) > N (number of nodes)

  12. The Ring - Partitioning • Each NODE has a single, unique TOKEN • Each NODE claims a RANGE of its neighbors in the ring • Partitioning – Map from Key Space to Token – Can be random or Order Preserving • Snitching – Map from Nodes to Physical Location

  13. Writing • No Locks • Append support without read ahead • Atomicity guarantee for a key (in a ColumnFamily) • Always Writable!!! • SSTables – Key/data – SSTable file for each column family • Fast

  14. Reading • Wait for R responses • Wait for N – R responses in the background and perform read repair • Read multiple SSTables • Slower than writes (but still fast)

  15. Compare with MySQL (RDBMS) • Compare a 50GB Database: • MySQL • ~300ms write • ~350ms read • Cassandra • ~0.12ms write • ~15ms read

  16. Queries • Single column • Slice • Set of names / range of names • Simple slice -> columns • Super slice -> supercolumns • Key range

  17. Sorting • Sorting is set on writing • Sorting is set by the type of the Column/Supercolumn keys • Sorting/keys Types • Bytes • UTF8 • Ascii • LexicalUUID • TimeUUID

  18. Drawbacks • No joins (for speed) • Not able to sort at query time • Not really supports sql (altough some API’s support it on a very small portion)

  19. API’s Many API’s for large number of languages includes C++, Java, Python, PHP, Ruby, Erlang, Haskell, C#, Javascript and more… • Thrift interface – Driver level interface – hard to use. • Hector – a java Cassandra client – simple Column based client – does what Cassandra is intended to do. • Kundera – JPA supported java client – tries to translate JPA classes and attributes to Cassandra – good on inserts, hard and problematic still with queries.

  20. Cassandra Installation • Install prerequisite – basically the latest java se release • Extract the Cassandra Zip files to your requested path • Run Bin/cassandra.but –f • Cassandra node is up and running

  21. World database in cassandra • World - Keyspace • Countries – SuperColumn Family • CountryDetails – SuperColumn • Border – SuperColumns • Coordinates – SuperColumn • GDP – SuperColumn • Language – SuperColumns • Cities – Column Family

  22. Using Hector API - definitions • Creating a Cassandra Cluster : • Adding a keyspace: • Adding a Column: Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160"); columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); BasicColumnFamilyDefinitioncolumnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE); columnFamilyDefinition.setName(CITY_CF); // ColumnFamily Name columnFamilyDefinition.addColumnDefinition(columnDefinition);

  23. Using Hector API - definitions • Adding a SuperColumn: • Adding all definition to cluster: BasicColumnFamilyDefinitionsuperCfDefinition = new BasicColumnFamilyDefinition(); superCfDefinition.setKeyspaceName(WORLD_KEYSPACE); superCfDefinition.setName(COUNTRY_SUPER); superCfDefinition.setColumnType(ColumnType.SUPER); ColumnFamilyDefinitioncfDefStandard = new ThriftCfDef(columnFamilyDefinition); ColumnFamilyDefinitioncfDefSuper = new ThriftCfDef(superCfDefinition); KeyspaceDefinitionkeyspaceDefinition= HFactory.createKeyspaceDefinition(WORLD_KEYSPACE, "org.apache.cassandra.locator.SimpleStrategy", 1, Arrays.asList(cfDefStandard, cfDefSuper)); cluster.addKeyspace(keyspaceDefinition);

  24. Using Hector API - inserting • Creating a Column Template • Adding a Row into a Column Family ColumnFamilyTemplate<String, String> template = new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer); ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key"); updater.setString(“key", "value"); try { template.update(updater); } catch (HectorException e) { // do something ... }

  25. Using Hector API - inserting • Creating a Super Column Template • Adding a Row into a SuperColumnFamily SuperCfTemplate<String,String, String> template = new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator, columnFamilyName, stringSerializer, stringSerializer, stringSerializer); SuperCfUpdater<String, String, String>updater = template.createUpdater("a key"); HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”); superColumn.setString(“column name”, value); superColumn.update(); try{ template.update(updater); } catch (HectorException e) { // do something ... }

  26. Using Hector API - reading • Reading all Rows and it’s columns from a Column Family (Using CQL) • Reading all columns from a Row in a SuperColumn Family CqlQuery<String,String,String> cqlQuery = new CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer, stringSerializer); cqlQuery.setQuery("select * from City"); QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute(); SuperCfTemplate<String,String,String> superColumn = HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”); SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key"); Collection<String> columnNames = superRes.getSuperColumns();

  27. Using Hector API - reading • Reading a SuperColumn from a Row in a SuperColumnFamily • Every query as options to get part of the rows – by setting start value and end value (the rows are sorted on inserting), and part of the columns by setting the column names explicitly SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator, stringSerializer, stringSerializer, stringSerializer, stringSerializer); query.setColumnFamily(“SuperColumnFamily”); query.setKey(“key"); query.setSuperName(“SuperColumnName"); QueryResult<HSuperColumn<String, String, String>> result = query.execute(); for (HColumn<String, String> col : result.get().getColumns()) { String name = col.getName(); String value = col.getValue(); }

  28. Administration tools • Cassandra – node activator • Nodetool – bootstrapping and monitoring • Cassandra-cli – Application Console • Sstable2json - Export • Json2sstable - Import

More Related