1 / 18

Counters for real-time statistics Aug 2011

Counters for real-time statistics Aug 2011. Quick Cassandra storage primer. Standard columns. Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example: set Users['ecapriolo']['fname']='ed';.

teva
Download Presentation

Counters for real-time statistics Aug 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Counters for real-time statistics Aug 2011

  2. Quick Cassandra storage primer

  3. Standard columns Idempotent writes – last client time stamp wins Store byte [] - can have validators No internal locking Not read before write Example: set Users['ecapriolo']['fname']='ed';

  4. Counter columns Store Integral values only Can be incremented or decremented with single RPC Local read before write Merged on read Example: incr followers['ecapriolo']['x'] by 30

  5. Counters combine powers with: And you get: • composite keys: incr stats['user/date']['page'] by 1; • scale to distribute writes • A distributed system to record events • Pre-caclulated real time stats

  6. Other ways to collect and report Store in files, process into reports Example: data-> hdfs -> hive queries -> reports Light work on front end Heavy on back end Store into relational database Example: data -> rdbms (ind) -> rt queries & reports -> reports Divides work between front end and back end Indexes can become choke points

  7. Example data set url | username | event_time | time_to_serve_millis /page1.htm | edward | 2011-01-02 :04:01:04 | 45 /page1.htm | stacey | 2011-01-02 :04:01:05 | 46 /page1.htm | stacey | 2011-01-02 :04:02:07 | 40 /page2.htm | edward | 2011-01-02 :04:02:45 | 22

  8. “Query” one: hit count bucket by minute page | time | count /page1.htm | 2011-01-02 :04:01 | 2 /page1.htm | 2011-01-02 :04:02 | 1 /page2.htm | 2011-01-02 :04:02 | 1

  9. “Query” two: resources consumed by user per hour user | time | total_time_to_serve edward | 2011-01-02 :04 | 67 stacey | 2011-01-02 :04 | 86

  10. Turn a record line into a pojo class Record { String url,username; Date date; int timeToServe; } Use your imagination here: public static List<Record> readRecords(String file) throws Exception {

  11. writeRecord() Method public static void writeRecord(Cassandra.Client c, Record r) throws Exception { DateFormat bucketByMinute = new SimpleDateFormat("yyyy-MM-dd HH:mm"); DateFormat bucketByDay = new SimpleDateFormat("yyyy-MM-dd"); DateFormat bucketByHour = new SimpleDateFormat("yyyy-MM-dd HH");

  12. “Query” 1 page counts by minute CounterColumn counter = new CounterColumn(); ColumnParent cp = new ColumnParent("page_counts_by_minute"); counter.setName(ByteBufferUtil.bytes (bucketByMinute.format(r.date))); counter.setValue(1); c.add( ByteBufferUtil.bytes( bucketByDay.format(r.date)+"-"+r.url) , cp, counter, ConsistencyLevel.ONE);

  13. “Query” 2 usage by users per hour CounterColumn counter2 = new CounterColumn(); ColumnParent cp2 = new ColumnParent ("user_usage_by_minute"); counter2.setName( ByteBufferUtil.bytes( bucketByHour.format(r.date))); counter2.setValue(r.timeToServe); c.add(ByteBufferUtil.bytes( bucketByDay.format(r.date)+"-"+r.username) , cp2, counter2, ConsistencyLevel.ONE);

  14. How this works

  15. Results [default@counttest] list user_usage_by_minute; ——————- RowKey: 2011-01-02- stacey => (counter=2011-01-02 04, value=86) ——————- RowKey: 2011-01-02- edward => (counter=2011-01-02 04, value=67)

  16. More Results [default@counttest] list page_counts_by_minute; ——————- RowKey: 2011-01-02-/page1.htm => (counter=2011-01-02 04:01, value=2) => (counter=2011-01-02 04:02, value=1) ——————- RowKey: 2011-01-02-/page2.htm => (counter=2011-01-02 04:02, value=1)

  17. Recap Counters pushed work to the “front end” Data is bucketed, sorted, and indexed on insert Data is already “ready” on read Designed around how you want to read data Distributed writes across the cluster Bucketed data by time, user, page, etc. Different then table/index contention point

  18. Questions? Full code at: http://www.jointhegrid.com/highperfcassandra/?cat=7

More Related