1 / 13

Apache Druid

This presentation gives an overview of the Apache Druid project. It covers areas like use cases, features, architecture and users. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/<br><br>Music by <br><br>"Little Planet", composed and performed by Bensound from http://www.bensound.com/

semtechs
Download Presentation

Apache Druid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Druid ? ● Real Time Analytics Database ● Distributed Architecture ● Open Source ● Highly Performant ● Time Series Database ● Apache 2 License ● Written in Java

  2. Druid Use Cases ● User activity and behaviour ● Network flows ● Digital marketing ● Application performance management ● IoT and device metrics ● OLAP and business intelligence For real time data ingestion, fast query and high uptime.

  3. Druid Features ● Column-oriented storage ● Native search indexes ● Streaming and batch ingest ● Flexible schemas ● Time-optimized partitioning ● SQL support ● Horizontal scalability ● Easy operation

  4. Druid Users ● Airbnb ● Outbrain ● Alibaba ● Paypal ● Booking.com ● Pinterest ● Cisco ● Slack ● Ebay ● Twitter ● Hulu ● Walmart ● Lyft ● Yahoo Some of the more famous users among many others

  5. Druid MetaStore ● Stores Metadata about system and data stored ● Can use the following databases – Derby, MySQL, Postgresql ● Stores Meta data information like – Segments, Rules, Config – Tasks, Audit

  6. Druid Deep Storage ● Deep storage persists Druid segment data ● Uses storage like – Local Mounts, AWS S3, HDFS ● Core extensions available from Druid committers ● Extension examples include – Azure, Cassandra, Cloudfiles

  7. Druid Architecture

  8. Druid Architecture 2

  9. Druid Processes ● Historical – store and query historic data ● MiddleManager – ingest new data ● Broker – process client queries ● Coordinator – watch over Historical processes ● Overlord - watch over MiddleManager processes ● Router – optional – provide a unified API gateway

  10. Druid Query ● Druid supports JSON and SQL based queries ● The SQL syntax is as follows ● GROUPING SETS improves efficiency, reduces scanning ● ROLLUP provides grouped data for each level of data ● CUBE provides grouped data for each combination of data

  11. Druid High Availability (HA) ● Use 3 or 5 ZooKeeper nodes on own hardware ● MetaStore use MySQL or Postgresql – With replication and failover ● Use multiple Coordinators and Overlords – Using same metaStore and ZooKeeper ● Scale Brokers horizontally ● Use a load balancer

  12. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – ●See “Mastering Apache Spark” Packt Oct 2015 – ●See “Complete Guide to Open Source Big Data Stack “Apress Jan 2018” – – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – ●Connect on LinkedIn www.linkedin.com/in/mike-frampton-38563020 –

  13. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related