1 / 12

Apache Fluo

This presentation gives an overview of the Apache Fluo project. It explains Apache Fluo in terms of it's architecture, functionality and transactions. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/

semtechs
Download Presentation

Apache Fluo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What Is Apache Fluo ? ● For large scale data set incremental updates ● Open source Apache 2.0 license ● Based upon Apache Accumulo – Uses Hadoop HDFS to store data – Uses ZooKeeper for configuration – Partitions tables into tablets ● It is a distributed system ● Supports cross node transactions

  2. What Is Apache Fluo ? ● Allows monitoring of large datasets to – Identify small changes – Join changes into the larger data set – Without processing all data ● Transactions allows many current changes – Without data corruption ● Fluo uses code based observers which – Act on table column changes ● Offers a Fluo Java based API

  3. What Is Apache Fluo ? ● Use of Fluo is code based and low level ● Fluo uses Hadoop YARN to run its processes ● Fluo uses ZooKeeper to – Store its meta data – Store its state information ● Fluo data is stored in Fluo tables on Accumulo ( HDFS) – Same structure as Accumulo except – Row has no timestamps

  4. Fluo Architecture

  5. Fluo Architecture ● Large scale computation through small scale transactions ● Clients access Fluo through Java API ● Clients ingest data through the API ● Application Oracle processes apply transaction timestamps ● Application worker processes run user code ● User code/observers monitor column changes ● Multiple workers can run the same observers ● Transactions change data, snapshots read data

  6. Fluo Architecture ● Fluo provides snapshot isolation ● A snapshot only sees pre committed transactions ● Transaction overlap / collision is possible ● In this case a write skew is possible if – Different keys are concurrently updated ● Fluo supports scanners to read data ranges or spans ● Fluo has a transaction based LoaderExecutor – To aid the loading of data

  7. Fluo Architecture ● Fluo supports incremental processing via ● Notifications – Persistent markers set by a transaction that Indicate – An Observer should run later for a certain row+column ● Observers – User provided code that is registered to – Process notifications for a certain column ●Observer receives row/column that triggered it plus transaction ●Fluo worker processes running across a cluster ● Will execute Observers

  8. Fluo Architecture ● Fluo supports two types of notification ● Strong notification – Guarantee an observer will run at most once – When a column is modified – Even for multiple row+column updates ● Weak notification – Cause an observer to run at least once – Observers may run multiple times and/or concurrently – Based on a single weak notification

  9. Fluo Row Locking

  10. Fluo Row Locking ● For cross node transactions Fluo uses – Accumulo conditional mutations ●Conditional mutations lock entire rows ● On the server side when checking conditions ● Row locks can impact the transaction performance ● May be a problem if – Many transactions will update separate columns in a row – Those transactions are very likely to run concurrently

  11. Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –

  12. Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

More Related