120 likes | 133 Views
This presentation gives an overview of the Apache AsterixDB project. It explains the AsterixDB database in terms of its functionality and capabilities. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache AsterixDB ? ● A Big Data Management System (BDMS) ● Open source / Apache 2.0 license ● Manages semi-structured data ● Has a NoSQL style data model (ADM) ● Has an expressive and declarative query language (AQL) ● Uses a runtime query execution engine, Apache Hyracks ● Support for querying and indexing external data (e.g. HDFS)
What Is Apache AsterixDB ? ● Has two query languages (SQL++ and AQL) ● Scale-tested on up to 1000+ cores and 500+ disks ● Basic transactional (concurrency and recovery) capabilities ● Partitioned LSM-based data storage and indexing ● Supports efficient data ingestion ● Exploits internal data partitioning and indexes – To avoid scanning data sets – When processing queries
Asterix Data Model (ADM) ●Unusual extensions in red
Asterix Built In Functions ● Numeric Functions ● Object Functions ● String Functions ● Aggregate Functions ● Binary Functions ● Comparison Functions ● Spatial Functions ● Type Functions ● Similarity Functions ● Conditional Functions ● Tokenizing Functions ● Miscellaneous Functions ● Temporal Functions
AsterixDB HTTP API ● Examples of HTTP API queries using curl
AsterixDB CSV Load Example ● Create a dataverse / type and dataset
AsterixDB Full Text Queries ● Searching for words in text rather than sub strings
AsterixDB External Data ● Built in adapters for external data sets – localfs – hdfs – socket – socket_client – twitter_push – twitter_pull – rss
AsterixDB User Defined Functions ● UDF's written in Java, stored in libs ● Use managix command to – Stop Asterix instance – Install UDF library – Start Asterix instance ● Now UDF's in lib can be executed ● See simplified example on next slide ● For testlib library use against tweet feed
AsterixDB User Defined Functions use dataverse feeds; drop feed ProcessedTwitterFeed if exists; create secondary feed ProcessedTwitterFeed from feed TwitterFeed apply function testlib#addHashTags; connect feed ProcessedTwitterFeed to dataset ProcessedTweets; use dataverse feeds; for $i in dataset ProcessedTweets limit 10 return $i;
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration