1 / 34

Oracle In-Database MapReduce : When Hadoop Meets Exadata

Oracle In-Database MapReduce : When Hadoop Meets Exadata. Kuassi Mensah Director Product Management.

juliet
Download Presentation

Oracle In-Database MapReduce : When Hadoop Meets Exadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle In-Database MapReduce: When Hadoop Meets Exadata KuassiMensahDirector Product Management

  2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

  3. Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution

  4. Big Data Concept MapReduce Infrastructure RDBMS MapReduce (phase I) DataMining (phase II) Any Data MapReduce Convention: Process Data Locally

  5. Big Data In Real Life, Today MapReduce Infrastructure RDBMS Unstructured Data (HDFS, NoSQL, etc) MapReduce (phase I) DataMining (phase II) RDBMS Structured Data

  6. Problems with Big Data Today • Shipping Data from RDBMS to MapReduce Infrastructure • Too Big to Move • Operational Issues • Data Correctness/Loss • Lack of Enterprise Class Security on MapReduce Infrastructure • Breaking MapReduce Convention • Cost of MapReduce Infrastructure or Storage • Lack of MapReduce Development Skills • Lack of MapReduce Deployment Skills

  7. Big Data with In-Database MapReduce Hadoop Cluster RDBMS MapReduce DataMining Unstructured Data (HDFS, NoSQL, etc) In-Database MapReduce MapReduce Structured Data (RDBMS) DataMining

  8. In-Database MapReduce Trends • Hybrid Platforms: DBMS + MapReduce • Projects/Products/Initiatives • DataStax: Cassandra + Hadoop • HadaptHadoopDB: Postgress + Hadoop • Greenplum HD • MongoDBMapReduce: JavaScript • Aster Data / TeraData • Limitations • Dependency on a Hadoop infrastructure in addition to DBMS • Source compatibility: Need to rewrite Hadoop jobs in different lang.

  9. Oracle’s Big Data Strategy MapReduce APIs Across Data Infrastructure Hadoop, R, SQL RDBMS ( In-Database MapReduce) Big Data Appliance Weblogs Sales Records

  10. Oracle In-Database MapReduceIntegration with Oracle Big Data Solution

  11. Oracle In-Database MapReduce Feature of Oracle database 12c releases In-Database Container for Hadoop (currently Beta) SQL MapReduce (12.1.0.1)

  12. Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution

  13. SQL MapReduceDeclarative MR Analytics Collection of Existing and New Features • SQL Analytic functions • User-defined Aggregates functions • Parallel Pipelined Table Functions • SQL Pattern Matching MATCH_RECOGNIZE -- new!

  14. SQL Pattern Matching Stock price Find 10-day periods where a stock price has “double-bottomed” Find event A (“privilege revoked”) followed by 3 or more occurences of event B (“attempted login”) within 1 minute • SQL Pattern Matching provides expressive syntax and fast execution for pattern matching • New SQL construct: MATCH_RECOGNIZE • Define patterns using regular expression syntax 9 12 1 days 19

  15. SQL Pattern Matching Sessionization SELECT user_id, session_idstart_time, no_of_events, duration FROM Events MATCH_RECOGNIZE ( PARTITION BY User_ID ORDER BY Time_Stamp MEASURES match_number() session_id, count(*) as no_of_events, first(time_stamp) start_time, last(time_stamp) - first(time_stamp) duration PATTERN (b s*) DEFINE s as (s.Time_Stamp - prev(Time_Stamp) <= 10) ) ORDER BY user_id, session_id;

  16. DEMO

  17. Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution

  18. Vanilla Hadoop Hadoop Cluster Physical partitions (DataNodes) Mappers Materialization of Intermediate data Reducers

  19. In-Database Container for Hadoop Components • Apache Hadoop • Task execution: In-Database JVM • Data partitioning & task scheduling: PQ engine • Data storage: Table, external table, object view. • Data type mapping: TableReader, TableWriter

  20. In-Database Container For Hadoop RDBMS Server Table partitions Mappers processes Pipelining Intermediate data Reducers processes Parallel DML

  21. In-DB Cont. 4 Hadoopvs Vanilla Hadoop RDBMS Server Hadoop Cluster Physical vs Logical data partitions Mappers Materialization vs Pipelining Intermediate data Reducers Parallel DML

  22. In-Database Container for HadoopSummary • A “Hadoop container” in the RDBMS engine: no Hadoop cluster required. • Data processing in-situ: no need to ship data to a separate infrastructure. • API and Source-compatibility: accept HadoopMappers and Reducers as-is • Java interface: invoke Hadoop jobs a-la vanilla Hadoop • SQL interface: Map & Reduce steps in SQL statements

  23. In-Database Container for Hadoop SQL and Java interfaces public class WordCount { public static void main() throws Exception { /* Setup the parameters and run the job */ …… job.init(); job.run(); } SELECT * FROM TABLE (HREDUCE_JP_WORDCOUNT(:ConfKey, CURSOR(SELECT * FROM TABLE (HMAP_JP_WORDCOUNT(:ConfKey, CURSOR(SELECT * from InTable))))))

  24. DEMO

  25. Pipelining Hadoop Jobs Through the SQL Interface Pipelining Hadoop steps without intermediate materialization select * from table (HREDUCE_JP_JOB2 (:Confkey2, .... (HMAP_JP_JOB2 (:ConfKey2, .... (HREDUCE_JP_JOB1 (:ConfKey1, .... (HMAP_JP_JOB1 (:ConfKey1, ...), ))));

  26. In-Database Container for Hadoop Projected Features • Reuse Mappers & Reducers (including R-generated) • Dynamic Data Partitioning • Apache Hadoop API 2.00 • Custom WritablesHadoop types • Serialized Data Formats • InputFormats: HDFS, HBase, Others • Java interface (Similar to Vanilla Hadoop Driver). • SQL interface: Hadoop Job Steps in SQL queries • Mahout

  27. Develop/Deploy with In-Db Cont. 4 Hadoop Reuse existing Mappers & Reducers Develop HadoopMappers & Reducers from scratch Create or Update Hadoop Job Configuration file Load all Java code in RDBMS and create Call Specs Invoke Hadoop job via Java or SQL interfaces. Populate output table with parallel INSERT

  28. Agenda • Big Data & In-Database MapReduce • SQL Map Reduce • In-Database Container for Hadoop • Oracle’s Big Data Solution

  29. Oracle’s Big Data Solution Oracle Endeca Information Discovery Oracle Big Data Appliance Oracle Exadata Oracle Exalytics InfiniBand InfiniBand Oracle Real-TimeDecisions Acquire Organize Analyze Decide

  30. Oracle In-Database MapReduce Summary • Declarative Analytics (SQL MapReduce) • Programmatic Analytics (Complex Algorithms, Hadoop) • MapReduce Jobs steps in SQL Queries. • Custom extensions (InputFormats) • RDBMS QoS (e.g., Enterprise Class Security) • Developers and DBAs friendly • Seamless integration with Oracle’s Big Data solution

More Related