1 / 21

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database. Marty Gubar Oracle Big Data Product Management. Session Goals. Introduce the Oracle Big Data Connectors

melina
Download Presentation

Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle Big Data Connectors: High-Performance Integration for Hadoop and Oracle Database Marty Gubar Oracle Big Data Product Management

  2. Session Goals • Introduce the Oracle Big Data Connectors • Understand how they provide high-performance connectivity between Oracle Database & Oracle Big Data Appliance • See the Connectors in action!

  3. Oracle’s Big Data Platform Visualize & Decide Organize& Discover Stream Acquire Analyze

  4. Oracle’s Big Data Platform Hadoop Oracle Database Oracle Big Data Connectors

  5. Oracle Big Data Connectors Components • Oracle SQL Connector for HDFS • Oracle Loader for Hadoop • Oracle R Connector for Hadoop • Oracle Data Integrator Application Adapters for Hadoop

  6. What is HDFS? Primary storage system underlying Hadoop Fault tolerant, scalable, highly available Designed to be well-suited to distributed processing Is superficially structured like a UNIX file system Big Data Appliance HDFS

  7. What is Hive? Provides structure over files Metadata describes tables/columns HiveQL offers basic SQL access to data Hive converts HiveQL queries into MapReduce jobs Big Data Appliance HDFS CREATE EXTERNAL TABLE myTable ( movieId STRING, hits INT ) ROW FORMAT DELIMITED… SELECT movieId, sum(hits)FROM myTable GROUP BY movieId

  8. Oracle SQL Connector for HDFS Access Hive tables and HDFS files using Oracle external tables Setup access automatically Combine data from two appliances Access or load data in parallel Hadoop Oracle Database SQL Query External Table OSCH ODCH ODCH

  9. Performance Comparison • Fuse DFS Load speed comparison CPU usage comparison

  10. Key Benefits • Uniquely enables access to HDFS data files from Oracle Database • Performance • 12 TB/hour from Oracle Big Data Appliance to Oracle Exadata • 5x – 20x faster than comparable third party products • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle

  11. Demonstration:Using Oracle SQL Connector for Hadoop

  12. Oracle Loader for Hadoop Read target table metadata from the database Connect to the database from reducer nodes, load into database partitions in parallel (JDBC or direct path) Oracle Loader for Hadoop Partition, sort, and convert into Oracle data types on Hadoop Shuffle/Sort Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats Handles skew in input data to maximize performance Online and offline modes (offline: create Oracle Data Pump files on HDFS) MAP Reduce MAP Reduce MAP MAP Shuffle/Sort Reduce MAP Reduce MAP Reduce

  13. Automatically Handle Input Data Skew • Distribute load evenly across reduce tasks • All reducers do approximately the same amount of work • Avoids slowdown because of unbalanced reducer loads • Maximizes performance • Data is sampled to determine optimal partitioning of map output keys • Load Balancing across Reducers

  14. Performance Comparison Third party products Load speed comparison CPU usage comparison

  15. Key Benefits • Load directly from HDFS, Hive tables, … into Oracle Database without intermediate staging files • Performance • 10x faster than comparable third party products • Offload database server processing to Hadoop • Minimizes impact on performance SLAs of production applications • Easy to use for Oracle DBAs and Hadoop developers • Developed and supported by Oracle

  16. Leverage Both Connectors Oracle Data Pump files in HDFS queried (and loaded if necessary) with Oracle SQL Connector of HDFS. Offline load: Data pre-processed and written as Oracle Data Pump format in HDFS. Oracle SQL connector for hdfs Oracle Loader for Hadoop Shuffle/Sort MAP Reduce MAP SQL Query Reduce MAP External Table HDFS Client OSCH MAP Shuffle/Sort Reduce ODCH ODCH MAP Reduce Oracle Database MAP Reduce

  17. Demonstration:Using Oracle Loader for Hadoop

  18. For more information Search OTN for… • Big Data • Data Warehousing Blog • Oracle Big Data Interactive e-Book • Oracle Big Data YouTube videos

More Related