1 / 17

Building BI App on Cloud

Building BI App on Cloud. Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com. Yahoo is the most Visited Site on the Internet 600M+ Unique Visitors per Month Billions of Page Views per Day Billions of Searches per Month Billions of Emails per Month Terabytes of Data per Day!

lexine
Download Presentation

Building BI App on Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building BI App on Cloud Rohit Chatter Sr. Architect@Yahoo! rohitc@yahoo-inc.com

  2. Yahoo is the most Visited Site on the Internet 600M+ Unique Visitors per Month Billions of Page Views per Day Billions of Searches per Month Billions of Emails per Month Terabytes of Data per Day! And we crawl the Web 100+ Billion Pages 5+ Trillion Links Petabytes of data Reading 100 Terabytes could be overwhelming Yahoo! BigData Scale

  3. Yahoo! Search Scale Manages campaigns, creates ad listings, bids for keywords Types in a search query on Yahoo or affiliate site (aka the Publisher) Passes search query to the ad platform for servable ad listings Ad serving returns relevant & available ads matching the search query Shows ads returned by ad serving Clicks on Ad

  4. Business Model Performance, Credit Summary Daily, Weekly, Monthly & Yearly Performance, Budget Headroom, AM performance, competitive analysis Daily, Weekly, Monthly & Yearly Performance, Feature Adoption Daily, Hourly, Weekly, Monthly & Yearly Daily, Weekly, Monthly & Yearly Competitive analysis, cross sell, upsell, performance Daily, Hourly, Weekly, Monthly & Yearly

  5. Hour Glass Model – A Perspective Home Grown App What if analysis and deep dive data analysis Excellence & Strategic Home Grown App Level 1 & 2 analysis Improvement & Alignment Business Perfomance monitoring Tactical & Operational reporting RDBMS Facts Granular aggregates Most granular data- event level model

  6. BI on Cloud [1000ft view] Functional View What is computed where Apache Web Server Load balanced web BI Tool/Home Grown App Server – BI layer Derived Metrics – CTR, Depth, RPM, Coverage Oracle RDBMS BI Aggregates (H,D,W,M) Aggregates & Metadata layer Rollups, Type 2 Dimension, Alerts & Messaging Hadoop Grid + PIG Cloud Metrics Impressions, Revenue, Clicks, Conversions, Quality Score, Top keywords Utility Computing Build Aggregates Data Source Dimension & Fact Data – 100+ Gigabytes/Day

  7. BI on Cloud – Screen Shots

  8. CUBE on Hadoop?

  9. Tradition Home Grown Tools I-CUBE MicroStrategy Oracle ETL/ Aggregation ART HADOOP APOLLO FEEDS

  10. Game Changer – Hbase & Schema Home Grown Tools I-CUBE BI Tool Aggregation in HIVE HBASE JDBC/ODBC Hiveserver HADOOP ART

  11. How we do? Number Game Size – 360GB Format – RCFile Rows – 14.7 Bilion Mappers – 562 Reducers – 436 Elapsed Time <= 30 mins • Htable – Schema Less • Use Hbase Incrementor - incrementColumnValue for Weekly & MTD • Hive Windowing UDF to generate flattened daily row • Carefully choose Rowkey • SCD – Comes free • Performance – Physical file Hfile by table & Column Family

  12. Challenge@Hand BIG DATA Hadoop/RDBMS SLA

  13. What users love? – Excel & Pivot • Features • Allows quick analysis of large data • Creates neat, informative summaries without writing complex functions • Excellent charting options.

  14. But “Hang” on a minute? – BIG DATA? What if I need to Pivot Having few Million Record Or maybe Billion records

  15. Our Answer – Hadoop Pivot Number Game Size – 360GB Format – RCFile Rows – 14.7 Bilion Mappers – 670 Reducers – 30 Elapsed Time – 251 secs [< 5 mins] Voila – Back to Excel

  16. Questions?

  17. Unified Web BI Portal GRID GRID Based Report Web Server BI Web Server Other Tools TRAD I T I ONAL BI App Server Web Services Data Access Layer [ ODBC/PL/SQL API] App Server ,Grid Launcher Box Oracle RAC 8 Node 60TB Scheduler Metadata Hive + PIG – Query Engine Oracle ETL Server Facts on HDFS [Rcfile] Dimensions HBase Hadoop HDFS Grid – Daily Feeds & Aggregates Hadoop HDFS – Hourly Feeds

More Related