Building BI App on Cloud
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Building BI App on Cloud PowerPoint PPT Presentation


  • 48 Views
  • Uploaded on
  • Presentation posted in: General

Building BI App on Cloud. Rohit Chatter Sr. [email protected]! [email protected] Yahoo is the most Visited Site on the Internet 600M+ Unique Visitors per Month Billions of Page Views per Day Billions of Searches per Month Billions of Emails per Month Terabytes of Data per Day!

Download Presentation

Building BI App on Cloud

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Building bi app on cloud

Building BI App on Cloud

Rohit Chatter

Sr. [email protected]!

[email protected]


Building bi app on cloud

Yahoo is the most Visited Site on the Internet

600M+ Unique Visitors per Month

Billions of Page Views per Day

Billions of Searches per Month

Billions of Emails per Month

Terabytes of Data per Day!

And we crawl the Web

100+ Billion Pages

5+ Trillion Links

Petabytes of data

Reading 100 Terabytes could be overwhelming

Yahoo! BigData Scale


Building bi app on cloud

Yahoo! Search Scale

Manages campaigns, creates ad listings, bids for keywords

Types in a search query on Yahoo or affiliate site (aka the Publisher)

Passes search query to the ad platform for servable ad listings

Ad serving returns relevant & available ads matching the search query

Shows ads returned by ad serving

Clicks on Ad


Business model

Business Model

Performance, Credit Summary

Daily, Weekly, Monthly & Yearly

Performance, Budget Headroom, AM performance, competitive analysis

Daily, Weekly, Monthly & Yearly

Performance, Feature Adoption

Daily, Hourly, Weekly, Monthly & Yearly

Daily, Weekly, Monthly & Yearly

Competitive analysis, cross sell, upsell, performance

Daily, Hourly, Weekly, Monthly & Yearly


Building bi app on cloud

Hour Glass Model – A Perspective

Home Grown App

What if analysis and deep dive data analysis

Excellence & Strategic

Home Grown App

Level 1 & 2 analysis

Improvement & Alignment

Business Perfomance monitoring

Tactical & Operational reporting

RDBMS Facts

Granular aggregates

Most granular data- event level model


Building bi app on cloud

BI on Cloud [1000ft view]

Functional View

What is computed where

Apache Web Server

Load balanced web

BI Tool/Home Grown

App Server – BI layer

Derived Metrics – CTR, Depth, RPM, Coverage

Oracle RDBMS

BI Aggregates (H,D,W,M)

Aggregates & Metadata layer

Rollups, Type 2 Dimension, Alerts & Messaging

Hadoop Grid + PIG

Cloud

Metrics

Impressions, Revenue, Clicks,

Conversions, Quality Score, Top keywords

Utility Computing

Build Aggregates

Data Source

Dimension & Fact

Data – 100+ Gigabytes/Day


Building bi app on cloud

BI on Cloud – Screen Shots


Building bi app on cloud

CUBE on Hadoop?


Tradition

Tradition

Home Grown Tools

I-CUBE

MicroStrategy

Oracle

ETL/

Aggregation

ART

HADOOP

APOLLO FEEDS


Game changer hbase schema

Game Changer – Hbase & Schema

Home Grown Tools

I-CUBE

BI Tool

Aggregation in

HIVE

HBASE

JDBC/ODBC

Hiveserver

HADOOP

ART


How we do

How we do?

Number Game

Size – 360GBFormat – RCFileRows – 14.7 Bilion

Mappers – 562Reducers – 436

Elapsed Time <= 30 mins

  • Htable – Schema Less

  • Use Hbase Incrementor - incrementColumnValue for Weekly & MTD

  • Hive Windowing UDF to generate flattened daily row

  • Carefully choose Rowkey

  • SCD – Comes free

  • Performance – Physical file Hfile by table & Column Family


Challenge@hand

[email protected]

BIG DATA

Hadoop/RDBMS

SLA


What users love excel pivot

What users love? – Excel & Pivot

  • Features

  • Allows quick analysis of large data

  • Creates neat, informative summaries without writing complex functions

  • Excellent charting options.


But hang on a minute big data

But “Hang” on a minute? – BIG DATA?

What if I need to Pivot

Having few Million Record

Or maybe Billion records


Our answer hadoop pivot

Our Answer – Hadoop Pivot

Number Game

Size – 360GB

Format – RCFile

Rows – 14.7 Bilion

Mappers – 670

Reducers – 30

Elapsed Time – 251 secs [< 5 mins]

Voila – Back to Excel


Questions

Questions?


Building bi app on cloud

Unified Web BI Portal

GRID

GRID Based Report

Web Server

BI Web Server

Other

Tools

TRAD

I

T

I

ONAL

BI App Server

Web Services Data Access Layer [ ODBC/PL/SQL API]

App Server ,Grid Launcher Box

Oracle RAC 8 Node

60TB

Scheduler

Metadata

Hive + PIG – Query Engine

Oracle ETL Server

Facts on HDFS [Rcfile]

Dimensions

HBase

Hadoop HDFS Grid – Daily Feeds & Aggregates

Hadoop HDFS – Hourly Feeds


  • Login