1 / 44

Big Data and the BI Wild West

Big Data and the BI Wild West. Don’t Bring an Elephant to a Gun Fight!. Paul Groom. Tools Processes Objectives. Why Business Intelligence?. Community. Acquire. View. Learn. Action. What is Business Intelligence?. Numbers Tables Charts I ndicators. Time - History - Lag. Access

elita
Download Presentation

Big Data and the BI Wild West

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data and the BI Wild West Don’t Bring an Elephant to a Gun Fight! Paul Groom

  2. Tools Processes Objectives

  3. Why Business Intelligence? Community Acquire View Learn Action

  4. What is Business Intelligence? Numbers Tables Charts Indicators Time - History - Lag Access - to view (portal) - to data - to depth - Control/Secure Consumption - digestion …with easeand simplicity

  5. Business [Intelligence] Desires More timely Lower latency Richer data model More granularity More users interactions Self service

  6. View and generate

  7. Got mobile? 200 million Employees bring their own device to work 50% Companies BYOD orgs have had a security breach Nearly half Of the workforce will be made up of millennials by 2020 1/3 Have broken or would break corporate policy on BYOD

  8. Data flow

  9. Disruption: Data Discovery tools Dynamic access Drill unlimited

  10. BI tools have plateaued …again Decision Support (Reporting) in late 90’s …led to data mining Business Intelligence of 00’s …leading to analytics and data science

  11. More math …a lot more math

  12. The drive for deeper understanding Campaign Management Machine learning algorithms Dynamic Simulation Behaviour modelling Clustering Analytical Complexity Dynamic Interaction Statistical Analysis Fraud detection Reporting & BPM Technology/Automation

  13. Behind the numbers create external script LM_PRODUCT_FORECAST environment rsint receives ( SALEDATE DATE, DOW INTEGER, ROW_ID INTEGER, PRODNO INTEGER, DAILYSALES INTEGER ) partition by PRODNO order by PRODNO, ROW_ID sends ( R_OUTPUT varchar ) isolate partitions script S'endofr( # Simple R script to run a linear fit on daily sales prod1<-read.csv(file=file("stdin"), header=FALSE,row.names=1) colnames(prod1)<-c("DOW","ID","PRODNO","DAILYSALES") dim1<-dim(prod1) daily1<-aggregate(prod1$DAILYSALES, list(DOW = prod1$DOW), median) daily1[,2]<-daily1[,2]/sum(daily1[,2]) basesales<-array(0,c(dim1[1],2)) basesales[,1]<-prod1$ID basesales[,2]<-(prod1$DAILYSALES/daily1[prod1$DOW+1,2]) colnames(basesales)<-c("ID","BASESALES") fit1=lm(BASESALES ~ ID,as.data.frame(basesales)) forecast<-array(0,c(dim1[1]+28,4)) colnames(forecast)<-c("ID","ACTUAL","PREDICTED","RESIDUALS") select sum(sales) from sales_history where year = 2006 and month = 5 and region=1; select total_sales from summary where year = 2006 and month = 5 and region=1; select Trans_Year, Num_Trans, count(distinct Account_ID) Num_Accts, sum(count( distinct Account_ID)) over (partition by Trans_Year order by Num_Trans) Total_Accts, cast(sum(total_spend)/1000 as int) Total_Spend, cast(sum(total_spend)/1000 as int) / count(distinct Account_ID) Avg_Yearly_Spend, rank() over (partition by Trans_Year order by count(distinct Account_ID) desc) Rank_by_Num_Accts, rank() over (partition by Trans_Year order by sum(total_spend) desc) Rank_by_Total_Spend from( select Account_ID, Extract(Year from Effective_Date) Trans_Year, count(Transaction_ID) Num_Trans, sum(Transaction_Amount) Total_Spend, avg(Transaction_Amount) Avg_Spend from Transaction_fact where extract(year from Effective_Date)<2009 and Trans_Type='D' and Account_ID<>9025011 and actionid in (select actionid from DEMO_FS.V_FIN_actions where actionoriginid =1) group by Account_ID, Extract(Year from Effective_Date) ) Acc_Summary group by Trans_Year, Num_Trans order by Trans_Yeardesc, Num_Trans; select dept, sum(sales) from sales_fact Where period between date ‘01-05-2006’ and date ‘31-05-2006’ group by dept having sum(sales) > 50000;

  14. It’s all about getting work done Tasks evolving: Used to be simple fetch of value Bottlenecks Bottlenecks Then was compute dynamic aggregate Now complex algorithms!

  15. Time to influence Reaction – what? – potential value Action – opportunity - interaction BI is becoming democratized Time to influence

  16. BI Wild West Data

  17. Business [Intelligence] Desiresin relation to Big Data More timely Lower latency Richer data model More granularity More users interactions Self service

  18. The Data Warehouse?

  19. Realities

  20. Reports against the DW are just plain dull , boring even!

  21. And then came…

  22. Hadoop ticks many but not all the boxes a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a

  23. Stomped on costs Made economics of scale practical

  24. New economics = New attitude just grab and retain all data the data science team will dig into it later Talk to BI team about plugging into Hadoop – should be simple? Call IT: Why SQL so limited? No need to triage before storage No need to pre-process before storage i.e. no need to align to storage

  25. Early bridge Building Early Hadoop integration tools

  26. Wanted Dead or Alive The No SQL Posse SQL • The new bounty hunters: • Drill • Impala • Pivotal • Stinger

  27. still …but Hadoop too slow for interactive BI …loss of train-of-thought

  28. For once technology is on our side …oh and BTW RAM is cheap!

  29. Hadoop is… Lots of these Hadoop inherently disk oriented Not so many of these Typically low ratio of CPU to Disk

  30. ‘Flash’ washing is not the solution

  31. Analytics needs low latency, no I/O wait

  32. Analytical Platform Reference Architecture Application & Client Layer All BI Tools All OLAP Clients Excel Analytical Platform Layer Near-line Storage (optional) Reporting Persistence Layer Kognitio Storage Cloud Storage HadoopClusters Enterprise Data Warehouses Legacy Systems

  33. Cognos SQL MDX

  34. Reach out, actively select and pull back to consume

  35. “No SQL” graduates to “not-only-SQL” SQL remains preferred data access language … for business community SQL can encapsulate other processing - in-line Python, R, Java etc. MPP everything – get more work done

  36. Discovery Production

  37. Big Data + Hadoop + in-memory for BI a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a

  38. Wild West 1865 to 1890 "The Significance of the Frontier in American History" (1893) a thesis by Fredrick Jackson Turner.The West not as a particular geographic place, but a frontier process - as a series of Wests on a receding frontier line - the point where savagery meets civilization. For Turner, American history was largely a tale of people leaving settled areas for the frontier, and their struggle to survive in new lands.

  39. Driving the golden spike for Hadoop and BI

  40. connect contact kognitio.com Paul Groom Chief Innovation Officer paul.groom@kognitio.com kognitio.tel kognitio.com/blog Michael Hiskey VP, Marketing & Business Development michael.hiskey@kognitio.com twitter.com/kognitio linkedin.com/companies/kognitio Steve Friedberg - press contact MMI Communications steve@mmicomm.com tinyurl.com/kognitio youtube.com/kognitio Kognitio is a Platinum Sponsor of the Hadoop Summit – see us at booth #31 – center!

More Related