1 / 29

Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight

Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight. Lance Olson Partner Group Program Manager. BRK2557. Agenda. Big data and traditional data warehouse Big data in the cloud Cloud versus on-premises Patterns and case studies HDInsight workloads.

paterson
Download Presentation

Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harnessing the Power of Hadoop: Cloud Scale with Microsoft Azure HDInsight Lance Olson Partner Group Program Manager BRK2557

  2. Agenda Big data and traditional data warehouse Big data in the cloud Cloud versus on-premises Patterns and case studies HDInsight workloads

  3. Big Data vs Traditional DW

  4. Two Approaches to Information Management for Analytics: Top-Down + Bottom-Up Predictive Analytics Prescriptive Analytics Diagnostic Analytics Descriptive Analytics Top-Down (Deductive) Bottom-Up (Inductive) How can we make it happen? VALUE What will happen? Theory Theory Hypothesis Why did it happen? OPTIMIZATION Hypothesis Pattern Observation What happened? Observation Confirmation INFORMATION DIFFICULTY

  5. Data Warehousing Uses A Top-Down Approach Understand Corporate Strategy Implement Data Warehouse Gather Requirements BI and analytic Reporting & Analytics Design Reporting & Analytics Development Business Requirements Dashboards Reporting Data warehouse Dimension Modelling Physical Design ETL Design ETL Development ETL Technical Requirements Data sources Setup Infrastructure Install and Tune OLTP ERP CRM LOB

  6. The “data lake” Uses A Bottom-Up Approach Store all data in native format without schema definition Ingest all data regardless of requirements Do analysis Using analytic engines like Hadoop Web Web Social Social Sensors Sensors Devices Devices Batch queries LOB applications LOB applications Relational Relational Video Video Clickstream Clickstream Interactive queries Real-time analytics Machine Learning Data warehouse

  7. Data Lake + Data Warehouse Better Together BI and analytic What happened? What is happening? Why did it happen? What are key relationships? Web Social Sensors Devices What will happen? What if? How risky is it? What should happen? What is the best option? How can I optimize? Dashboards Reporting Data warehouse LOB applications Relational Video Clickstream ETL Data sources OLTP ERP CRM LOB

  8. Big Data in the Cloud

  9. Why Cloud + Big Data? Data of all Volume Variety, Velocity Massive Compute and Storage Deployment expertise Speed Economics Scale Always Up, Always On Time to value Open and flexible

  10. Why Microsoft Azure? ML Search Data Factory Event Hubs Database HDInsight Stream Analytics DocumentDB Azure Storage Appliances Software • Azure Facts • >4 trillion objects in Azure • 300,000-1M+ requests per second • Double compute and storage every 6 months On-premises Servers

  11. Introducing Azure HDInsight Microsoft’s cloud Hadoop offering 100% open source Apache Hadoop Built on the latest releases across Hadoop (2.6) Up and running in minutes with no hardware to deploy Harness existing .NET and Java skills Utilize familiar BI tools for analysis including Microsoft Excel

  12. Hadoop Is Being Run Everywhere in the World

  13. Cloud and On-Premises “vs” or “+”?

  14. Cloud + On-Premises Hybrid Scenarios Development, Testing, & Pilot IoT Applications Other Azure Services such as BI / ML On-Premises

  15. Use Cases: Let the data decide

  16. Use Cases: Patterns and Case Studies

  17. Rockwell Automation is partnered with one of the six oil and gas super majors to build unmanned internet-connected gas dispensers. Each dispenser emits real-time management metrics allowing them to detect anomalies and predict when proactive maintenance needs to occur. Azure HDInsight Power BI for O365 Data Factory Mobile Device Hive, Pig, Real-time notification Azure Blobs Azure SQL DB Mobile Notification Hub • Store sensor data every 5 minutes • Temperature, pressure, vibration, etc. • Tens of thousands of data points / second

  18. JustGiving wanted to harness the power of their data by using network science to map people’s connections and relationships so that they could connect people with the causes they care about. Based on 15 years of data, the JustGiving GiveGraphis the world’s largest ecosystem of givingbehavior.It contains more than 81 million person nodes, thousands of causes and 285 million connections and is the engine that drives JustGiving’s social platform, enabling levels of personalization and engagement that a traditional infrastructure would be unable to deliver. Activity Feeds Give Graph Agent Azure HDInsight Azure Blobs SQL Server On-premises Real-time Event Service Bus Azure Tables Azure Cache Website + Event store Web API Serves results

  19. Common Hadoop Patterns Single view of entity Customer, Product, Machine, etc. Predictive Analytics Data Scientists and Analysts finding patterns and correlations New models emerge to explain business performance New predictions emerge based on previously disassociated data Data Discovery Large amounts of machine, sensor, clickstream, and geolocation data New value emerges when correlated with data from product, customer, and inventory catalogs

  20. HDInsight Workloads

  21. HDInsight Supports Hive SQL-like queries on Hadoop data in HDInsight HDInsight provides easy-to-use graphical query interface for Hive HiveQL is a SQL-like language (subset of SQL) Hive structures include well-understood database concepts such as tables, rows, columns, partitions Compiled into MapReduce jobs that are executed on Hadoop Dramatic performance gains with Stinger/Tez Stinger is a Microsoft, Hortonworks and OSS driven initiative to bring interactive queries with Hive Brings query execution engine technology from Microsoft SQL Server to Hive Performance gains up to 100x Sample Query Microsoft contribution to Apache code 1400s 32x Speedup 40X Speedup 44.3s 35.1s Hive 10 HDP 1.3 /Hive 11 HDP 2.0 Hadoop 2.0 100x Speedup 15s HDP 2.1

  22. HDInsight Supports HBase NoSQL database on data in HDInsight Columnar, NoSQL database Runs on top of the Azure Blob Stores in HDInsight Provides flexibility in that new columns can be added to column families at any time HMaster Coordination Name Node Region Server Region Server Region Server Region Server Job Tracker Data Node Data Node Data Node Data Node Task Tracker Task Tracker Task Tracker Task Tracker

  23. Storm for Azure HDInsight Stream analytics for Near-Real Time Processing Consumes millions of real-time events from a scalable event broker (ie. Apache Kafka, Azure Event Hub) Performs time-sensitive computation Output to persistent stores, dashboards or devices Customizable with Java + .NET Deeply integrated to Visual Studio Event producers Collection Event Queuing System Transformation Long-term storage Presentation and action Apache Storm on HDInsight Kafka / RabbitMQ / ActiveMQ Web/thick client dashboards HDFS Applications HBase Azure DBs Search and query Azure Stream Analytics Event Hubs Stream processing Cloud gateways(web APIs) Azure storage Devices Data analytics (Excel) Live Dashboards Storage adapters Sensors Field gateways Devices to take action Web and Social

  24. Azure HDInsight running Linux Choice of Windows or Linux clusters Managed & supported by Microsoft Re-use common tools, documentation, samples from Hadoop/Linux ecosystem Add Hadoop projects that were authored on Linux to HDInsight Easier transition from on-premises to cloud

  25. Microsoft Makes Hadoop Easier Deep Visual Studio Integration Debug Hive jobs through Yarn logs or troubleshoot Storm topologies Visualize Hadoop clusters, tables, and storage Submit Hive queries, Storm topologies (C# or Java spouts/bolts) IntelliSense for authoring Hive jobs and Storm business logic

  26. Introducing Azure Data Lake • Sign up • http://azure.com/datalake A hyper scale repository for big data analytic workloads Built for Hadoop Enterprise Ready Hyper Scale, Massive throughput

  27. Please evaluate this session Your feedback is important to us! VisitMyigniteathttp://myignite.microsoft.comor download and use the Ignite Mobile Appwith the QR code above.

More Related