1 / 25

An introduction to HDInsight

An introduction to HDInsight. Edinson Medina SR PFE for Data and AI Microsoft Services. Who Am I?. Edinson Medina SR PFE Data and AI Domain Microsoft Services UK Venezuelan @ sqldixitox https://www.linkedin.com/in/edinsonmedina/. Roles in the room?. What is Big Data?.

farrell
Download Presentation

An introduction to HDInsight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An introduction to HDInsight Edinson Medina SR PFE for Data and AI Microsoft Services

  2. Who Am I? Edinson Medina SR PFE Data and AI Domain Microsoft Services UK Venezuelan @sqldixitox https://www.linkedin.com/in/edinsonmedina/

  3. Roles in the room?

  4. What is Big Data? • Data that is too large or complex for analysis in traditional relational databases • Typified by the “3 V’s”: • Volume – Huge amounts of data to process • Could be TBs, PBs or EBs • Variety – A mixture of structured and unstructured data • Structured, Semi-structured, Unstructured • Velocity – New data generated extremely frequently • Stream Processing, Real Time, Batch Sensor and IoT Processing Web server click-streams Social media sentiment analysis

  5. What is Hadoop Map Reduce can Map and Reduce data • Big Data not the same as Hadoop • What is the MapReduce process? • What is HDFS? • MapReduce Engine vs Tez Engine Hadoop Cluster Head Node Worker Nodes can:1 Map:1 Reduce:1 Map:1 and:1 Reduce:1 data:1 Map:2 Reduce:2 can:1 and:1 Data:1 HDFS Map:2 Reduce:2 can:1 and:1 Data:1 Map:2 Reduce:2 can:1 and:1 Data:1 Map:2 Reduce:2 can:1 and:1 Data:1

  6. set hive.execution.engine=mr; SELECT… set hive.execution.engine=tez; SELECT… Map Map Map Map Map Map Map Map Map Map Map Map Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce Reduce

  7. What is HDInsight? • Microsoft’s Hadoop distribution • Powered by the cloud • 100% Apache Hadoop • Immersive insights

  8. Spark Hadoop ecosystem in HDInsight Streaming (Storm) Metadata (HCatalog) Graph (Pegasus) Stats processing (RHadoop) Business Intelligence (Excel, Power View, SSAS…) Active Directory (Ranger) Pipeline / workflow (Oozie) NoSQL Database (HBase) Data Integration ( ODBC / SQOOP/ REST) Scripting (Pig) Query (Hive) Machine Learning (Mahout) Distributed Processing (Map Reduce or TEZ) System Center (Future) Log file aggregation (Flume) YARN Distributed Storage (HDFS)

  9. A metadata service that projects tabular schemas over folders • Enables the contents of folders to be queried as tables, using SQL-like query semantics • Queries are translated into jobs • Execution engine can be Tez or MapReduce SELECT…

  10. Pig performs a series of transformations to data relations based on Pig Latin statements • Relations are loaded using schema on read semantics to project table structure at runtime • You can run Pig Latin statements interactively in the Grunt shell, or save a script file and run them as a batch

  11. A workflow engine for actions in a Hadoop cluster • MapReduce • Hive • Pig • Others • Support parallel workstreams and conditional branching

  12. Sqoop is a database integration service • Built on open source Hadoop technology • Enables bi-directional data transfer between Hadoop clusters and databases via JDBC

  13. A low-latency, NoSQL database built on Hadoop • Modeled on Google’s BigTable • HBase stores data in StoreFiles on HDFS HBase HDFS

  14. What is NoSQL • A type of databases • Don’t use the relational model • Good fit for distributed environments NoSQL has very little to do with SQL (structured query language), It should have been called Not Only Relational Databases Schema-less / schema-free Focus on performance over consistence

  15. What is a Stream of data? 01100101 01100101 01100101 01100101 01100101 01100101 01100101 01100101 01100101 A unbounded sequence of event data Stream processing is continuous Aggregation is based on temporal windows

  16. An event processor for data streams • Defines a streaming topology that consists of: • Spouts: Consume data sources and emit streams that contain tuples • Bolts: Operate on tuples in streams • Storm topologies run continuously on streams of data • Real-time monitoring • Event aggregation and logging Spout Bolt

  17. A fast, general purpose computation engine that supports in-memory operations • A unified stack for interactive, streaming, and predictive analysis • Can run in Hadoop clusters

  18. Kafka: An open-source platform that's used for building streaming data pipelines and applications. Kafka also provides message-queue functionality that allows you to publish and subscribe to data streams • Rserver: A server for hosting and managing parallel, distributed R processes. It provides data scientists, statisticians, and R programmers with on-demand access to scalable, distributed methods of analytics • Mahout: A library of machine learning algorithms to execute on data in HDFS

  19. So, Do you need big data? • Are your data volumes truly “big”? • Many times we regulate on how much data we save • Are you collection enough? • Is it needed? • Do you required to constantly accommodate new data • Is your business transactional only • How will you benefit from it? • Are you ready for it? • You will need to filter trough the noise • Skills and expertise

  20. Demo Create Hadoop Cluster in Azure HDInsight Processing Big Data with Hive Connect using PowerBI desktop

  21. Questions?

  22. Just like Jimi Hendrix …  We love to get feedback Please complete the session feedback forms

  23. SQLBits - It's all about the community... Please visit Community Corner, we are trying this year to get more people to learn about the SQL Community, equally if you would be happy to visit the community corner we’d really appreciate it.

More Related