1 / 6

BIG DATA

BIG DATA. HADOOP. Background. The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing

Download Presentation

BIG DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIG DATA HADOOP

  2. Background The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing • The volume of data being made publicly available increases every year. success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data. • Variety of Data, Velocity of the Data , Volume of the data –V3 • Data Storage & Analysis • The storage capacity of the hard drives increased, but access speeds have not kept up significantly • Now 1 Terabyte data is norm for disks and speed is around 100 MB/s,so it takes more than two and half hours to read all the data from the disk. So there is a long time to read zetta bytes of data • So alternative solution—To read from multiple disks

  3. Background • Data Storage & Analysis • Problems in reading from and writing to multiple disks • Multiple hardware pieces are prone for failure-So data loss probability is high • Solution for Data loss-Replication • RAID works with replication only • Data Analyis need to combine the data from various elements & challenges • Need a solution as reliable shared storage and analysis system • Hello ! Hadoop • NUTCH project by Doug Cutting • Google GFS & Map Reduce distributed data storage and processing • Yahoo Development Project • Doug Cutting Apache Hadoop Open source frame work • Hadoop-Made up Name

  4. Hadoop Vs Other Systems

  5. HADOOP ARCHITECTURE Top of Existing File System Streaming Data Access patterns Very large files Commodity Hardware High Through put rather than low latency 1) MAP 2) REDUCE 3) CODE for MR JOB 4) Automatic parallelization 5) Fault Tolerance Java,Python etc House keeping in built Lot of small files Low latency Data access Multiple Writes,

  6. HDFS • HDFS block size 64 MB -128 MB • Why is it so large? Client Name Node Secondary Name Node Heart Beating, Block replication and Balancing Data node Data Node Data node Data node Data Nodes Data nodes

More Related