Big Data - PowerPoint PPT Presentation

istas
big data n.
Skip this Video
Loading SlideShow in 5 Seconds..
Big Data PowerPoint Presentation
play fullscreen
1 / 25
Download Presentation
Big Data
141 Views
Download Presentation

Big Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Big Data Santi Apichairojkul System Consultant

  2. Confidential

  3. 35 ZB By 2020, the digital universe will be 44 times larger than it was in 2009 61% 5-6% Of executives surveyed want more information when making a decision Productivity boost realized by companies that use data-directed decision-making 83% 2X #1 Ranking of Analytics & BI in Gartner CIO survey of top technology priorities for 2012 Of mid-market CIOs surveyed identified analytics as their top-priority investment area IDC estimates that the digital universe will double every 18 months

  4. What’s Big Data? Confidential

  5. BIG Data? • Hardware and software technologies for managing big volumes of data • Datasets whose size is beyond the ability of typical database software tools • Focus on Web 2.0 Technologies • Database Scale-out • Relational Data Analytics • Distributed Data Analytics • Distributed File Systems • Real Time Analytics Confidential

  6. What’s Big Data? Velocity The speed at which the data must be processed and a decision made Variety Volume The range of data, types and structure to the data A large amount of data, growing at large rates Confidential

  7. The ‘Big Data’ Phenomenon • Big Data Drivers • The proliferation of data capture and creation technologies • Increased “interconnectedness” drives consumption (creating more data) • Inexpensive storage makes it possible to keep more, longer • Innovative software and analysis tools turn data into information More Consumption More Devices New & Better Information More Content • Every gigabyteof stored content can generate apetabyte or more of transient data* • The information about you is much greater than the information you create Big Data encompasses not only the content itself, but how it’s consumed *Source: IDC 2011 Confidential

  8. Big Data Solution Requirements • Cost-effectively manage • the volume, variety and velocity of data Process and analyze large, complex data sets…quickly Flexibly adapt to context changes and new data types Confidential

  9. Big Data Retention Solutions Big Data Analytics Solutions Confidential

  10. Dell Big Data Retention Solutions Confidential

  11. Big Data Retention Solution The data sources and tools Reduce Size: Massive patented de-dupe and compression, typically 95-97% storage capacity savings Hardware: Low-cost Dell servers and storage Resources: Eliminates requirements for specialized skillsets, infrastructure, and services Retain Preserve: Maintains record volumes in original format Immutable: Tamper proof worm & audit trails Configurable: User-configurable retention policies Massively Scalable: With no complexity Longevity: Long-term optimized systems. Retrieve Standards: SQL &BI tools via ODBC/JDBC Performant: Fast queries for large complex datasets Flexible: With schema evolution & point-in-time access The Dell Big Data Retention Solution

  12. RainStor Leads Industry with 40X Compression 0 5 3X 6X 7X 10 8X 15 20 25 30 35 40 40X 45 50 FlatfileGzip Hadoop LZO Compressed Relational Columnar Source: Ratios vs. Raw – RainStor Benchmarks using customer data (2011) Confidential

  13. Big Data Analytics Solutions Confidential

  14. Confidential

  15. What is Apache Hadoop? CORE HADOOP COMPONENTS • Hadoop is a platform for data storage and processing that is… • Scalable • Fault tolerant • Open source Hadoop Distributed File System (HDFS) File sharing and data protection across physical servers MapReduce Distributed computing across physical servers • Scales • economically • Can be deployed on commodity hardware • Open source platform guards against vendor lock • Excels at • complex analysis • Scale-out architecture divides workloads across multiple nodes • Flexible file system eliminates ETL bottlenecks • Consolidates • everything • A single repository for storing and mining any type of data • Not bound by a single schema Confidential

  16. Distributed File System (DFS) Distributed File System (DFS) Traditional • Black Box • Big Iron • Big Disk • General-purpose, standards-based servers, storage, networking • Software that easily scales processing to 1000s of cores/systems Confidential

  17. DFS - Architecture MPP (Massively Parallel Processing) Shared-Nothing Architecture SQL MapReduce Master Severs Query planning & dispatch ... ... Network Interconnect SegmentSevers Query processing & data storage ... ... ExternalSources Loading, streaming, etc. Confidential

  18. Map Reduce Confidential

  19. HDFS & MapReduce - Briefly Confidential

  20. Hadoopin Production Confidential

  21. Dell Big Data Solutions in Thailand

  22. Dell Apache Hadoop Solution • Petabyte-scale data management – open source distributed files system and computational processing engine called MapReducefor highly scalable data management. • For: • Financial, research institutions, retail, media & entertainment, telcom, government, and health and life sciences • Benefits: • Reliable, scalable, low-cost file storage • Rapid parallel processing of big data • Complements existing data management systems Joint Services & Support Cluster-optimized PowerEdge C + + 6248sw C2100 C2100 C2100 C2100 Dell Cloud Solutions

  23. | Revolution R Enterprise | Big Data Analysis Confidential

  24. What does big data mean to you? • How will you handle your big data? • How do you plan to use analytics in your business? • Are you considering adding analytics to the services you offer your customers? • Who are the decision makers and end users of your BD, BI, &/or analytics? • How are you storing your Big Data?

  25. The power to do more Confidential