Big data distilled separating the hype from reality
Download
1 / 22

Big Data Distilled Separating the hype from reality - PowerPoint PPT Presentation


  • 155 Views
  • Uploaded on

Big Data Distilled Separating the hype from reality. Mike King Technical Fellow Fedex Services November 8, 2012 Midsouth DAMA. What is Big Data?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Big Data Distilled Separating the hype from reality' - demetria-mccray


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Big data distilled separating the hype from reality

Big Data DistilledSeparating the hype from reality

Mike King

Technical Fellow

Fedex Services

November 8, 2012

Midsouth DAMA


What is big data
What is Big Data?

  • Applying analytics to construct a model to predict an outcome where two or more dimension([VC])s exist AND your existing solutions can’t solve it.

  • The dimensions - 4 V’s, 1 C

    • Volume

    • Velocity

    • Variety

    • Variability

    • Complexity


The m arket
The Market

  • Growing fast

  • Lots of players

    • Small and nimble

    • Large

  • Changing fast

  • Hype

  • Contenders and pretenders

  • Commercials are deceiving



Why do we need it
Why do we need it?

  • Competitive Intelligence

  • Joining dissimilar data

  • Linking data

  • Adding context to data

  • Discovery

    • Diapers

    • Pregnancy

  • To supplement our BI/DW

  • Table stakes


Use cases
Use Cases?

  • Customer analysis

    • Sentiment

    • Defection

    • Cannibilization

    • Cross selling

  • Network analysis

    • M2M

  • Fraud detection

  • Risk management

  • Text analytics

  • Social media analytics

  • Log analysis


Apache hadoop
Apache Hadoop

  • Batch

  • Open Source

  • Components

    • HDFS

    • DB

      • Hbase

      • Cassandra

    • Map/Reduce

    • Hive

    • Pig

    • Mahout

    • Chuckwa

    • Avro

    • Zookeeper


Solutions
Solutions

  • Which stack/distribution?

    • Varying components

      • Apples & oranges

  • Types

    • Partial

    • Overlapping

    • Complementary

    • Substitute

  • Fast pace of change

  • Flux of partnerships


Dealing with vendors choices
Dealing with vendors, choices

  • Decide what your requirements are

  • Don’t let them tell you what you need

  • Beware bait and switch

    • Extras

  • Some are looking to sell

    • Professional Services

    • Other Software

  • All solutions are incomplete

  • Many solutions are lacking

  • Multiple…is one enough?

  • Switching is possible

    • Low cost?

  • Beware

    • Proprietary components

    • Solutions that have already been fixed….Apache nn

  • Hammer and nail


My big data vendors
My Big Data Vendors

  • MapR

  • Kaggle

  • Karmasphere

  • Hadapt

  • Datameer

  • Lucid Works

  • 1010data?

  • Splunk

  • SAS

  • IBM

  • Oracle

  • Hortonworks

  • Cloudera

  • EMC

  • Teradata

  • Amazon

  • Microsoft

  • HP


Not my big data vendors
Not My Big Data Vendors

  • Pentaho

  • Palantir

  • Kalido

  • Composite

  • Couchbase

  • Marklogic

  • StoredIQ

  • Syncsort

  • Datastax?

  • IBI

  • Informatica

  • SAP

  • 10Gen

  • Talend

  • Denodo

  • Tableau

  • Tibco

  • ParAccel


What s missing
What’s missing?

  • Collaboration

  • Directory, dictionary, metadata

  • Context

  • Relevance, value

  • DQ

  • Search

  • Security

  • Performance

  • Monitoring

  • Management tools

  • Governance

  • Backup


Counterintuitive anti dogma notions
Counterintuitive & Anti-dogma Notions

  • Size matters

    • But not unitarily

  • Smaller is better

    • Sampling

  • Quality matters

    • GIGO

  • All data must have structure to be consumed

    • There is no unstructured data!


Myths
Myths

  • You don’t need a DBA

    • Schemaless

  • B.D. is just for unstructured data

    • Your unstructured data has lots of value

  • It’s separate from your other BI stuff incl..

    • OLAP

    • DW

    • Datamarts

    • Analytics

  • Nosql


Prerequisites
Prerequisites

  • Many varied skill sets are needed

    • DBA

    • Sysadmin

    • BI analytics

    • Math (statistics)

    • Programming

  • Reading

  • Training

  • Scope


Training options
Training options

  • Read books

  • Add some blogs to your feeds

  • Follow some of the right people on twitter

    • Search #bigdata #nosql #datascience ….

  • Online training

    • Big Data University (free)

    • EMC , Hortonworks, Cloudera, Karmashpere

  • Tutorials

  • Conferences

  • Get a degree

    • NC State, Stanford, Northwestern, Syracuse, UCSD


Suggestions
Suggestions

  • Start small

  • Conduct triage on your possible sources

  • It should be integrated w/ the DW

    • Silos are bad….think spread marts

  • Grow your own Data Scientists

    • Move disparate LOB analysts in a single org

    • Train and cross train

  • Limit the BD user population

  • Design is still required

  • Mind and mine your structured data first

  • Get more training


Don t
Don’t

  • Make your nosqldb the system of record

  • Put all your data in hadoop…to start

  • Ignore open source

  • Connect your garden variety query tools to hadoop

  • Open it up to everyone

  • Keep data indefinitely

  • Get heavy handed on security


Other items
Other items

  • Cloud

  • SIEM

  • Tools to complement your solution(s)

  • Which db(s) to use?

    • For what?

    • External tables

    • Nosqldbs

  • Persist map reduce results in your db

  • Storage

  • Servers

    • X86 linux

  • External data sources


Trends
Trends

  • March 2012 article by Munish Gupta

    • SaaS for analytics

    • Crowdsourcing

    • Data analysis libraries

    • Nosql market shakeup

  • Additionally from the article

    • RDBMS’s will not make a comeback

  • Other

    • More diverse sources

    • More data

    • More jobs

    • More choices, solutions, products, services, etc…

    • Query tools - yek


  • Links of interest
    Links of interest

    • http://wikibon.org/wiki/v/Enterprise_Big-data

    • My diigo bookmarks on Big Data

      • http://www.diigo.com/user/morpheus/bigdata 266

    • Curt Monash’s Blog … http://www.dbms2.com

    • http://www.keithrozario.com/2012/07/opensource-gold-the-greatest-crowdsourcing-story-ever-told.html

    • http://www.analyticbridge.com/

    • http://gigaom.com/data/

    • This deck

      • http://92lobos.wikispaces.com/file/detail/Big+Data+Distilled.pptx

    • Future B.D. items

      • http://92lobos.wikispaces.com/bigdata


    Contact

    Contact

    Feel free to drop me a note with any questions


    ad