Big Data …Big Opportunities ? ……Big Hype ? (or just a Big Mess ?) Data challenges and IBM views Dr. Matthew Ganis IBM Senior Technical Staff Member CIO Social Media Analytics Chief Architect Member, IBM Academy of Technology email@example.com @mattganis (twitter)
The Term “Big Data” is pervasive - but still provokes a bit of confusion. SO what is it ? Big Data has been used to convey all sorts of concepts, including huge Quantities of data, social media analytics, next generation data management Capabilities, real time data and much much more.....
That means we create about 1.8 Zetabytes of Information everytwo years.
Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible.
Information is at the Center of a New Wave of Opportunity… … And Organizations Need Deeper Insights 44x 2020 35 zettabytes Business leaders frequently make decisions based on information they don’t trust, or don’t have 1in3 as much Data and ContentOver Coming Decade 1in2 Business leaders say they don’t have access to the information they need to do their jobs Velocity Variety of CIOs cited “Business intelligence and analytics” as part of their visionary plansto enhance competitiveness 80% 83% Volume 2009 800,000 petabytes Of world’s datais unstructured of CEOs need to do a better job capturing and understanding information rapidly in order to make swift business decisions 60% 5
Structured vs Unstructured Structured data refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite. The lack of structure makes compilation a time and energy-consuming task.
The Challenge: Bring Together a Large Volume and Variety of Data to Find New Insights Multi-channel customer sentiment and experience a analysis Detect life-threatening conditions at hospitals in time to intervene Predict weather patterns to plan optimal wind turbine usage, and optimize capital expenditure on asset placement Make risk decisions based on real-time transactional data Identify criminals and threats from disparate video, audio, and data feeds 7
Monthly sales reports Profitability analysis Customer surveys Merging the Traditional and Big Data Approaches Big Data Approach Iterative & Exploratory Analysis Traditional Approach Structured & Repeatable Analysis IT Delivers a platform to enable creative discovery Business Users Determine what question to ask Business Users Explores what questions could be asked IT Structures the data to answer that question Brand sentiment Product strategy Maximum asset utilization Structured vs. Exploratory 9
The Internet of Things (IoT) is a scenario in which objects, animals or people are provided with unique identifies and the ability to automatically transfer data over a network without requiring human-to-human or human-to-computer interaction
What are we running ? Who is talking about us ? Male / Female / Student / Professional / Retired / Customers ? What do they “feel” ? Positive/Negative Sentiment / Angry / Annoyed ? Where are they talking ? Who are they influencing ? Who’s listening to them ?
When customers are talking about us or about our products we want to know where those conversations are happening so we can: • Interact with interested customers • Get in front of any issues
Numerous studies show that word-of-mouth and personal recommendations are seen as far more credible to consumers than newspaper and television advertisements. While such mass advertisements are still necessary because of their powerful reach, these findings show that companies need to increase their focus on more personalized approaches. Clearly, this is incredibly difficult, maybe even impossible, for most companies to deal directly with the countless number of potential consumers. This is where influencers come in……
What makes someone Influential ? The number of tweets they make ? The number of times people mention them ? The number of followers they have? How often they are retweeted ?
We were asked to look at why a particular product launch wasn’t performing as expected. We pulled all the “chatter” about it and found:
Where is all this data coming from ? While it is true that vast amounts of data are and will be generated from financial transactions, medical records, mobile phones and social media to the Internet of Things but there are questions that need to be asked to understand data’s meaningful use: • How will data be managed? • How will data be shared? Some thoughts about “data as a service” • Establishment of standards, governance, guidelines. (E.g., open architectures) • Creation of industry specific data exchanges. (E.g., healthcare data exchanges, environment data exchanges etc.) • Creation of cross-industry data exchanges. (E.g., healthcare data exchanges seamlessly interacting with environmental data exchanges etc.)
Enterprise Integration Big Data Platform Data Warehouse • Trusted Information & Governance • Companies need to govern what comes in, and the insights that come out • Data Management • Insights from Big Data must be incorporated into the warehouse Enterprise Integration Traditional Sources New Sources 34
Poor data quality Dirty data Missing values Inadequate data size Poor representation in data sampling
Data variety - trying to accommodate data that comes from different sources and in a variety of different forms (images, geo data, text, social, numeric, etc.). How do we link them together ? Is there a common taxonomy or why to organize it ? Is there a “signal” in one source of data that points to another ?
Dealing with huge datasets, or 'Big Data,' that require distributed approaches.
Who is influential ? How do we define influence ?
The Big Data Opportunity Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible. Variety: Velocity: Volume: Manage the complexity of multiple relational and non-relational data types and schemas Streaming data and large volume data movement Scale from terabytes to zettabytes (1B TBs) 41
Data nodes Data nodes Data nodes Data nodes Data Data Data Data Big Data : why is it possible Now ? • Traditional approach • Application server and Database server are separate • Data can be on multiple servers • Analysis Program can run on multiple Application servers • Network is still a the middle • Data have to go through the network • Big Data Approach • Analysis Program runs where are the data : on Data Node • Only the Analysis Program are have to go through the network • Analysis Program need to be MapReduce aware • Highly Scalable : • 1000s Nodes • Petabytes and more • Traditional approach : Data to Function Query Data User request Database server Application server Send result return Data Data process Data • Big Data approach : Function to Data Query & process Data Send Function to process on Data User request Master node Send Consolidate result 42
What Big Data Is Not • It is not a replacement for your Database strategy • It is not a replacement for your Warehouse strategy • It is not a solution by itself, it needs jobs/applications to drive value 43