business analyst professional development day september 2013 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Big Data – Big Changes PowerPoint Presentation
Download Presentation
Big Data – Big Changes

Loading in 2 Seconds...

play fullscreen
1 / 13

Big Data – Big Changes - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

Business Analyst Professional Development Day September 2013. Big Data – Big Changes. Contents. What is Advanced Analytics & Big Data? Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously – they are different and build on each other from a maturity perspective

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Big Data – Big Changes' - deiter


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Contents

What is Advanced Analytics & Big Data?Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously – they are different and build on each other from a maturity perspective

Big Data & Analytics ContinuumLeveraging “Big Data” should be done on a stable foundation - Examples

Skills of the Data Analyst / ScientistNew skills and levels of maturity, certifications and training

what is advanced analytics
What is Advanced Analytics?

Advanced Analytics is comprised of both Business Intelligence technologies and complex analytic practices that are used to uncover relationships and patterns within large volumes of historical data that can be used to predict future behavior and events or improve operational results.

  • What happened?
  • When did it happen?
  • Standard Reports
  • How many?
  • How often?
  • Where?
  • Adhoc Reports
  • Where exactly is the
  • problem?
  • How do I find the answers?
  • Query Drilldown
  • When should I react?
  • What actions are needed now?
  • Alerts
  • Why is this happening?
  • What opportunities am
  • I missing?
  • Statistical Analysis
  • What if these trends continue?
  • How much is needed?
  • When will it be needed?
  • Forecasting
  • What will happen next?
  • How will it affect my business?
  • Predictive Analytics
  • How can we get better?
  • What is the best decision?
  • Optimization
what is big data volume variety velocity and sometimes veracity and value
What is Big Data? Volume, Variety, Velocity (and sometimes Veracity and Value)

Definition:

“Dealing with information management

challenges that don’t natively fit with

traditional approaches to handling the

problem.” – Tom Deutsch (IBM)

where has nationwide been and where can we go
Where has Nationwide been, and where can we go?

Comprehensive advanced analytics have been built around marketing, product and pricing, and other areas of the business – mostly disconnected, some using rudimentary technologies that are inefficient and focused mainly on data movement and not getting value out of the data.

Internal Data

captured or streamed today in Systemsand Data Warehouses(e.g., policy admin, claims)

NEW internal Data

not previously captured(e.g., emails, clickstream, mobile, telematics, unstructured notes from agents or claims adjusters)

Industry estimates suggest that 80% of enterprise data is in unmodeled/unstructured forms where it is nearly inaccessible and traditional modeling does not fit.

Integrating text extraction techniques to varieties and large volumes of data such as SEC filings can be combined with traditional BI data to create new structured metrics for analysis and exploration.

Text is also trapped in large description fields in our operational data stores like the Claims DW.

NEW External Data

from non-traditional sources (e.g., internet, social networks, demographic, local economy, price elasticity, mobile location stream, localized competitor intelligence)

big data analytics continuum
Big Data & Analytics Continuum

Cognitive

  • Reasoning
  • Learning
  • Natural Language

What is the most likely answer?

What is the right question?

Prescriptive

  • Optimization
  • Rules
  • Constraints

What’s the next best action?

What will happen when and why?

Predictive

  • Machine Learning
  • Forecasting
  • Statistical Analysis

Business Value

What could happen?

What if these trends continue?

Descriptive

  • Alerts & Drill Down
  • Ad hoc Reports
  • Standard Reports

What has happened and why?

How many, how often, who & where?

Information

Layer

  • Big Data Platforms
  • Content Management
  • RDBMS and Integration

How do I integrate new data sources?

How is data managed and stored?

When entering the Big Data space, be cautious of your foundational competencies. Information Management capabilities such as data integration, extensible data modeling, data quality and data governance become even more important when dealing with these new, uncertain, high volume data sources. Additionally, to achieve the full ROI, you must have mature analytics methodology, appropriately skilled resources and technology.

use case machine learning advanced analytics structured
Use Case – Machine Learning: Advanced Analytics, Structured

Open Source R was chosen to accelerate the model development

process for the intern. Several external R packages were added to

complete the SVM capability in R as a desktop tool. Supplemental

data preparation of the S&P financial data was handled with

various scripts and spreadsheets.

The project will provide knowledge transfer to Freedom Specialty where

they currently intend to implement it in SAS.

Selected Results

  • Cross Business Interest
  • Freedom Specialty Insurance
  • Enterprise Applications Investments
  • NF opportunities just beginning to be explored

Accrual Score (Bankruptcy) Prediction

The machine learning technique called Support Vector Machine (SVM) was selected. This supervised learning technique takes a set of factors in a training set of labeled results and constructs a model.

Although Freedom’s project was a predictive modeling effort, the business is anxious to pursue analyzing the “fine print” of unstructured text in filings and media reports looking for red flags to help triage the workload for analysts.

positive precision 0.81positive recall 0.70positive F1 score 0.75negative precision 0.74negative recall 0.83negative F1 score 0.78

accuracy 0.77

Model Validation

Results (Jan 2013)

Further Optimization

Pending

Principle: Start with solid advanced analytics capabilities and add “Big Data” for added ROI

use case speech analytics volume variety unstructured
Use Case – Speech Analytics: Volume, Variety (Unstructured)

Hypothesis:

Determine if there are certain words used more prevalently during a first notice of loss call which would indicate a fraudulent claim.

  • Convert first notice of loss call history to text and store in big data platform.
  • Associate call text into two categories: those that resulted in fraud and those that did not.
  • Mine data for word patterns. Determine if there are differences in word usage between fraudulent and non-fraudulent claims.
  • Build model / rules to execute against call in real time using streaming technology.

This will result in false positives! Should be combined with claims, billing, contact history to enhance accuracy of model.

Principle: “Big data” does not replace your existing analytics using your structured data warehouse. Big Data is simply an additional data set which enhances an existing set of capabilities and should not be used out of context.

new roles new skills
New Roles, New Skills
  • Data Analyst / Data Scientist
  • What is Data Analysis?
  • How do you recognize patterns in data?
  • What is the process for inspecting the data?
  • How do you identify data cleansing and transformation rules?
  • Why / How do you visualize your findings and information?
  • How do you manage, manipulate and query large, complex data on Hadoop as an analyst?
  • What statistical model is most appropriate for the problem scenario? What other type of model is appropriate?
  • Types of Tools Used
  • R
  • SPSS
  • Tableau
  • Data Mining tools such as Teradata Miner
  • Hadoop implementation specific tools such as BigSQL & BigSheets (IBM)
  • Other Considerations
  • Certifications: Certified Analytics Professional from Informs
  • Nationwide / IBM Client Center for Advanced Analytics
more terminology to learn
More Terminology to Learn

With a wide range of advanced modeling techniques…

Classes of Advanced

Analytics Problems

  • ARMA
  • CART
  • CIR++
  • Compression Nets
  • Decision Trees
  • Discrete Time Survival Analysis
  • D-Optimality
  • Ensemble Model
  • Gaussian Mixture Model
  • Genetic Algorithm
  • Gradient Boosted Trees
  • Hierarchical Clustering
  • Kalman Filter
  • K-Means
  • KNN
  • Linear Regression
  • Logistic Regression
  • Monte Carlo Simulation
  • Multinomial Logistic Regression
  • Neural Networks
  • Optimization: LP; IP; NLP
  • Poisson Mixture Model
  • Restricted Boltzmann Machine
  • Sensitivity Trees
  • SVD, A-SVD, SVD++
  • SVM
  • Projection on Latent Structures
  • Spectral Graph Theory
  • Regression
  • Classification
  • Clustering
  • Forecasting
  • Optimization
  • Simulation
  • Sparse Data Inference
  • Anomaly Detection
  • Natural Language Processing
  • Intelligent Data Design
big data analytics the landscape
Big Data Analytics – The Landscape

The technologies that deal with the big data

problems are broad and diverse, it is not

just Hadoop

Presentation