Business Analyst Professional Development Day September 2013. Big Data – Big Changes. Contents. What is Advanced Analytics & Big Data? Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously – they are different and build on each other from a maturity perspective
What is Advanced Analytics & Big Data?Business Intelligence, Advanced Analytics and Big Data seem to be used synonymously – they are different and build on each other from a maturity perspective
Big Data & Analytics ContinuumLeveraging “Big Data” should be done on a stable foundation - Examples
Skills of the Data Analyst / ScientistNew skills and levels of maturity, certifications and training
Advanced Analytics is comprised of both Business Intelligence technologies and complex analytic practices that are used to uncover relationships and patterns within large volumes of historical data that can be used to predict future behavior and events or improve operational results.
“Dealing with information management
challenges that don’t natively fit with
traditional approaches to handling the
problem.” – Tom Deutsch (IBM)
Comprehensive advanced analytics have been built around marketing, product and pricing, and other areas of the business – mostly disconnected, some using rudimentary technologies that are inefficient and focused mainly on data movement and not getting value out of the data.
captured or streamed today in Systemsand Data Warehouses(e.g., policy admin, claims)
NEW internal Data
not previously captured(e.g., emails, clickstream, mobile, telematics, unstructured notes from agents or claims adjusters)
Industry estimates suggest that 80% of enterprise data is in unmodeled/unstructured forms where it is nearly inaccessible and traditional modeling does not fit.
Integrating text extraction techniques to varieties and large volumes of data such as SEC filings can be combined with traditional BI data to create new structured metrics for analysis and exploration.
Text is also trapped in large description fields in our operational data stores like the Claims DW.
NEW External Data
from non-traditional sources (e.g., internet, social networks, demographic, local economy, price elasticity, mobile location stream, localized competitor intelligence)
What is the most likely answer?
What is the right question?
What’s the next best action?
What will happen when and why?
What could happen?
What if these trends continue?
What has happened and why?
How many, how often, who & where?
How do I integrate new data sources?
How is data managed and stored?
When entering the Big Data space, be cautious of your foundational competencies. Information Management capabilities such as data integration, extensible data modeling, data quality and data governance become even more important when dealing with these new, uncertain, high volume data sources. Additionally, to achieve the full ROI, you must have mature analytics methodology, appropriately skilled resources and technology.
Open Source R was chosen to accelerate the model development
process for the intern. Several external R packages were added to
complete the SVM capability in R as a desktop tool. Supplemental
data preparation of the S&P financial data was handled with
various scripts and spreadsheets.
The project will provide knowledge transfer to Freedom Specialty where
they currently intend to implement it in SAS.
Accrual Score (Bankruptcy) Prediction
The machine learning technique called Support Vector Machine (SVM) was selected. This supervised learning technique takes a set of factors in a training set of labeled results and constructs a model.
Although Freedom’s project was a predictive modeling effort, the business is anxious to pursue analyzing the “fine print” of unstructured text in filings and media reports looking for red flags to help triage the workload for analysts.
positive precision 0.81positive recall 0.70positive F1 score 0.75negative precision 0.74negative recall 0.83negative F1 score 0.78
Results (Jan 2013)
Principle: Start with solid advanced analytics capabilities and add “Big Data” for added ROI
Determine if there are certain words used more prevalently during a first notice of loss call which would indicate a fraudulent claim.
This will result in false positives! Should be combined with claims, billing, contact history to enhance accuracy of model.
Principle: “Big data” does not replace your existing analytics using your structured data warehouse. Big Data is simply an additional data set which enhances an existing set of capabilities and should not be used out of context.
With a wide range of advanced modeling techniques…
Classes of Advanced
The technologies that deal with the big data
problems are broad and diverse, it is not