Slide1 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS PowerPoint PPT Presentation


  • 154 Views
  • Uploaded on
  • Presentation posted in: General

DSCI 4520/5240 (DATA MINING). DSCI 4520/5240 Data Mining. Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS. Introduction to DM. “It is a capital mistake to theorize before one has data.

Download Presentation

Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Slide1 l.jpg

DSCI 4520/5240 (DATA MINING)

DSCI 4520/5240

Data Mining

Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS


Slide2 l.jpg

Introduction to DM

“It is a capital mistake to theorize before one has data.

Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

(Sir Arthur Conan Doyle: Sherlock Holmes, "A Scandal in Bohemia")


Slide3 l.jpg

Nobel Laureate Calls Data Mining "A Must"

In an interview with ComputerWorld in January 1999, Dr. Penzias (won the 1978 Nobel Prize in physics and was the vice president and chief scientist at Bell Laboratories) considered large scale data mining from very large databases as the key application for corporations in the next few years.

In response to ComputerWorld's age-old question of "What will be the killer applications in the corporation?" Dr. Penzias replied:

"Data mining." He then added: "Data mining will become much more important and companies will throw away nothing about their customers because it will be so valuable. If you're not doing this, you're out of business" he said.


What is data mining l.jpg

What Is Data Mining?

Data mining (knowledge discovery in databases):

  • A process of identifying hidden patterns and relationships within data (Groth)

    Data mining:

  • Extraction of interesting (non-trivial,implicit, previously unknown and potentially useful)information or patterns from data in large databases


Motivation necessity is the mother of invention l.jpg

Motivation: “Necessity is the Mother of Invention”

Data explosion problem

  • Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories

    We are drowning in data, but starving for knowledge!

    Solution: Data warehousing and data mining

  • Data warehousing and on-line analytical processing

  • Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases


Slide6 l.jpg

Data Deluge

hospital patient registries

electronic point-of-sale data

remote sensing images tax returns

stock trades OLTP telephone calls

airline reservations credit card charges

catalog orders bank transactions


Data mining circa 1963 l.jpg

Data Mining, circa 1963

IBM 7090

600 cases

“Machine storage limitations

restricted the total number of

variables which could be

considered at one time to 25.”


Business decision support l.jpg

Business Decision Support

  • Database Marketing

    • Target marketing

    • Customer relationship management

  • Credit Risk Management

    • Credit scoring

  • Fraud Detection

  • Healthcare Informatics

    • Clinical decision support


Required expertise l.jpg

Required Expertise

  • Domain

  • Data

  • Analytical Methods


Multidisciplinary l.jpg

Multidisciplinary

Statistics

Pattern

Recognition

Neurocomputing

Machine

Learning

AI

Data Mining

Databases

KDD


What is data mining11 l.jpg

What Is Data Mining?

  • IT: Complicated database queries

  • ML: Inductive learning from examples

  • Stat: What we were taught not to do


Comparing statistics to data mining from cerrito 2006 l.jpg

Comparing Statistics to Data Mining (from Cerrito 2006)


Comparing statistics to data mining from cerrito 200613 l.jpg

Comparing Statistics to Data Mining (from Cerrito 2006)


Predictive modeling l.jpg

Predictive Modeling

Inputs

Target

...

...

...

...

...

...

Cases

...

...

...

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

...


Types of targets l.jpg

Types of Targets

  • Supervised Classification

    • Event/no event (binary target)

    • Class label (multiclass problem)

  • Regression

    • Continuous outcome

  • Survival Analysis

    • Time-to-event (possibly censored)


Why data mining potential applications l.jpg

Why Data Mining? — Potential Applications

Database analysis and decision support

  • Market analysis and management

    • target marketing, customer relation management, market basket analysis, cross selling, market segmentation

  • Risk analysis and management

    • Forecasting, customer retention, improved underwriting, quality control, competitive analysis

  • Fraud detection and management

    Other Applications

  • Text mining (news group, email, documents) and Web analysis.

  • Intelligent query answering


Market analysis and management 1 l.jpg

Market Analysis and Management (1)

Where are the data sources for analysis?

  • Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies

    Target marketing

  • Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.

    Cross-market analysis

  • Associations/co-relations between product sales

  • Prediction based on the association information


Market analysis and management 2 l.jpg

Market Analysis and Management (2)

Customer profiling

  • data mining can tell you what types of customers buy what products (clustering or classification)

    Identifying customer requirements

  • identifying the best products for different customers

  • use prediction to find what factors will attract new customers


Corporate analysis and risk management l.jpg

Corporate Analysis and Risk Management

Finance planning and asset evaluation

  • cash flow analysis and prediction

  • contingent claim analysis to evaluate assets

  • cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)

    Resource planning:

  • summarize and compare the resources and spending

    Competition:

  • monitor competitors and market directions

  • group customers into classes and a class-based pricing procedure

  • set pricing strategy in a highly competitive market


Fraud detection and management 1 l.jpg

Fraud Detection and Management (1)

Applications

  • widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.

    Approach

  • use historical data to build models of fraudulent behavior and use data mining to help identify similar instances

    Examples

  • auto insurance: detect a group of people who stage accidents to collect on insurance

  • money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)

  • medical insurance: detect professional patients and ring of doctors and ring of references


Fraud detection and management 2 l.jpg

Fraud Detection and Management (2)

Detecting inappropriate medical treatment

  • Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

    Detecting telephone fraud

  • Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm.

  • British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.

    Retail

  • Analysts estimate that 38% of retail shrink is due to dishonest employees.


Slide22 l.jpg

On the News:

Can Data Mining save America’s schools?


On the news state agency nabs abuse of food stamps l.jpg

On the News:State Agency Nabs Abuse of Food Stamps

JUNE 21, 2004 (COMPUTERWORLD) - The state of Louisiana issues food stamp purchase cards to 600,000 people a year -- but the recipients don't always use them to buy food. So program administrators have started using BI tools to detect suspicious activity for follow-up investigations.

When swiped at the point of sale, the purchase card creates a transactional record that's forwarded to the Louisiana Department of Social Services in Baton Rouge. This information is offloaded to a SQL Server data warehouse, where it can be sliced and diced using Information Builders Inc.'s WebFocus query software. Investigators can scan the data by geography, purchase amount and other variables to detect "signatures of fraud," says Duane Fontenot, the department's IT director.

The system has been feeding data to 40 fraud investigators for the past eight months and has information about every participating store. For instance, agents using the digital map can see where certain transactions are taking place by parish, city or even larger areas. If a food stamp recipient frequently travels 60 miles to use the card at one store -- passing 30 other stores on the way -- that could indicate a scheme to sell the cards for cash, Fontenot says. In one instance, investigators uncovered a criminal network that was converting the stamps into currency that was then wired to overseas banks. When culprits are faced with such evidence, Fontenot says, "usually, they just confess."


On the news rexer analytics annual data mining survey l.jpg

On the News:Rexer Analytics Annual Data Mining survey


On the news rexer analytics annual data mining survey25 l.jpg

On the News:Rexer Analytics Annual Data Mining survey


Other applications l.jpg

Other Applications

Sports

  • IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat

    Astronomy

  • JPL and the Palomar Observatory discovered 22 quasars with the help of data mining

    Internet Web Surf-Aid

  • IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.


Data mining a kdd process l.jpg

Data Mining: A KDD Process

Knowledge

Pattern Evaluation

  • Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data

Selection

Data Warehouse

Data Cleaning

Data Integration

Databases


Steps of a kdd process l.jpg

Steps of a KDD Process

Learning the application domain:

  • relevant prior knowledge and goals of application

    Creating a target data set: data selection

    Data cleaning and preprocessing: (may take 60% of effort!)

    Data reduction and transformation:

  • Find useful features, dimensionality/variable reduction, invariant representation.

    Choosing functions of data mining

  • summarization, classification, regression, association, clustering.

    Choosing the mining algorithm(s)

    Data mining: search for patterns of interest

    Pattern evaluation and knowledge presentation

  • visualization, transformation, removing redundant patterns, etc.

    Use of discovered knowledge


Data mining and business intelligence l.jpg

Data Mining and Business Intelligence

Increasing potential

to support

business decisions

End User

Making

Decisions

Business

Analyst

Data Presentation

Visualization Techniques

Data Mining

Data

Analyst

Information Discovery

Data Exploration

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

OLAP, MDA

DBA

Data Sources

Paper, Files, Information Providers, Database Systems, OLTP


  • Login