slide1 l.
Skip this Video
Download Presentation
Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS

Loading in 2 Seconds...

play fullscreen
1 / 29

Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS - PowerPoint PPT Presentation

  • Uploaded on

DSCI 4520/5240 (DATA MINING). DSCI 4520/5240 Data Mining. Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS. Introduction to DM. “It is a capital mistake to theorize before one has data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS' - nellis

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

DSCI 4520/5240 (DATA MINING)

DSCI 4520/5240

Data Mining

Some slide material taken from or inspired by: Groth, Han and Kamber, Cerrito, SAS


Introduction to DM

“It is a capital mistake to theorize before one has data.

Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”

(Sir Arthur Conan Doyle: Sherlock Holmes, "A Scandal in Bohemia")


Nobel Laureate Calls Data Mining "A Must"

In an interview with ComputerWorld in January 1999, Dr. Penzias (won the 1978 Nobel Prize in physics and was the vice president and chief scientist at Bell Laboratories) considered large scale data mining from very large databases as the key application for corporations in the next few years.

In response to ComputerWorld's age-old question of "What will be the killer applications in the corporation?" Dr. Penzias replied:

"Data mining." He then added: "Data mining will become much more important and companies will throw away nothing about their customers because it will be so valuable. If you're not doing this, you're out of business" he said.

what is data mining
What Is Data Mining?

Data mining (knowledge discovery in databases):

  • A process of identifying hidden patterns and relationships within data (Groth)

Data mining:

  • Extraction of interesting (non-trivial,implicit, previously unknown and potentially useful)information or patterns from data in large databases
motivation necessity is the mother of invention
Motivation: “Necessity is the Mother of Invention”

Data explosion problem

  • Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories

We are drowning in data, but starving for knowledge!

Solution: Data warehousing and data mining

  • Data warehousing and on-line analytical processing
  • Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Data Deluge

hospital patient registries

electronic point-of-sale data

remote sensing images tax returns

stock trades OLTP telephone calls

airline reservations credit card charges

catalog orders bank transactions

data mining circa 1963
Data Mining, circa 1963

IBM 7090

600 cases

“Machine storage limitations

restricted the total number of

variables which could be

considered at one time to 25.”

business decision support
Business Decision Support
  • Database Marketing
    • Target marketing
    • Customer relationship management
  • Credit Risk Management
    • Credit scoring
  • Fraud Detection
  • Healthcare Informatics
    • Clinical decision support
required expertise
Required Expertise
  • Domain
  • Data
  • Analytical Methods








Data Mining



what is data mining11
What Is Data Mining?
  • IT: Complicated database queries
  • ML: Inductive learning from examples
  • Stat: What we were taught not to do
predictive modeling
Predictive Modeling















































types of targets
Types of Targets
  • Supervised Classification
    • Event/no event (binary target)
    • Class label (multiclass problem)
  • Regression
    • Continuous outcome
  • Survival Analysis
    • Time-to-event (possibly censored)
why data mining potential applications
Why Data Mining? — Potential Applications

Database analysis and decision support

  • Market analysis and management
    • target marketing, customer relation management, market basket analysis, cross selling, market segmentation
  • Risk analysis and management
    • Forecasting, customer retention, improved underwriting, quality control, competitive analysis
  • Fraud detection and management

Other Applications

  • Text mining (news group, email, documents) and Web analysis.
  • Intelligent query answering
market analysis and management 1
Market Analysis and Management (1)

Where are the data sources for analysis?

  • Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies

Target marketing

  • Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.

Cross-market analysis

  • Associations/co-relations between product sales
  • Prediction based on the association information
market analysis and management 2
Market Analysis and Management (2)

Customer profiling

  • data mining can tell you what types of customers buy what products (clustering or classification)

Identifying customer requirements

  • identifying the best products for different customers
  • use prediction to find what factors will attract new customers
corporate analysis and risk management
Corporate Analysis and Risk Management

Finance planning and asset evaluation

  • cash flow analysis and prediction
  • contingent claim analysis to evaluate assets
  • cross-sectional and time series analysis (financial-ratio, trend analysis, etc.)

Resource planning:

  • summarize and compare the resources and spending


  • monitor competitors and market directions
  • group customers into classes and a class-based pricing procedure
  • set pricing strategy in a highly competitive market
fraud detection and management 1
Fraud Detection and Management (1)


  • widely used in health care, retail, credit card services, telecommunications (phone card fraud), etc.


  • use historical data to build models of fraudulent behavior and use data mining to help identify similar instances


  • auto insurance: detect a group of people who stage accidents to collect on insurance
  • money laundering: detect suspicious money transactions (US Treasury's Financial Crimes Enforcement Network)
  • medical insurance: detect professional patients and ring of doctors and ring of references
fraud detection and management 2
Fraud Detection and Management (2)

Detecting inappropriate medical treatment

  • Australian Health Insurance Commission identifies that in many cases blanket screening tests were requested (save Australian $1m/yr).

Detecting telephone fraud

  • Telephone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm.
  • British Telecom identified discrete groups of callers with frequent intra-group calls, especially mobile phones, and broke a multimillion dollar fraud.


  • Analysts estimate that 38% of retail shrink is due to dishonest employees.

On the News:

Can Data Mining save America’s schools?

on the news state agency nabs abuse of food stamps
On the News:State Agency Nabs Abuse of Food Stamps

JUNE 21, 2004 (COMPUTERWORLD) - The state of Louisiana issues food stamp purchase cards to 600,000 people a year -- but the recipients don't always use them to buy food. So program administrators have started using BI tools to detect suspicious activity for follow-up investigations.

When swiped at the point of sale, the purchase card creates a transactional record that's forwarded to the Louisiana Department of Social Services in Baton Rouge. This information is offloaded to a SQL Server data warehouse, where it can be sliced and diced using Information Builders Inc.'s WebFocus query software. Investigators can scan the data by geography, purchase amount and other variables to detect "signatures of fraud," says Duane Fontenot, the department's IT director.

The system has been feeding data to 40 fraud investigators for the past eight months and has information about every participating store. For instance, agents using the digital map can see where certain transactions are taking place by parish, city or even larger areas. If a food stamp recipient frequently travels 60 miles to use the card at one store -- passing 30 other stores on the way -- that could indicate a scheme to sell the cards for cash, Fontenot says. In one instance, investigators uncovered a criminal network that was converting the stamps into currency that was then wired to overseas banks. When culprits are faced with such evidence, Fontenot says, "usually, they just confess."

other applications
Other Applications


  • IBM Advanced Scout analyzed NBA game statistics (shots blocked, assists, and fouls) to gain competitive advantage for New York Knicks and Miami Heat


  • JPL and the Palomar Observatory discovered 22 quasars with the help of data mining

Internet Web Surf-Aid

  • IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages, analyzing effectiveness of Web marketing, improving Web site organization, etc.
data mining a kdd process
Data Mining: A KDD Process


Pattern Evaluation

  • Data mining: the core of knowledge discovery process.

Data Mining

Task-relevant Data


Data Warehouse

Data Cleaning

Data Integration


steps of a kdd process
Steps of a KDD Process

Learning the application domain:

  • relevant prior knowledge and goals of application

Creating a target data set: data selection

Data cleaning and preprocessing: (may take 60% of effort!)

Data reduction and transformation:

  • Find useful features, dimensionality/variable reduction, invariant representation.

Choosing functions of data mining

  • summarization, classification, regression, association, clustering.

Choosing the mining algorithm(s)

Data mining: search for patterns of interest

Pattern evaluation and knowledge presentation

  • visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

data mining and business intelligence
Data Mining and Business Intelligence

Increasing potential

to support

business decisions

End User





Data Presentation

Visualization Techniques

Data Mining



Information Discovery

Data Exploration

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts



Data Sources

Paper, Files, Information Providers, Database Systems, OLTP