data mining l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data Mining PowerPoint Presentation
Download Presentation
Data Mining

Loading in 2 Seconds...

play fullscreen
1 / 29

Data Mining - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Data Mining. Ryan Benton Center for Advanced Computer Studies University of Louisiana at Lafayette Lafayette, La., USA January 13, 2011. Important Note. This presentation was obtained from Dr. Vijay Raghavan Obtained on January 12, 2011.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Mining' - ledell


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data mining

Data Mining

Ryan Benton

Center for Advanced Computer Studies

University of Louisiana at Lafayette

Lafayette, La., USA

January 13, 2011

important note
Important Note
  • This presentation was obtained from Dr. Vijay Raghavan
    • Obtained on January 12, 2011.
    • Dr. Raghavan is a member of the Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, La., USA
  • They are being used with his permission.
    • Some modifications have been made.
contents
CONTENTS
  • The Motivation
  • Knowledge Discovery in Databases (KDD)
  • Data Mining
    • Related Fields
    • Research Issues
    • Tasks
  • Association Mining Problem
  • Classification Mining Problem
  • Conclusions
the motivation
THE MOTIVATION

“We are drowning in information,

but starving for knowledge.”

John Naisbett

knowledge discovery in databases definition
KNOWLEDGE DISCOVERY IN DATABASES- Definition
  • A hot buzzword for a class of database applications that look for patterns or relationships in data that are:
    • Hidden,
    • Previously unknown and
    • Potentially useful
kdd definition
KDD: Definition
  • Extract (discover):
    • interesting and
    • previously unknown

knowledge from very large real world databases.

kdd definition7
KDD: Definition
  • More formally:
    • Valid,
    • Novel, Potentially useful or Desired
    • Ultimately understandable.
kdd process
KDD- PROCESS

Note, order of these steps

And what is include in

Each step may vary depending

On who you talk to.

And maybe

even this

kdd vs data mining
KDD vs. DATA MINING
  • Synonyms (?)
  • KDD
    • More than just finding pattern
    • Mining, dredging and fishing
kdd related fields
KDD- Related Fields
  • Data Warehousing
  • On-Line Analytical Processing (OLAP)
  • Database Marketing
  • Exploratory Data Analysis (EDA)
data warehousing
Data Warehousing
  • A data warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management’s decision making process.
olap and data warehousing

WEB

Relational

Data Marts

Query

Reporting

Tools

Portal

Data

Warehouse

OLAP

Tools

External

Source

MDD

Data Marts

GIS

Tools

File

Server

Geo-

Reference

Data

Meta Data

OLAP and Data Warehousing

User

data mining related areas

Visualization

Machine

Learning

Data

Mining

Statistics

Artificial

Intelligence

Expert

Systems

Pattern

Recognition

Data Mining: Related Areas

Database

Management

Systems

  • Other Areas:
    • Neural Networks
    • Evolutionary Methods
    • Information Retrieval
    • Etc.

The ‘Parent’

of Data Mining

Sometimes considered

an subarea of AI

Sometimes considered

an subarea of AI

database versus data mining
Database versus Data Mining
  • Query
    • DB: Well Defined & SQL
    • DM: Poorly Defined & Various Languages
  • Data
    • DB: Operational (and generally relational)
    • DM: Not Operational.
  • Output
    • DB: Precise, subset of the database.
    • DM: Varies.
examples
Examples
  • Database
    • Find all people with last name Raghavan.
    • Identify all customers who have bought more than 10,000 dollars
  • Data Mining
    • Find those who have poor credit
    • Find all those who like the same cars
    • Find all items that are often (frequently) purchased with milk.
    • Predict the value of the housing market.
statistics
Statistics
  • Simple descriptive models
  • Traditionally:
    • A model created from a sample of the data to the entire dataset.
  • Exploratory Data Analysis:
    • Data can actually drive the creation of the model
    • Opposite of traditional statistical view.
  • Presupposes a distribution
machine learning
Machine Learning
  • Machine Learning: area of AI that examines how to write programs that can learn.
  • Types of models
    • Classification
    • Prediction (Regression)
    • Clustering
  • Types of Learning:
    • Supervised
    • Unsupervised
  • Traditionally
    • Small Datasets
    • ‘Complete’ Data
      • Changed since about mid-1990’s
      • Examples on non-complete from earlier
      • Field isn’t static (ideas flow between)
data mining research issues
Data Mining: Research Issues
  • Ultra large data
  • Noisy data
  • Null values
  • Incomplete data
    • Note, somewhat related to NULL values
  • Redundant data
  • Dynamic aspects of data
data mining tasks
Data Mining: Tasks
  • Association
  • Classification
  • Clustering
  • Estimation
  • Data Visualization
  • Deviation Analysis
  • etc

Generally, implication that you are seeking

Relationships and/or can manipulate the data

interactively.

association mining problem
Deriving association rules from data:

Given a set of items I = {i1,i2, . . . , in} and a set of transactions S = {s1, s2, . . ., sm}, each transaction si S, such that si  I,

an association rule is defined as X  Y, where X  I, Y  I and X  Y = , describes the existence of a relationship between the two itemsets X and Y.

ASSOCIATION MINING PROBLEM
measurements
Measurements

Measures to define the strength

of the relationship between two

itemsets X and Y

applications of associations
Applications of Associations
  • I = Products, S = Baskets
  • I = Cited Articles, S = Technical Articles
  • I = Incoming Links, S = Web pages
  • I = Keywords, S = Documents
  • I = Term papers, S = Sentences
classification mining problem
Classification Mining Problem
  • Pattern Recognition and Machine Learning communities
  • Generally aimed at models of the data.
  • Often includes both
    • Categorization
    • Prediction (Regression)
  • Supervised.
clustering mining problem
Clustering Mining Problem
  • Assumption: Data, naturally, falls into groups.
    • Overlapping or Non-Overlapping
  • What are the groups?
    • And what data falls within each group.
  • Unsupervised.
measures
Measures
  • Error
    • Categorization
      • Number Bad Assignments/Total Assignments
    • Prediction
      • Mean Squared Error
  • In truth, a number of measures have been proposed.
note about data
Note about ‘Data’
  • Various types:
    • Text
    • Strings
    • Numeric
    • Sound
    • Image
    • Relations
    • Etc.
conclusions
CONCLUSIONS

KDD has interesting problems

  • It is an inter-disciplinary field
  • No matter your expertise, you can find an interesting niche
  • Many high-demand applications
    • Customer Relationship Management
    • Suggestive Sales
      • Amazon
    • Search
      • Ebay
    • Stock Prediction
      • Well, would be nice.