Data mining
Download
1 / 28

Data Mining - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

Data Mining. Lecture 2. Course Syllabus. Course topics : Introduction ( Week1-Week2 ) What is Data Mining? Data Collection and Data Management Fundamentals The Essentials of Learning The Emerging Needs for Different Data Analysis Perspectives

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data Mining' - palani


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data mining

Data Mining

Lecture 2


Course syllabus
Course Syllabus

  • Course topics:

  • Introduction (Week1-Week2)

    • What is Data Mining?

    • Data Collection and Data Management Fundamentals

    • The Essentials of Learning

    • The Emerging Needs for Different Data Analysis Perspectives

  • Data Management and Data Collection Techniques for Data Mining Applications(Week3-Week4)

    • Data Warehouses: Gathering Raw Data from Relational Databases and transforming into Information.

    • Information Extraction and Data Processing Techniques

    • Data Marts: The need for building highly specialized data storages for data mining applications


Week 2 data vs knowledge
Week 2- Data vs. Knowledge

Data (Operation)

  • Data:

    • raw

    • atomic

    • (mostly!) operational

  • Information:

    • processed

    • re-organized

    • grouped

  • Knowledge

    • patterns,models, findings ‘behind’ Information

  • Wisdom

    • perfect orchestration of Knowledge

Information

(Analytic)

Data

Knowledge

Wisdom

“Where is the wisdom we have lost in knowledge?

Where is the knowledge we have lost in information?”

T. S. Eliot


Week 2 evolution of database and information systems
Week 2- Evolution of Database and Information Systems

  • 1960s: (focus on efficient data collection)

    • Data collection, database creation, IMS and network DBMS

  • 1970s: (focus on structured data collection)

    • Relational data model, relational DBMS implementation

  • 1980s: (focus on information extraction)

    • RDBMS, advanced data models (extended- relational, OO, deductive, etc.)

    • and application-oriented DBMS (spatial, scientific, engineering, etc.)

  • 1990s – 2000s: (focus on knowledge extractionand modeling)

    • Data Mining, Data Warehousing, Multi Dimensional Databases


Week 2 data collection and data management fundamentals what is data warehouse
Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

collection of data in support of management’s decision making process”

William H. Inmon

Subject-oriented: A data warehouse is organized around major subjects,

such as customer,supplier, product, and sales.Rather than concentrating

on the day-to-day operations and transaction processing of an organization,

a data warehouse focuses on the modeling and analysis of data for

decision makers


Week 2 data collection and data management fundamentals what is data warehouse1
Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

collection of data in support of management’s decision making process”

William H. Inmon

Integrated: A data warehouse is usually constructed by integrating multiple

Heterogeneous sources, such as relational databases, flat files, and on-line

transaction records. Data cleaning and data integration techniques are applied

to ensure consistency in naming conventions, encoding structures, attribute

measures, and so on.


Week 2 data collection and data management fundamentals what is data warehouse2
Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile

collection of data in support of management’s decision making process”

William H. Inmon

Time-variant: Data are stored to provide information from a historical perspective

(e.g., the past 5–10 years). Every key structure in the data warehouse contains, either

implicitly or explicitly, an element of time.

Nonvolatile: A data warehouse is always a physically separate store of data transformed

from the application data found in the operational environment. Due to this separation,

a data warehouse does not require transaction processing, recovery, and concurrency

control mechanisms. It usually requires only two operations in data accessing:

initial loading of data and access of data.


Week 2 data collection and data management fundamentals what is data warehouse3
Week 2- Data Collection and Data Management Fundamentals – What is Data Warehouse

  • data cleaning

  • data integration

  • data consolidation


Week 2 data collection and data management fundamentals what is olap
Week 2- Data Collection and Data Management Fundamentals – What is OLAP

  • object oriented methodology comes in

  • entities (cubes)

  • attributes (dimensions)



Week 2 data collection and data management fundamentals what is olap2
Week 2- Data Collection and Data Management Fundamentals – What is OLAP

  • Multi Dimensional Database Modeling

    • star schema

    • snowflake schema

    • fact constellation schema

  • fact vs dimension





Week 2 data collection and data management fundamentals olap operations
Week 2- Data Collection and Data Management Fundamentals – OLAP Operations

  • roll-up

  • drill-down

  • slice

  • dice

  • pivot (rotation)

taken from the Text Book


Week 2 data collection and data management fundamentals olap operations1
Week 2- OLAP OperationsData Collection and Data Management Fundamentals – OLAP Operations


Week 2 data collection and data management fundamentals what is data mart
Week 2- Data Collection and Data Management Fundamentals – What is Data Mart ?

data warehouse

information about subjects that span the entire organization,

its scope is enterprise-wide.

which modeling schema ?

the fact constellation schema is commonly used, since it can model

multiple, interrelated subjects.

data mart

a department subset of the data warehouse that focuses on selected subjects,

its scope is departmentwide.

which modeling schema ?

the star or snowflake schema are commonly used, since both are

geared toward modeling single subjects


Week2-OLAP vs Data Mining What is Data Mart ?

  • On-Line Analytical Processing

  • provides the ability to pose statistical and summary queries interactively (traditional On-Line Transaction Processing (OLTP) databases may take minutes or even hours to answer these queries)

  • Advantages relative to data mining

    • Can obtain a wider variety of results

    • Generally faster to obtain results

  • Disadvantages relative to data mining

    • User must “ask the right question”

    • Generally used to determine high-level statistical summaries, rather than specific relationships among instances


Week2-Reporting vs Data Mining What is Data Mart ?

  • Reporting

    • Last months sales for each service type

    • Sales per service grouped by customer sex or age bracket

    • List of customers who lapsed their policy

  • Data Mining

    • What characteristics do customers that lapse their policy have in common and how do they differ from customers who renew their policy?

    • Which motor insurance policy holders would be potential customers for my House Content Insurance policy?


Week2- Data to Knowledge Pyramid What is Data Mart ?

Increasing potential

to support

business decisions

End User

Making

Decisions

Business

Analyst

Data Presentation

Visualization Techniques

Data Mining

Data

Analyst

Information Discovery

Data Exploration

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

OLAP, MDA

DBA

Data Sources

Paper, Files, Information Providers, Database Systems, OLTP


Week 2 data mining perspective to knowledge discovery

Interpretation/ What is Data Mart ?

Evaluation

Data Mining

Preprocessing

Patterns

Selection

Preprocessed

Data

Data

Target

Data

Week 2- Data Mining Perspective to Knowledge Discovery

Knowledge

adapted from:

U. Fayyad, et al. (1995), “From Knowledge Discovery to Data Mining: An Overview,” Advances in Knowledge Discovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT Press


Week2- Data Mining Process Flow What is Data Mart ?

Visualization and

Human Computer

Interaction

Plan

for

Learning

Generate

and Test

Hypotheses

Discover

Knowledge

Determine

Knowledge

Relevancy

Evolve

Knowledge/

Data

Goals for Learning

Knowledge Base

Database(s)

Background Knowledge

Discovery Algorithms

“In order to discoveranything, you mustbelooking forsomething”

Laws of Serendipity


Week2-Simplified view of Data Mining Process Flow What is Data Mart ?

Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-base

Database or data warehouse server

Filtering

Data cleaning & data integration

Data

Warehouse

Databases


Week 2 extended perspective on data mining process flow

Mining query What is Data Mart ?

Mining result

Layer4

User Interface

User GUI API

OLAM

Engine

OLAP

Engine

Layer3

OLAP/OLAM

Data Cube API

Layer2

MDDB

MDDB

Meta Data

Database API

Filtering&Integration

Filtering

Layer1

Data Repository

Data cleaning

Data

Warehouse

Databases

Data integration

Week 2- Extended Perspective on Data Mining Process Flow


Week 2 essentials of learning
Week 2- Essentials of Learning What is Data Mart ?

  • Learning ?

  • can we formalize it?

  • is it just a chemical activation?

  • is it memorization?

  • is it continous node connecting/disconnecting on dynamically changing brain network topology?


Week 2 essentials of learning1
Week 2- Essentials of Learning What is Data Mart ?

  • The Artifical Intelligence View:

    • central to human knowledge and intelligence, essential for building intelligent machines.

    • years of effort in AI has shown that trying to build intelligent computers by programming all the rules cannot be done; automatic learning is crucial. For example, we humans are not born with the ability to understand language — we learn it — and it makes sense to try to have computers learn language instead of trying to program it all it


Week 2 essentials of learning2
Week 2- Essentials of Learning What is Data Mart ?

  • The Software Engineering View:

    • Machine Learning allows us to program computers by example, which can be easier than writing code the traditional way.

  • The Stats View:

    • Machine Learning is the marriage of computer science and statistics

    • computational techniques are applied to statistical problems. Machine Learning has been applied to a vast number of problems in many contexts, beyond the typical statistics problems. Machine Learning is often designed with different considerations than statistics (e.g., speed is often more important than accuracy).


Week 2 end
Week 2-End What is Data Mart ?

  • Please check the web site for Learning Theory and its Esssentials:

    http://www.infed.org/biblio/b-learn.htm

  • read

    • Course Text Book Chapter 3


ad