- 273 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Learning & Data Mining' - Albert_Lan

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Learning

- Change of contents and organization of system’s knowledge enabling to improve its performance on task - Simon
- Acquire new knowledge from environment
- Organize its current knowledge
- Inductive Inference
- General conclusion from examples
- Infer association between input and output
- with some confidence
- Incremental vs Batch

General Model of Learning Agent

Performance standard

Environment

Sensors

Critics

Feedback

Change

Learning

Module

Performance

Module

Knowledge

Learning

Goals

Problem

Generator

Effectors

From Artificial Intelligence : a modern Approach

by Russel and Norvig

Classification of Inductive Learning

- Supervised Learning
- given training examples
- correct input-output pairs
- recover unknown function from data generated from the function
- generalization ability for unseen data
- classification : function is discrete
- concept learning : output is binary
- Unsupervised Learning

Classification of Inductive Learning

- Supervised Learning
- Unsupervised Learning
- No correct input-output pairs
- needs other source for determining correctness
- reinforcement learning : yes/no answer only
- example : chess playing
- Clustering : group into clusters of common characteristics
- Map Learning : explore unknown territory
- Discovery Learning : uncover new relationships

데이터 마이닝

- 데이터 마이닝(data mining)의 정의
- 대량의 실제 데이터로부터
- 이전에 잘 알려지지는 않았지만
- 묵시적이고
- 잠재적으로 유용한 정보를

추출하는 작업

Cf) KDD(Knowledge Discovery in Database)

데이터로부터 지식을 추출하는 전 과정

데이터 마이닝 ⊂ KDD

데이터 마이닝 기술 ( II )

- 데이터 마이닝 주요 작업(primary task)
- 분류화(Classification)
- 군집화(Clustering)
- 특성화(Characterization, Summerization)
- 경향 분석(Trend analysis)
- 연관규칙 탐사(Association, Market basket analysis)
- 패턴 분석(Pattern analysis)
- Estimation
- Prediction

데이터 마이닝 기술( III )

- 응용 분야
- Marketing & Retail
- Banking
- Finance
- Insurance
- Medicine & Health(Genetics)
- Quality control
- Transportation
- Geo • Spatial Application

Data Mining Tasks(1)

- classification
- Examples

News ⇒ [international] [domestic]

[sports]

[culture]…

large

medium

small

predefinedclasses

objects

Data Mining Tasks(2)

- Classification - continued

Credit application ⇒ [high] [medium] [low]

Water sample => [일급수]

[이급수]

…

[구정물]

- Algorithm
- Decision trees, Memory based reasoning

Data Mining Tasks(3)

- Estimation

cf. classification maps to discrete categories

- Examples
- 나이, 성별, 혈압… ⇒ 잔여수명
- 나이, 성별, 직업… ⇒ 연수입
- 지역, 수량(水), 인구 -> 오염농도
- Algorithm : Neural network
- Estimating future value is called Prediction

attr1

attr2

attr3

…

(continuous)

value

data

Data Mining Tasks(4)

- Association (Market basket analysis)

- determine which things go together

- Example
- shopping list ⇒ Cross-Selling(supermarket (shelf, catalog, CF…) home-shopping, E-shopping…)
- Association rules

Data Mining Tasks(5)

- Clustering

cf. classification - predefined category

clustering - find new category &

explain the category

G1

G2

G3

G4

heterogeneous population

homogeneous subgroups(clusters)

Data Mining Tasks(6)

- Clustering -continued
- Examples
- Symptom ⇒ Disease
- Customer information ⇒ Selective sales
- 토양(수질) data

Note: clustering is dependent to the

features used

card 예: number, color, suite …

Data Mining Tasks(7)

- Clustering - continued
- Clustering is useful for Exception finding
- Algorithm

K-means -> K clusters

Note:Directed vs. Non-directed KDD

exception

- calling card fraud detection
- credit card fraud. etc.

데이터 마이닝 기술(IV)

- 데이터 마이닝 기법
- 연관규칙(association rule)
- K-최단인접(k-nearest neighbor)
- 의사결정 트리(decision tree)
- 신경망(neural network)
- 유전자 알고리즘(genetic algorithm)
- 통계적 기법(statistical technique)

Market Basket Analysis (Associations) (1/10)

O: Orange Juice M: Milk

S: Soda W: Window Cleaner

D: Detergent

Market Basket Analysis (Associations) (2/10)

- Co.Occurrence Table

Market Basket Analysis (Associations) (3/10)

{ S , O} : Co-Occurrence of 2

R1 - if S Then O

R2 - if O Then S

- Support - 전체 data중 몇 percent가 이를 포함?

Confidence - 전체 LHS 중 몇 percent 가 규칙만족?

eg. Support of R1 2 / 5 40%

Confidence of R1 2 / 3

confidence of R2 2 / 4

determine “How Good” is the Rule

Market Basket Analysis (Associations) (4/10)

- Probability Table {A, B, C}

Market Basket Analysis (Associations) (5/10)

R1: If A ∧ B then C

R2: If A ∧ C then B

R3: If B ∧ C then A

- Confidence

Support =5

Market Basket Analysis (Associations) (6/10)

- R3 has the best confidence (0.33)

but is it GOOD?

Note: R3 : If B ∧C then A (0.33)

A (0.45)

예: 머리 긴 사람 여자

- Improvement -> How good is the rule

compared to random guessing?

?

Market Basket Analysis (Associations) (7/10)

improvement=

improvement > 1: criteria

P(condition and result)

P(condition) P(result)

Market Basket Analysis (Associations) (8/10)

- Some Issues
- overall algorithm

build co-occurrence matrix for

1 item, 2 items, 3 items, etc.

-> complex!!

- Pruning

eg. minimum support pruning

- Virtual Item

season, store, geographic information

combined with real : items

eg. If OJ ∧ Milk ∧Friday then Beer

Market Basket Analysis (Associations) (9/10)

- Level of Description

How specific !

Drink Soda Coke

- 장점

- explainability

- undirected Data Mining

- variable length data

- simple computation

Market Basket Analysis (Associations) (10/10)

- 단점

- Complex as data grows

- Limited Data Type (attributes)

- Difficult to determine right number of items

- Rare Items --> pruned

Clustering Algorithm (1/2)

- k-means method ( Mc Queen ‘61)

- lot of variations

- Alg. Step

1. Choose initial k-points (seeds)

2. Find closest neighbors for k points

( initial cluster)

3. Find centroid for the cluster

4. goto step 2

stop when no more change

y1 + … + yn

(x2, y2)

(x3, y3)

n

n

,

(xn, yn)

(x1, y1)

Clustering Algorithm (2/2)Note: Finding neighbors

- Finding Centroid

Variation of k-means

1. Use probability density rather than simple distance

eg. Gaussian mixture Models

2. Weighted Distance

3. Agglomeration Method

- hierarchical cluster

Agglomerative Algorithm

1. Start with every single record as a cluster(N)

2. Select closest cluster and combine them

(N-1 clusters)

3. go to step 2

4. Stop at the right level (number)

what is closest?

Distance between clusters

- 3 measures

1. Single linkage

closest members

2. Complete linkage

most distant members

3. centeroids

Clustering

- Strength

1. Undirected Knowledge Discovery

2. Categorical, Numeric, Textual data 에 적합

3. Easy to Apply

- Weakness

1. Can be difficult to choose right (distance)

measure & weight

2. Initial parameter에 sensitive

3. Can be hard to interpret

normal

reduced

none

astigmatism

yes

no

spectacle press

soft

hypermetrope

myope

none

hard

Decision Tree(contact lens)Class 2

Learning

function

input

…

…

Class n

classification

yes

concept

input

Concept learning

no

decision tree

Concept Learningeg. red

good customer

Decision Tree for weather (1/4)

outlook

sunny

r

o

humidity

windy

no

high

n

f

t

no

yes

yes

no

If Outlook = sunny then play = no and humidity = high

Decision Tree for weather (2/4)

note: temp, humid can be numeric data

temp>30 (hot)

10<= temp <= 30 (normal)

temp<10 (cool)

Decision Tree for weather (3/4)

- attribute
- Attribute types
- nominal ( categorical discreet )
- ordinal ( numeric continuous)
- interval [10,20]
- ratio – real numbers

Decision Tree for weather (4/4)

note: Leaf node doesn’t have to be yes/no

--> classification

tear

normal

reduced

astigmatism

none

no

hard

soft

Contact lens

Decision Tree 를 이용한 Prediction

A

Build trees

C

B

Training

(set)

...

Test

(set)

Evaluation

set

B

Choose best

A

data

real data

Predict expected

performance

Unseen data

Error

rate

Training data

Depth of Tree

The effect of pruning- Some issues
- where to prune?

Too high -> unnecessarily complex

too low -> lose information

- what to split?

(first)

Error Rate

y

y

y

n

y

y

n

y

er=2/7

- Adjusted error rate of a tree

AE(T)= E(T) + α leaf-count(T)

- Find sub tree α1 of T s.t.

AE(α1) <= AE(T)

then prune all the branches

that are not part of α1

Possible sub trees for weather data (1/2)

first split?

(a) (b)

temp

outlook

sunny

not

cool

rainy

o

mild

y

y

y

n

n

y

y

y

y

y

y

n

n

n

y

y

y

n

y

y

y

y

n

n

y

y

n

n

Possible sub trees for weather data (2/2)

(c ) (d)

windy

humidity

high

true

normal

false

y

y

y

y

y

y

n

y

y

y

n

n

n

n

y

y

y

n

n

n

y

y

y

y

y

y

n

n

Information Theory & Entropy

info([2,3]) = 0.971 bit

info([4,0]) = 0.0 bit

info([3,2]) = 0.971 bit

-> info ([2,3], [4,0], [3,2])

= (5/14) * 0.971 + (4/14) * 0 + (5/14) * 0.971

= 0.693 bit

gain(outlook) = info([9,5]) - info([2,3], [4,0],[3,2])

= 0.247 bits

gain(temp) = 0.029 bit

gain(humid) = 0.152 bit

gain(windy) = 0.048 bit

Calculating info(x) - entropy

- if either #yes or #no is 0

then info(x) = 0

- if #yes = #no then

info(x) is max.value

- can cover multi class situation

eg. Info[2,3,4]

= info( [2,7] + 7/9 * info[3,4] )

=> entropy(p1, p2, … , pn) = - p1log p1 - p2 logp2

- … - pn log pn

info([2,3,4]) = entropy ( 2/9, 3/9, 4/9 )

-> -2/9 * log 2/9 - 3/9 * log 3/9 - 4/9 log 4/9

= [-2log 2 - 3 log 3 - 4 log 4 + 9 log 9] /9

Algorithms: CART, C4.5

- CART - binary tree only

Briemen ‘84

- C4.5

Quinlan ‘86 => ID3

- Clementine
- NCR
- CHAID Hartigan ‘75

Download Presentation

Connecting to Server..