Loading in 5 sec....

KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11PowerPoint Presentation

KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11

- 78 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11' - inga-decker

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

KDD Group Research SeminarFall, 2001 - Presentation 8 – 11

Incremental Learning

Friday, November 16

James Plummer

Reference

Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997.

Outline

- Machine Learning
- Extracting information from data
- Forming concepts

- The Data
- Arrangement of Data
- Attributes, Labels, and Instances

- Categorization of Data
- Results

- Arrangement of Data
- MLJ ( Machine Learning in Java )
- Collection of Machine Learning algorithms
- Current Inducers

- Incremental Learning
- Description of technique
- Nearest Neighbor Algorithm
- Distance-Weighted Algorithm

- Advantages and Disadvantages
- Gains and Loses.

Machine Learning

- Sometimes called Data Mining
- The process of extracting useful information from data
- Marketing databases, medical databases, weather databases
- Finding Consumer purchase patterns

- Marketing databases, medical databases, weather databases
- Used to form concepts
- Predictions
- Classifications
- Numeric Answers

The Data

- Arrangement of Data
- A piece of data is a set of attributes ai which make up an instance xj
- Attributes can be considered evidence

- Each instance has a label or category f(xj) (outcome value)
xj = a1, a2, a3, . . . ai; f(xj);

- A set of data is a set of instances

- A piece of data is a set of attributes ai which make up an instance xj
- Categorization
- A set of instances is used as control for new query instances xq(training)
- Calculate f^(xj) based on training data
- f^(xj) is the predicted value of the actual f(xj)

- Results
- The number of correctly predicted values over the total number of query instances
- f^(xq)correct/ f(xq)total

MLJ (Machine Learning in Java)

- MLJ is a collection of learning algorithms
- Inducers
- Categorize data to learn concepts

- Inducers
- Currently in Development
- ID3
- Uses trees

- Naïve Bayes
- Uses complex calculations

- C4.5
- Uses trees with pruning techniques

- ID3
- Incremental Learning
- Uses comparison techniques
- Soon to be added

Incremental Learning

- Instance Based Learning
- k-Nearest Neighbor
- All instances correspond to points in an n-dimensional space
- The distance between two instances is determined by:

ar(x)is therthattribute of instancex

- Given a query instance xq to be categorized the k-nearest neighbors are calculated
- f^(xq) is assigned the most frequent value of the nearest k f(xj)
- For k = 1, f^(xq) will be assigned f(xi) if xi is the closest instance in the space

- Examine three cases for the 2 dimensional space to the right
- k=1
- k=5
- Weighted, k=5

- Same as k-Nearest Neighbor
- Effect of f(xj) on f^(xq) based on d(xq, xj)
- In the case xq = xithen f^(xq) = f(xi)

Advantages and Disadvantages

- Gains of using k-Nearest Neighbor
- Individual attributes can be weighted differently
- Change d(xi, xq) to allow nearest xi to have stronger of weaker effect on f^(xq)
- Unaffected by noise in training data
- Very Effective when provided a large set of training data
- Flexible, f^(xq) can be calculated in many useful ways
- Very small training time

- Loses
- Not good when training data is insufficient
- Not very effective if similar xi have disimilar f^(xi)
- More computation time need to categorize new instances

Referrences

- Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997.
- Witten and Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”. Morgan Kaufmann publishers. 2000.
* equation reduced for simplicity

Download Presentation

Connecting to Server..