KDD Group Research Seminar
This presentation is the property of its rightful owner.
Sponsored Links
1 / 10

KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11 PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11. Incremental Learning. Friday, November 16 James Plummer [email protected] Reference Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997. Outline. Machine Learning Extracting information from data Forming concepts

Download Presentation

KDD Group Research Seminar Fall, 2001 - Presentation 8 – 11

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Kdd group research seminar fall 2001 presentation 8 11

KDD Group Research SeminarFall, 2001 - Presentation 8 – 11

Incremental Learning

Friday, November 16

James Plummer

[email protected]

Reference

Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997.


Outline

Outline

  • Machine Learning

    • Extracting information from data

    • Forming concepts

  • The Data

    • Arrangement of Data

      • Attributes, Labels, and Instances

    • Categorization of Data

    • Results

  • MLJ ( Machine Learning in Java )

    • Collection of Machine Learning algorithms

    • Current Inducers

  • Incremental Learning

    • Description of technique

    • Nearest Neighbor Algorithm

    • Distance-Weighted Algorithm

  • Advantages and Disadvantages

    • Gains and Loses.


Machine learning

Machine Learning

  • Sometimes called Data Mining

  • The process of extracting useful information from data

    • Marketing databases, medical databases, weather databases

      • Finding Consumer purchase patterns

  • Used to form concepts

    • Predictions

    • Classifications

    • Numeric Answers


The data

The Data

  • Arrangement of Data

    • A piece of data is a set of attributes ai which make up an instance xj

      • Attributes can be considered evidence

    • Each instance has a label or category f(xj) (outcome value)

      xj = a1, a2, a3, . . . ai; f(xj);

    • A set of data is a set of instances

  • Categorization

    • A set of instances is used as control for new query instances xq(training)

    • Calculate f^(xj) based on training data

      • f^(xj) is the predicted value of the actual f(xj)

  • Results

    • The number of correctly predicted values over the total number of query instances

    • f^(xq)correct/ f(xq)total


Data example

Yes

No

Yes

Data Example

  • Predict the values of Example 6, 7, 8 given data examples 1 through 5


Mlj machine learning in java

MLJ (Machine Learning in Java)

  • MLJ is a collection of learning algorithms

    • Inducers

      • Categorize data to learn concepts

  • Currently in Development

    • ID3

      • Uses trees

    • Naïve Bayes

      • Uses complex calculations

    • C4.5

      • Uses trees with pruning techniques

  • Incremental Learning

    • Uses comparison techniques

    • Soon to be added


Incremental learning

Incremental Learning

  • Instance Based Learning

    • k-Nearest Neighbor

    • All instances correspond to points in an n-dimensional space

      • The distance between two instances is determined by:

ar(x)is therthattribute of instancex

  • Given a query instance xq to be categorized the k-nearest neighbors are calculated

  • f^(xq) is assigned the most frequent value of the nearest k f(xj)

  • For k = 1, f^(xq) will be assigned f(xi) if xi is the closest instance in the space


Distance weighted nearest neighbor

  • Examine three cases for the 2 dimensional space to the right

    • k=1

    • k=5

    • Weighted, k=5

Distance-Weighted Nearest Neighbor

  • Same as k-Nearest Neighbor

    • Effect of f(xj) on f^(xq) based on d(xq, xj)

    • In the case xq = xithen f^(xq) = f(xi)


Advantages and disadvantages

Advantages and Disadvantages

  • Gains of using k-Nearest Neighbor

    • Individual attributes can be weighted differently

    • Change d(xi, xq) to allow nearest xi to have stronger of weaker effect on f^(xq)

    • Unaffected by noise in training data

    • Very Effective when provided a large set of training data

    • Flexible, f^(xq) can be calculated in many useful ways

    • Very small training time

  • Loses

    • Not good when training data is insufficient

    • Not very effective if similar xi have disimilar f^(xi)

    • More computation time need to categorize new instances


Referrences

Referrences

  • Mitchell, Tom M. “Machine Learning” MaGraw-Hill Companies. 1997.

  • Witten and Frank, “Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations”. Morgan Kaufmann publishers. 2000.

    * equation reduced for simplicity


  • Login