an introduction to machine learning with perl n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An Introduction to Machine Learning with Perl PowerPoint Presentation
Download Presentation
An Introduction to Machine Learning with Perl

Loading in 2 Seconds...

play fullscreen
1 / 83

An Introduction to Machine Learning with Perl - PowerPoint PPT Presentation


  • 744 Views
  • Uploaded on

An Introduction to Machine Learning with Perl. February 3, 2003 O’Reilly Bioinformatics Conference. Ken Williams ken@mathforum.org. Tutorial Overview. What is Machine Learning? (20’) Why use Perl for ML? (15’) Some theory (20’) Some tools (30’) Decision trees (20’) SVMs (15’)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

An Introduction to Machine Learning with Perl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an introduction to machine learning with perl

An Introduction to Machine Learning with Perl

February 3, 2003

O’Reilly Bioinformatics Conference

Ken Williams

ken@mathforum.org

tutorial overview
Tutorial Overview
  • What is Machine Learning? (20’)
  • Why use Perl for ML? (15’)
  • Some theory (20’)
  • Some tools (30’)
  • Decision trees (20’)
  • SVMs (15’)
  • Categorization (40’)
references sources
References & Sources
  • Machine Learning, Tom Mitchell. McGraw-Hill, 414pp, 1997
  • Foundations of Natural Language Processing, Christopher D. Manning & Hinrich Schütze. MIT Press, 680 pp, 1999
  • Perl-AI list (perl-ai@perl.org)
what is machine learning
What Is Machine Learning?
  • A subfield of Artificial Intelligence (but without the baggage)
  • Usually concerns some particular task, not the building of a sentient robot
  • Concerns the design of systems that improve (or at least change) as they acquire knowledge or experience
typical ml tasks
Typical ML Tasks
  • Clustering
  • Categorization
  • Recognition
  • Filtering
  • Game playing
  • Autonomous performance
typical ml tasks2
Typical ML Tasks
  • Categorization
typical ml tasks3
Typical ML Tasks
  • Recognition

Vincent Van Gogh

Michael Stipe

Mohammed Ali

Ken Williams

Burl Ives

Winston Churchill

Grover Cleveland

typical ml tasks4
Typical ML Tasks
  • Recognition

Little red corvette The kids are all right

The rain in Spain Bort bort bort

typical ml tasks6
Typical ML Tasks
  • Game playing
typical ml tasks7
Typical ML Tasks
  • Autonomous performance
typical ml buzzwords
Typical ML Buzzwords
  • Data Mining
  • Knowledge Management (KM)
  • Information Retrieval (IR)
  • Expert Systems
  • Topic detection and tracking
who does ml
Who does ML?
  • Two main groups: research and industry
  • These groups do listen to each other, at least some
  • Not many reusable ML/KM components, outside of a few commercial systems
  • KM is seen as a key component of big business strategy - lots of KM consultants
  • ML is an extremely active research area with relatively low “cost of entry”
when is ml useful
When is ML useful?
  • When you have lots of data
  • When you can’t hire enough people, or when people are too slow
  • When you can afford to be wrong sometimes
  • When you need to find patterns
  • When you have nothing to lose
an aside on your presenter
An aside on your presenter
  • Academic background in math & music (not computer science or even statistics)
  • Several years as a Perl consultant
  • Two years as a math teacher
  • Currently studying document categorization at The University of Sydney
  • In other words, a typical ML student
why use perl for ml
Why use Perl for ML?
  • CPAN - the viral solution™
  • Perl has rapid reusability
  • Perl is widely deployed
  • Perl code can be written quickly
  • Embeds both ways
  • Human-oriented development
  • Leaves your options open
but what about all the data
But what about all the data?
  • ML techniques tend to use lots of data in complicated ways
  • Perl is great at data in general, but tends to gobble memory or forego strict checking
  • Two fine solutions exist:
    • Be as careful in Perl as you are in C (Params::Validate, Tie::SecureHash, etc.)
    • Use PDL or Inline (more on these later)
interfaces vs implementations
Interfaces vs. Implementations
  • In ML applications, we need both data integrity and the ability to “play with it”
  • Perl wrappers around C/C++ structures/objects are a nice balance
  • Keeps high-level interfaces in Perl, low-level implementations in C/C++
  • Can be prototyped in pure Perl, with C/C++ parts added later
some ml theory and terminology
Some ML Theory and Terminology
  • ML concerns learning a target function from a set of examples
  • The target function is often called a hypothesis
  • Example: with Neural Network, a trained network is a hypothesis
  • The set of all possible target functions is called the hypothesis space
  • Training process can be considerd a search through the hypothesis space
some ml theory and terminology1
Some ML Theory and Terminology
  • Each ML technique will
    • probably exclude some hypotheses
    • prefer some hypotheses over others
  • A technique’s exclusion & preference rules are called its inductive bias
  • If it ain’t biased, it ain’t learnin’
    • No bias = rote learning
    • Bias = generalization
  • Example: kids learning multiplication (understanding vs. memorization)
some ml theory and terminology2
Some ML Theory and Terminology
  • Ideally, a ML technique will
    • not exclude the “right” hypothesis, i.e. the hypothesis space will include the target hypothesis
    • Prefer the target hypothesis over others
  • Measuring the degree to which these criteria are satisfied is important and sometimes complicated
evaluating hypotheses
Evaluating Hypotheses
  • We often want to know how good a hypothesis is
    • To know how it performs in real world
    • May be used to improve learning technique or tune parameters
    • May be used by a learner to automatically improve the hypothesis
  • Usually evaluate on test data
    • Test data must be kept separate from training data
    • Test data used for purpose 3) is usually called validation or held-out data.
    • Training, validation, and test data should not contaminate each other
evaluating hypotheses1
Evaluating Hypotheses
  • Some standard statistical measures are useful
  • Error rate, accuracy, precision, recall, F1
  • Calculated using contingency tables
evaluating hypotheses2
Evaluating Hypotheses
  • Error = (b+c)/(a+b+c+d)
  • Accuracy = (a+d)/(a+b+c+d)
  • Precision = p = a/(a+b)
  • Recall = r = a/(a+c)
  • F1 = 2pr/(p+r)

Precision is easy to maximize by assigning nothing

Recall is easy to maximize by assigning everything

F1 combines precision and recall equally

evaluating hypotheses3
Evaluating Hypotheses
  • Example (from categorization)
  • Note that precision is higher than recall - indicates a cautious categorizer

Precision = 0.851, Recall = 0.711, F1 = 0.775

These scores depend on the task - can’t compare scores across tasks

Often useful to compare categories separately, then average (macro-averaging)

evaluating hypotheses4
Evaluating Hypotheses
  • The Statistics::Contingency module (on CPAN) helps calculate these figures:

use Statistics::Contingency;

my $s = new Statistics::Contingency;

while (...) {

... Do some categorization ...

$s->add_result($assigned, $correct);

}

print "Micro F1: ", $s->micro_F1, "\n";

print $s->stats_table;

Micro F1: 0.774803607797498

+-------------------------------------------------+

| miR miP miF1 maR maP maF1 Err |

| 0.243 0.843 0.275 0.711 0.851 0.775 0.006 |

+-------------------------------------------------+

useful perl data munging tools
Useful Perl Data-Munging Tools
  • Storable - cheap persistence and cloning
  • PDL - helps performance and design
  • Inline::C - tight loops and interfaces
storable
Storable
  • One of many persistence classes for Perl data (Data::Dumper, YAML, Data::Denter)
  • Allows saving structures to disk:

store($x, $filename);

$x = retrieve($filename);

  • Allows cloning of structures:

$y = dclone($x);

  • Not terribly interesting, but handy
slide30
PDL
  • Perl Data Language
  • On CPAN, of course (PDL-2.3.4.tar.gz)
  • Turns Perl into a data-processing language similar to Matlab
  • Native C/Fortran numerical handling
  • Compact multi-dimensional arrays
  • Still Perl at highest level
pdl demo
PDL demo

PDL experimentation shell:

ken% perldl

perldl> demo pdl

extending pdl
Extending PDL
  • PDL has extension language PDL::PP

Lets you write C extensions to PDL

Handles many gory details (data types, loop indexes, “threading”)

extending pdl1
Extending PDL
  • Example: $n = $pdl->sum_elements;

# Usage:

$pdl = PDL->random(7);

print "PDL: $pdl\n";

$x = $pdl->sum_elements;

print "Sum: $sum\n";

# Output:

PDL: [0.513 0.175 0.308 0.534 0.947 0.171 0.702]

Sum: [3.35]

extending pdl2
Extending PDL

pp_def('sum_elements',

Pars => 'a(n); [o]b();',

Code => <<'EOF’,

double tmp;

tmp = 0;

loop(n) %{

tmp += $a();

%}

$b() = tmp;

EOF

);

extending pdl3
Extending PDL

pp_def('sum_elements',

Pars => 'a(n); [o]b();',

Code => <<'EOF’,

double tmp;

tmp = 0;

loop(n) %{

tmp += $a();

%}

$b() = tmp;

EOF

);

extending pdl4
Extending PDL

pp_def('sum_elements',

Pars => 'a(n); [o]b();',

Code => <<'EOF’,

double tmp;

tmp = 0;

loop(n) %{

tmp += $a();

%}

$b() = tmp;

EOF

);

extending pdl5
Extending PDL

pp_def('sum_elements',

Pars => 'a(n); [o]b();',

Code => <<'EOF’,

$GENERIC() tmp;

tmp = ($GENERIC()) 0;

loop(n) %{

tmp += $a();

%}

$b() = tmp;

EOF

);

inline c
Inline::C
  • Allows very easy embedding of C code in Perl modules
  • Also Inline::Java, Inline::Python, Inline::CPP, Inline::ASM, Inline::Tcl
  • Considered much easier than XS or SWIG
  • Developers are very enthusiastic and helpful
inline c basic syntax
Inline::C basic syntax
  • A complete Perl script using Inline:

(taken from Inline docs)

#!/usr/bin/perl

greet();

use Inline C => q{

void greet() { printf("Hello, world\n"); }

}

inline c for writing functions
Inline::C for writing functions
  • Find next prime number greater than $x

#!/usr/bin/perl

foreach (-2.7, 29, 30.33, 100_000) {

print "$_: ", next_prime($_), "\n";

}

. . .

inline c for writing functions1
Inline::C for writing functions

use Inline C => q{

int next_prime(double in) {

// Implements a Sieve of Eratosthenes

int *is_prime;

int i, j;

int candidate = ceil(in);

if (in < 2.0) return 2;

is_prime = malloc(2 * candidate * sizeof(int));

for (i = 0; i<2*candidate; i++) is_prime[i] = 1;

. . .

inline c for writing functions2
Inline::C for writing functions

for (i = 2; i < 2*first_candidate; i++) {

if (!is_prime[i]) continue;

if (i >= first_candidate) { free(is_prime); return i; }

for (j = i; j < 2*first_candidate; j += i) is_prime[j] = 0;

}

return 0; // Should never get here

}

}

inline c for wrapping libraries
Inline::C for wrapping libraries
  • We’ll create a wrapper for ‘libbow’, an IR package
  • Contains an implementation of the Porter word-stemming algorithm (i.e., the stem of 'trying' is 'try’)

# A Perlish interface:

$stem = stem_porter($word);

# A C-like interface:

stem_porter_inplace($word);

inline c for wrapping libraries1
Inline::C for wrapping libraries

package Bow::Inline;

use strict;

use Exporter;

use vars qw($VERSION @ISA @EXPORT_OK);

BEGIN {

$VERSION = '0.01';

}

@ISA = qw(Exporter);

@EXPORT_OK = qw(stem_porter

stem_porter_inplace);

. . .

inline c for wrapping libraries2
Inline::C for wrapping libraries

use Inline (C => 'DATA',

VERSION => $VERSION,

NAME => __PACKAGE__,

LIBS => '-L/tmp/bow/lib -lbow',

INC => '-I/tmp/bow/include',

CCFLAGS => '-no-cpp-precomp',

);

1;

__DATA__

__C__

. . .

inline c for wrapping libraries3
Inline::C for wrapping libraries

// libbow includes bow_stem_porter()

#include "bow/libbow.h"

// The bare-bones C interface exposed

int stem_porter_inplace(SV* word) {

int retval;

char* ptr = SvPV_nolen(word);

retval = bow_stem_porter(ptr);

SvCUR_set(word, strlen(ptr));

return retval;

}

. . .

inline c for wrapping libraries4
Inline::C for wrapping libraries

// A Perlish interface

char* stem_porter (char* word) {

if (!bow_stem_porter(word)) return &PL_sv_undef;

return word;

}

// Don't know what the hell these are for in libbow,

// but it needs them.

const char *argp_program_version = "foo 1.0";

const char *program_invocation_short_name = "foofy";

when to use speed tools
When to use speed tools
  • A word of caution - don’t use C or PDL before you need to
  • Plain Perl is great for most tasks and usually pretty fast
  • Remember - external libraries (like libbow, pari-gp) both solve problems and create headaches
decision trees
Decision Trees
  • Conceptually simple
  • Fast evaluation
  • Scrutable structures
  • Can be learned from training data
  • Can be difficult to build
  • Can “overfit” training data
  • Usually prefer simpler, i.e. smaller trees
decision trees1
Decision Trees
  • Sample training data:
decision trees2
Decision Trees
  • How do we build the tree from the training data?
  • We want to make the smallest possible trees
  • Which attribute (Outlook, Wind, etc.) is the best classifier?
  • We need a measurement of how much information a given attribute contributes toward the outcome.
  • We use information gain (IG), which is based on the entropy of the training instances.
  • The attribute with the highest IG is the “most helpful” classifier, and reduces entropy the most.
decision trees3
Decision Trees
  • From Information Theory, invented by Claude Shannon
  • Measures uncertainty of a decision between alternate options
  • Probabilistically expected value of the number of bits necessary to specify value of an attribute
  • i represents an attribute value, pi represents the probability of seeing that attribute.
decision trees4
Decision Trees

sub entropy {

my %prob;

$prob{$_}++ foreach @_;

$_ /= @_ foreach values %prob;

my $sum = 0;

$sum += $_ * log($_) foreach values %prob;

return -$sum / log(2);

}

decision trees5
Decision Trees
  • Si are the subsets of S having attribute I value i
  • IG is original entropy minus entropy after knowing attribute i
  • Find argmaxI(Gain(S,I)) at each splitting node
  • To maximize IG, we can just minimize the second term on the right, since Entropy(S) is constant
  • This is the ID3 algorithm (J. R. Quinlan, 1986)
decision trees6
Decision Trees
  • Decision trees in Perl are available with AI::DecisionTree (on CPAN)
  • Very simple OO interface
  • Currently implements ID3
    • Handles either consistent or noisy input
    • Can post-prune trees using a Minimum Message Length criterion
    • Doesn’t do cross-validation
    • Doesn’t handle continuous data
  • More robust feature sets are needed - patches welcome!
decision trees example
Decision Trees - Example

use AI::DecisionTree;

my $dtree = new AI::DecisionTree;

# Add training instances

$dtree->add_instance

(attributes => {outlook => 'sunny',

temperature => 'hot',

humidity => 'high'},

result => 'no');

$dtree->add_instance

(attributes => {outlook => 'overcast',

temperature => 'hot',

humidity => 'normal'},

result => 'yes');

# ... repeat for several more instances

decision trees example1
Decision Trees - Example

# ... continued ...

$dtree->train;

# Find results for unseen instances

my $result = $dtree->get_result

(attributes => {outlook => 'sunny',

temperature => 'hot',

humidity => 'normal'});

print "Result: $result\n";

slide58
SVMs
  • Another ML technique
  • Measures features quantitatively, induces a vector space
  • Finds the optimal decision surface
slide59
SVMs
  • Data may be inseparable
  • Same algorithms usually work, find “best” surface
  • Different surface shapes may be used
  • Usually scales well with number of features, poorly with number of examples
svms example
SVMs - Example

use Algorithm::SVM;

use Algorithm::SVM::DataSet;

# Collect & format the data:

my @data;

for (...) {

push @data, Algorithm::SVM::DataSet->new

( Label => $foo,

Data => \@bar, );

}

# Train the SVM:

my $svm = Algorithm::SVM->new(Kernel => ‘linear’);

$svm->train(@data);

... continued ...

svms example1
SVMs - Example

my $test = Algorithm::SVM::DataSet->new

( Label => undef,

Data => \@baz, );

}

my $result = $svm->predict($test);

print "Predicted: $result\n";

text categorization
Text Categorization
  • Text categorization, and categorization in general, is an extremely powerful ML technique
  • Generalizes well to many areas
    • Document management
    • Information Retrieval
    • Gene/protein identification
    • Spam filtering
  • Fairly simple concept
  • Lots of technical challenges
text categorization1
Text Categorization
  • AI::Categorizer (sequel to AI::Categorize) on CPAN
  • Addresses lots of tasks in text categorization
    • Format of documents (XML, text, database, etc.)
    • Support for structured documents (title, body, etc.)
    • Tokenizing of data into words
    • Linguistic stemming
    • Feature selection (1-grams, n-grams, statistically chosen)
    • Vector space modeling (TF/IDF methods)
    • Machine learning algorithm (Naïve Bayes, SVM, DecisionTree, kNN, etc.)
    • Machine learning parameters (different in each algorithm)
    • Hypothesis behavior (best-category only, or all matching categories)
ai categorizer framework1
AI::Categorizer Framework
  • KnowledgeSet embodies a set of documents and categories
ai categorizer framework2
AI::Categorizer Framework
  • Document is a (possibly structured) set of text data, belonging to 1 or more categories
ai categorizer framework3
AI::Categorizer Framework
  • Category is a named set containing 1 or more documents
ai categorizer framework4
AI::Categorizer Framework
  • Collection is a storage medium for document and category information (as text files, in DBI, XML files, etc.)
ai categorizer framework5
AI::Categorizer Framework
  • Feature Vector maps features (words) to weights (counts)
ai categorizer framework6
AI::Categorizer Framework
  • Learner is a ML algorithm class (Naïve Bayes, kNN, Decision Tree, etc.)
ai categorizer framework7
AI::Categorizer Framework
  • Hypothesis is the learner’s “best guess” about document categories
ai categorizer framework8
AI::Categorizer Framework
  • Experiment collects and analyzes hypotheses
using ai categorizer
Using AI::Categorizer
  • Highest-level interface

use AI::Categorizer;

my $c = new AI::Categorizer(...parameters...);

# Run a complete experiment - training on a

# corpus, testing on a test set, printing a

# summary of results to STDOUT

$c->run_experiment;

using ai categorizer1
Using AI::Categorizer
  • More detailed:

use AI::Categorizer;

my $c = new AI::Categorizer(...parameters...);

# Run the separate parts of $c->run_experiment

$c->scan_features;

$c->read_training_set;

$c->train;

$c->evaluate_test_set;

print $c->stats_table;

using ai categorizer2
Using AI::Categorizer
  • In an application:

# After training, use learner for categorizing

my $l = $c->learner;

while (...) {

my $d = ...create a document...

my $h = $l->categorize($d);

print "Best category: ", $h->best_category;

}

using ai categorizer3
Using AI::Categorizer
  • Uses the Class::Container package, so all parameters can go to the top-level object constructor:

my $c = new AI::Categorizer

(save_progress => 'my_progress',

data_root => 'my_data',

features_kept => 10_000,

threshold => 0.1,

);

using ai categorizer4
Using AI::Categorizer
  • Uses the Class::Container package, so all parameters can go to the top-level object constructor:

my $c = new AI::Categorizer

(save_progress => 'my_progress',

data_root => 'my_data',

features_kept => 10_000,

threshold => 0.1,

);

(AI::Categorizer needn’t know about these, it’s transparent)

To Categorizer

To KnowledgeSet

To Learner

na ve bayes categorization
Naïve Bayes Categorization
  • Simple, fast machine learning technique
  • Let c1…m represent all categories, and w1…n represent the words of a given document

Above term is computationally infeasible - data is too sparse

na ve bayes categorization1
Naïve Bayes Categorization
  • Apply Bayes’ Theorem
na ve bayes categorization2
Naïve Bayes Categorization
  • The quantities p(ci) and p(wj|ci) can be calculated from training set
  • p(ci) is fraction of training set belonging to category ci
  • p(wj|ci) is fraction of words in ci that are wj
  • Must deal with unseen words, we don’t want any p(wj|ci) to be zero
  • Typically we pretend unseen words have been seen 0.5 times, or use some similar strategy
na ve bayes sample run
Naïve Bayes Sample Run

ken> perl eg/run_experiment.pl [options]

references
References
  • Ken Williams: ken@mathforum.org or kenw@ee.usyd.edu.au
  • Perl-AI list: perl-ai@perl.org
  • AI::Categorizer, AI::DecisionTree, Statistics::Contingency, Inline::C, PDL, Storable all on CPAN
  • libbow: http://www.cs.cmu.edu/~mccallum/bow
  • Machine Learning, Tom Mitchell. McGraw-Hill, 414pp, 1997
  • Foundations of Natural Language Processing, Christopher D. Manning & Hinrich Schütze. MIT Press, 680 pp., 1999
extras time permitting
Extras, time permitting
  • AI::Categorizer parameters by class
  • AI::DecisionTree example
  • PDL::Sparse walkthrough
  • AI::NodeLib (incomplete implementation)