pattern analysis
Download
Skip this Video
Download Presentation
Pattern Analysis

Loading in 2 Seconds...

play fullscreen
1 / 26

Pattern Analysis - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Pattern Analysis. Prof. Bennett Math Model of Learning and Discovery 2/14/05 Based on Chapter 1 of Shawe-Taylor and Cristianini. Outline. What is pattern analysis? Illustrate issues via example Pattern definitions Examples of practical tasks Pattern algorithms Summary .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Pattern Analysis' - clifton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pattern analysis

Pattern Analysis

Prof. Bennett

Math Model of Learning and Discovery 2/14/05

Based on Chapter 1 of

Shawe-Taylor and Cristianini

outline
Outline
  • What is pattern analysis?
  • Illustrate issues via example
  • Pattern definitions
  • Examples of practical tasks
  • Pattern algorithms
  • Summary
pattern analysis3
Pattern Analysis
  • The automatic detection of patterns in data from the same source.
  • Make predictions of new data coming from the same source.
  • Data may take many forms:

images, text, records of commercial transactions, genome sequences, family tree

data driven analysis
Data Driven Analysis

Kepler Analyzed Brahe’s Planetary Motion Data

P = Period D = Average Distance from Sun

found regularities
Found “Regularities”
  • Observed P3= D2
  • Developed three laws of planetary motion.
  • Compressible:

Data can be represented by one column

  • Predictable:

Discovering hidden relations allow us to predict other columns.

  • Third Law is exact.
data representation i
Data Representation I
  • Nonlinear Model of D and P
  • Linear Model of
data representation ii
Data Representation II
  • Assume we know plane of orbit, so we can represent positions as (x,y) pairs
  • Also know orbit is ellipse
data representation
Data Representation
  • Pattern is nonlinear function of x,y
  • Pattern is linear function of
  • Linear relationships are easier to find.
set of hypotheses
Set of Hypotheses
  • Hypothesis Ellipse compute
  • Hypothesis Circle compute

UNDERFITS

set of hypotheses10
Set of Hypotheses

Hypothesis any continuous function

OVERFITS!!!

Depends on size of hypothesis class

Use domain knowledge to limit hypotheses

typical pattern analysis
Typical Pattern Analysis
  • Approximate not exact.
  • Data has errors and omissions.
  • Cannot predict graduate school performance from GRE’s and grades alone.
  • Best Representation/Model unknown.
  • Make approximate predictions – need to address how accurate estimates are.
definition exact pattern
Definition: Exact Pattern
  • A general exact pattern, f, for data source S satisfies

for all data x from source S

approximate pattern14
Approximate Pattern
  • A general approximate pattern, f, for data source S satisfies

for all data x from source S

statistical pattern
Statistical Pattern
  • A general statistical pattern, f, for data source S generated iid according to distribution D satisfies

for all data x from source S

two and multiclass classification
Two and Multiclass Classification
  • Example – Character Recognition

two class - is it an A or not?

multiclass – what letter is it ?

regression
Regression
  • Example –Determine drug bioavailability through the intestine. Estimate apparent permeability as assayed via intestinal cell line.
density estimation
Density Estimation
  • Estimate the probability that a particular event occurs, p(x). Use it to detect improbably events like fraud.
principal component analysis
Principal Component Analysis
  • Find a projection of the data that captures the major variance in the data.

Eigenfaces - capture essential qualities of faces to help ID and reduce storage needs.

pattern analysis algorithm
Pattern Analysis Algorithm
  • A Pattern Analysis Algorithm

input = finite set of data from source S

a.k.a. the training set

output = detector function f

or no patterns detected

pattern algorithm issues
Pattern Algorithm Issues
  • Efficiency and Scalability – memory and CPU requirements, large data sets
  • Robustness – find approximate patterns on noisy data
  • Stability - discover genuine patterns, find same problems on different views of the dataset
stability
Stability
  • Generalization –

Find pattern on future data

Pattern may exist by chance for finite sample

Provide statistical guarantee that pattern truly exist with caveat that with small probability that algorithm may have been mislead.

example
Example
  • Observe that for state agency that all 20 babies adopted in last 10 years from country x are girls.
  • Pattern, only girls are available for adoption from that country.
  • With probability p=(0.5)220 could observe data even if chance of girls and boys equally likely.
  • So with chance p, we were mislead.
statistical learning theory
Statistical Learning Theory
  • Produce a pattern based on a finite sample. Provide bounds on the probability that pattern approximately represents a true pattern with some probability.

Probably Approximately Correct

recoding strategy
Recoding Strategy
  • With proper representation, the problem can become easier (linear model works).
  • Develop general purpose linear learning methods.
  • Change recoding using “kernel functions”
key ideas
Key Ideas
  • Patterns are regularities in data from a specified source
  • Algorithm takes finite sample and computes pattern
    • Efficiency, robustness, and stability
  • Representation -- Kernels
  • Strategy = Generic Algorithms + Recoding
  • Many Learning Tasks in this framework
ad