CS 657/790 Machine Learning and Data Mining Course Introduction

Download Presentation

CS 657/790 Machine Learning and Data Mining Course Introduction

Loading in 2 Seconds...

- 76 Views
- Uploaded on
- Presentation posted in: General

CS 657/790 Machine Learning and Data Mining Course Introduction

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CS 657/790 Machine Learning andData MiningCourse Introduction

- Please hand in sheet of paper with:
- Your name and email address
- Your classification (eg, 2nd year computer science PhD student)
- Your experience with MATLAB (none, some or much)
- Your undergraduate degree (when, what, where)
- Your AI experience (courses at UWM or elsewhere)
- Your programming experience

- Course Instructor: Joe Bockhorst
- email: joebock@uwm.edu
- office: 1155 EMS
- Course webpage: http://www.uwm.edu/~joebock/790.html
- office hours: ???
- Possible times:
- before class on Monday (3:30-5:30)
- Monday morning
- Wednesday morning
- after class Monday (7:00-9:00)

- Possible times:

- Machine Learning (Tom Mitchell)
- Bookstore in union, $140 new
- Amazon.com hard cover: $125 new , $80 used
- Amazon.com soft cover: < $30

- Read (posted on class web page)
- Preface
- Chapter 1
- Sections 6.1, 6.2, 6.9, 6.10
- Sections 8.1, 8.2

- Powerpoint encourages words over pictures (not good)
- But powerpoint can be saved, tweaked, easily shared, …
- Notes posted on course website following lecture

- Your thoughts?

- Slides are a combination of
- Jude Shavlik’s notes from UW-Madison machine learning course (Prof. I had)
- Textbook Slides (Google “machine learning textbook”)
- My notes

- Is there one?

- 1st half covers supervised learning
- Algorithms: support vector machines, neural networks, probabilistic models …
- Methodology

- 2nd half covers graphical probability models
- Powerful statistical models very useful for learning in complex and/or noisy settings

- Primarily algorithmic & experimental
- Some theory, both mathematical & conceptual (much on statistics)
- "Hands on" experience, interactive lectures/discussions
- Broad survey of many ML subfields
- "symbolic" (rules, decision trees)
- "connectionist" (neural nets)
- Support Vector Machines
- statistical ("Bayes rule")
- genetic algorithms (if time)

- to understand what a learning system should do
- to understand how (and how well) existing systems work

- Programming
- Data structures and algorithms
- CS 535

- Data structures and algorithms
- Math
- Calculus (partial derivatives)
- Simple probability & statistics

- Why MATLAB?
- Fast prototyping
- Integrated plotting
- Widely used in academia (industry too?)
- Will save you time in the long run

- Why not MATLAB?
- Proprietary software
- Harder to work from home

- Optional Assignment: familiarize yourself with MATLAB, use MATLAB help system

- E256, E280, E285, E384, E270
- All have MATLAB installed under Windows XP

- Bi-weekly programming plus perhaps some “paper & pencil” homework
- "hands on" experience valuable
- HW0 – build a dataset
- HW1 & HW2 supervised learning algorithms
- HW3 & HW4 graphical probability models

- Midterm exam (after about 8-10 weeks)
- Final exam
- Find project of your choosing
- during last 4-5 weeks of class

HW's25%

Project20%

Midterm20%

Final30%

Quality Discussion 5%

- HW's due @ 4pm
- you have 5 late days to use over the semester
- (Fri 4pm → Mon 4pm is 1 late "day")

- SAVE UP late days!
- extensions only for extreme cases

- Penalty points after late days exhausted
- 10% per day

- Can't be more than one week late

- Machine Learning: computer algorithms that improve automatically through experience [Mitchell].
- Data Mining: Extracting knowledge from large amounts of data. [Han & Kamber] (synonym: knowledge discovery in databases (KDD))

Supervised learning, decision trees, neural nets,

Bayesian networks, k-nearest neighbor, genetic algorithms, unsupervised learning (clustering in DM jargon),…

reinforcement learning, learning theory, evaluating learning systems, using domain knowledge, inductive logic programming, …

Data Warehouse,

OLAP, query languages, association rules, presentation, …

ML

DM

We’ll try to cover topics in red

- Learning = improving with experience
- Example: learn to play checkers

- Improve over task T,
- with respect to performance measure P,
- based on experience E

- T: Play Checkers
- P: % of games won
- E: games played against self

- T: find genes in DNA sequences
- ACGTGCATGTGTGAACGTGTGGGTCTGATGATGT…

- P: % of genes found
- E: experimentally verified genes

* Prediction of Complete Gene Structures in Human Genomic DNA,

Burge & Carlin J. Molecular Biology, 1997, 268 78-94

- T: drive vehicle
- P: reach destination
- E: machine observation of human driver

Stanford team won 2005 driverless vehicle race

across Mojave Desert

“The robot's software system relied predominately on state-of-the-art AI technologies, such as machine learning and probabilistic

reasoning.”

[Winning the DARPA Grand Challenge, Thrun et al., Journal of Field Robotics, 2006]

- Data is plentiful
- Retail, video, images, speech, text, DNA, bio-medical measurements, …

- Computational power is available
- Budding Industry
- ML has great applications
- ML still relatively immature

- Think about this
- will need to create it by week after next

- Google to find:
- UCI archive (or UCI KDD archive)
- UCI ML archive (UCI machine learning repository)

- Step 1: Choose a Boolean (true/false) concept
- Subjective Judgement
- Books I like/dislike
- Movies I like/dislike
- Web pages I like/dislike

- “Time will tell” concepts
- Stocks to buy
- Medical outcomes

- Sensory interpretation
- Face recognition (See text)
- Handwritten digit recognition
- Sound recognition

- Subjective Judgement

- Step 2: Choosing a feature Space
- We will use fixed-length feature vectors
- Choose N features
- Each feature has Vipossible values
- Each example is represented by a vector of N feature values
(i.e., is a point in the feature space)

e.g.: <red, 50, round>

colorweight shape

- Feature Types
- Boolean
- Nominal
- Ordered
- Hierarchical

- We will use fixed-length feature vectors
- Step 3: Collect examples (“I/O” pairs)

Defines a space

In HW0 we will use a subset

(see next slide)

closed

polygon

continuous

square

triangle

circle

ellipse

- Nominal
- No relationship among possible values
e.g., color є {red, blue, green} (vs. color = 1000 Hertz)

- No relationship among possible values
- Linear (or Ordered)
- Possible values of the feature are totally ordered
e.g., size є{small, medium, large}←discrete

weight є [0…500] ←continuous

- Possible values of the feature are totally ordered
- Hierarchical
- Possible values are partiallyordered in an ISA hierarchy
e.g. for shape->

- Possible values are partiallyordered in an ISA hierarchy

Product

Pct

Foods

Tea

99 Product

Classes

2302 Product

Subclasses

Dried

Cat Food

Canned

Cat Food

Friskies

Liver, 250g

~30k

Products

- Structure of one feature!
- “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.”
- - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001

* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers

- Discrete
- tokens (char strings, w/o quote marks and spaces)

- Continuous
- numbers (int’s or float’s)
- If only a few possible values (e.g., 0 & 1) use discrete

- i.e., merge nominal and discrete-ordered
(or convert discrete-ordered into 1,2,…)

- We will ignore hierarchy info and
only use the leaf values (it is rare any way)

- numbers (int’s or float’s)

- Creating a dataset of
- HW0 out on-line
- Due next Monday

fixed length feature vectors

Digitized

camera image

Learned

Function

Steering

Angle

age = 13

sex = M wgt = 18

Learned

Function

ill

vs

healthy

- Car Steering (Pomerleau)
- Medical Diagnosis (Quinlan)
- DNA Categorization
- TV-pilot rating
- Chemical-plant control
- Back gammon playing
- WWW page scoring
- Credit application scoring

Medical

record

- Choose a dataset
- based on interest/familiarity
- meets basic requirements
- >1000 examples
- category (function) learned should be binary valued
- ~500 examples labeled class A,
other 500 labeled class B

→ Internet Movie Database (IMD)

- IMD has a lot of data that are not discrete or continuous or binary-valued for target function (category)

Name

Country

List of movies

Name

Year of birth

Gender

Oscar nominations

List of movies

Studio

Actor

Name

Year of birth

List of movies

Director/

Producer

Made

Directed

Acted in

Produced

Movie

Title, Genre, Year, Opening Wkend BO receipts,

List of actors/actresses, Release season

- Choose a boolean or binary-valued target function (category)
- Opening weekend box office receipts > $2 million
- Movie is drama? (action, sci-fi,…)
- Movies I like/dislike (e.g. Tivo)

- How to transfer available attributes:
Other example attributes (select predictive features)

- Movie
- Average age of actors
- Number of producers
- Percent female actors

- Studio
- Number of movies made
- Average movie gross
- Percent movies released in US

- Movie

- Director/Producer
- Years of experience
- Most prevalent genre
- Number of award winning movies
- Average movie gross

- Actor
- Gender
- Has previous Oscar award or nominations
- Most prevalent genre

David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:

- Opening weekend box office receipts > $2 million
- 25 attributes
- Accuracy = 83.3%
- Default accuracy = 56%

- Movie is drama?
- 12 attributes
- Accuracy = 71.9%
- Default accuracy = 51%

- http://kdl.cs.umass.edu/proximity/about.html

Learning denotes changes in the system that

… enable the system to do the same task …

more effectively the next time.

- Herbert Simon

Learning is making useful changes in our minds.

- Marvin Minsky

Not in Mitchell’s textbook (will spend 0-2 lectures on this – but also in CS776)

- Inducing Functions from I/O Pairs
- Decision trees (e.g., Quinlan’s C4.5 [1993])
- Connectionism / neural networks (e.g., backprop)
- Nearest-neighbor methods
- Genetic algorithms
- SVM’s

- Learning without a Teacher
- Conceptual clustering
- Self-organizing systems
- Discovery systems

Will be covered briefly

- Improving a Multi-Step Problem Solver
- Explanation-based learning
- Reinforcement learning

- Using Preexisting Domain Knowledge Inductively
- Analogical learning
- Case-based reasoning
- Inductive/explanatory hybrids