Cs 657 790 machine learning and data mining course introduction
Sponsored Links
This presentation is the property of its rightful owner.
1 / 41

CS 657/790 Machine Learning and Data Mining Course Introduction PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

CS 657/790 Machine Learning and Data Mining Course Introduction. Student Survey. Please hand in sheet of paper with: Your name and email address Your classification (eg, 2 nd year computer science PhD student) Your experience with MATLAB (none, some or much)

Download Presentation

CS 657/790 Machine Learning and Data Mining Course Introduction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


CS 657/790 Machine Learning andData MiningCourse Introduction


Student Survey

  • Please hand in sheet of paper with:

    • Your name and email address

    • Your classification (eg, 2nd year computer science PhD student)

    • Your experience with MATLAB (none, some or much)

    • Your undergraduate degree (when, what, where)

    • Your AI experience (courses at UWM or elsewhere)

    • Your programming experience


Course Information

  • Course Instructor: Joe Bockhorst

    • email: joebock@uwm.edu

    • office: 1155 EMS

    • Course webpage: http://www.uwm.edu/~joebock/790.html

    • office hours: ???

      • Possible times:

        • before class on Monday (3:30-5:30)

        • Monday morning

        • Wednesday morning

        • after class Monday (7:00-9:00)


Textbook & Reading Assignment

  • Machine Learning (Tom Mitchell)

    • Bookstore in union, $140 new

    • Amazon.com hard cover: $125 new , $80 used

    • Amazon.com soft cover: < $30

  • Read (posted on class web page)

    • Preface

    • Chapter 1

    • Sections 6.1, 6.2, 6.9, 6.10

    • Sections 8.1, 8.2


Powerpoint Vs Whiteboard

  • Powerpoint encourages words over pictures (not good)

  • But powerpoint can be saved, tweaked, easily shared, …

    • Notes posted on course website following lecture

  • Your thoughts?


Full Disclosure

  • Slides are a combination of

    • Jude Shavlik’s notes from UW-Madison machine learning course (Prof. I had)

    • Textbook Slides (Google “machine learning textbook”)

    • My notes


Class Email List

  • Is there one?


Course Outline

  • 1st half covers supervised learning

    • Algorithms: support vector machines, neural networks, probabilistic models …

    • Methodology

  • 2nd half covers graphical probability models

    • Powerful statistical models very useful for learning in complex and/or noisy settings


Course "Style"

  • Primarily algorithmic & experimental

  • Some theory, both mathematical & conceptual (much on statistics)

  • "Hands on" experience, interactive lectures/discussions

  • Broad survey of many ML subfields

    • "symbolic" (rules, decision trees)

    • "connectionist" (neural nets)

    • Support Vector Machines

    • statistical ("Bayes rule")

    • genetic algorithms (if time)


Two Major Goals

  • to understand what a learning system should do

  • to understand how (and how well) existing systems work


Background Assumed

  • Programming

    • Data structures and algorithms

      • CS 535

  • Math

    • Calculus (partial derivatives)

    • Simple probability & statistics


Programming Assignments in MATLAB

  • Why MATLAB?

    • Fast prototyping

    • Integrated plotting

    • Widely used in academia (industry too?)

    • Will save you time in the long run

  • Why not MATLAB?

    • Proprietary software

    • Harder to work from home

  • Optional Assignment: familiarize yourself with MATLAB, use MATLAB help system


Student Computer Labs

  • E256, E280, E285, E384, E270

  • All have MATLAB installed under Windows XP


Requirements

  • Bi-weekly programming plus perhaps some “paper & pencil” homework

    • "hands on" experience valuable

    • HW0 – build a dataset

    • HW1 & HW2 supervised learning algorithms

    • HW3 & HW4 graphical probability models

  • Midterm exam (after about 8-10 weeks)

  • Final exam

  • Find project of your choosing

    • during last 4-5 weeks of class


Grading

HW's25%

Project20%

Midterm20%

Final30%

Quality Discussion 5%


Late HW's Policy

  • HW's due @ 4pm

  • you have 5 late days to use over the semester

    • (Fri 4pm → Mon 4pm is 1 late "day")

  • SAVE UP late days!

    • extensions only for extreme cases

  • Penalty points after late days exhausted

    • 10% per day

  • Can't be more than one week late


Machine Learning Vs Data Mining

  • Machine Learning: computer algorithms that improve automatically through experience [Mitchell].

  • Data Mining: Extracting knowledge from large amounts of data. [Han & Kamber] (synonym: knowledge discovery in databases (KDD))


What’s the difference? Topics in ML and DM texts (Mitchell Vs Han & Kamber)

Supervised learning, decision trees, neural nets,

Bayesian networks, k-nearest neighbor, genetic algorithms, unsupervised learning (clustering in DM jargon),…

reinforcement learning, learning theory, evaluating learning systems, using domain knowledge, inductive logic programming, …

Data Warehouse,

OLAP, query languages, association rules, presentation, …

ML

DM

We’ll try to cover topics in red


The learning problem

  • Learning = improving with experience

  • Example: learn to play checkers

  • Improve over task T,

  • with respect to performance measure P,

  • based on experience E

  • T: Play Checkers

  • P: % of games won

  • E: games played against self


Famous Example: Discovering Genes

  • T: find genes in DNA sequences

    • ACGTGCATGTGTGAACGTGTGGGTCTGATGATGT…

  • P: % of genes found

  • E: experimentally verified genes

* Prediction of Complete Gene Structures in Human Genomic DNA,

Burge & Carlin J. Molecular Biology, 1997, 268 78-94


Famous Example 2: Autonomous Vehicles Driving

  • T: drive vehicle

  • P: reach destination

  • E: machine observation of human driver


ML key to winning DARPA Grand Challenge

Stanford team won 2005 driverless vehicle race

across Mojave Desert

“The robot's software system relied predominately on state-of-the-art AI technologies, such as machine learning and probabilistic

reasoning.”

[Winning the DARPA Grand Challenge, Thrun et al., Journal of Field Robotics, 2006]


Why study machine learning (data mining)?

  • Data is plentiful

    • Retail, video, images, speech, text, DNA, bio-medical measurements, …

  • Computational power is available

  • Budding Industry

  • ML has great applications

  • ML still relatively immature


Next Time: HW0 – Create Your Own Dataset

  • Think about this

    • will need to create it by week after next

  • Google to find:

    • UCI archive (or UCI KDD archive)

    • UCI ML archive (UCI machine learning repository)


HW0 – Your “Personal Concept”

  • Step 1: Choose a Boolean (true/false) concept

    • Subjective Judgement

      • Books I like/dislike

      • Movies I like/dislike

      • Web pages I like/dislike

    • “Time will tell” concepts

      • Stocks to buy

      • Medical outcomes

    • Sensory interpretation

      • Face recognition (See text)

      • Handwritten digit recognition

      • Sound recognition


HW0 – Your “Personal Concept”

  • Step 2: Choosing a feature Space

    • We will use fixed-length feature vectors

      • Choose N features

      • Each feature has Vipossible values

      • Each example is represented by a vector of N feature values

        (i.e., is a point in the feature space)

        e.g.: <red, 50, round>

        colorweight shape

    • Feature Types

      • Boolean

      • Nominal

      • Ordered

      • Hierarchical

  • Step 3: Collect examples (“I/O” pairs)

Defines a space

In HW0 we will use a subset

(see next slide)


closed

polygon

continuous

square

triangle

circle

ellipse

Standard Feature Typesfor representing training examples – source of “domain knowledge”

  • Nominal

    • No relationship among possible values

      e.g., color є {red, blue, green} (vs. color = 1000 Hertz)

  • Linear (or Ordered)

    • Possible values of the feature are totally ordered

      e.g., size є{small, medium, large}←discrete

      weight є [0…500] ←continuous

  • Hierarchical

    • Possible values are partiallyordered in an ISA hierarchy

      e.g. for shape->


Product

Pct

Foods

Tea

99 Product

Classes

2302 Product

Subclasses

Dried

Cat Food

Canned

Cat Food

Friskies

Liver, 250g

~30k

Products

Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17)

  • Structure of one feature!

  • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.”

  • - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001

* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers


Our Feature Types(for homeworks)

  • Discrete

    • tokens (char strings, w/o quote marks and spaces)

  • Continuous

    • numbers (int’s or float’s)

      • If only a few possible values (e.g., 0 & 1) use discrete

    • i.e., merge nominal and discrete-ordered

      (or convert discrete-ordered into 1,2,…)

    • We will ignore hierarchy info and

      only use the leaf values (it is rare any way)


Today’sTopics

  • Creating a dataset of

  • HW0 out on-line

    • Due next Monday

fixed length feature vectors


Digitized

camera image

Learned

Function

Steering

Angle

age = 13

sex = M wgt = 18

Learned

Function

ill

vs

healthy

Some Famous Examples

  • Car Steering (Pomerleau)

  • Medical Diagnosis (Quinlan)

  • DNA Categorization

  • TV-pilot rating

  • Chemical-plant control

  • Back gammon playing

  • WWW page scoring

  • Credit application scoring

Medical

record


HW0: Creating your dataset

  • Choose a dataset

    • based on interest/familiarity

    • meets basic requirements

      • >1000 examples

      • category (function) learned should be binary valued

      • ~500 examples labeled class A,

        other 500 labeled class B

        → Internet Movie Database (IMD)


HW0: Creating your dataset

  • IMD has a lot of data that are not discrete or continuous or binary-valued for target function (category)

Name

Country

List of movies

Name

Year of birth

Gender

Oscar nominations

List of movies

Studio

Actor

Name

Year of birth

List of movies

Director/

Producer

Made

Directed

Acted in

Produced

Movie

Title, Genre, Year, Opening Wkend BO receipts,

List of actors/actresses, Release season


HW0: Creating your dataset

  • Choose a boolean or binary-valued target function (category)

    • Opening weekend box office receipts > $2 million

    • Movie is drama? (action, sci-fi,…)

    • Movies I like/dislike (e.g. Tivo)


HW0: Creating your dataset

  • How to transfer available attributes:

    Other example attributes (select predictive features)

    • Movie

      • Average age of actors

      • Number of producers

      • Percent female actors

    • Studio

      • Number of movies made

      • Average movie gross

      • Percent movies released in US


HW0: Creating your dataset

  • Director/Producer

    • Years of experience

    • Most prevalent genre

    • Number of award winning movies

    • Average movie gross

  • Actor

    • Gender

    • Has previous Oscar award or nominations

    • Most prevalent genre


HW0: Creating your dataset

David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:

  • Opening weekend box office receipts > $2 million

    • 25 attributes

    • Accuracy = 83.3%

    • Default accuracy = 56%

  • Movie is drama?

    • 12 attributes

    • Accuracy = 71.9%

    • Default accuracy = 51%

  • http://kdl.cs.umass.edu/proximity/about.html


What Do You Think Machine Learning Means?


What is Learning?

Learning denotes changes in the system that

… enable the system to do the same task …

more effectively the next time.

- Herbert Simon

Learning is making useful changes in our minds.

- Marvin Minsky


Not in Mitchell’s textbook (will spend 0-2 lectures on this – but also in CS776)

Major Paradigms of Machine Learning

  • Inducing Functions from I/O Pairs

    • Decision trees (e.g., Quinlan’s C4.5 [1993])

    • Connectionism / neural networks (e.g., backprop)

    • Nearest-neighbor methods

    • Genetic algorithms

    • SVM’s

  • Learning without a Teacher

    • Conceptual clustering

    • Self-organizing systems

    • Discovery systems


Will be covered briefly

Major Paradigms of Machine Learning

  • Improving a Multi-Step Problem Solver

    • Explanation-based learning

    • Reinforcement learning

  • Using Preexisting Domain Knowledge Inductively

    • Analogical learning

    • Case-based reasoning

    • Inductive/explanatory hybrids


  • Login