Cs 657 790 machine learning and data mining course introduction
1 / 41

CS 657/790 Machine Learning and Data Mining Course Introduction - PowerPoint PPT Presentation

  • Uploaded on

CS 657/790 Machine Learning and Data Mining Course Introduction. Student Survey. Please hand in sheet of paper with: Your name and email address Your classification (eg, 2 nd year computer science PhD student) Your experience with MATLAB (none, some or much)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CS 657/790 Machine Learning and Data Mining Course Introduction' - gaurav

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Cs 657 790 machine learning and data mining course introduction

CS 657/790 Machine Learning andData MiningCourse Introduction

Student survey
Student Survey

  • Please hand in sheet of paper with:

    • Your name and email address

    • Your classification (eg, 2nd year computer science PhD student)

    • Your experience with MATLAB (none, some or much)

    • Your undergraduate degree (when, what, where)

    • Your AI experience (courses at UWM or elsewhere)

    • Your programming experience

Course information
Course Information

  • Course Instructor: Joe Bockhorst

    • email: joebock@uwm.edu

    • office: 1155 EMS

    • Course webpage: http://www.uwm.edu/~joebock/790.html

    • office hours: ???

      • Possible times:

        • before class on Monday (3:30-5:30)

        • Monday morning

        • Wednesday morning

        • after class Monday (7:00-9:00)

Textbook reading assignment
Textbook & Reading Assignment

  • Machine Learning (Tom Mitchell)

    • Bookstore in union, $140 new

    • Amazon.com hard cover: $125 new , $80 used

    • Amazon.com soft cover: < $30

  • Read (posted on class web page)

    • Preface

    • Chapter 1

    • Sections 6.1, 6.2, 6.9, 6.10

    • Sections 8.1, 8.2

Powerpoint vs whiteboard
Powerpoint Vs Whiteboard

  • Powerpoint encourages words over pictures (not good)

  • But powerpoint can be saved, tweaked, easily shared, …

    • Notes posted on course website following lecture

  • Your thoughts?

Full disclosure
Full Disclosure

  • Slides are a combination of

    • Jude Shavlik’s notes from UW-Madison machine learning course (Prof. I had)

    • Textbook Slides (Google “machine learning textbook”)

    • My notes

Class email list
Class Email List

  • Is there one?

Course outline
Course Outline

  • 1st half covers supervised learning

    • Algorithms: support vector machines, neural networks, probabilistic models …

    • Methodology

  • 2nd half covers graphical probability models

    • Powerful statistical models very useful for learning in complex and/or noisy settings

Course style
Course "Style"

  • Primarily algorithmic & experimental

  • Some theory, both mathematical & conceptual (much on statistics)

  • "Hands on" experience, interactive lectures/discussions

  • Broad survey of many ML subfields

    • "symbolic" (rules, decision trees)

    • "connectionist" (neural nets)

    • Support Vector Machines

    • statistical ("Bayes rule")

    • genetic algorithms (if time)

Two major goals
Two Major Goals

  • to understand what a learning system should do

  • to understand how (and how well) existing systems work

Background assumed
Background Assumed

  • Programming

    • Data structures and algorithms

      • CS 535

  • Math

    • Calculus (partial derivatives)

    • Simple probability & statistics

Programming assignments in matlab
Programming Assignments in MATLAB

  • Why MATLAB?

    • Fast prototyping

    • Integrated plotting

    • Widely used in academia (industry too?)

    • Will save you time in the long run

  • Why not MATLAB?

    • Proprietary software

    • Harder to work from home

  • Optional Assignment: familiarize yourself with MATLAB, use MATLAB help system

Student computer labs
Student Computer Labs

  • E256, E280, E285, E384, E270

  • All have MATLAB installed under Windows XP


  • Bi-weekly programming plus perhaps some “paper & pencil” homework

    • "hands on" experience valuable

    • HW0 – build a dataset

    • HW1 & HW2 supervised learning algorithms

    • HW3 & HW4 graphical probability models

  • Midterm exam (after about 8-10 weeks)

  • Final exam

  • Find project of your choosing

    • during last 4-5 weeks of class


HW's 25%

Project 20%

Midterm 20%

Final 30%

Quality Discussion 5%

Late hw s policy
Late HW's Policy

  • HW's due @ 4pm

  • you have 5 late days to use over the semester

    • (Fri 4pm → Mon 4pm is 1 late "day")

  • SAVE UP late days!

    • extensions only for extreme cases

  • Penalty points after late days exhausted

    • 10% per day

  • Can't be more than one week late

Machine learning vs data mining
Machine Learning Vs Data Mining

  • Machine Learning: computer algorithms that improve automatically through experience [Mitchell].

  • Data Mining: Extracting knowledge from large amounts of data. [Han & Kamber] (synonym: knowledge discovery in databases (KDD))

What s the difference topics in ml and dm texts mitchell vs han kamber
What’s the difference? Topics in ML and DM texts (Mitchell Vs Han & Kamber)

Supervised learning, decision trees, neural nets,

Bayesian networks, k-nearest neighbor, genetic algorithms, unsupervised learning (clustering in DM jargon),…

reinforcement learning, learning theory, evaluating learning systems, using domain knowledge, inductive logic programming, …

Data Warehouse,

OLAP, query languages, association rules, presentation, …



We’ll try to cover topics in red

The learning problem
The learning problem

  • Learning = improving with experience

  • Example: learn to play checkers

  • Improve over task T,

  • with respect to performance measure P,

  • based on experience E

  • T: Play Checkers

  • P: % of games won

  • E: games played against self

Famous example discovering genes
Famous Example: Discovering Genes

  • T: find genes in DNA sequences


  • P: % of genes found

  • E: experimentally verified genes

* Prediction of Complete Gene Structures in Human Genomic DNA,

Burge & Carlin J. Molecular Biology, 1997, 268 78-94

Famous example 2 autonomous vehicles driving
Famous Example 2: Autonomous Vehicles Driving

  • T: drive vehicle

  • P: reach destination

  • E: machine observation of human driver

Ml key to winning darpa grand challenge
ML key to winning DARPA Grand Challenge

Stanford team won 2005 driverless vehicle race

across Mojave Desert

“The robot's software system relied predominately on state-of-the-art AI technologies, such as machine learning and probabilistic


[Winning the DARPA Grand Challenge, Thrun et al., Journal of Field Robotics, 2006]

Why study machine learning data mining
Why study machine learning (data mining)?

  • Data is plentiful

    • Retail, video, images, speech, text, DNA, bio-medical measurements, …

  • Computational power is available

  • Budding Industry

  • ML has great applications

  • ML still relatively immature

Next time hw0 create your own dataset
Next Time: HW0 – Create Your Own Dataset

  • Think about this

    • will need to create it by week after next

  • Google to find:

    • UCI archive (or UCI KDD archive)

    • UCI ML archive (UCI machine learning repository)

Hw0 your personal concept
HW0 – Your “Personal Concept”

  • Step 1: Choose a Boolean (true/false) concept

    • Subjective Judgement

      • Books I like/dislike

      • Movies I like/dislike

      • Web pages I like/dislike

    • “Time will tell” concepts

      • Stocks to buy

      • Medical outcomes

    • Sensory interpretation

      • Face recognition (See text)

      • Handwritten digit recognition

      • Sound recognition

Hw0 your personal concept1
HW0 – Your “Personal Concept”

  • Step 2: Choosing a feature Space

    • We will use fixed-length feature vectors

      • Choose N features

      • Each feature has Vipossible values

      • Each example is represented by a vector of N feature values

        (i.e., is a point in the feature space)

        e.g.: <red, 50, round>

        colorweight shape

    • Feature Types

      • Boolean

      • Nominal

      • Ordered

      • Hierarchical

  • Step 3: Collect examples (“I/O” pairs)

Defines a space

In HW0 we will use a subset

(see next slide)

Standard feature types for representing training examples source of domain knowledge








Standard Feature Typesfor representing training examples – source of “domain knowledge”

  • Nominal

    • No relationship among possible values

      e.g., color є {red, blue, green} (vs. color = 1000 Hertz)

  • Linear (or Ordered)

    • Possible values of the feature are totally ordered

      e.g., size є{small, medium, large}←discrete

      weight є [0…500] ←continuous

  • Hierarchical

    • Possible values are partiallyordered in an ISA hierarchy

      e.g. for shape->

Example hierarchy kdd journal vol 5 no 1 2 2001 page 17





99 Product


2302 Product



Cat Food


Cat Food


Liver, 250g



Example Hierarchy (KDD* Journal, Vol 5, No. 1-2, 2001, page 17)

  • Structure of one feature!

  • “the need to be able to incorporate hierarchical (knowledge about data types) is shown in every paper.”

  • - From eds. Intro to special issue (on applications) of KDD journal, Vol 15, 2001

* Officially, “Data Mining and Knowledge Discovery”, Kluwer Publishers

Our feature types for homeworks
Our Feature Types(for homeworks)

  • Discrete

    • tokens (char strings, w/o quote marks and spaces)

  • Continuous

    • numbers (int’s or float’s)

      • If only a few possible values (e.g., 0 & 1) use discrete

    • i.e., merge nominal and discrete-ordered

      (or convert discrete-ordered into 1,2,…)

    • We will ignore hierarchy info and

      only use the leaf values (it is rare any way)

Today s topics

  • Creating a dataset of

  • HW0 out on-line

    • Due next Monday

fixed length feature vectors

Some famous examples


camera image





age = 13

sex = M wgt = 18






Some Famous Examples

  • Car Steering (Pomerleau)

  • Medical Diagnosis (Quinlan)

  • DNA Categorization

  • TV-pilot rating

  • Chemical-plant control

  • Back gammon playing

  • WWW page scoring

  • Credit application scoring



Hw0 creating your dataset
HW0: Creating your dataset

  • Choose a dataset

    • based on interest/familiarity

    • meets basic requirements

      • >1000 examples

      • category (function) learned should be binary valued

      • ~500 examples labeled class A,

        other 500 labeled class B

        → Internet Movie Database (IMD)

Hw0 creating your dataset1
HW0: Creating your dataset

  • IMD has a lot of data that are not discrete or continuous or binary-valued for target function (category)



List of movies


Year of birth


Oscar nominations

List of movies




Year of birth

List of movies





Acted in



Title, Genre, Year, Opening Wkend BO receipts,

List of actors/actresses, Release season

Hw0 creating your dataset2
HW0: Creating your dataset

  • Choose a boolean or binary-valued target function (category)

    • Opening weekend box office receipts > $2 million

    • Movie is drama? (action, sci-fi,…)

    • Movies I like/dislike (e.g. Tivo)

Hw0 creating your dataset3
HW0: Creating your dataset

  • How to transfer available attributes:

    Other example attributes (select predictive features)

    • Movie

      • Average age of actors

      • Number of producers

      • Percent female actors

    • Studio

      • Number of movies made

      • Average movie gross

      • Percent movies released in US

Hw0 creating your dataset4
HW0: Creating your dataset

  • Director/Producer

    • Years of experience

    • Most prevalent genre

    • Number of award winning movies

    • Average movie gross

  • Actor

    • Gender

    • Has previous Oscar award or nominations

    • Most prevalent genre

Hw0 creating your dataset5
HW0: Creating your dataset

David Jensen’s group at UMass used Naïve Bayes (NB) to predict the following based on attributes they selected and a novel way of sampling from the data:

  • Opening weekend box office receipts > $2 million

    • 25 attributes

    • Accuracy = 83.3%

    • Default accuracy = 56%

  • Movie is drama?

    • 12 attributes

    • Accuracy = 71.9%

    • Default accuracy = 51%

  • http://kdl.cs.umass.edu/proximity/about.html

What do you think machine learning means
What Do You Think Machine Learning Means?

What is learning
What is Learning?

Learning denotes changes in the system that

… enable the system to do the same task …

more effectively the next time.

- Herbert Simon

Learning is making useful changes in our minds.

- Marvin Minsky

Major paradigms of machine learning

Not in Mitchell’s textbook (will spend 0-2 lectures on this – but also in CS776)

Major Paradigms of Machine Learning

  • Inducing Functions from I/O Pairs

    • Decision trees (e.g., Quinlan’s C4.5 [1993])

    • Connectionism / neural networks (e.g., backprop)

    • Nearest-neighbor methods

    • Genetic algorithms

    • SVM’s

  • Learning without a Teacher

    • Conceptual clustering

    • Self-organizing systems

    • Discovery systems

Major paradigms of machine learning1

Will be covered briefly this – but also in CS776)

Major Paradigms of Machine Learning

  • Improving a Multi-Step Problem Solver

    • Explanation-based learning

    • Reinforcement learning

  • Using Preexisting Domain Knowledge Inductively

    • Analogical learning

    • Case-based reasoning

    • Inductive/explanatory hybrids