- 359 Views
- Uploaded on
- Presentation posted in: General

Machine Learning

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Machine Learning

UsmanRoshan

Dept. of Computer Science

NJIT

- “Machine learning is programming computers to optimize a performance criterion using example data or past experience.” Intro to Machine Learning, Alpaydin, 2010
- Examples:
- Facial recognition
- Digit recognition
- Molecular classification

- 1946: First computer called ENIAC to perform numerical computations
- 1950: Alan Turing proposes the Turing test. Can machines think?
- 1952: First game playing program for checkers by Arthur Samuel at IBM. Knowledge based systems such as ELIZA and MYCIN.
- 1957: Perceptron developed by Frank Roseblatt. Can be combined to form a neural network.
- Early 1990’s: Statistical learning theory. Emphasize learning from data instead of rule-based inference.
- Current status: Used widely in industry, combination of various approaches but data-driven is prevalent.

- Problem: Recognize images representing digits 0 through 9
- Input: High dimensional vectors representing images
- Output: 0 through 9 indicating the digit the image represents
- Learning: Build a model from “training data”
- Predict “test data” with model

- We assume that the data is represented by a set of vectors each of fixed dimensionality.
- Vector: a set of ordered numbers
- We may refer to each vector as a datapointand each dimension as a feature
- Example:
- A bank wishes to classify humans as risky or safe for loan
- Each human is a datapoint and represented by a vector
- Features may be age, income, mortage/rent, education, family, current loans, and so on

- Data
- NIPS 2003 feature selection contest
- mldata.org
- UCI machine learning repository

- Contests
- Kaggle

- Software
- Python sci-kit
- R
- Your own code

- Not required but highly recommended for beginners
- Introduction to Machine Learning by Ethem Alpaydin (2nd edition, 2010, MIT Press). Written by computer scientist and material is accessible with basic probability and linear algebra background
- Applied predictive modeling by Kuhn and Johnson (2013, Springer). More recent book focuses on practical modeling.

- Combination of various methods
- Parameter tuning
- Error trade-off vs model complexity

- Data pre-processing
- Normalization
- Standardization

- Feature selection
- Discarding noisy features

- Basic linear algebra and probability
- Vectors
- Dot products
- Eigenvector and eigenvalue

- See Appendix of textbook for probability background
- Mean
- Variance
- Gaussian/Normal distribution

- Implementation of basic classification algorithms with Perl and Python
- Nearest Means
- Naïve Bayes
- K nearest neighbor
- Cross validation scripts

- Experiment with various algorithms on assigned datasets

- Some ideas:
- Experiment with Kaggle and NIPS 2003 feature selection datasets
- Experimental performance study of various machine learning techniques on a given dataset. For example comparison of feature selection methods with a fixed classifier.

- One exam in the mid semester
- Final exam
- What to expect on the exams:
- Basic conceptual understanding of machine learning techniques
- Be able to apply techniques to simple datasets
- Basic runtime and memory requirements
- Simple modifications

- Assignments and project worth 50%
- Exams worth 50%