1 / 15

# slides Lec 1 - PowerPoint PPT Presentation

Machine Learning ICS 178. Instructor: Max Welling . What is Expected?. Class Homework/Projects (40%) Midterm (20%) Final (40%). For the projects, students should make teams. This class needs your active participation: please ask questions and

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'slides Lec 1' - LeeJohn

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Machine LearningICS 178

Instructor: Max Welling

• Class

• Homework/Projects (40%)

• Midterm (20%)

• Final (40%)

• For the projects, students should make teams.

• participate in discussions (there is no such thing as a dumb question).

• 1: Introduction: overview, examples, goals, probability, conditional independence, matrices, eigenvalue decompositions

• 2: Optimization and Data Visualization: Stochastic gradient descent, coordinate descent, centering, sphering, histograms, scatter-plots.

• 3: Classification I: emprirical Risk Minimization, k-nearest neighbors, decision stumps, decision tree.

• 4: Classification II: random forests, boosting.

• 5: Neural networks: perceptron, logistic regression, multi-layer networks, back-propagation.

• 6: Regression: Least squares regression.

• 7: Clustering: k-means, single linkage, agglomorative clustering, MDL penalty.

• 8: Dimesionality reduction: principal components analysis, Fisher linear discriminant analysis.

• 9: Reinforcement learning: MDPs, TD- and Q-learning, value iteration.

• 10: Bayesian methods: Bayes rule, generative models, naive Bayes classifier.

Machine Learningaccording to

• The ability of a machine to improve its performance based on previous results.

• The process by which computer systems can be directed to improve their

• performance over time. Examples are neural networks and genetic algorithms.

• Subspecialty of artificial intelligence concerned with developing methods for software

• to learn from experience or extract knowledge from examples in a database.

• The ability of a program to learn from experience —

• that is, to modify its execution on the basis of newly acquired information.

• Machine learning is an area of artificial intelligence concerned with the

• development of techniques which allow computers to "learn".

• More specifically, machine learning is a method for creating computer

• programs by the analysis of data sets. Machine learning overlaps heavily

• with statistics, since both fields study the analysis of data, but unlike statistics,

• machine learning is concerned with the algorithmic complexity of computational implementations. ...

• ZIP code recognition

• Loan application classification

• Signature recognition

• Voice recognition over phone

• Credit card fraud detection

• Spam filter

• Suggesting other products at Amazone.com

• Marketing

• Stock market prediction

• Expert level chess and checkers systems

• biometric identification (fingerprints, DNA, iris scan, face)

• machine translation

• web-search

• document & information retrieval

• camera surveillance

• robosoccer

• and so on and so on...

• Chess Playing is a classic AI problem

• well-defined problem

• very complex: difficult for humans to play well

• Conclusion: YES: today’s computers can beat even the best human

Garry Kasparov (current World Champion)

Deep Blue

Deep Thought

Points Ratings

The Grand Challenge is an off-road robot competition devised by DARPA (Defense Advanced Research Projects Agency) to promote research in the area of autonomous vehicles. The challenge consists of building a robot capable of navigating 175 miles throughdesert terrain in less than 10 hours, with no human intervention.

http://www.grandchallenge.org/

http://www.darpa.mil/grandchallenge/overview.asp

• Netflix awards \$1M for the person who improves their system by 10%.

• The relevant machine learning problem goes under then name:

• “user recommendation system” or “collaborative filtering”.

• When you shop online at Amazon.com they recommend books based on

• what links you are clicking.

• For netflix the relevant problem is predicting movie-rating values for users.

total of +/- 400,000,000 nonzero entries

(99% sparse)

movies (+/- 17,770)

users (+/- 240,000)

http://www.netflixprize.com/community/viewtopic.php?id=103

Netflix Challenge

# movies

# movies with that mean

# ratings

mean movie rating value

# users with that mean

# users

# ratings

mean user rating value

• The user-movie matrix has many missing entries: Joe did not happen to rate “ET”.

• Netflix wants to recommend unseen movies to users based on movies he/she

• has seen (and rated!) in the past.

• To recommend movies we are being asked to fill in the missing entries for Joe

• with predicted ratings and pick the movies with the highest predicted ratings.

• Where does the information come from?

• Say we want to predict the rating for Joe and ET.

• I: Mary has rated all movies that Joe has seen in the past very similarly.

• She has also seen ET and rated it with a 5. What would you predict for Joe?

• II: StarTrek that has obtained very similar ratings as ET from all users.

• StarTrek was rated 4 by Joe. What would you predict for ET?

• You will team up with 1 or more partners and implement algorithms that we

• discuss in class on the netflix problem.

• Our goal is to get high up on the leaderboard

• This involves both trying out various learning techniques (machine learning)

• as well as dealing with the large size of the data (data mining).

• Towards the end we will combine all our algorithms to get a final score.

• Every class (starting next week) we will have a presentation by 1 team to report

• on their progress and to share experience.

• Text corpora are widely available in digital form these days (scanned journals,

• scanned newspapers, blogs,...).

• We can mine this text and discover interesting patterns: what topics are present

• in this article, what is the most similar/relevant article/webpage in the corpus.

• Here the data has a very similar format:

99% sparse

word-tokens (+/- 20,000)

documents (up to 1000,000)

• Each document is represented as a count vector for each of the words in the

• vocabulary: [20,5,3,0,1,0,2,0,0,0,5,0,...].

• So, in the article the word “president” appeared 5 times (can you guess a topic?).

• Now, we don’t want to fill in missing entries (sparse means “0”, not missing).

• Our task is to find for instance which documents are most similar

• (document retrieval).

• Many more data matrices have the same format: for instance gene-expression

• data is a matrix of genes vs. experiments where the values represent the

• “activity level” of the gene in that experiment. Can we identify diseases?

“president”

“the”

• Modern technologies generate data at an unprecedented scale.

• The amount of data doubles every year.

• “One petabyte is equivalent to the text in one billion books,

• yet many scientific instruments, including the Large Synoptic Survey Telescope,

• will soon be generating several petabytes annually”.

• (2020 Computing: Science in an exponential world:Nature Published online: 22 March 2006)

• Computers dominate our daily lives

• Science, industry, army, our social interactions etc.

• We can no longer “eyeball” the images captured by some satellite

• for interesting events, or check every webpage for some topic.

• We need to trust computers to do the work for us.