Machine Learning ICS 178

1 / 15

# Machine Learning ICS 178 - PowerPoint PPT Presentation

Machine Learning ICS 178. Instructor: Max Welling. What is Expected?. Class Homework/Projects (40%) Midterm (20%) Final (40%). For the projects, students should make teams. This class needs your active participation: please ask questions and

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Machine Learning ICS 178' - arista

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Machine LearningICS 178

Instructor: Max Welling

What is Expected?
• Class
• Homework/Projects (40%)
• Midterm (20%)
• Final (40%)
• For the projects, students should make teams.
• participate in discussions (there is no such thing as a dumb question).
Syllabus
• 1: Introduction: overview, examples, goals, probability, conditional independence, matrices, eigenvalue decompositions
• 2: Optimization and Data Visualization: Stochastic gradient descent, coordinate descent, centering, sphering, histograms, scatter-plots.
• 3: Classification I: emprirical Risk Minimization, k-nearest neighbors, decision stumps, decision tree.
• 4: Classification II: random forests, boosting.
• 5: Neural networks: perceptron, logistic regression, multi-layer networks, back-propagation.
• 6: Regression: Least squares regression.
• 7: Clustering: k-means, single linkage, agglomorative clustering, MDL penalty.
• 8: Dimesionality reduction: principal components analysis, Fisher linear discriminant analysis.
• 9: Reinforcement learning: MDPs, TD- and Q-learning, value iteration.
• 10: Bayesian methods: Bayes rule, generative models, naive Bayes classifier.
Machine Learningaccording to
• The ability of a machine to improve its performance based on previous results.
• The process by which computer systems can be directed to improve their
• performance over time. Examples are neural networks and genetic algorithms.
• Subspecialty of artificial intelligence concerned with developing methods for software
• to learn from experience or extract knowledge from examples in a database.
• The ability of a program to learn from experience —
• that is, to modify its execution on the basis of newly acquired information.
• Machine learning is an area of artificial intelligence concerned with the
• development of techniques which allow computers to "learn".
• More specifically, machine learning is a method for creating computer
• programs by the analysis of data sets. Machine learning overlaps heavily
• with statistics, since both fields study the analysis of data, but unlike statistics,
• machine learning is concerned with the algorithmic complexity of computational implementations. ...
Some Examples
• ZIP code recognition
• Loan application classification
• Signature recognition
• Voice recognition over phone
• Credit card fraud detection
• Spam filter
• Suggesting other products at Amazone.com
• Marketing
• Stock market prediction
• Expert level chess and checkers systems
• biometric identification (fingerprints, DNA, iris scan, face)
• machine translation
• web-search
• document & information retrieval
• camera surveillance
• robosoccer
• and so on and so on...
Can Computers play Humans at Chess?
• Chess Playing is a classic AI problem
• well-defined problem
• very complex: difficult for humans to play well
• Conclusion: YES: today’s computers can beat even the best human

Garry Kasparov (current World Champion)

Deep Blue

Deep Thought

Points Ratings

2005 DARPA Grand Challenge

The Grand Challenge is an off-road robot competition devised by DARPA (Defense Advanced Research Projects Agency) to promote research in the area of autonomous vehicles. The challenge consists of building a robot capable of navigating 175 miles throughdesert terrain in less than 10 hours, with no human intervention.

http://www.grandchallenge.org/

2007 Darpa Challenge

http://www.darpa.mil/grandchallenge/overview.asp

Netflix Challenge

• Netflix awards \$1M for the person who improves their system by 10%.
• The relevant machine learning problem goes under then name:
• “user recommendation system” or “collaborative filtering”.
• When you shop online at Amazon.com they recommend books based on
• what links you are clicking.
• For netflix the relevant problem is predicting movie-rating values for users.

total of +/- 400,000,000 nonzero entries

(99% sparse)

movies (+/- 17,770)

users (+/- 240,000)

source:

http://www.netflixprize.com/community/viewtopic.php?id=103

Netflix Challenge

# movies

# movies with that mean

# ratings

mean movie rating value

# users with that mean

# users

# ratings

mean user rating value

• The user-movie matrix has many missing entries: Joe did not happen to rate “ET”.
• Netflix wants to recommend unseen movies to users based on movies he/she
• has seen (and rated!) in the past.
• To recommend movies we are being asked to fill in the missing entries for Joe
• with predicted ratings and pick the movies with the highest predicted ratings.
• Where does the information come from?
• Say we want to predict the rating for Joe and ET.
• I: Mary has rated all movies that Joe has seen in the past very similarly.
• She has also seen ET and rated it with a 5. What would you predict for Joe?
• II: StarTrek that has obtained very similar ratings as ET from all users.
• StarTrek was rated 4 by Joe. What would you predict for ET?
• You will team up with 1 or more partners and implement algorithms that we
• discuss in class on the netflix problem.
• Our goal is to get high up on the leaderboard
• This involves both trying out various learning techniques (machine learning)
• as well as dealing with the large size of the data (data mining).
• Towards the end we will combine all our algorithms to get a final score.
• Every class (starting next week) we will have a presentation by 1 team to report
• on their progress and to share experience.
Text Data
• Text corpora are widely available in digital form these days (scanned journals,
• scanned newspapers, blogs,...).
• We can mine this text and discover interesting patterns: what topics are present
• in this article, what is the most similar/relevant article/webpage in the corpus.
• Here the data has a very similar format:

99% sparse

word-tokens (+/- 20,000)

documents (up to 1000,000)

Text Data
• Each document is represented as a count vector for each of the words in the
• vocabulary: [20,5,3,0,1,0,2,0,0,0,5,0,...].
• So, in the article the word “president” appeared 5 times (can you guess a topic?).
• Now, we don’t want to fill in missing entries (sparse means “0”, not missing).
• Our task is to find for instance which documents are most similar
• (document retrieval).
• Many more data matrices have the same format: for instance gene-expression
• data is a matrix of genes vs. experiments where the values represent the
• “activity level” of the gene in that experiment. Can we identify diseases?

“president”

“the”

Why is this cool/important?
• Modern technologies generate data at an unprecedented scale.
• The amount of data doubles every year.
• “One petabyte is equivalent to the text in one billion books,
• yet many scientific instruments, including the Large Synoptic Survey Telescope,
• will soon be generating several petabytes annually”.
• (2020 Computing: Science in an exponential world:Nature Published online: 22 March 2006)
• Computers dominate our daily lives
• Science, industry, army, our social interactions etc.
• We can no longer “eyeball” the images captured by some satellite
• for interesting events, or check every webpage for some topic.
• We need to trust computers to do the work for us.