Machine Learning in Real World: Successes & New Directions

Machine Learning meets the Real World:Successes and new research directions Andrea Pohoreckyj Danyluk Department of Computer Science Williams College, Williamstown, MA October 11, 2002

Data, data everywhere... • Scientific: data collection routinely produces gigabytes of data per day • Telecommunications: AT&T produces 275 million call records • Web: Google handles 70 million searches • Retail: WalMart records 20 million sales transactions

A wealth of information • Scientific data • Detection of oil spills from satellite images • Prediction of molecular bioactivity for drug design • Telecommunications • Fraud detection to distinguish between “bad” and normal usage of cell phones

A wealth of information • Web mining • Characterize killer pages • Retail • Determine better product placement • Direct mail • Predict who is most likely to donate to a charity

Machine learning success(Machine learning is ubiquitous) • Scientific discovery • Detection of oil spills from satellite images • Telecommunications • Diagnosis of problems in the local loop • Printing • Determine causes of banding (printing cylinder problems) • Control • Self-steering vehicles

Why research in machine learning is so good today Research in machine learning benefits from • Abundant data • Interest in fielding new applications • Even more data • Push on limits of our understanding, technology, etc.

Plan for this talk Original • Discuss success stories and failures • Failures help identify new areas of research New plan • One success story in detail • Lesson learned: can identify new areas of research even when we succeed

Induction of decision trees • Not the only (or even the most “hot”) algorithms • Have been used in many contexts • Important for understanding our success story: local-loop network diagnosis

Inductive learning Given a collection of observations of the form (<x>, f<x>) Find g<x> that approximates f<x>

Sample data

Predictive modelI.e., g<x>

Learning objectives • Learn a tree that is correct • Learn a tree that is compact • At every level in the tree, select a test that best differentiates examples of one class from another

TDIDT • If all examples are from the same class • The tree is a leaf with that class name • Else • Pick a test to make • Construct one edge for each possible test outcome • Partition the examples by test outcome • Build subtrees recursively

Which is better?

The Gain Criterion • Measure the information of the collection • Measure the information of each possible split • Choose the split with greatest information gain

Information (Entropy) • Let T be a set of examples • Let C1, C2, …, Cn be class labels • freq(Ci,T) = number of examples in T that belong to class Ci. • |T| = number of examples in T • Select example and announce its class: info = - log2 freq(Ci,T)/|T|

Information (Entropy) • Let T be a set of examples • Info(T) = - (freq(Ci,T)/|T|) (log2 (freq(Ci/|T|)/|T|))

Entropy after a split • Let X be an attribute with n possible values. • Let Tj be the examples that have the value j for attribute X.Average entropy that results from making split on X:infoX(T) =  ( |Ti| / |T| ) * info(Ti),sum over n possible values of X.

Information Gain • Compute infoX(T) for every attribute • Select attribute that maximizes info(T) – infoX(T)

Which is better?

Scrubber (the success story) • Diagnoses problems in the local loop • Problem may be due to trouble in: • Customer premise equipment • Facilities connecting customer to cable • Cable • Central office • Millions of “troubles” reported annually

MAX, 1990 • Acts as Maintenance Administrator (MA) • Sequence of action: • Customer calls • Rep takes information; initiates tests • Trouble report sent to MA • MA puts trouble in dispatch queue for specific type of technician

Scrubber 2 • Performed a task at a later point in the pipeline • Survey dispatch queues to determine whether dispatch appropriate • Dispatch not immediate • Many problems resolved exogenously

Scrubber 3 • Scrubber 2 for new application platform • Centralized knowledge server • Cover twice as large a network

Implementation difficulties • Original expert system shell no longer supported • Knowledge base evolved into opacity • Many tweaks over a decade • Many knowledge engineers • Most not available to work on Scrubber3

Requirements • Level of performance at least as good as prior system • Overall accuracy • False positives and false negatives in range • Comprehensible • For understanding and acceptance by experts

Additional requirements (ours) • Improved performance • Improved extensibility

Phase I: Modeling Scrubber 2 • Applied a decision tree learning algorithm • Input data: • Trouble reports • Scrubber 2 diagnoses

Data 26,000 trouble reports • 40 attributes (1/2 continuous; 1/2 symbolic) • Two classes • Dispatch • Don’t -- I.e., call customer to verify ok

Background knowledge • C4.5 selected • 17 of 40 attributes used

Phase I results • Decision trees with predictive accuracy of .99, with as few as 10,000 examples • Less than two days of work (easy!)

Phase II: Acceptance • Comprehensibility  Readability • Need to observe rationality in learned knwoledge • Original trees on order of 1000 nodes • The simpler the model, the better it can be understood Comprehensibility = Readability + Simplicity + Fidelity

Trading off simplicity and correctness • Pruning nodes sacrifices correctness • Appropriate when comprehensibility an issue • Langley and Schwabacher, 2001 • Note: not pruning to avoid overfitting

Phase II results • Used only two most prominent attributes • New decision trees created • Still fell into acceptable zone

Phase III: Working toward extensibility • Hoped to gain flexibility for • Local modifiability • Additional attribute values • Moved toward probabilistic decision tree • Leaves labeled with probability estimates, not decisions • Stubby trees easy to represent in tabular form

Phase IIIb: More data • Focus on two attributes gave us access to an extensive data set • Many more trouble reports • Abridged (two-attribute) form had not been considered useful earlier

Phase III results • Simple diagnostic model • Greater empirical confidence -- impt due to small disjunct problem • “Big” general rules cover approximately 50% of the data • Remaining 50% covered by small disjuncts

Summarizing the success story • C4.5 applied to induce Scrubber 2 model • Pruned model for comprehensibility/simplicity • Converted new model into probabilistic one • Used newly gained data for additional tuning and confidence • Small(?), simple model in very short time

Lessons can be learned from success Lesson 1: the importance of comprehensibility • Rationality • Readability • Simplicity

Lessons can be learned from success Lesson 2: the need for algorithms to handle small data sets • Creative ways to engineer interesting features from few • Openness to alternative sources of data • Algorithms specifically tuned to handle small data sets Langley has noted this to be an issue of scientific data -- but true for industrial data as well

Lessons can be learned from success Lesson 3: the need to think about systematic error • Locally systematic error only look like noise with enough data • Clearly related to the problem of small data sets • How do our algorithms hold up?

Lessons can be learned from success Lesson 4: the need to think about the future • Learning results put into practice will be modifed and extended • Must new models be learned? • Can improvement be incremental?

Lessons can be learned from success Lesson 5: creative uses of the technology • Learning for the purposes of re-engineering isn’t “standard” • New applications will serve to fuel new research

Further reading and acknowledgements • Carla Brodley et al, American Scientist, Jan./Feb. ‘99 • Pat Langley, various publications • Thanks to Foster Provost and many others at Nynex / Bell Atlantic

Machine Learning in Real World: Successes & New Directions

Machine Learning in Real World: Successes & New Directions

Presentation Transcript

Hadoop and its Real-world Applications

Machine Learning in Real World: CART

Mobile Commerce meets the Real World: Mobile Ticketing

Thesis Proposal

Real World Application of the Heat Machine

RESEARCH ON EDUCATION AND LEARNING (REAL)

Machine Learning Applied in Product Classification

Web Mining: Machine Learning for Web Applications

Directions and Reference points

Cognitive science for machine learning 1: What is cognitive science? Nick Chater

Research in the Real World

When The Big Ten Meets The Big Three, The Successes Add Up

Chapter 8 Machine learning

机器学习 machine learning

Introduction to Machine Learning

Machine Learning Examples

About Machine Learning Examples

Machine Learning and Real-World Applications

Machine Learning Training

machine learning certification training in hyderabad

How to do Machine Learning Online Courses

Machine Learning Algorithms | Machine Learning Tutorial | Data Science Tutorial | Edureka