1 / 25

Math Models for Learning and Discovery

Math Models for Learning and Discovery. Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute. The Learning Problem.

zebulon
Download Presentation

Math Models for Learning and Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Math Models for Learning and Discovery Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute

  2. The Learning Problem The problem of understanding intelligence is said to be the greatest problem in science today and “the” problem for this century – as deciphering the genetic code was for the second half of the last one…the problem of learning represents a gateway to understanding intelligence in man and machines. -- Tomasso Poggio and Steven Smale 2003

  3. What do these problems have in common? • Design and Discovery of Pharmaceuticals • Target Marketing in Business • Diagnosis of Breast Cancer • Discovery of Novel Superconductors • Detection of Anthrax using TZ spectroscopy • Modeling and predicting global trade • RNA Transcription

  4. DRUG TRIVIA (2000 old info) • In USA $25B/yr for R&D of pharmaceuticals (33% clinicals) • Worth their weight in gold • 10-15 years from conception  market for drug • Development cost 0.5B/drug • First-year sales > $1B/drug • 1 drug approved/5000 compounds tested • 1 out of 100 drugs succeeds to market • 19 Alzheimer’s drugs in development • 20,000,000 Americans with Alzheimer by 2050 DDASSL RENSSELAER

  5. Drugs Worth weight in GOLD DDASSL RENSSELAER

  6. TOWARDS TREATING THE HIV EPEDIMIC HIV Reverse-Transcriptase Inhibition modeling: • Have a few Molecules that have been tested: • Can we predict if new molecule will inhibit HIV?

  7. What do we know? • The bioactivities of a small set of molecules • Many Possible Descriptors for each molecules: Molecular Weight Electrostatic Potential Ionization Potential • Can we predict molecules bioactivity?

  8. Database Marketing • Bank has $1.7 billion portfolio of home mortgages. • When customer refinances, they may lose customer. • Questions will a customer refinance? • If so, offer that customer a good deal on refinancing.

  9. What do we know? • For many customers, we know if they refinanced or not. • We know attributes of customer: • Income • Age • Residential Area • Payment History • Can we predict behavior of future customers?

  10. Breast Cancer Diagnosis Fine needle aspirate of breast tumor. Is tumor benign or malignant?

  11. What do we know? • For patients in initial study, we know whether tumor was benign or malignant. • Have a digital image of tumor aspirate. • Know characteristics doctors look at: • Uniformity of cell shape • Uniformity of cell size • Cell Mitosis

  12. What do we know? • For patients in initial study, we know whether tumor was benign or malignant. • Have a digital image of tumor aspirate. • Know characteristics doctors look at: • Uniformity of cell shape • Uniformity of cell size • Cell Mitosis

  13. Superconductivity • Superconductivity is the ability of a material to conduct current with no resistance and extremely low loss. • A few high temperature superconductors have been found. • What other compounds are superconductors?

  14. Applications of Superconductivity: Magnetic Resonance Imaging

  15. Applications of Superconductivity • Maglev Trains

  16. Applications of Superconductivity • Very small and efficient motors • Better power transmission cables • Better cellular phone service Find a cheap high-temperature superconductor and you will get the NOBEL PRIZE.

  17. What do we know? • Many compounds have been tested to see if they are superconductors. • Many descriptors exists for these compounds based on molecular properties.

  18. What do all these problems have in common? Each problem • Can be posed as a “yes” or “no” question. • Has examples known to be of the “yes” type or the “no” type. • Each example has an associated set of descriptors. Learn Classification Function !

  19. Data Mining • Each problem has data. • Our job is to “mine” information from this data. • Information depends on the question asked. • In this case we must produce a predictive yes/no model (a.k.a. a classification model) based on the data.

  20. Mathematical Model • Have data • Construct predictive function f(x)y • Solve mathematical model to find f • Want f to generalize well on future data

  21. Types of Learning Problems • Classification • Regression • Clustering • Ranking

  22. Data Mining • Classification = yes/no models • Start with examples of yes and no. • Associate a set of descriptors with each example. Descriptors must be appropriate for the question you are asking. • Construct a model to split the two sets • Use the model to predict new examples.

  23. Learning Model • What kind of learning task is it? • What sort of f should we use? • Kernel function • What loss function to use? • What regularization function? • How can we solve this learning model? • How well will the model predict new points?

  24. Class information • See course web page http://www.rpi.edu/~bennek/class/mmld/index.htm

  25. Assignment for Friday • Read and be prepared to discuss Chapter 1, Shaw-Taylor and Cristianini • Lecturer: Gautam Kunapuli

More Related