1 / 60

Advanced Algorithms for Biological Data Analysis

Advanced Algorithms for Biological Data Analysis. Center for Bioinformation Technology (CBIT) & Biointelligence Laboratory School of Computer Science and Engineering Seoul National University http://bi.snu.ac.kr/ http://cbit.snu.ac.kr/. Lecture Schedule.

cestrada
Download Presentation

Advanced Algorithms for Biological Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Algorithms for Biological Data Analysis Center for Bioinformation Technology (CBIT) & Biointelligence Laboratory School of Computer Science and Engineering Seoul National University http://bi.snu.ac.kr/ http://cbit.snu.ac.kr/

  2. Lecture Schedule • Day 1: Introduction to Machine Learning • Day 2: Neural Networks • Day 3: Hidden Markov Models • Day 4: Principal Component Analysis • Day 5: Clustering Analysis

  3. Introduction to Machine Learning Algorithms in Bioinformatics Byoung-Tak Zhang Center for Bioinformation Technology (CBIT) & Biointelligence Laboratory School of Computer Science and Engineering Seoul National University E-mail: btzhang@cse.snu.ac.kr http://bi.snu.ac.kr./ http://cbit.snu.ac.kr/

  4. Outline • Part I • Concept of Machine Learning (ML) • Machine Learning Algorithms and Applications • Applications in Bioinformatics • Part II • Version Space Learning • Decision Tree Learning

  5. What is Artificial Intelligence (AI)? • Design and study of computer programs that behave intelligently. • Designing computer programs to make computers smarter. • Study of how to make computers do things at which, at the moment, people are better. • (No satisfactory definition of AI)

  6. Research Areas and Approaches Learning Algorithms Inference Mechanisms Knowledge Representation Intelligent System Architecture Research Intelligent Agents Information Retrieval Electronic Commerce Data Mining Bioinformatics Natural Language Proc. Expert Systems Artificial Intelligence Application Rationalism (Logical) Empiricism (Statistical) Connectionism (Neural) Evolutionary (Genetic) Biological (Molecular) Paradigm

  7. Concept of Machine Learning

  8. Context Computer Science (AI) Cognitive Science Machine Learning Statistics Information Theory

  9. Why Machine Learning? • Recent progress in algorithms and theory • Growing flood of online data • Computational power is available • Budding industry Three niches for machine learning • Data mining: using historical data to improve decisions • Medical records --> medical knowledge • Software applications we can’t program by hand • Autonomous driving • Speech recognition • Self-customizing programs • Newsreader that learns user interests

  10. Brief History of Machine Learning • 1950’s: Samuels checker player • 1960’s: Neural networks, perceptron; pattern recognition; learning in the limit theory; Minsky &Papert. • 1970’s: Symbolic concept induction; Winstons’s arch learner; knowledge acquisition bottleneck; Quinlan’s ID3; Michalski’s AQ and soybean diagnosis results; scientific discovery with BACON; mathematical discovery with AM. • 1980’s: Continued progress on decision-tree and rule learning; Explanation-based learning; speedup learning; utility problem, analogy; resurgence of connectionism (PDP, ANN); Valiant’s PAC learning; experimental evaluation • 1990’s: Data mining; adaptive software agents & IR; reinforcement learning; theory refinement; inductive logic programming; voting, bagging, boosting, and stacking; learning Bayesian networks.

  11. Learning: Definition • Definition • Learning is the improvement of performance in some environment through the acquisition of knowledge resulting from experience in that environment. the improvement of behavior through acquisition of knowledge on some performance task based on partial task experience

  12. A Learning Problem: EnjoySport Sky Temp Humid Wind Water Forecast EnjoySports Sunny Warm Normal Strong Warm Same Yes Sunny Warm High Strong Warm Same Yes Rainy Cold High Strong Warm Change No Sunny Warm High Strong Cool Change Yes What is the general concept?

  13. Possible Uses of Machine Learning configuration and design planning and scheduling data mining and knowledge discovery diagnostic reasoning execution and control language understanding vision and speech

  14. Metaphors and Methods Neurobiology Connectionist Learning Biological Evolution Heuristic Search Genetic Learning Tree / Rule Induction Statistical Inference Memory and Retrieval Probabilistic Induction Case-Based Learning

  15. Learning: Components • Components of a learning system • Performance: accuracy, efficiency, understandability • Environment: external setting to the learner • Knowledge: internal data structure • Experience: perception, action, mental traces • Improvement: desirable change in performance

  16. Learning System Performance problem improve behavior solution Environment Knowledge get knowledge get data acquired knowledge Learning

  17. What is the Learning Problem? • Learning = improving with experience at some task • Improve over task T, • With respect to performance measure P, • Based on experience E. E.g., Learn to play checkers • T: Play checkers • P: % of games won in world tournament • E: opportunity to play against self

  18. Machine Learning: Tasks • Supervised Learning • Estimate an unknown mapping from known input- output pairs • Learn fw from training set D={(x,y)} s.t. • Classification: y is discrete • Regression: y is continuous • Unsupervised Learning • Only input values are provided • Learn fw from D={(x)} s.t. • Compression • Clustering • Reinforcement Learning

  19. Machine Learning: Strategies • Rote learning • Concept learning • Learning from examples • Learning by instruction • Inductive learning • Deductive learning • Explanation-based learning (EBL) • Learning by analogy • Learning by observation

  20. Supervised Learning • Given a sequence of input/output pairs of the form <xi, yi>, where xi is a possible input and yi is the output associated with xi. • Learn a function f that accounts for the examples seen so far, f(xi) = yi for all i, and that makes a good guess for the outputs of the inputs that it has not seen.

  21. Examples of Input-Output Pairs Inputs Task Outputs Recognition Classes that the objects belong to Descriptions of objects Actions or predictions Action Descriptions of situations Yes or No (indicating whether or not the office contains a recycling bin) Descriptions of offices (floor, prof’s office) Janitor robot problem

  22. Classification and Concept Learning • Classification • If the function is discrete valued, then the outputs are called classes • Concept learning • Learned function has only two possible outputs

  23. Unsupervised Learning • Clustering • A clustering algorithm partitions the inputs into a fixed number of subsets or clusters so that inputs in the same cluster are close to one another. • Discovery learning • The objective is to uncover new relations in the data. • Reinforcement learning • Uses a feedback signal (not the target output) that gives the learning program an indication of whether or not what it has learned is correct.

  24. Online and Batch Learning • Batch methods • Process large sets of examples all at once. • Online (incremental) methods • Process examples one at a time.

  25. Machine Learning Algorithms and Applications

  26. Machine Learning Algorithms (1/2) • Symbolic Learning (covered on Day 1) • Version Space Learning • Case-Based Learning • Neural Learning (covered on Day 2) • Multilayer Perceptrons (MLPs) • Self-Organizing Maps (SOMs) • Support Vector Machines (SVMs) • Evolutionary Learning (very briefly explained on Day 1) • Evolution Strategies • Evolutionary Programming • Genetic Algorithms • Genetic Programming

  27. Machine Learning Algorithms (2/2) • Probabilistic Learning (covered on Days 3 and 5) • Bayesian Networks (BNs) • Helmholtz Machines (HMs) • Latent Variable Models (LVMs) • Generative Topographic Mapping (GTM) • Other Machine Learning Methods (partially covered on Days 1 and 4) • Decision Trees (DTs) • Reinforcement Learning (RL) • Boosting Algorithms • Mixture of Experts (ME) • Independent Component Analysis (ICA)

  28. Example Applications of ML (1/2) • Banking & Investment • Credit card fraud • Delinquent accounts • Authorization of purchases • Predict stock market • Health Care • Disease diagnosis • Managing resources • Look for causal relationships between environment and disease • Marketing • Credit card applications • Use past buying habits to predict likelihood of customer purchasing some new product • Textual Data Mining

  29. Example Applications of ML (2/2) • Astronomy • Bioinformatics • Chemistry • Human resources: evaluating job performance • Insurance & Finance • Manufacturing: process control • Signal and image processing • Speech recognition • …

  30. Neural Nets for Handwritten Digit Recognition … … … Pre-processing ? 0 1 2 3 9 0 1 2 3 9 Output units … … … Hidden units … … Input units … Training Test …

  31. ALVINN System: Neural Network Learning to Steer an Autonomous Vehicle

  32. Learning to Navigate a Vehicle by Observing an Human Expert (1/2) • Inputs • The images produces by a camera mounted on the vehicle • Outputs • The actions taken by the human driver to steer the vehicle or adjust its speed. • Result of learning • A function mapping images to control actions

  33. Learning to Navigate a Vehicle by Observing an Human Expert (2/2)

  34. Data Recorrection by a Hopfield Network corrupted input data original target data Recorrected data after 20 iterations Recorrected data after 10 iterations Fully recorrected data after 35 iterations

  35. Predicting the Sunspot Number with Neural Networks

  36. ANN for Face Recognition 960 x 3 x 4 network is trained on gray-level images of faces to predict whether a person is looking to their left, right, ahead, or up.

  37. Transformation & reduction Selection & Sampling Preprocessing & Cleaning Interpretation/ Evaluation Data Mining -- -- -- -- -- -- -- -- -- Database/data warehouse Target data Cleaned data Transformed data Patterns/ model Knowledge Performance system Data Mining

  38. Customer Relationship Management (CRM) • Increased Customer Lifetime Value • Increased Wallet Share • Improved Customer Retention • Segmentation of Customers by Profitability • Segmentation of Customers by Risk of Default • Integrating Data Mining into the Full Marketing Proce

  39. Hot Water Flashing Nozzle with Evolutionary Algorithms Hans-Paul Schwefel performed the original experiments Start Hot water entering Steam and droplet at exit At throat: Mach 1 and onset of flashing

  40. Case-Based Reasoning (Aamodt & Plaza, 1994) Input New Problem 1. Retrieve Case Base Learned Case Retrived Cases General Knowledge 4. Retain 2. Reuse Retrived Solution Retrived Solution Output 3. Revise

  41. Machine Learning Applications in Bioinformatics

  42. Bioinformatics • What is a Bioinformatics? Bioinformatics is a new term referring to the discipline that employs computers to store, retrieve, analyze and assist in understanding biological information. • The application of information technology and computer science to the study of biological systems. • The analysis of the massive (and constantly increasing) amount of genetic information • Sophisticated computer technologies to enable discovery in all fields of life sciences.

  43. Sequence analysis • Sequence alignment • Structure and function prediction • Gene finding • Structure analysis • Protein structure comparison • Protein structure prediction • RNA structure modeling • Expression analysis • Gen expression analysis • Gene clustering • Pathway analysis • Metabolic pathway • Regulatory networks Problems in Bioinformatics

  44. Applications of Bioinformatics • Drug design • Identification of genetic risk factors • Gene therapy • Genetic modification of food crops and animals • Forensics • Biological warfare • Personalized Medicine • E-Doctor

  45. knowledge knowledge Drug Development Pharmacology Ecology Machine Learning and Bioinformatics Machine learning Bio DB Medical therapy research

  46. Machine Learning Techniques for Bio Data Mining • Sequence Alignment • Simulated Annealing • Genetic Algorithms • Structure and Function Prediction • Hidden Markov Models • Multilayer Perceptrons • Decision Trees • Molecular Clustering and Classification • Support Vector Machines • Nearest Neighbor Algorithms • Expression (DNA Chip Data) Analysis • Self-Organizing Maps • Bayesian Networks

  47. Structure and Function Prediction Protein structure prediction Protein modeling Gene finding and gene prediction

  48. Effect and Applications of Biological Data Mining Biocomputing Increase and Improvement of Farm Products Renewable Energy Biological Data Mining store, retrieve, analyze and assist in understanding biological information Diagnosis with Chip SNP (Single Nucleotide Polymorphism) Customized Drug

More Related