1 / 36

Introduction to Programming for Biology

Introduction to Programming for Biology. 楊進木. Discussing & Grading. Project Select a direction: Gene finding, DNA Sequence alignment, Cluster Write a program for your problem Schedule 8 th week: the direction and the first progress report 18 th week: final progress report.

avari
Download Presentation

Introduction to Programming for Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Programming for Biology 楊進木

  2. Discussing & Grading • Project • Select a direction: • Gene finding, DNA Sequence alignment, Cluster • Write a program for your problem • Schedule • 8th week: the direction and the first progress report • 18th week: final progress report

  3. The C Language • Currently, the most commonly-used language for embedded systems • “High-level assembly” • Very portable: compilers exist for virtually every processor • Easy-to-understand compilation • Produces efficient code • Fairly concise

  4. The Main Points • Like a high-level assembly language • Array-of-cells model of memory • Very efficient code generation follows from close semantic match • Language lets you do just about everything • Very easy to make mistakes

  5. C History • Developed between 1969 and 1973 along with Unix • Due mostly to Dennis Ritchie • Designed for systems programming • Operating systems • Utility programs • Compilers • Filters

  6. The only way to learn a programming language is by writingprograms in that language

  7. The C Programming System library programs in machine language user-written programs in machine language Text edit Editor Microsoft C Turbo C Borland C user-written programs in C language environment Compiler library Debugger

  8. Hello World in C • Preprocessor used to share information among source files • Clumsy • + Cheaply implemented • + Very flexible #include <stdio.h> void main() { printf(“Hello, world!\n”); }

  9. Hello World in C Program mostly a collection of functions “main” function special: the entry point “void” qualifier indicates function does not return anything #include <stdio.h> void main() { printf(“Hello, world!\n”); } I/O performed by a library function: not included in the language

  10. Complier • Microsoft visual C • Turbo C • Borland C • …

  11. First HW • Write a hello.c • An editor • A complier • Write a program

  12. Software Maintenance • Software requires maintenance • Bug repair • Feature enhancement • The cost of software maintenance constitutes between 80 and 90 percent of the total cost • Software engineering is the discipline of writing programs so that they can be understood and maintained by others • Good programming style requires developing an aesthetic sense

  13. Central Dogmaof Molecular BiologyDNA -> RNA -> Protein -> Phenotype -> DNA Molecules Sequence, Structure, Function Processes Mechanism, Specificity, Regulation Central Paradigmfor BioinformaticsGenomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Phenotype Large Amounts of Information Standardized Statistical What is the Information?Molecular Biology as an Information Science • Information transfer (mRNA) • Protein synthesis (tRNA/mRNA) • Some catalytic activity • Genetic material (idea from D Brutlag, Stanford, graphics from S Strobel)

  14. Automatic pipeline systems for structure identification • BLAST • SAM • MACAW • Sequence alignment • Hidden Markov model • Multiple string alignment Homology model (Comparative model) • Local sequence alignment • Hidden Markov model • Motif identification • Align (super)/secondary Str. • Cluster/Classification no homology • PSI-BLAST • HMMER • PHD (JPRED) Fold Recognition (Threading) no partial homology • Scoring functions • knowledge-base functions • Physical-based function • Search methods • Combining threading techniques ab initio Methods • ROSETTA Output Structures Steps and techniques Tools Methods Genomic Sequence • Genome annotation • Gene prediction • Sequence alignment • GENSCAN • BLAST

  15. Introduction- Amino Acid Cd 2 Cg Cb 1 Ca N • Backbone vs. Side Chain

  16. Amino Acid: side chain Backbone Phe Side chain Cd N Cg Cb Ca C O N Ca Ca H N Ile

  17. Levels of Protein Structure Primary LGINCRGSSQCGLSGGNLMVRIRDQACGNQGQTWCPGERRAKVCGTGNSISAYVQSTNNCISGTEACRHLTNLVNHGCRVCGSDPLYAGNDVSRGQLTVNYVNSC Tertiary Secondary Quaternary

  18. Special Bonds - Hbond • Hbonds • Hbonds {<boolean>} or hbonds <value> • <boolean>: on , off • hbonds on • <value>: 0.1~0.99, 1~500 • hbonds 0.3 • Color hbonds white • Show 2nd structure • 1crn • Restrict backbone • Cartoons • Helix, sheet

  19. Protein-Ligand Docking dream.chun@msa.hinet.net 生語-T9010- EX 001

  20. Docking example –1 (1fe7-VIT)

  21. Bioinformatics Core Gene Finding SNPs, DNA Chip Neural Network HMMs, SVMs 1. DNA Seq. GenBank Dynamic Programming Evolutionary Computation Sequence Alignment BLAST PDB 2. Protein Seq. Structure Alignment Structure prediction SCOP 3.Find Sim. Seq. & Structure DOCK ACD 5. Identify drug MSA Algorithms Database RASMOL 4. Identify Func.

  22. Sequence Alignment- Example Which is best ?

  23. Raw Data ???T C A T G C A T T G 2 matches, 0 gaps T C A T G | |C A T T G 3 matches (2 end gaps) T C A T G . | | | . C A T T G 4 matches, 1 insertion T C A - T G | | | | . C A T T G 4 matches, 1 insertion T C A T - G | | | | . C A T T G Sequence Alignment- Example Which is best ? Modeling: Scoring Function

  24. Internet Resource - Blast

  25. Multiple String Alignment

  26. Purposes

  27. DNA Chip • Advantages • Parallel • High-throughput • Large-scale • Genomic scale • Applications • Disease diagnosis • regulatory networks • Drug discovery

  28. DNA Chip Design Life cycle Cy5 (red) Cy3 (green)

  29. Clustering

  30. Classification

  31. Gene prediction • Software • Genemark.hmm (Arabidopsis) • Genscan+ (Arabidopsis) • GlimmerA

  32. Internet Resource- PDB

  33. Internet Resource-GenBank

  34. Protein Docking: Hepatitis C Virus (1A1V) Binding site identification Lead compounds identification Lead optimization New drug discovery

  35. Protein Structure PredictionIBM- Blue Gene SwissPort MADWVTGKVTKVQNWTDALFS LTVHAPVLPFTAGQFTKLGLEID GERVQRAYSYVNSPDNPDLEFY LVTVPDGKLSPRLAALKPGDEVQVV SEAAGFFVLDEVPHCETLWMLAT GTAIGPYLSILRLGKDLDRFKNLVL VHAARYAADLSYLPLMQELEKRYE GKLRIQTVVSRETAAGSLTGRIP ALIESGELESTIGLPMNKETSHVML CGNPQMVRDTQQLLKETRQ MTKHLRRRPGHMTAEHYW 1D Sequences Database Multi-Strategies Approach 3D Structure Database SCOP, CATH FSSP, Pfam Neural Networks Support Vector Machines Hidden Markov Models Evolutionary Computation Data Mining Approaches Statistic Models ~ 17000 Others… (quantum mechanism) Which path is best ? We assume that the best path depends on data characteristics.

  36. Protein Database Gene Protein Function Query/Navigation 1D Seq. 2D Structure 3D Structure Functions PDB Swissport CATH SCOP FSSP DSSP Pfam PROSITE Ligand Pathway Web User Interface Processing Programs PHP 1D Sequences Secondary structure 3D structure Binding information Pathway Integrating Protein database MySeq …

More Related