Bioinformatics :Data to biological knowledge in a mouseclick Rinku Saha

Bioinformatics :Data to biological knowledge in a mouseclick Rinku Saha Biomedical Informatics Team UAMS

Biology vs bioinformatics Biology now is a science in transition changing rapidly from data-poor to data-rich science Biology: Uses time consuming wet lab data and analysis is a repetative process. Bioinformatics: Its fast as it uses resources from several datasets at once. Bioinformatics Traditional biology Fig:Referenced from Bioinformatics: from data to biological knowledge by Dena Leshkowitz, Ph.D Bioinformatics Unit Hebrew University

Life is a library of sequences Crick and Watson, a pair of students in 1950's Cambridge, discovered the structure of DNA, In 1956 Frederick Sanger was the first to establish the order of amino acids of a protein hormone Insulin. The progress in determining protein sequences was slow until the mid 1970's when the same Frederick Sanger (amongst others) developed methods for the rapid sequencing of DNA. The ability to sequence DNA lead rapidly to an immediate increase in the number of protein sequences resolved. Central databases were established in Europe, the USA and Japan to collect this sequence information from individual scientists and make it available to other researchers. The Human Genome Project is a decade long endeavor which is benefitting in the form of gene sequences that emerge from the project . This was followed by a series of projects, some still continuing, have been successful in sequencing the smaller genomes of some bacteria, yeast,invertebrates,rat,mice etc. The increased quantity of data will lead to a better understanding of the way genes and their protein products work and thus help us with developing better methods for dealing with the diseases that happen when the processes that control life go wrong. Then came the microarray technology that allows simultaneous measurement of expression levels for up to tens of thousands of genes which helps us in examining complex biological interactions simateneosly.And now Protein arrays are rapidly becoming established as a powerful means to detect proteins, monitor their expression levels, and investigate protein interactions and functions . Protein arrays make possible the parallel multiplex screening of thousands of interactions, encompassing protein-antibody, protein-protein, protein-ligand or protein-drug, enzyme-substrate screening and multianalyte diagnostic assays in the chip format

Data Explosion Human Genome Project Microarray and Protein Array Human Genome project completed assembly in year 2000 The Challenge of terabytes of data and its annotation and representation Fig Ref:Bioinformatics: from data to biological knowledge Dena Leshkowitz, Ph.D Bioinformatics Unit Hebrew University

What do we do with such a huge volume of data Bionformatics Solutions Develop fast applications to analyze the data . Develop databases and software to store the data, enter new data and query the data(NCBI,EMBL etc) Design data structures to represent this Information Ref:Bioinformatics: from data to biological knowledge Dena Leshkowitz, Ph.D Bioinformatics Unit Hebrew University

Nucleotides and Bioinformatics Question :What would you do when you discover a unknown DNA fragment in a gel which you have had sequenced • Answer: Bioinformatics tools for dna sequence and genomic analysis • Vector sequence check: Using Blast2Evec or NCBI VecScreen etc • Restriction mapping: Using REBASE,ResMap,Restriction Analysis etc • Design PCR primers: Using Oligo,Primer3,BioMath,PrimerStation etc • Analyze DNA composition:Using Repeat Masker,Emboss tools such as chips,compseq,chips etc • Identify coding regions and translation:ORF Finder,Genie,Translate tool,Transeq etc • Motif identification:SMART,ProfileScan • Identification of signals associated with gene regulation:GrailEXP • Similarity searching for identifying a probable functional role: tblastn,megablast,psi- • blast(NCBI) • Genome search to identify similar regions in wider range of organisms: • GenomeScan,SNP(Ncbi) of • Important Links • http://restools.sdsc.edu/biotools/biotools16.html • http://www.humgen.nl/primer_design.html • http://ccb.ucmerced.edu/app/?id=emboss • http://www.123genomics.com/files/analysis.html • http://www.dnalc.org/bioinformatics/dnalc_nucleotide_analyzer.htm

RNA and Bioinformatics Question :DNA to RNA • Analysis you can perform using bioinformatics tools: • Detect tRna and tmRna in nucleotide sequence:ARAGORN • RNA secondary structure predition:RNAView Secondary Structure Viewer,RNAmine,RNA Fold Server • Important Links • http://www.bioinfo.rpi.edu/applications/mfold/ • http://rnamine.ncrna.org/rnamine/ • http://phmmts.ncrna.org/phmmts/jsp/mainIndex.jsp?pageRef=phmmts

Gene Expression Facts • every cell of the body contains a full set of chromosomes and identical genes • a fraction of these genes are turned on, however, and it is the subset that is "expressed" • that confers unique properties to each cell type • during transcription information contained within the DNA, the repository of genetic information • into messenger RNA (mRNA) molecules • mRNA molecules are then translated into the proteins that perform most of the critical functions of cells • scientists now study the kinds and amounts of mRNA produced by a cell to learn which genes are • expressed (using microarrays ), which in turn provides insights into how the cell responds to its changing needs Question :What would you do to manage huge datasets and analyze microarray image and data? • Databases to store data and applications to analyze data: Bioinformatics way • Management of microarray data:AMAD,BASE • Image and data analysis: R, bioconductor, d-chip etc,Affymetrix, Scananalyze, clustering • Data Annotation:Netaffyx,DAVID,Onto-Express,GenMapp Important Links http://genome-www5.stanford.edu/ http://staffa.wi.mit.edu/chipdb/public/ http://info.med.yale.edu/microarray/data_analysis.htm#keckwks http://david.abcc.ncifcrf.gov/

Protein and Bioinformatics Facts: • amino acids are strung together in particular sequences that will fold up into a specific structure • each protein is a nanomachine that can perform a particular task • understanding how different proteins fold up and how they work, we can begin to understand how they work together • to make up a cell or play a role in disease > mdlsavriqe vqnvlhamqk ilecpiclel ikepvstqcd hifckfcmlk llnqkkgpsq cplckneitk rslqgsarfs qlveellkii dafeldtgmq cangfsfskk knsssellne dasiiqsvgy rnrvkklqqi esgsatlkds lsvqlsnlgi vrsmkknrqt qpqnksvyia lesdsseerv napdgcsvrd qelfqiapgg agdegklnsa kkaacdfseg • Question:What would you do when you discover a unknown protein band in a 2-D gel? • Answer:Bioinformatics tools for protein and proteomic analysis • Determine the mass and molecular weight: Compute pI/Mw,ProFound,SearchXlinks,Mascot,X-proteo etc • Primary sequence analysis:blastp(NCBI) etc • Multiple allignment to identify the conserved regions of identity between a set of sequences selected from blast results: • Clustalw,multialign etc • Pattern search search using conserved regions for probable functional role :Prosite,Pfam,PRINTS,InterPro (Expasy) etc • Post-translational modification prediction:ChloroP,LipoP,PATS,SignalP etc • Secondary structure prediction/Threading: PredictProtein,GOR,Jnet,Threader,GenTHREADER etc • Homology modelling: DALI,Modeller,SCRAWL etc • Visualization: RasMol,Swiss-PDB Viewer etc • Molecular dynamics and structure quantam mechanics for real life prediction:Amber,Gaussian • Phylogenetic tree construction: Phylip Important links http://expasy.org/ http://www.compbio.dundee.ac.uk/ http://www.ebi.ac.uk/

The underlying assumption used ... • Mapping protein formation of a novel sequence to • infer cellular metabolics • infer probable evolutionary trend both past • and future • Develop near perfect disease controlling • and preventive therapeutics

A Short List of Bioinformatics Databases Ref:http://www.biw.kuleuven.be/vakken/i287/bioinformatica.htm

The most amazing question: What is Bioinformatics ?

Answer: Bioinformatics now occupies a fundamental role in modern biology, chemistry, genetics and systems biology, enabling and accelerating the path to biological discoveries and the understanding of systems.(Ref:Http://bioinformatics.ubc.ca). or Using computational systems,software applications and database solutions Ref:http://bioinformatics.ubc.ca/research/talks/archive/LIBR534_061003_jfox.pdf

Important Links http://bioinformatics.ubc.ca/about/what_is_bioinformatics http://binf.gmu.edu/websites.html http://bioinformatics.unr.edu/seqbx/tutorials.htm http://www.sacs.ucsf.edu/Resources/biolinks.html http://www.biw.kuleuven.be/vakken/i287/bioinformatica.htm#Primary%20DB

Bioinformatics :Data to biological knowledge in a mouseclick Rinku Saha

Bioinformatics :Data to biological knowledge in a mouseclick Rinku Saha

Presentation Transcript

Proteins: Their Structure and Biological Functions

Overview of Biological Databases

Biological Hazards

Bioinformatics Toolbox

Protein Identification via Database searching

Machine Learning for High-Throughput Biological Data

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Bioinformatics PhD. Course

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Lecture 5 Microarray Data Analysis Bioinformatics Data Analysis and Tools

CS 6293 Advanced Topics: Translational Bioinformatics

CHEMICAL AND BIOLOGICAL DEFENSE

Data Mining: How to make islands of knowledge emerging out of oceans of data

Biological Weapons

Introduction to Bioinformatics

Graphical Models in Machine Learning

3. Be able to critically read and evaluate the scientific literature.

Bioinformatics Pipelines for RNA- Seq Data Analysis

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Canadian Bioinformatics Workshops