Bioinformatics at promega corporation
Download
1 / 27

Bioinformatics at Promega Corporation - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on

Bioinformatics at Promega Corporation. Intro to Bioinformatics Biotec May 4, 2006 Ethan Strauss Sr. Scientist R&D Bioinformatics, Promega, [email protected] http://q7.com/~ethan/molbio. My Background. Bachelor’s degree in biology PhD and work experience in Molecular Biology

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Bioinformatics at Promega Corporation' - rhoda


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bioinformatics at promega corporation
Bioinformatics at Promega Corporation

Intro to Bioinformatics Biotec

May 4, 2006

Ethan Strauss

Sr. Scientist R&D Bioinformatics,

Promega,

[email protected]

http://q7.com/~ethan/molbio


My background
My Background

  • Bachelor’s degree in biology

  • PhD and work experience in Molecular Biology

  • Eight years in Promega Technical Services

  • Almost a year in Bioinformatics (officially)

  • No formal computer training

  • No formal bioinformatics training


Bioinformatics at promega corporation1
Bioinformatics at Promega Corporation

  • Bioinformatics did not exists as a separate function until 2001

    • One person 2001- 2005

    • Two people 2005 - ?

  • Bioinformatics supports primarily R&D (~100 scientists)

    • Mentor and train R&D scientists

    • Provide expertise for projects (~120 requests per year)

    • Propose and evaluate new acquisitions

    • Liaison to IT department

    • Manage bioinformatics infrastructure (~15 tools)

    • Develop new tools and adapt existing tools in house


Bioinformatics projects
Bioinformatics Projects

  • Programming

    • Tools for internal and external Promega customers

      • Plexor™ Primer Design System

      • Biomath

      • siRNA Designer

      • Sequence analysis for Excel and Microsoft Word

      • Analysis of BLAST results

      • Automated data retrieval (Web services)

      • Database for tracking vector construction

      • Database for keeping track of plasmid features

      • Laboratory Information Management System (LIMS)

      • Chemical Database


Bioinformatics projects1
Bioinformatics Projects

  • Biocomputing (use of computers in biological research)

  • Database searches

  • data mining

  • discovery research

  • Analysis & in silico design of nucleic acid and protein sequence

  • Molecular visualization

  • Modeling

  • Simulation (proteins, ligands)


Programming
Programming

  • Tools for Promega customers

    • Biomath (http://www.promega.com/biomath/)

      • Basic calculations (Most can be done easily by hand)

      • Simple code (Javascript)

      • Established theory.

      • Universal (not Promega specific)

    • siRNA Designer(http://www.promega.com/siRNADesigner/ )

      • Complex calculations

      • More complex code (VBScript)

      • Rapidly evolving theory

      • Partially Promega specific


Programming1
Programming

  • Tools for Promega customers

    • Plexor Primer Design (https://www.promega.com/techserv/tools/plexor)

      • Complex calculations

      • Complex code (C#.Net)

        • Separate user interface and main calculations

        • Multiple interacting modules

        • Database integration

        • Integration with Genbank (through a web service)

      • Proprietary improvements on established theory

      • Very Promega specific


Programming2
Programming

  • Tools for internal use

    • BLAST analysis of Plexor Primers

      • Primer specificity is important

      • BLAST can determine specificity, but output is very complex.

      • Simplify

        • Combine all hits from the same “Gene”

        • Only show hits which could mis-prime

        • Groups hits by species

        • Allow sorting by species


Programming3
Programming

Initial BLAST results (1 page out of ~30)

  • Tools for internal use

    • BLAST analysis of Plexor Primers

Analyzed BLAST results (complete!)


Programming4
Programming

  • Tools for internal use

    • Vector/Insert Database

      • Promega’s Flexi vector system has a very structured cloning procedure.

      • R&D has been making many different Flexi vector backbones with many inserts.

      • Keeping track has been a problem.

      • A database is in development


Programming5
Programming

  • Tools for internal use


Programming6
Programming

  • Internal Projects

    • Which Restriction enzyme cuts least frequently in human ORFs?

      • Method:

        • Download human Refseq database (ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/)

        • Load into local database

        • Scan each sequence for each RE site

          • The scan took 2-3 hours to complete

http://www.promega.com/pnotes/89/12416_11/12416_11.pdf


Programming7
Programming

  • Internal Projects

    • Which human genes in Genbank are the most “popular”?

      • Method

        • Download “Gene” database (ftp://ftp.ncbi.nlm.nih.gov/gene/)

        • Download Gene Ontology information (http://www.geneontology.org/)

        • Use web services to get pathway information from KEGG (http://www.genome.jp/kegg/)

        • Use web services to get citation information from Pubmed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed)

        • Load all into local database

        • Rank genes by desired criteria

          • Size

          • Function

          • Localization

          • Pathways

          • Publications


Database searches and data mining
Database searches and data mining

Question: Can you reformat this sequence for me?Tool: ReadSeq http://bimas.dcrt.nih.gov/molbio/readseq & Macros

Question: How many viral proteins start with MetHis?Tool: Hits database & motif searches http://hits.isb-sib.ch/

Question: How many different bacterial two-domain proteins are known?Tool: SCOP database http://scop.berkeley.edu/

Question: How do I design PCR primers selective for bacterial species X?Tool: Ribosomal database 16s rRNA alignment: http://rdp.cme.msu.edu


In silico design rna sequences
In silico design – RNA sequences

Goal: Design RNA sequence that folds into specific structure

(specific structure provides desired function)

Tools: mfold (Michael Zucker) http://www.bioinfo.rpi.edu/~zukerm/

Vienna RNA Package http://www.tbi.univie.ac.at/


In silico design dna sequences
In silico design – DNA sequences

Goal: Express protein of interest in E. coli cells – fastest way

Steps: Obtain protein or DNA sequence from database

Optimize codon usage for expression in E. coli

Match restriction enzyme sites to expression vector

Send DNA sequence for synthesis (cost ~$1/base)

Tools: NCBI database http://www.ncbi.nlm.nih.gov

Codon usage database http://www.kazusa.or.jp/codon/

Restriction enzyme database http://rebase.neb.com/rebase/rebase.html

Sequence analysis software


In silico design reporter gene
In silico design – reporter gene

Goal: Design optimal DNA sequence coding for reporter protein

(maximize expression and minimize unintended regulation)


In silico design reporter genes
In silico design – reporter genes

Tools:

Optimize codon usage:

Codon Usage DB http://www.kazusa.or.jp/codon/

INCA http://www.bioinfo-hr.org/inca/

Identify & remove regulatory sites:

TRANSFAC DB http://www.biobase.de/

TESS http://www.cbil.upenn.edu/tess/

Genomatix tools http://www.genomatix.de

Others

hRluc

Expression: up 10x

Background: down 10x

Non-specific regulation: lower


Visualization molecular system of interest
Visualization – molecular system of interest

Goal: Visualize molecule of interest (blue) and interaction partners

Tools: World Index of Molecular Visualization Resources

http://molvis.sdsc.edu/visres/index.html


Modeling protein fold
Modeling – protein fold

Goal: 3D structure model of enzyme => location of N/C termini => find active site => other

Tools: NCBI BLink http://www.ncbi.nlm.nih.gov/

Protein Data Bank http://www.rcsb.org/pdb

SwissModel http://swissmodel.expasy.org/

WHAT IF http://swift.cmbi.ru.nl/whatif/

InsightII Modeler http://www.accelrys.com/insight

unknown 3D structure: Renilla luciferasehomologue with known 3D structure: Hydrolase

sequence identity: 36%


Modeling protein engineering
Modeling – protein engineering

  • Goal: Alter catalytic activity of enzyme

    => predict structural effects of different point mutations

mutation disrupts structure mutation does not disrupt structure

Tools: InsightII Modeler http://www.accelrys.com/insight/


Modeling protein engineering1
Modeling – protein engineering

Goal: Improve substrate binding rate of enzyme

=> identify specific amino acids to mutate

constricted binding tunnel open binding tunnel (mutant)

Tools: InsightII Modeler http://www.accelrys.com/insight/


Modeling substrate engineering
Modeling – substrate engineering

Goal: Find better substrate for enzyme

=> analyze geometric constraints of substrate binding pocket

Tools: Hetero-compound Info Center http://alpha2.bmc.uu.se/hicup/

InsightII Modeler http://www.accelrys.com/insight/



Lims laboratory information management system
LIMS – Laboratory Information Management System

  • Goal: Manage in-house DNA sequences and associated data

  • Eval: UW-Madison Center for Eukaryotic Structural Genomics

  • Sesame http://www.sesame.wisc.edu/

    • “…Sesame is designed to organize and record data relevant to complex scientific projects, to launch computer-controlled processes, and to help decide about subsequent steps on the basis of information available. The Sesame system is based on the multi-tier paradigm, and it consists of a framework and application modules that carry out specific tasks.Users interact with Sesame through a series of web-based Java applet-applications designed to organize data. It allows collaborators on a given project to enter, process, view, and extract relevant data, regardless of location, so long as web access is available. Data reside in an Oracle relational database. Sesame serves as a digital laboratory notebook and allows users to attach numerous files and images…”


Bioinformatics advice
Bioinformatics Advice

  • Be aware of bias in databases!

    • Search Genbank (nucleotide) for Human[Organism] apoptosis. How many hits?

    • Now try Orcinus[Organism] apoptosisHow many hits?

    • Can you conclude that Orcinus does not have apoptosis?


Bioinformatics advice1
Bioinformatics Advice

  • Bioinformatics is changing and advancing very rapidly.

    • Don’t forget to notice what is new.

      • NCBI now has ~20 different databases. They had two only 3-5 years ago

    • If you want to do something that you know can’t be done, check again in two weeks!

      • My standard computer can process the entire human genome for Restriction sites, ORFs etc in a few hours. Not long ago, the best computers couldn’t even hold that much data!

    • If old tools work, don’t feel you need to use the newest tools.

      • I still do much of my analysis with Microsoft Word…


ad