Bioinformatics at promega corporation
1 / 27

Bioinformatics at Promega Corporation - PowerPoint PPT Presentation

  • Uploaded on

Bioinformatics at Promega Corporation. Intro to Bioinformatics Biotec May 4, 2006 Ethan Strauss Sr. Scientist R&D Bioinformatics, Promega, [email protected] My Background. Bachelor’s degree in biology PhD and work experience in Molecular Biology

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Bioinformatics at Promega Corporation' - rhoda

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Bioinformatics at promega corporation
Bioinformatics at Promega Corporation

Intro to Bioinformatics Biotec

May 4, 2006

Ethan Strauss

Sr. Scientist R&D Bioinformatics,


[email protected]

My background
My Background

  • Bachelor’s degree in biology

  • PhD and work experience in Molecular Biology

  • Eight years in Promega Technical Services

  • Almost a year in Bioinformatics (officially)

  • No formal computer training

  • No formal bioinformatics training

Bioinformatics at promega corporation1
Bioinformatics at Promega Corporation

  • Bioinformatics did not exists as a separate function until 2001

    • One person 2001- 2005

    • Two people 2005 - ?

  • Bioinformatics supports primarily R&D (~100 scientists)

    • Mentor and train R&D scientists

    • Provide expertise for projects (~120 requests per year)

    • Propose and evaluate new acquisitions

    • Liaison to IT department

    • Manage bioinformatics infrastructure (~15 tools)

    • Develop new tools and adapt existing tools in house

Bioinformatics projects
Bioinformatics Projects

  • Programming

    • Tools for internal and external Promega customers

      • Plexor™ Primer Design System

      • Biomath

      • siRNA Designer

      • Sequence analysis for Excel and Microsoft Word

      • Analysis of BLAST results

      • Automated data retrieval (Web services)

      • Database for tracking vector construction

      • Database for keeping track of plasmid features

      • Laboratory Information Management System (LIMS)

      • Chemical Database

Bioinformatics projects1
Bioinformatics Projects

  • Biocomputing (use of computers in biological research)

  • Database searches

  • data mining

  • discovery research

  • Analysis & in silico design of nucleic acid and protein sequence

  • Molecular visualization

  • Modeling

  • Simulation (proteins, ligands)


  • Tools for Promega customers

    • Biomath (

      • Basic calculations (Most can be done easily by hand)

      • Simple code (Javascript)

      • Established theory.

      • Universal (not Promega specific)

    • siRNA Designer( )

      • Complex calculations

      • More complex code (VBScript)

      • Rapidly evolving theory

      • Partially Promega specific


  • Tools for Promega customers

    • Plexor Primer Design (

      • Complex calculations

      • Complex code (C#.Net)

        • Separate user interface and main calculations

        • Multiple interacting modules

        • Database integration

        • Integration with Genbank (through a web service)

      • Proprietary improvements on established theory

      • Very Promega specific


  • Tools for internal use

    • BLAST analysis of Plexor Primers

      • Primer specificity is important

      • BLAST can determine specificity, but output is very complex.

      • Simplify

        • Combine all hits from the same “Gene”

        • Only show hits which could mis-prime

        • Groups hits by species

        • Allow sorting by species


Initial BLAST results (1 page out of ~30)

  • Tools for internal use

    • BLAST analysis of Plexor Primers

Analyzed BLAST results (complete!)


  • Tools for internal use

    • Vector/Insert Database

      • Promega’s Flexi vector system has a very structured cloning procedure.

      • R&D has been making many different Flexi vector backbones with many inserts.

      • Keeping track has been a problem.

      • A database is in development


  • Tools for internal use


  • Internal Projects

    • Which Restriction enzyme cuts least frequently in human ORFs?

      • Method:

        • Download human Refseq database (

        • Load into local database

        • Scan each sequence for each RE site

          • The scan took 2-3 hours to complete


  • Internal Projects

    • Which human genes in Genbank are the most “popular”?

      • Method

        • Download “Gene” database (

        • Download Gene Ontology information (

        • Use web services to get pathway information from KEGG (

        • Use web services to get citation information from Pubmed (

        • Load all into local database

        • Rank genes by desired criteria

          • Size

          • Function

          • Localization

          • Pathways

          • Publications

Database searches and data mining
Database searches and data mining

Question: Can you reformat this sequence for me?Tool: ReadSeq & Macros

Question: How many viral proteins start with MetHis?Tool: Hits database & motif searches

Question: How many different bacterial two-domain proteins are known?Tool: SCOP database

Question: How do I design PCR primers selective for bacterial species X?Tool: Ribosomal database 16s rRNA alignment:

In silico design rna sequences
In silico design – RNA sequences

Goal: Design RNA sequence that folds into specific structure

(specific structure provides desired function)

Tools: mfold (Michael Zucker)

Vienna RNA Package

In silico design dna sequences
In silico design – DNA sequences

Goal: Express protein of interest in E. coli cells – fastest way

Steps: Obtain protein or DNA sequence from database

Optimize codon usage for expression in E. coli

Match restriction enzyme sites to expression vector

Send DNA sequence for synthesis (cost ~$1/base)

Tools: NCBI database

Codon usage database

Restriction enzyme database

Sequence analysis software

In silico design reporter gene
In silico design – reporter gene

Goal: Design optimal DNA sequence coding for reporter protein

(maximize expression and minimize unintended regulation)

In silico design reporter genes
In silico design – reporter genes


Optimize codon usage:

Codon Usage DB


Identify & remove regulatory sites:



Genomatix tools



Expression: up 10x

Background: down 10x

Non-specific regulation: lower

Visualization molecular system of interest
Visualization – molecular system of interest

Goal: Visualize molecule of interest (blue) and interaction partners

Tools: World Index of Molecular Visualization Resources

Modeling protein fold
Modeling – protein fold

Goal: 3D structure model of enzyme => location of N/C termini => find active site => other

Tools: NCBI BLink

Protein Data Bank



InsightII Modeler

unknown 3D structure: Renilla luciferasehomologue with known 3D structure: Hydrolase

sequence identity: 36%

Modeling protein engineering
Modeling – protein engineering

  • Goal: Alter catalytic activity of enzyme

    => predict structural effects of different point mutations

mutation disrupts structure mutation does not disrupt structure

Tools: InsightII Modeler

Modeling protein engineering1
Modeling – protein engineering

Goal: Improve substrate binding rate of enzyme

=> identify specific amino acids to mutate

constricted binding tunnel open binding tunnel (mutant)

Tools: InsightII Modeler

Modeling substrate engineering
Modeling – substrate engineering

Goal: Find better substrate for enzyme

=> analyze geometric constraints of substrate binding pocket

Tools: Hetero-compound Info Center

InsightII Modeler

Lims laboratory information management system
LIMS – Laboratory Information Management System

  • Goal: Manage in-house DNA sequences and associated data

  • Eval: UW-Madison Center for Eukaryotic Structural Genomics

  • Sesame

    • “…Sesame is designed to organize and record data relevant to complex scientific projects, to launch computer-controlled processes, and to help decide about subsequent steps on the basis of information available. The Sesame system is based on the multi-tier paradigm, and it consists of a framework and application modules that carry out specific tasks.Users interact with Sesame through a series of web-based Java applet-applications designed to organize data. It allows collaborators on a given project to enter, process, view, and extract relevant data, regardless of location, so long as web access is available. Data reside in an Oracle relational database. Sesame serves as a digital laboratory notebook and allows users to attach numerous files and images…”

Bioinformatics advice
Bioinformatics Advice

  • Be aware of bias in databases!

    • Search Genbank (nucleotide) for Human[Organism] apoptosis. How many hits?

    • Now try Orcinus[Organism] apoptosisHow many hits?

    • Can you conclude that Orcinus does not have apoptosis?

Bioinformatics advice1
Bioinformatics Advice

  • Bioinformatics is changing and advancing very rapidly.

    • Don’t forget to notice what is new.

      • NCBI now has ~20 different databases. They had two only 3-5 years ago

    • If you want to do something that you know can’t be done, check again in two weeks!

      • My standard computer can process the entire human genome for Restriction sites, ORFs etc in a few hours. Not long ago, the best computers couldn’t even hold that much data!

    • If old tools work, don’t feel you need to use the newest tools.

      • I still do much of my analysis with Microsoft Word…