Pavel morozov march 3
Download
1 / 36

Pavel Morozov March 3 - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Pavel Morozov March 3. Legionella Functional Genomics Project. Modulation of host-cell gene expression. Adhesion, invasion. Inhibition of lysosome fusion. Evasion. Recruitment of ER. Replication. Legionella pneumophila.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Pavel Morozov March 3' - gyula


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pavel morozov march 3
Pavel Morozov March 3

Legionella Functional Genomics Project.


Legionella pneumophila

Modulation of host-cell gene expression

Adhesion, invasion

Inhibition of lysosome fusion

Evasion

Recruitment of ER

Replication

Legionella pneumophila

  • An intracellular pathogen that can invade and replicate inside human macrophages and causes potentially fatal human infection Legionaires' disease.

  • Transmitted through inhaling mist droplets containing the bacteria.

  • Has extraordinary ability to survive in many different ecological niches (axenic cultures, biofilms with other organisms and intracellular vacuoles of amoebae, ciliates and human cells).

  • In order to relpicate Legionella should be inside if protozoa (amobae, acanthamoeba) which are single-cell eukaryotes, or macrophages of human lungs or monocites.



Complete genome of legionella pneumophila strain phyladelphia 1
Complete genome of LEgionella pneumophila (strain Phyladelphia 1).

region 3: efflux

genes in

direct chain

Legionella pneumophila (strain Phyladelphya 1) genome.

The highlighted regions were noteworthy due to their possession of different than average G+C content and GC skew in addition to skewed strand preference of ORFs. These computationally determined regions turn out to contain gene clusters that belong to specific categories (e.g., ribosomal protein cluster), or those corresponding to points of genome rearrangements or acquired by horizontal transfer. Some examples are shown in more detail below.

genes inreverse chain

C+G content

GC skew

region 7: tra/trb region (F-plasmid)


Project goals
Project goals Phyladelphia 1).

  • Study molecular mechanisms (genetics and regulation) of

    • Legionella ability to survive in different ecological niches.

    • Legionella infection.

  • Extended genome annotation of Legionella species (Phyladelphia, Paris, Lens strains).

  • Custom whole-genome microarrays.

  • Network reconstruction and modeling.


Microarray design history of legionella microarrays
Microarray Design. Phyladelphia 1).History of Legionella Microarrays.

September 2005

2,997 70-mer oligos

Whole-genome array

3,005 genes in duplicates

640 reference controls

October 2003

3,230 clones

90% of the genome

June 2001

  • 1344 clones in triplicate

  • 40% of the genome


Requirements for Microarray Probes. Phyladelphia 1).

The goal was to design 70-mer probes covering all protein- and RNA- coding genes and control probes for testing background and array properties.

Requirements common to all probes:

should not contain short nucleotide stretches that are too abundant;

should be free from secondary structure elements;

should have approximately same melting temperature

Requirements specific to probes specific to genes:

70-mers should be unique (occure once) in experimental system (Legionella, Human, E.coli);

Requirements specific to array control probes:

should not not exist in experimental system (Legionella, Human, E.coli)


Microarray probe design using unique oligonucleotides of particular length
Microarray probe design using unique oligonucleotides of particular length.

5’ CDS or genomic sequence 3’14-meroligonucleotides

uniqueoligonuclleotides

overrepresented 8-mers

70-mer microarray probe

In simplified form probe selection can be described like selection of regions with maximum number of unique oligonucleotides (in this case of length 14 bp) and minimal number of overrepresented shorter oligonucleotides (in this case 8 bp).

In actual study we have to use oligonucleotides of different length and also check for the probe melting temperature.

Using unique oligonucleotide for designing probes automatically removes secondary structure issues.


Chosing length of oligonucleotides

ancestors particular length.

descendants

Chosing length of oligonucleotides

DNA or RNA (genomic or mRNA sequence).

n

n+1

n+2

n+3

n+k


Distribution of ancestors and descendant of various length. particular length.

For each position we can define the length L at which the nucleotide, starting at this position became unique. All oligonucleotides in this position longer than L will be also unique. Also there are two types of unique oligonucleotides: those who contain unique oligonucleotide of smaller length and those who do not, we name them ancestors and descendants. It is enough to keep information about first occurrence of oligonucleotide for each position in order to have complete information about distribution of unique oligonucleotides for particular sequence region.

Distributions of ancestral and descendant unique oligonucleotides by it’s length. Solid line denote sum of two distributions, dotted line denote distribution of ancestral oligonucleotides and dashed line stands for descendats. A) Results of simulation for genomes of size 1mb. B) Real data for human chromosome X.


Design of probes using unique oligonucleotides positional information.

Sequence region and ancestors for each position (-1 if not known) :

a t g c a c t a g c t a g c t a g t c g …

12,14,-1,-1,15,10,10,11,10,14,-1,-1,13,15,12,-1,-1,-1,12,16…

P1

P2

Pi

For each potential probe Pi can be defined vector of number of unique oligonucleotides of various length (both ancestors and descendants):Vi={0,0,0,0,0,0,0,2,3,4,2,3,5,6,7}.

A Golden Standard vector can be defined asG={0,0,0,0,0,0,n1,n1-1,n1-2,n1-3…}.

An Euclidian distance is a relible choise of a measure for the estimation of distance between Vi and G:

D(Vi,G)=√ ∑L (Vi(j)-G(j))²

where L set of oligonucleotide length used.

A probes with minimal distance to golden standard we choosed.


Finding unique oligonucleotides. information.

Olig Space without Space with length coding coding

4 256 32

5 1,024 128

6 4,096 512

7 16,384 2,048

8 65,536 8,192

9 262,144 32,768

10 1,048,576 131,072

11 4,194,304 524,288

12 16,777,216 2,097,152

13 67,108,864 8,388,608

14 268,435,456 33,554,432

15 1,073,741,824 134,217,728

16 4,294,967,296 536,870,912

17 17,179,869,184 2,147,483,648

18 68,719,476,736 8,589,934,592

19 274,877,906,944 34,359,738,368

20 1,099,511,627,776 137,438,953,472

  • Enumerating oligonucleotides

    • Binary arithmetic : 00 stands for A, 01 for T, 10 for C and 11 for G.

      Binary:01110001 Decimal:142 T G A T

    • Enumeration is complete, dense, and nonredundant.

  • Counting oligonucleotides

    • direct counting

    • Complete space of possible oligonucleotides grows as 4n.

    • Memory size of current computers allows to handle oligonucleotides up to 16 on PC, up to 18 on Sun Solaris. With algorithm enhancements we can go up to 24 (but no need).The best resolution for human genome provided by length 18 and most bacterial genomes 12-14.

  • computable on desktop- computable on workstation with big memory- computable on workstation with big memory with enhanced algorithm- hardly computable


Program realization and data formats

0 information.

1

0

0

1

1

0

0

  • Symbol Length of first Overrepresented flag unique oligonucleotides

  • in this position

Program realization and data formats.

u_find.exeu_findm.exe

for minimal oligonucleotidelength

Marked for unique oligonucleotides fasta file

List of fasta files

(genomes etc.)

u_find.exeu_findm.exefor all desired oligonucleotidelength

u_code.exe

Storing data in Rich FASTA format

Results of the search for unique oligonucleotides are stored in “rich” Fasta format. Essentially it is linear record of positional information like regular Fasta file, but with coded additional information.

u_design.exe

Microarray probes


Design of control probes using non-existing oligonucleotides information.

Goal: sequence which have no homology to any genome ( no blast hits over threshold)

  • Selecting nonexistent oligonucleotides

  • Overlapping and merging oligonucleotides

  • Choosing probes from merged sequences

    AATGCTAGCTA

    ATGCTAGCTAC

    CTAGCTACGGA

    AGCTACGGAAT

    AATGCTAGCTACGGAAT . . . . . .

    ATGCTAGCTACGGA

Nonexisting oligonuclleotides

Nonexisting sequence.

Probe selection (temperature, secondary structure)



Properties of proposed probe design method. probes selected

  • Finding of unique and nonexistent oligonucleotides have linear computational time on the size of genomes used.

  • Once the unique and system is represented in “rich” fasta format, design of new probes became extremely fast and can be repeated as much as needed in order to create probes for new set of CDS or genomic region.

  • Probes, selected by using unique oligonucleotides automatically reduce the presence of hairpins on RNA secondary structure.

  • Method can be applied to experimental systems with multiple non-related genomes (genomes can be as far from each other as eu- and prokaryotes).

  • Method is efficient for control probe selection.

  • Problem: Method did not provide robust estimation of sequence homology between probe and the rest of genomes, at the same time selected probes have the lowest homology to the rest of genome possible.

  • Method provides valuable statstics about oligonucleotide usage in particular genomes and genome sets.


2,997 70-mer oligos, 3,005 genes in all (with duplicates) probes selected

640 reference controls


Legionella in microbial communities
Legionella probes selectedin Microbial Communities.

  • Biofilms are not just a bunch of microbes, they are a special environment, protected from harsh outside by a special polysaccharide layer, which is produced by other microbes in the community.

  • Microbial community in biofilms have shared metabolic and regulatory networks.

  • Biofilms provide excellent environment for horizontal gene transfer.

  • Since biofilms prevent antibiotics and other biocide from getting to the pathogens biofilms are significant reservoir of health-hazardous pathogens.

  • Legionella can survive in biofilms, but cannot form it by itself, only as part of the microbial community.


Similar applications and potential use of proposed method. probes selected

  • Evolutionary studies (Traces of ancient events?).

    Hsieh et.al., Minimal model for genome evolution and growth. Phys Rev Lett. 2003 Jan 10;90(1):018101.

    Jordan et.al., A universal trend of amino acid gain and loss in protein evolution.Nature. 2005 Feb 10;433(7026):633-8. Epub 2005 Jan 19.

  • Use in organism and sequence identification – metagenomics.

    Metagenomics: "the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.“ (Chen and Pachter, University of California, Berkeley)

    Bailey & Ulrich, Molecular profiling approaches for identifying novel biomarkers.Expert Opin Drug Saf. 2004 Mar;3(2):137-51. Review.

    Palmer et.al., Rapid quantitative profiling of complex microbial populations.Nucleic Acids Res. 2006 Jan 10;34(1):e5.


Annotation and web pages
Annotation and Web Pages. probes selected


Clickable probes selectedInteractive Interface

ADAPTERS LAYER: Converting and performing requests, formatting output.UNIX web server, Perl scripts, JAVA, C.

Browser

HTML

JAVA

Local Databases

Remote Databases

Remote Methods

Local Methods

Memory Engine

UpdateEngine

SQL Engine

request

data transfer

supervision

Client Side

Server Side


Solved technical problems: probes selected

solved

ongoing

Setting up the server side

Setting up mySQL server and services

Tools for importing and parsing external databases

scripts to process flat files (perl, mySQL):

extracting related information

fomatting into SQL database

Formatting into static HTML

scripts to pars remote databases (perl, java, mySQL):

extracting related information

fomatting into SQL database

Formatting into static HTML

update engine (under construction)

WEB page development (HTML, JavaScript, CSS)

Testing with Explorer, Fire Fox, Opera, Safari.


Sources of Information probes selected

Proprietary data

Publicly available data

Results of computations

Sequence/Genome

NCBI EMBL TIGR Individual genomes

Functional Domains

Function and annotation

PFAM PDB PRODOM PROSITE

TRANSFAC SMART

Pathways

Categories

Literature

GO

GeneNet MetaCyc

MEDLINE


Current list of integrated databases
Current list of integrated databases probes selected

Parsed for Legionella-related information, organized and stored locally:

  • NCBI

  • EMBL

  • UniProt

  • InterPro

  • PIRSF (PIR superfamily/family)

  • Pfam

  • PRINTS

  • PRODOM

  • PROSITE

  • HSSP

  • MedLine/PubMed

  • MetaCyc

  • NMPDR/FIG

  • KEGG


Web site scheme

Integrated probes selectedTools

WEB site scheme

Precompiled

Static interactive tables

WEB

server

and

scripts

SQL

database

Static interactive gene descriptions

Dynamic (by user requests)

Interactive data retrieval into interactive tables

Interactive genome map

Interactive toolsBLAST, HMM, REMOTE_SEARCH (SMART, PROSITE etc.)

Semi Dynamic

Search History


  • Legionella probes selected Genome Browser.

  • Interactive.

  • You can:

    • Choose scale and region

    • Links to tables and annotation data

    • Choose annotation tracks to display and track parameters

    • Choose various color schemes

    • Add custom annotation tracks


Interactive tables

Row operations: Select/Unselect probes selected, Show/Hiderows

Columns (fields) operations: Show/hide column

Sorting columns

Interactive tables


Interactive tables by demands
Interactive Tables probes selectedby Demands.


Snapshots of the probes selectedNMPDR annotation pages

Region comparisons in other genomes by sequence homology:

icmR

icmP

L.pn Phil1

Coxiella burnetii


Visualization of the gene expression in nmpdr system
Visualization of the gene expression in NMPDR system probes selected

pathway reactions

expression ratios

Legionella gene info



  • Study gene expression: probes selected

  • during intracellular growth and under various environmental stresses

  • axenically- and protozoan-grown Legionella

  • in Legionella-containing biofilms

4. Develop models (gene networks and reporter genes) that describe relevant patterns of gene expression:

(gene networks =expressed genes + their regulators)


560 assignments probes selected

LegCyc:181 pathways

678 assignments

72%

Expressed genes: Original Gene Function Assignments

~3000 genes

BLAST

ORF Finders

KEGG  Pathways

MetaCyc

GeneOntology

Plus Missing

Members

  • Use lower stringency search

  • BLAST expected genes to Legionella genome sequence

  • Search for probable motif combinations

Confirm absence

of these genes:


histidine biosynthesis probes selected

LegCyc

Legionella metabolic pathway overview(a portion)


1 2 3 4 5 6 7 8 probes selected

Search for transcription factor binding sites

+

Predicted operons

Clusters of co-expressed genes

lvrA

lvh

TF site prediction (in silico).

  • Promoter manipulations

  • Co-expressed gene sets

  • Regulatory networks

Experimental confirmation of the predicted promoters. Transcription start sites.

Use of confirmed motifs to identify additional co-regulated genes.


  • Columbia Genome Center probes selected

  • Jing Ju lab, S. Kalachikov, S. Pompu

  • Gene expression microarrays

  • Clusters of coexpressed genes

  • Regulatory genes knockout results (expression)

  • Molecular biology methods

  • Gene expression microarrays

  • RT-PCR

  • Transcriptional factors

  • promotor verification

  • Microbiology Department

  • Prof. Shuman

  • Gene knockout

  • Phenotypic analysis

  • Computational Analysis

  • Morozov Pavel, Morozova Irina

  • operon structures

  • putative promotors and transcriptional regulation sites

  • detailed gene annotation

  • regulatory network reconstruction


ad