Molecular biology in silico
Download
1 / 56

Molecular biology in silico - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Molecular biology in silico. Mikhail Gelfand Research and Training Center “Bioinformatics”, Institute for Information Transmission Problems, RAS AlBio06, Moscow, July 2006. red: papers (experiments) blue: sequence fragments. Propaganda. Complete genomes.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Molecular biology in silico' - tevin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Molecular biology in silico

Molecular biology in silico

Mikhail Gelfand

Research and Training Center “Bioinformatics”,

Institute for Information Transmission Problems, RAS

AlBio06, Moscow, July 2006


Propaganda

red: papers (experiments)blue: sequence fragments

Propaganda


Complete genomes
Complete genomes

GOLD db.(III.2006):361 complete genomesIncomplete (in the process): 952 bacteria58 archaea 607 eukaryotes (incl.ESTs)46 metagenomes


More propaganda
More propaganda

Most genes will never be studied in experiment

Even in E.coli: only 20-30 new genes per year (hundreds are still uncharacterized)

Bioinformatics = molecular biology in silico

  • ~2% of all recent papers in biological journals

  • Essential component of biological research

  • Make predictions about function and regulation of genes (many quite reliable!)

  • Metabolic reconstruction and prediction of phenotype given genome

  • Identify really interesting cases, fill gaps in knowledge

    • “Universally missing genes” – not a single known gene even for ~10% reactions of central metabolism. No genes for >40% reactions overall

    • “Conserved hypothetical genes” (5-15% of any bacterial genome) – essential, but unknown function




How similarity to known proteins
How?Similarity to known proteins

  • Useful for many purposes (allows one to annotate 50-75% genes in a bacterial genome)

  • Necessary first step

  • May be automated

    • … to some extent …

    • in particular, care is needed to avoid too specific predictions

    • Problem: propagation of annotation errors

  • Boring (nothing new)


Noradrenaline transporter in an archaeon
Noradrenaline transporter in an archaeon?

SOURCE Methanococcus jannaschii.

ORGANISM Methanococcus jannaschii

Archaea; Euryarchaeota; Methanococcales; Methanococcaceae;

Methanococcus.

Now corrected: Hypothetical sodium-dependent transporter MJ1319.

FEATURES Location/Qualifiers

source 1..492

/organism="Methanococcus jannaschii"

/db_xref="taxon:2190"

Protein 1..492

/product="sodium-dependent

noradrenaline transporter"

CDS 1..492

/gene="MJ1319"

/note="similar to EGAD:HI0736 percent identity: 38.5;

identified by sequence

similarity;

putative"

/coded_by="U67572:71..1549"

/transl_table=11


Similarity to hypothetical proteins somebody else s errors
Similarity to hypothetical proteins: somebody else’s errors…

The correct

annotation


Genes with curious functional assignments
Genes with curious functional assignments

  • C75604: Probable head morphogenesis protein,Deinococcusradiodurans

  • O05360:Automembrane protein H,Yersinia enterocolitica

  • Q8TID9:Benzodiazepine (valium) receptor TspO, Methanosarcina acetivorans

  • NP_069403: DR-beta chain MHC class II, Archaeoglobus fulgidus


Errors in experimental papers
Errors in experimental papers

SwissProt:

DEFINITION Hypothetical 43.6 kDa protein.

ACCESSION P48012

...

KEYWORDS Hypothetical protein.

SOURCE Debaryomyces occidentalis

ORGANISM Debaryomyces occidentalis

Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes;

Saccharomycetales; Saccharomycetaceae; Debaryomyces.

[CAUTION] Was originally (Ref.1) thought to be 3-isopropylmalatedehydrogenase (LEU2).

PIR:

DEFINITION 3-isopropylmalate dehydrogenase (EC 1.1.1.85)

- yeast(Schwanniomyces occidentalis).

ACCESSION S55845

KEYWORDS oxidoreductase.


Swissprot entry dsdx ecoli
SwissProt entry DSDX_ECOLI

-!- CAUTION: An ORF called dsdC was originally (Ref.3) assigned to thewrong DNA strand and thought to be a D-serine deaminase activator,it was then resequenced by Ref.2 and still thought to be "dsdC",but this time to function as a D-serine permease. It is Ref.1 thatshowed that dsdC is another gene and that this sequence should becalled dsdX. It should also be noted that the C-terminal part ofdsdX (from 338 onward) was also sequenced (Ref.6 and Ref.7) andwas thought to be a separate ORF (don't worry, we also haddifficulties understanding what happened!).


Positional clustering
Positional clustering

  • Genes that are located in immediate proximity tend to be involved in the same metabolic pathway or functional subsystem

    • mainly in prokaryotes, very weak in eukaryotes

    • caused by operon structure, but not only

      • horizontal transfer of loci containing several functionally linked operons

      • compartmentalisation of products in the cytoplasm

    • very weak evidence

      • stronger if observed in may unrelated genomes

  • May be measured

    • e.g. the STRING database/server (P.Bork, EMBL)

    • and other sources


String trpb positional clusters
STRING: trpB – positional clusters


Functionally dependent genes tend to cluster on chromosomes in many different organisms
Functionally dependent genes tend to cluster on chromosomes in many different organisms

Vertical axis: number of gene pairs with association score exceeding a threshold.

Control: same graph, random re-labeling of vertices


More genomes stronger links highly significant clustering
More genomes (stronger links) in many different organisms=> highly significant clustering


Especially in linear pathways right
Especially in linear pathways (right in many different organisms)


Fusions
Fusions in many different organisms

  • If two (or more) proteins form a single multidomain protein in some organism, they all are likely to be tightly functionally related

  • Very useful for the analysis of eukaryotes

  • Sometimes useful for the analysis of prokaryotes


String trpb fusions
STRING: in many different organisms trpB – fusions


Phyletic patterns
Phyletic patterns in many different organisms

  • Functionally linked genes tend to occur together

  • Enzymes with the same function (isozymes) have complementary phyletic profiles


String trpb co occurrence phyletic profiles
STRING: in many different organisms trpB – co-occurrence (phyletic profiles)


Phyletic profiles in the phe tyr pathway
Phyletic profiles in the Phe/Tyr pathway in many different organisms

shikimate kinase


Archaeal shikimate kinase

Chorismate biosynthesis pathway ( in many different organismsE. coli)

Archaeal shikimate-kinase


Arithmetics of phyletic patterns

3-dehydroquinate dehydratase in many different organisms(EC 4.2.1.10):

Class I (AroD) COG0710 aompkzyq---lb-e----n---i--

Class II (AroQ) COG0757 ------y-vdr-bcefghs-uj----

Two forms combined aompkzyqvdrlbcefghsnuj-i--

+

Shikimate kinase (EC 2.7.1.71):

Typical (AroK) COG0703 ------yqvdrlbcefghsnuj-i--

Archaeal-type COG1685 aompkz--------------------

Two forms combined aompkzyqvdrlbcefghsnuj-i--

+

Arithmetics of phyletic patterns

Shikimate dehydrogenase (EC 1.1.1.25):

AroE COG0169 aompkzyqvdrlbcefghsnuj-i--

5-enolpyruvylshikimate 3-phosphate synthase (EC 2.5.1.19) AroACOG0128aompkzyqvdrlbcefghsnuj-i--

Chorismate synthase (EC 2.5.1.19) AroCCOG0082aompkzyqvdrlbcefghsnuj-i--


Distribution of association scores monotonic for subunits bimodal for isozymes
Distribution of association scores in many different organisms(monotonic for subunits, bimodal for isozymes)


E g transporters
E.g. transporters in many different organisms

  • Transporters of end products of metabolic pathways may substitute the entire pathway

  • Transporters of compounds for catabolic pathways co-occur with pathways

  • Transporters for intermediates substitute upstream parts of pathways


Example bioy
Example: in many different organismsbioY


Other approaches to phyletic patterns
Other approaches to phyletic patterns in many different organisms

  • Gene signatures of lifestyles

    • e.g. thermophily:DNA gyrase is the only gene specific to all hyperthermophiles (bacterial and archaeal)

    • see COGs

  • Regulators and signals


Example bior gene black arrow candidate site red dot
Example in many different organisms: bioRgene: black arrow;candidate site: red dot


Comparative analysis of regulation
Comparative analysis of regulation in many different organisms

  • Phylogenetic footprinting: regulatory sites are more conserved than non-coding regions in general and are often seen as conserved islands in alignments of gene upstream regions

  • Consistency filtering: regulons (sets of co-regulated genes) are conserved =>

    • true sites occur upstream of orthologous genes

    • false sites are scattered at random


Enzymes
Enzymes in many different organisms

  • Identification of a gap in a pathway (universal, taxon-specific, or in individual genomes)

  • Search for candidates assigned to the pathway by co-localization and co-regulation (in many genomes)

  • Prediction of generalbiochemical function from (distant) similarity and functional patterns

  • Tentative filling of the gap

  • Verification by analysis of phylogenetic patterns:

    • Absence in genomes without this pathway

    • Complementary distribution with known enzymes for the same function


Transporters
Transporters in many different organisms

  • Identification of candidates assigned to the pathway by co-localization and co-regulation (in many genomes)

  • Prediction of generalfunction by analysis of transmembrane segments and similarity

  • Prediction of specificity by analysis of phylogenetic patterns:

    • End product if present in genomes lacking this pathway (substituting the biosynthetic pathway for an essential compound)

    • Input metabolite if absent in genomes without the pathway (catabolic, also precursors in biosynthetic pathways)

    • Entry point in the middle if substituting an upper or side part of the pathway in some genomes


5 utr regions of riboflavin genes from bacteria
5 in many different organisms’ UTR regionsof riboflavin genes from bacteria


Conserved secondary structure of the rfn element
Conserved secondary structure of the RFN-element in many different organisms

Capitals: invariant (absolutely conserved) positions.

Lower case letters: strongly conserved positions.

Dashes and stars: obligatory and facultative base pairs

Degenerate positions: R = A or G; Y = C or U; K = G or U; B= not A; V = not U. N: any nucleotide. X: any nucleotide or deletion


Rfn the mechanism of regulation
RFN: the mechanism of regulation in many different organisms

  • Transcription attenuation

  • Translation attenuation


Early observation an uncharacterized gene ypaa with an upstream rfn element
Early observation: an uncharacterized gene ( in many different organismsypaA) with an upstream RFN element


Phylogenetic tree of rfn elements regulation of riboflavin biosynthesis
Phylogenetic tree of in many different organismsRFN-elements (regulation of riboflavin biosynthesis)

no riboflavin biosynthesis

duplications

no riboflavin biosynthesis


Ypaa riboflavin vitamin b2 transporter in gram positive bacteria
YpaA: riboflavin (vitamin B2) transporter in Gram-positive bacteria

  • 5 predicted transmembrane segments => a transporter

  • Upstream RFN element (likely co-regulation with riboflavin genes) => transport of riboflaving or a precursor

  • S. pyogenes, E. faecalis, Listeria sp.: ypaA, no riboflavin pathway => transport of riboflavin

    Prediction: YpaA is riboflavin transporter (Gelfand et al., 1999)

    Validation:

  • YpaA transports flavines (riboflavin, FMN, FAD) (by genetic analysis, Kreneva et al., 2000)

  • ypaA is regulated by riboflavin (by microarray expression study, Lee et al., 2001)

  • … via attenuation of transcription (and to some extent inhibition of translaition) (Winkler et al., 2003)


A new family of nickel cobalt transporters
A new family of nickel/cobalt transporters bacteria

  • No experimental data

  • No structural data

  • Specificity predicted by comparative genomics

  • … and then validated in experiment

  • Mutational analysis under way



Identification of the candidate regulator by the analysis of phyletic patterns
Identification of the candidate regulator by the analysis of phyletic patterns

  • COG1327: the only COG with exactly the same phylogenetic pattern as the signal

    • “large scale” on the level of major taxa

    • “small scale” within major taxa:

      • absent in small parasites among alpha- and gamma-proteobacteria

      • absent in Desulfovibrio spp. among delta-proteobacteria

      • absent in Nostoc sp. among cyanobacteria

      • absent in Oenococcus and Leuconostoc among Firmicutes

      • present only in Treponema denticola among four spirochetes


COG1327 “ phyletic patternsPredicted transcriptional regulator, consists of a Zn-ribbon and ATP-cone domains”: regulator of the riboflavin pathway?


Additional evidence
Additional evidence phyletic patterns

  • sometimes clustered with nrd genes or with replication genes dnaB, dnaI, polA

  • candidate signals upstream of other replication-related genes

    • dNTP salvage

    • topoisomerase I, replication initiator dnaA, chromosome partitioning, DNA helicase II

  • experimental confirmation in Streptomyces (Borovok et al., 2004)


  • Multiple sites nrd genes fnr dnaa nrdr
    Multiple sites ( phyletic patternsnrd genes): FNR, DnaA, NrdR


    Mode of regulation
    Mode of regulation phyletic patterns

    • Repressor (overlaps with promoters)

    • Co-operative binding:

      • most sites occur in tandem (> 90% cases)

      • the distance between the copies (centers of palindromes) equals an integer number of DNA turns:

        • mainly (94%) 30-33 bp, in 84% 31-32 bp – 3 turns

        • 21 bp (2 turns) in Vibrio spp.

        • 41-42 bp (4 turns) in some Firmicutes


    Combined regulatory network for iron homeostasis genes in phyletic patternsa-proteobacteria.

    Irr

    Irr

    RirA

    RirA

    FeS

    heme

    degraded

    2+

    3+

    S

    i

    d

    e

    r

    o

    p

    h

    o

    r

    e

    F

    e

    /

    F

    e

    I

    r

    o

    n

    -

    r

    e

    q

    u

    i

    r

    i

    n

    g

    I

    r

    o

    n

    s

    t

    o

    r

    a

    g

    e

    F

    e

    S

    H

    e

    m

    e

    T

    r

    a

    n

    s

    c

    r

    i

    p

    t

    i

    o

    n

    u

    p

    t

    a

    k

    e

    u

    p

    t

    a

    k

    e

    e

    n

    z

    y

    m

    e

    s

    f

    e

    r

    r

    i

    t

    i

    n

    s

    s

    y

    n

    t

    h

    e

    s

    i

    s

    s

    y

    n

    t

    h

    e

    s

    i

    s

    f

    a

    c

    t

    o

    r

    s

    I

    r

    o

    n

    u

    p

    t

    a

    k

    [

    i

    r

    o

    n

    c

    o

    f

    a

    c

    t

    o

    r

    ]

    e

    s

    y

    s

    t

    e

    m

    s

    IscR

    Fur

    Fur

    Fe

    [+Fe]

    [+Fe]

    [- Fe]

    [ Fe]

    -

    FeS status

    of cell

    FeS

    [- Fe]

    [+Fe]

    The connecting line denote regulatory interactions, which the thickness reflecting the frequency of the interaction in the

    analyzed genomes. The suggested negative or positive mode of operation is shown by dead-end and arrow-end of the line.


    Distribution of phyletic patterns

    Irr,

    Fur/Mur,

    MntR,

    RirA, and

    IscR regulons

    in α-proteobacteria

    Fe and Mn regulons

    MUR

    /

    Irr

    Group

    RirA

    IscR

    Organism

    Abb.

    MntR

    F

    UR

    -

    -

    SM

    +

    +

    +

    Sinorhizobium meliloti

    -

    -

    +

    +

    +

    +

    Rhizobium leguminosarum

    RL

    Rhizobiaceae

    -

    -

    +

    +

    +

    Rhizobium etli

    RHE

    -

    -

    +

    +

    +

    Agrobacterium tumefaciens

    AGR

    A.

    -

    -

    +

    +

    +

    ML

    Mesorhizobium loti

    -

    -

    +

    +

    +

    +

    Mesorhizobium

    sp.

    BNC1

    MBNC

    -

    -

    +

    +

    +

    Brucella melitensis

    BME

    Rhizobiales

    -

    -

    +

    +

    +

    BQ

    Bartonella

    quintana

    and

    spp.

    -

    -

    -

    +

    +

    +

    Bradyrhizobium japonicum

    BJ

    -

    -

    -

    +

    +

    +

    RPA

    Rhodopseudomonas palustris

    B.

    -

    -

    -

    +

    +

    Nham

    Nitrobacter hamburgensis

    Bradyrhizobiaceae

    -

    -

    -

    +

    +

    Nitrobacter winogradskyi

    Nwi

    -

    RC

    +

    +

    +

    +

    Rhodobacter capsulatus

    -

    +

    +

    +

    +

    Rhodobacter sphaeroides

    Rsph

    -

    STM

    +

    +

    +

    +

    Silicibacter

    sp. TM1040

    -

    +

    +

    +

    +

    S

    PO

    Silicibacter pomeroyi

    -

    +

    +

    #?

    +

    Jannaschia

    sp.CC51

    Jann

    Rhodo-

    -

    bacteraceae

    HTCC2654

    +

    +

    +

    +

    Rhodobacterales bacterium

    RB2654

    C.

    -

    +

    +

    +

    +

    Roseobacter

    sp. MED193

    MED193

    -

    #?

    ISM

    +

    +

    +

    Roseovarius nubinhibens

    ISM

    Rhodo-

    -

    -

    bacterales

    sp.217

    +

    +

    +

    +

    Roseovarius

    ROS217

    p

    -

    +

    +

    #?

    +

    r

    Loktanella vestfoldensis

    SKA53

    SKA53

    o

    -

    t

    EE-36

    +

    +

    +

    Sulfitobacter sp.

    EE36

    #?

    e

    o

    -

    #?

    HTCC2597

    +

    +

    +

    Oceanicola batsensis

    OB2597

    b

    Hyphomonadaceae

    a

    -

    -

    -

    HTCC2633

    +

    +

    Oceanicaulis alexandrii

    OA2633

    c

    t

    Caulobacterales

    e

    -

    -

    -

    CC

    +

    +

    Caulobacter crescentu

    s

    r

    i

    Parvularculales

    a

    -

    -

    -

    +

    +

    Parvularcula bermudensis

    HTCC2503

    PB2503

    -

    -

    -

    +

    +

    Erythrobacter litoralis

    ELI

    -

    -

    -

    +

    +

    Saro

    Novosphingobium aromaticivorans

    Sphingomo-

    -

    -

    -

    +

    +

    nadales

    Sphinopyxis alaskensis

    g

    RB2256

    Sala

    D.

    -

    -

    -

    +

    +

    Zymomonas mobilis

    ZM

    Rhodo-

    -

    -

    +

    +

    +

    Gluconobacter oxydans

    GOX

    spirillales

    -

    -

    -

    +

    +

    +

    Rrub

    Rhodospirillum rubrum

    -

    -

    -

    +

    +

    +

    Amb

    Magnetospirillum magneticum

    SAR11 cluster

    -

    -

    +

    +

    HTCC1002

    +

    Pelagibacter ubique

    PU1002

    Rickettsiales

    -

    -

    -

    -

    +

    Rickettsia

    Ehrlichia

    and

    species

    #?' in RirA column denotes

    the absence of the rirA gene

    in an unfinished genomic sequence

    and the presence of candidate

    RirA-binding sites upstream of

    the iron uptake genes.


    Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - I

    Escherichia coli

    : P0A9A9

    sp|

    ECOLI

    Fur

    Pseudomonas aeruginosa

    : sp|Q03456

    PSEAE

    Neisseria meningitidis

    : sp|P0A0S7

    NEIMA

    HELPY

    : sp|O25671

    Helicobacter pylori

    BACSU

    Bacillus subtilis

    : P54574

    sp|

    SM mur

    Sinorhizobium meliloti

    MBNC03003179

    Mesorhizobium

    sp. BNC1

    (I)

    BQ fur2

    Bartonella quintana

    BMEI0375

    Brucella melitensis

    EE36 12413

    sp. EE-36

    Sulfitobacter

    a

    MBNC03003593

    sp. BNC1

    (II)

    Mesorhizobium

    RB2654 19538

    HTCC2654

    Rhodobacterales bacterium

    AGR C 620

    Agrobacterium tumefaciens

    RHE_CH00378

    Rhizobium

    etli

    RL mur

    Rhizobium leguminosarum

    Nham 0990

    Mur

    Nitrobacter hamburgensis

    X14

    Nwi 0013

    Nitrobacter winogradskyi

    RPA0450

    Rhodopseudomonas palustris

    BJ fur

    Bradyrhizobium japonicum

    ROS217 18337

    Roseovarius

    sp.217

    Jann 1799

    Jannaschia

    sp. CC51

    SPO2477

    Silicibacter pomeroyi

    STM1w01000993

    Silicibacter

    sp. TM1040

    MED193 22541

    sp. MED193

    Roseobacter

    OB2597 02997

    HTCC2597

    Oceanicola batsensis

    SKA53 03101

    Loktanella vestfoldensis

    SKA53

    Rsph03000505

    Rhodobacter sphaeroides

    ISM 15430

    Roseovarius nubinhibens

    ISM

    PU1002 04436

    Pelagibacter ubique

    HTCC1002

    GOX0771

    Gluconobacter oxydans

    ZM01411

    Zmomonas mobilis

    y

    Saro02001148

    Novosphingobium aromaticivorans

    a

    Sala 1452

    RB2256

    Sphinopyxis alaskensis

    Fur

    ELI1325

    Erythrobacter litoralis

    OA2633 10204

    Oceanicaulis alexandrii

    HTCC2633

    PB2503 04877

    Parvularcula bermudensis

    HTCC2503

    CC0057

    Caulobacter crescentus

    Rrub02001143

    Rhodospirillum rubrum

    Amb1009

    (I)

    Magnetospirillum magneticum

    a

    Amb4460

    Magnetospirillum magneticum

    (II)

    Irr

    Fur in g- and b- proteobacteria

    Fur in e- proteobacteria

    Fur in Firmicutes

    in a-proteobacteria

    Regulator of manganese

    uptake genes (sit, mntH)

    in a-proteobacteria

    Regulator of iron uptake

    and metabolism genes

    a-proteobacteria


    Erythrobacter litoralis in

    Caulobacter crescentus

    Novosphingobium aromaticivorans

    Zymomonas mobilis

    Sequence logos for

    the identified

    Fur-binding sites

    in the D group of

    a-proteobacteria

    Oceanicaulis alexandrii

    Sphinopyxis alaskensis

    Rhodospirillum rubrum

    Gluconobacter oxydans

    Parvularcula bermudensis -

    Magnetospirillum magneticum

    Identified Mur-binding sites

    Bacillus subtilis

    The A, B, and C groups

    Sequence logos for

    the known

    Fur-binding sites

    in Escherichia coli

    and Bacillus subtilis

    Mur

    a

    of - proteobacteria -

    Escherichia coli


    Phylogenetic tree of the Fur family of transcription factors in a-proteobacteria - II

    Escherichia coli

    ECOLI

    : P0A9A9

    sp|

    Fur

    Pseudomonas aeruginosa

    : sp|Q03456

    PSEAE

    Neisseria meningitidis

    : sp|P0A0S7

    NEIMA

    HELPY

    Helicobacter pylori

    : sp|O25671

    BACSU

    Bacillus subtilis

    : P54574

    sp|

    a

    Mur /

    Fur

    AGR C 249

    Agrobacterium tumefaciens

    SM irr

    Sinorhizobium meliloti

    RHE CH00106

    Rhizobium

    etli

    RL irr1

    Rhizobium leguminosarum

    (I)

    RL irr2

    Rhizobium leguminosarum

    (II)

    MLr5570

    Mesorhizobium

    loti

    MBNC03003186

    sp. BNC1

    Mesorhizobium

    BQ fur1

    Bartonella quintana

    BMEI1955

    Brucella melitensis

    (I)

    BMEI1563

    Brucella melitensis

    (II)

    BJ blr1216

    (II)

    Bradyrhizobium japonicum

    RB2654 182

    Rhodobacterales bacterium

    HTCC2654

    SKA53 01126

    Loktanella vestfoldensis

    SKA53

    ROS217 15500

    Roseovarius

    sp.217

    ISM 00785

    ISM

    Roseovarius nubinhibens

    OB2597 14726

    Oceanicola batsensis

    HTCC2597

    Jann 1652

    sp. CC51

    Jannaschia

    a

    I

    r

    r

    -

    Rsph03001693

    Rhodobacter sphaeroides

    EE36 03493

    Sulfitobacter

    sp. EE-36

    STM1w01001534

    sp. TM1040

    Silicibacter

    MED193 17849

    Roseobacter

    sp. MED193

    SPOA0445

    Silicibacter pomeroyi

    RC irr

    Rhodobacter

    capsulatus

    RPA2339

    (I)

    Rhodopseudomonas palustris

    RPA0424*

    Rhodopseudomonas palustris

    (II)

    BJ irr*

    (I)

    Bradyrhizobium japonicum

    Nwi 0035*

    Nitrobacter winogradskyi

    Nham 1013*

    Nitrobacter hamburgensis

    X14

    PU1002 04361

    Pelagibacter ubique

    HTCC1002

    Fur in g- and b- proteobacteria

    Fur in e- proteobacteria

    Fur in Firmicutes

    a-proteobacteria

    Irrin a-proteo-

    bacteria

    regulator of iron

    homeostasis


    Sequence logos for the identified Irr binding sites in in a-proteobacteria.

    The A group

    (8 species) - Irr

    The B group

    (4 species) - Irr

    The C group (12 species) - Irr


    Phylogenetic tree of the Rrf2 family of transcription factors in a-proteobacteria

    SPOA0186

    ROS217_15206

    Rsph03001477

    R

    Ricket.

    r

    s

    RC NsrR

    N

    _

    Sala_1049

    GOX0860

    CC0132

    C

    Saro02000305

    Amb1318

    E

    Nwi_0743

    SMc01160

    NE NsrR

    OB2597_05195

    BJ blr7974

    M

    B

    N

    C

    0

    3

    0

    ROS217_02155

    0

    4

    4

    8

    7

    RL_5159

    AGR_L_2343

    SMc00785

    ROS217_14291

    RHE CH00735

    AGR_C_402

    AGR_C_344

    AGR_L_1131

    OA2633_11510

    RL RirA

    SPO3722

    BMEII0707

    RHE_CH02777

    MLr1147

    RL_3336

    MBNC02002196

    SPO1393

    R

    h

    i

    BQ04990

    z

    o

    NsrR

    MBNC02000669

    b

    i

    a

    l

    e

    RC 0780

    MLl1642

    s

    RirA

    SMc02238

    RB2654_19993

    s

    e

    AGR_C_872

    l

    Rsph023178

    a

    r

    e

    t

    c

    a

    RHE_CH00547

    b

    SPO0432

    o

    d

    o

    h

    RL_619

    R

    MED193_09800

    STM_634

    ZMO0116

    ROS217_16231

    GOX0099

    B

    S

    C

    y

    m

    R

    IscR-II

    4

    1

    0

    5

    Rrub02000219

    m

    b

    A

    6

    7

    0

    1

    7

    0

    2

    0

    u

    b

    r

    R

    ZMO0422

    7

    4

    1

    6

    r

    M

    L

    Sala_1236

    5

    1

    6

    L

    4

    l

    M

    ELI0458

    9

    1

    6

    L

    _

    3

    R

    4

    3

    3

    6

    Saro3534

    H

    0

    E

    _

    C

    R

    H

    IscR

    6

    7

    0

    2

    2

    DV Rrf2

    M

    c

    S

    6

    1

    3

    1

    5

    0

    L

    _

    2

    5

    R

    9

    4

    1

    2

    2

    0

    9

    9

    M

    b

    2

    S

    7

    5

    6

    6

    3

    3

    3

    2

    1

    1

    0

    1

    1

    2

    2

    8

    0

    C

    H

    L

    _

    E

    _

    G

    R

    _

    R

    H

    A

    0

    0

    _

    OA2633_03246

    C

    _

    0

    a

    C

    l

    3

    2

    CC1866

    a

    9

    9

    1

    4

    0

    0

    _

    C

    _

    G

    R

    A

    S

    5

    o

    2

    r

    5

    9

    2

    8

    2

    0

    0

    a

    u

    b

    0

    R

    r

    B

    R

    A

    S

    r

    EC IscR

    P

    u

    m

    b

    Rrub_1115

    b

    0

    Jann_2366

    3

    2

    6

    3

    7

    5

    h

    0

    2

    s

    p

    Amb0200

    0

    R

    0

    0

    3

    STM_3629

    2

    0

    1

    0

    3

    5

    GOX1196

    C

    _

    0

    R

    4

    EE36_14302

    Ricket.

    0

    RPA0663

    SPO2025

    M

    E

    P

    B

    2

    5

    0

    3

    _

    0

    9

    8

    8

    4

    D

    I

    1

    Rsph023725

    S

    9

    3

    M

    _

    O

    _

    0

    1

    4

    3

    RC_0477

    B

    6

    3

    9

    R

    0

    2

    8

    2

    1

    1

    1

    0

    5

    5

    O

    5

    9

    0

    7

    0

    S

    4

    _

    _

    0

    2

    3

    0

    5

    3

    1

    5

    A

    4

    7

    8

    K

    5

    9

    _

    S

    6

    2

    2

    0

    B

    5

    4

    R

    2

    Nitrite/NO-sensing regulator NsrR

    (Nitrosomonas europeae, Escherichia coli)

    Positional clustering of rrf2-like genes with:

    iron uptake and storage genes;

    Fe-S cluster synthesis operons;

    genes involved in nitrosative stress protection;

    sulfate uptake/assimilation genes;

    thioredoxin reductase;

    carboxymuconolactone

    decarboxylase-family genes;

    hmc cytochrome operon

    Iron repressor RirA

    (Rhizobium leguminosarum)

    Cysteine metabolism

    repressor CymR

    (Bacillus subtilis)

    Cytochrome complex

    regulator Rrf2

    (Desulfovibrio vulgaris)

    Iron-Sulfur cluster

    synthesis repressor IscR

    (Escherichia coli)

    proteins with the conserved C-X(6-9)-C(4-6)-C motif within effector-responsive domain

    proteins without a cysteine triad motif


    Sequence logos for the identified RirA-binding sites in factors in a-proteobacteria

    The A group - RirA

    (8 species)

    The C group - RirA

    (12 species)


    Distribution of the conserved members of the Fe- and Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in a-proteobacteria

    Genes Functions:

    Iron uptake

    Iron storage

    FeS synthesis

    Iron usage

    Heme biosynthesis

    Regulatory genes

    Manganese uptake


    An attempt to reconstruct the history
    An attempt to reconstruct the history Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in


    Acknowledgements
    Acknowledgements Mn-responsive regulons and the predicted RirA, Fur/Mur, Irr, and DtxR binding sites in

    • Dmitry Rodionov (comparative genomics)

    • Andrei Mironov (software)

    • Alexei Vitreschak (riboswitches)

    • Slides:

      • Michael Galperin (NCBI, Bethesda)

      • Andrei Osterman (Burnham Institute, San-Diego)

    • Collaboration:

      • Thomas Eitinger (Humboldt University, Berlin) – Co/Ni transporters

      • Andy Johnston (University of East Anglia) – Fe in alphas

    • Funding:

      • Howard Hughes Medical Institute

      • Russian Fund of Basic Research

      • RAS, program “Molecular and Cellular Biology”

      • INTAS


    ad