genomes of bacterial pathogens and their diversity n.
Skip this Video
Loading SlideShow in 5 Seconds..
Genomes of bacterial pathogens and their diversity PowerPoint Presentation
Download Presentation
Genomes of bacterial pathogens and their diversity

play fullscreen
1 / 83
Download Presentation

Genomes of bacterial pathogens and their diversity - PowerPoint PPT Presentation

libba
168 Views
Download Presentation

Genomes of bacterial pathogens and their diversity

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Genomes of bacterial pathogens and their diversity Philippe Glaser - pglaser@pasteur.fr

  2. Introduction: general concepts on pathogenic bacteria and their genomes • How to sequence a bacterial genome • Two examples: the genus Listeria and Streptococcus agalactiae

  3. Examples of bacterial species and diseases Tuberculosis Mycobacterium tuberculosis Leper Mycobacterium leprae Cholera Vibrio cholera Whooping cough (coqueluche)Bordetella pertussis Soar throat Streptococcus pyogenes and viruses Meningitis Neisseria meningitidis and other bacteria Gonococci Neisseria gonorrhoeae Plague (la peste) Yersinia pestis DysenteryShigella flexneri Gastric cancer, ulcer, gastritis Helicobacter pylori Multiple diseases Escherichia coli, Staphylococcus aureus ….

  4. S h i g e l la 2 C h l a my di ae 4 +1 E sc h er i c hi a c ol i 4 +2 Ne i s s er i a 2 S a l m o ne ll a 3 B r a n h ame lla 0 2 He li c o b a c t e r 3 B o r de t e lla 3 P se u d o mo n a s 1 +2 P a s t e ur e l la 1 Y ers ini a 3 A c ti n o b a c ill us 0 4 S t e n o t r o p h o mo n a s 0 1 H a em o p h i lu s 2 B u r kh ol de r i a 0 1 2 B a r t o n e l la 3 Fl av o b a c t e r i u m 0 1 L e g i o n e l la 3 A c i n et o Published genome sequence of bacterial pathogens b ac t e r 0 3 L e p t o s p i r a 2 Vi br io 4 B o r re li a 1 C a mpy lo b a c t er 1 T re p o n em a 2 S t a p h y l o c o c cu s 4 M y c o b a c t er i u m 5 E n t e r oc o cc u s 1 Ri ck e t t s ia 3 S t r ep to co c c u s 9 A n a p la sm a 0 2 Li s t er i a 4 +1 C o x i e ll a 1 N o ca r d i a 0 2 E h r l i c hi a 0 1 C o ry n eb a c t er iu m 1 +3 C l o s t r i di um 2 +1 Total: > 80 published genomes M y c o p l a s ma 6

  5. Biodiversity of the microbial world • 4 000 000 000 000 000 000 000 000 000 000 bacteria on hearth • 3,5 billion years of evolution • 5000 culturable species - 500 000 (?) species

  6. Bacterial diversity in a Yellowstone hot spring Principle of the experiment: PCR amplification of 16 S RNA Sample Cloning 300 clones 84 sequences 14 phyla First analysis by restriction DNA Sequencing of 122 clones 54 bacterial groups 38 sequences 12 new phyla (Hugenholtz et al., J. Bacteriol 1998, 180 366-376)

  7. Diversity of the non-culturable bacterial world

  8. How to define a bacterial species • For eukaryotes the species definition is based on sexual reproduction. • Not possible for bacteria 1. Phenotypic definition 2. Molecular definition: 70% of “similarity” by genomic DNA hybridization More than 97% of identities between the 16S RNA genes =>A convenient definition but not fully satisfactory

  9. Interactions between humans (the host) and bacteria • The human body constitutes multiple ecosystems for bacterial communities: • The digestive tract • The throat • The skin • Other places are normally sterile (urine, milk, blood) • Symbiotic bacteria • Commensal bacteria • Pathogenic bacteria • Opportunistic pathogens and obligatory pathogens

  10. Bacteria and their environments Reservoir Animals Water Soil Food … Human host Vectors The ecology of the pathogenic bacteria or understanding its adaptation to these environments (growth conditions)

  11. Some questions in the study of human bacterial pathogens • What are the virulence factors and the host - pathogens interaction factors? • What is the physiology (the metabolism) of the bacteria in interaction with the host? • What is the evolution of the bacteria which lead to its adaptation to its host, and the relation with the non-pathogenic related species? • The identification of diagnostic and typing molecular tools • The identification on a rational basis of antigens for a-cellular vaccines • The identification of drug targets • How to use genomics (and post-genomics) to solve these questions

  12. Evolution & Biodiversity Genome variability Point mutation Genome rearrangement Gene duplication Horizontal gene transfer DNA repair Barriers to DNA transfer Selection Biodiversity => virulence and pathogenicity

  13. Size of bacterial genomes Nanoarchaeum equitans <500 kb Mycoplasma genitalium : 0.580 Mb 481 genes Minimal genome 300-400 genes Escherichia coli 4.6-5.6 4289-5648 genes Mesorhizobium loti 7.036 Mb 6752 genes Streptomycescoelicolor : 8.667 Mb 7825 genes Human 3,000.000 Mb 30000 genes

  14. Adaptation : Transcription regulators - vs genome size(http://www.regx.de/m_project_bioinformatics.php)

  15. Gene transfers in bacteria Bacteriophages Plasmids Transposons Competence Transduction Conjugation Transformation

  16. Mobile elements and gene gain • IS elements => no associated function, gene integration by IS mediated homologous recombination, gene inactivation. • Transposon => carry functional genes • Integron => a platform to incorporate new functions, multi-antibiotics resistance. • Phages => may carry virulence genes (cholera toxin) • Pathogenicity (functional) islands • Plasmids => may also carry transposons or integrons • + gene duplication • Identification of such elements in genome sequences

  17. Gene lost • By homologous recombination • By insertion of IS elements • By mutation : gene => pseudogene • Evolutionary impact • Reductive evolution (M. leprae, Y. pestis, B. pertussis) • Role in virulence: lysine decarboxylase in Shigella (cadA+ derivative are less virulent)

  18. Antigenic variation • By recombination: a gene cassette is inserted in front of an active promoter or remove from this position. (Brucella, Mycoplamsa galisepticum) • By mutation: variation of a micro satellite sequence length (homo polymer tract) lead to frameshift deletion or reversion (Helicobacter pylori, Neisseria meningitidis)

  19. Protein families and gene duplications • May arise by gene duplication or horizontal gene acquisition • Metabolic functions, surface proteins (antigens) • Correspond to a specificity of a species • Frequently discovered after whole genome sequencing

  20. Analysis of the genome of a bacterial pathogen • Annotation of the genome • Analysis of regulatory genes • Analysis of inactivated genes (pseudogenes) • Identification of protein families and mechanisms of phase variation • Identification of mobile elements • Identification of atypical regions (recently acquired) • Information obtained from comparative genomics

  21. DNA sequencing DNA automated sequencing machines produce 800 bases long sequences with an accuracy of 99 %. => How to sequence a 4 Mb bacterial genome with an accuracy higher than 99.99%? Two strategies : directed or random

  22. Directed strategy Random strategy Chromosome Chromosome Ordering clones of a large-insert library (cosmids, lambda or BAC) Random sequencing of a large number of clones Sequencing clone by clone of the minimum tiling path Sequence assembly Complete sequence Complete sequence

  23. ‘Whole genome shotgun’ Large-insert library (pSYX34 and BAC) Small-insert library (pcDNA2.1) Chromosome End-sequencing (large-insert fragments) End-sequencing (small-insert fragments) Assembly of sequences in contigs closure Annotation Complete Genome sequence

  24. Organization of a project • Choice of the strategy • Library construction DNA preparation of plasmid clones High throughput sequencing of both ends of inserts • Assembly • Finishing: gap closure and resequencing of low quality regions • Annotation

  25. Libraries Libraries of insufficient quality => No sequence Important features : coverage of the chromosome, absence of co-ligation, absence of clones without an insert, size of the inserts. Different types of libraries: * size of the inserts * copy number of the vector High-copy number vector : 1 to 3 kb inserts Low-copy number vector : 8 to 12 kb inserts Bacterial artificial chromosome : 50 to 100 kb inserts

  26. Construction of a 1 - 3 kb long inserts library Chromosomal DNA pcDNA: high copy number vector Two repeated BstXI sites 5’CCAG TGTG ATGG…CCAG CACA CTGG3’ 3’GGTC ACAC TACC…GGTC GTGT GACC5’ Nebulization End repair by T4 polymerase Ligation of BstXI adaptors, Size selection of the inserts Purification of the digested vector (two 5’ protruding ends) 5’pCTTTCCAGCACA3’ 3’GAAAGGTCp 5’ Ligation, transformation CACA TGTG ACAC GTGT Recombinant plasmid

  27. Bacterial artificial chromosome (BAC) Vector based on naturally occurring F-factor plasmid found in E. coli Cloning of DNA fragments of 100- to 300-kb (average, 150 kb) in E. coli • lacZ-based color selection of BAC clones with inserts • strict copy number control • stably maintained at 1-2 • copies per cell

  28. BAC library construction Preparation of chromosomal DNA in agarose plugs Partial digestion with HindIII or BamHI 200 kb 150 kb Ligation vector + DNA purified from agarose plugs 100 kb 50 kb Electroporation into E. coli DH10B Verification of insert size on PFGE gels after NotI digestion 200 kb 150 kb Inserts of 70 - 150 kb 100 kb 50 kb Linearized BAC vector (7kb)

  29. Automation

  30. DNA Sequencing 15 years ago! High throughput sequencing

  31. Automated DNA sequencing

  32. Automated sequencing

  33. Sequence assemblyPhred, Phrap, Consedhttp://www.phrap.org

  34. Statistics and progress of the project

  35. Finishing • Re-sequencing of regions containing low ‘quality’ sequences • Sequencing of ‘missing’ regions Contig B Contig A Sequence gaps Contig D Contig A Cloning gaps Contig B Contig E Contig C Contig F

  36. Timing of a bacterial genome project • Library construction and verification (one month) • Plasmid preparation 5000 minipreps per Mb (7 days) • Sequencing : 10000 sequences per Mb (20 days, ABI 3700) • PCR : highly variable (250 reactions per Mb) • Consumable costs : 10 000 Euro per Mb

  37. Listeria monocytogenes foodborne pathogen Transmission: dairy products, meat, vegetables, fish Disease: meningitis, encephalitis, septicemia, abortions, neonatal infections, gastroenteritis Population at risk: elderly, newborns, immuno-comprimised, pregnant women Mortality rate: 30% Problem for food industry Concern for public health

  38. Ecology of L. monocytogenes • Ability to survive and to grow in extreme conditions: low temperature, low water activity, broad ranges of pH… • Ubiquitous in the environment but at very low count • Variable count depending on the microenvironment and the season at a single location • Interaction with the vegetal world (silage) and the animal world (waste)

  39. Interaction of Listeria with its hosts • Carriage is frequent but transient • Low concentration of Listeria in feces • Intracelullar parasite • Ability to cross three barriers: intestinal, hemato-encephalic and placental barrier • Provokes a broad range of diseases : gastroenteritis, septicemia, meningitis, encephalitis, abortions • At risk population : immuno-compromised, elderly, pregnant women and new-born What are the relations between the two facets of this bacterium?

  40. Phylogenetic tree of the genus Listeria L. ivanovii L. grayi L. seeligeri L. innocua L. welshimeri L. monocytogenes B. subtilis (Pathogenic species) Vaneechoutte et al. Int J Syst Bact. (1998) 48, 127-139

  41. Genome comparison

  42. L. monocytogenes/B. subtilis synteny Bacillus subtilis Listeria monocytogenes

  43. L. innocua L. monocytogenes EGDe Synteny between Listeria genomes L. ivanovii L. monocytogenes EGDe • Absence of rearrangement between genomes • Rare translocations : probably deletion + insertion

  44. L. monocytogenes chromosome map L. monocytogenes 270 ‘specific’ genes L. innocua 149 ‘specific’ genes G+C content http://genolist.pasteur.fr/listilist

  45. G+C content of the 270 CDSs specific for L. monocytogenes 14 Total 12 Specific 10 8 Nb of CDSs (%) 6 4 2 0 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 52 G+C%

  46. Competence operons in L. monocytogenes A B C D E F G comG 37-34 32-21 30-18 31-17 33-18 34-37 32-23 A B C comE 34-44 39-69 34-32 comC 27 - 24 37 : GC% 34 : % identities Bs ortholog C A comF 38-43 35-35 2695.1 and 2014.1 two comEC paralogs (DNA binding protein)

  47. Amino acids 41 surface proteins with an LPXTG motif 2500 * 2000 * * * 1500 * * 1000 * * * * * * * * * * * * * 500 * 0 * = absent from the L. innocua genome InlA-like

  48. L. monocytogenes / L. inocua comparison Known virulence factors missing in L. innocua Surface proteins missing in L. innocua Metabolic pathways missing in L. innocua • Sugar PTS • Hexose phosphate permease • Bile acid hydrolase • Arginine deimidase • Glutamate decarboxylase

  49. L. monocytogenes - L. ivanovii 2944 / 0 Virulence gene cluster L. monocytogenes 345 ‘specific’ genes inlA inlB hpt L. ivanovii 350 ‘specific’ genes bsh inlC

  50. L. ivanovii L. grayi L. seeligeri L. innocua L. welshimeri L. monocytogenes B. subtilis ctc prfA plcA hly mpl actA plcB orfZ orfB orfA ldh orfX prs ctc prs orfZ orfB orfA ldh hly prs prfA plcA orfT ctc orfB orfA ldh orfB ldh prs 5,5 kb The virulence gene cluster prs spoVC mfd gcaD ctc yabK B. subtilis L. monocytogenes L. innocua ctc prfA hly mpl i-actA plcB orfS prs plcA orfT orfB orfA ldh L. ivanovii ctc orfB orfA ldh orfT prs L. welshimeri L. seeligeri L. grayi • Complex history with several events of insertion and deletions