GENOME  SIGNATURES  OF  MICROBIAL ORGANISMS     IDENTIFIED  BY  AMINO ACID  N-GRAM  ANALYSIS
Download
1 / 21

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: - PowerPoint PPT Presentation


  • 145 Views
  • Uploaded on

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: Judith Klein-Seetharaman Forschungszentrum, Juelich, Germany. Genome Signatures.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: ' - tanek


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS

B. Suman Bharathi

Advisor: Judith Klein-Seetharaman

Forschungszentrum, Juelich, Germany


Slide2 l.jpg

Genome Signatures BY AMINO ACID N-GRAM ANALYSIS

  • Sequence peptides which occur with unusually high frequency unlike others in particular organism or pathogen

  • Potential applications:

    • Drug development: synthetize drugs which target genome signature in pathogen

    • Sensor development: use genome signature to identify organism quickly using antibody


Slide3 l.jpg

Approach BY AMINO ACID N-GRAM ANALYSIS

  • Linguistic approach

  • N-gram analysis using toolkit

  • What the BLMT toolkit provides

  • N-gram statistical analysis

  • Definition of signature sequences

  • Use of toolkit on Neisseria Meningitidis

0.09

Neisseria meningitidis

versus other species

n=4

0.08

0.07

0.06

0.05

Occurrence of n-gram (%)

0.04

0.03

0.02

0.01

0

SDGI

LAAL

AALL

LLAA

ALLA

AAAL

LAAA

ALAA

AALA

AVLA

AAAA

AVAA

AAAV

EAAA

AEAA

AAEA

AAVA

AAAE

GRLK

MPSE

n-gram = sequence of length n


Slide4 l.jpg

Use of BLMT BY AMINO ACID N-GRAM ANALYSIS

  • N-gram statistical analysis gives us a detailed statistical data in terms of frequency of n-grams and their respective mean and standard deviations.

  • We have taken 45 organisms into consideration –bacteria, archaea, mycoplasmas and human

  • Search for n-grams whose standard deviations are away from the mean values.

  • Indicates the difference between expected and observed values in frequency of the n-grams.

  • Eventually helps us to see the unsusuality of this n-gram in the organism unlike the others compared.


Difference between expected and observed frequencies l.jpg

Xylella(black) BY AMINO ACID N-GRAM ANALYSIS

Vibrio(red)

Ureaplasma(green)

Treponema(blue)

Thermotoga(yellow)

Difference Between Expected and Observed frequencies

n-gram

The positive values indicate the over-represented n-grams while

the negative values indicate the under-represented n-grams


Initial points of difference between expected and observed frequency graph l.jpg
Initial Points of difference between expected and observed frequency graph

Xylella(black)

Vibrio(red)

Ureaplasma(green)

Treponema(blue)

Thermotoga(yellow)

Ureapasma shows high difference values (approx 0.00021), indicating

over-representation of n-grams compared

to expected probability of occurence in the organism


Standard deviation away from the mean l.jpg

Mycoplasma genitalium(black) frequency graph

M.tuberculosis(red)

M.leprae(green)

Mesorhizobium(blue)

Lactococcus(yellow)

Standard deviation away from the mean

  • Mycoplasma genitalium(black)

  • M.tuberculosis(red)

  • M.leprae(green)

  • Mesorhizobium(blue)

  • Lactococcus(yellow)

Shows distribution of n-gram standard deviations with

both high and low values of difference, indicating the

over-expressed and under-expressed n-gram values.


Highest standard deviations away from the mean l.jpg
Highest standard deviations away from the mean frequency graph

  • Mycoplasma genitalium(black)

  • M.tuberculosis(red)

  • M.leprae(green)

  • Mesorhizobium(blue)

  • Lactococcus(yellow)

Shows initial (highest) values of standard deviation away from mean

N-grams of M.tuberculosis much higher than M.leprae.


Slide9 l.jpg

Comparison of genome size with varying standard deviations frequency graph

  • Examine the relationship between genome size and distribution of n-gram standard deviations for each organism

  • Human genome taken as reference.

  • Compare genome size and standard deviations within same genus but across different species.


Size distribution of genomes l.jpg
Size Distribution of Genomes frequency graph

1.Human 22889476

2.Bacteria_Mesorhizobium_loti 4080256

3.Bacteria_Pseudomonas_aeruginosaPA01 3730192

4.baceria E_coi0157H7Baceria_Escherichia_coiO157H7 3229098

5.Bacteria_Escherichia_coliO157H7EDL933 3228100

6.Bacteria_Escherichia_coliK12 2726558

7.Bacteria_Mycobacterium_tuberculosisH37Rv 2666338

8.Bacteria_Bacillus_subtilis 2442200

9.Bacteria_Bacillus_halodurans_C125 2384352

10.Bacteria_SynechocystisPCC6803 2072748

11.Bacteria_Vibrio_cholerae_chr1 1725852

12.Bacteria_Deinococcus_radioduransR1_chr1 1559376

13.Bacteria_Xylella_fastidiosa 1490262

14.Archaea_Archaeoglobus_fulgidus 1343990

15.Bacteria_Pasteurella_multocida 1340102

16.Bacteria_Lactococcus_lactis_subsp_lactis 1335222

17.Archaea_Aeropyrum_pernix 1280062

18.B_Neisseria_meningitidis_serogroupBstrainMC58 1178096

19.Archaea_Halobacterium_spNRC1 1178038

20.B_Neisseria_meningitidis_serogroupAstrainZ2491 1176104

21.Bacteria_thermotoga_maritima 1167344

22.Bacteria_Pyrococcus_horikoshiiOT3 1141216

23.Bacteria_Mycobacterium_leprae_strinTN 1080756

24.A_Methanobacterium_thermoautotrophicum_deltaH 1054752

25.Bacteria_Haemophilus_influenzaeRd 1045572

26.Bacteria_Campylobacter_jejuni 1020944

27.Bacteria_Helicobacter_pylori_strianJ99 990942

28.Bacteria_Helicobacter_pylori26695 986258

29.Archaea_Methanococcus_jannaschii 970558

30.Bacteriae_Aquifex_aeolicus 968068

31.Archaea_Thermoplasma_acidophilum 909164

32.Archaea_thermoplasma_volcanium 903228

33.Bacteria_Chlamydophila_pneumonieaeJ138 735350

34.Bacteria_Chlamydophila_pneumonieaCWL029 725492

35.Bacteria_Chlamydophila_pneumonieaeAR39 729896

36.Bacteria_Treponema_pallidum 703414

37.Bacteria_Chlamydia_muridarum 646712

38.Bacteria_Chlamydia_trachomatis 626142

39.Bacteria_Rickettsia_prowazekii_strain_MadridE 559828

40.Bacteria_Mycoplasma_pneumoniae 480870

41.Bacteria_Ureaplasma_urealyticum 457608

42.Bacteria_Buchnera_sp_APS 371470

43.mycoplasma genitalium 352826

44.Bacteria_Borrelia_burgdorferi 300106


Size genome graph and varying std deviation values l.jpg
Size genome graph and varying std deviation values frequency graph

  • Human(black22889476)

  • Mesorhizobium(red,4080256)

  • P.aeruginosa(green,3730192)

  • E_coi0157h7(blue,3229098)

  • E_coli0157h7EDl933

  • (yellow,3228100)

The organisms are listed in descending order of genome size.

The relation between distribution of n-gram standard deviations

and size is compared.


Tail end of genome size and n gram distribution of standard deviations l.jpg
Tail end of Genome size and n-gram distribution of standard deviations

Human(black,22889476)

Mesorhizobium(red,4080256)

P.aeruginosa(green,3730192)

E_coi0157h7(blue,3229098)

E_coli0157h7EDl933

(yellow,3228100)

Human genome, though largest in size, has low values

of n-gram standard deviation values away from the mean

compared to smaller genomes


Initial points genome size and n gram distribution of standard deviations l.jpg
Initial points: Genome size and n-gram distribution of standard deviations

Human(black,22889476)

Mesorhizobium(red,4080256)

P.aeruginosa(green,3730192)

E_coi0157h7(blue,3229098)

E_coli0157h7EDl933 (yellow,3228100)

Human n-gram std deviation values are almost equal to Mesorhizobium

though Mesorhizobium has much smaller genome.


Genome size and n gram distribution of standard deviations l.jpg
Genome size and n-gram distribution of standard deviations standard deviations

  • Human (black,22889476)

  • E_coliK12(red,2726558)

  • M.tuberculosis(green,2666338)

  • B.subtilis(blue,2442200)

  • B.halodurans(yellow,2384352)

  • Synechocystis(brown,2072748)

M.tuberculosis has very high n-gram standard deviation values.

It exceeds the values of human, despite its smaller genome size.


Initial points of genome size and n gram distribution of standard deviations l.jpg
Initial points of Genome size and n-gram distribution of standard deviations

Human (black,22889476)

E_coliK12(red,2726558)

M.tuberculosis(green,2666338)

B.subtilis(blue,2442200)

B.halodurans(yellow,2384352)

Synechocystis(brown,2072748)

The thickness of lines indicates the genome size.

The thinnest line represents E_coliK12.

Mycobacterium tuberculosis shows highest values.


Final points of genome size and n gram distribution of standard deviations l.jpg
Final points of Genome size and n-gram distribution of standard deviations

Human (black,22889476)

E_coliK12(red,2726558)

M.tuberculosis(green,2666338)

B.subtilis(blue,2442200)

B.halodurans(yellow,2384352)

Synechocystis(brown,2072748)

M.tuberculosis and all other organisms here

have n-grams with higher difference values than human.


Slide17 l.jpg

Same genus / different species standard deviations

  • 4-grams in M. tuberculosis have much higher 4-gram standard deviations from mean than M. leprae


Mycobacterium l.jpg
Mycobacterium standard deviations

M. tuberculosis

M. leprae


Other organisms l.jpg

Neisseria meningitidis standard deviations

Thermotoga maritima

Synechocystis spec.

Haemophilus influenza

Human

Other Organisms


Slide20 l.jpg

Conclusions standard deviations

  • n-grams which are at least 30 standard deviations away from the mean are significant candidates for genome signatures.

  • Difference graphs: estimate the likelihood of n-gram observed in an organism.

  • Genome size graphs : there is no specific relationship between the size of genome and its standard deviation values.

  • Same genus and different species, where genome size is specified: There is a noticeable difference observed between Mycobacterium species (M.leprae and M.tuberculosis).


Slide21 l.jpg

Current and future work standard deviations

  • Find n-gram signatures n-grams in E.coli.

  • Explore the relationship between genome size and distribution of n-gram standard deviations different species of the same organism.

  • Find more specific targets to differentiate species in terms of signature peptides for all the 44 organisms taken for study.


ad