Loading in 5 sec....

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: Judith Klein-SeetharamanPowerPoint Presentation

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: Judith Klein-Seetharaman

- By
**tanek** - Follow User

- 145 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS B. Suman Bharathi Advisor: ' - tanek

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

GENOME SIGNATURES OF MICROBIAL ORGANISMS IDENTIFIED BY AMINO ACID N-GRAM ANALYSIS

B. Suman Bharathi

Advisor: Judith Klein-Seetharaman

Forschungszentrum, Juelich, Germany

Genome Signatures BY AMINO ACID N-GRAM ANALYSIS

- Sequence peptides which occur with unusually high frequency unlike others in particular organism or pathogen
- Potential applications:
- Drug development: synthetize drugs which target genome signature in pathogen
- Sensor development: use genome signature to identify organism quickly using antibody

Approach BY AMINO ACID N-GRAM ANALYSIS

- Linguistic approach
- N-gram analysis using toolkit
- What the BLMT toolkit provides
- N-gram statistical analysis
- Definition of signature sequences
- Use of toolkit on Neisseria Meningitidis

0.09

Neisseria meningitidis

versus other species

n=4

0.08

0.07

0.06

0.05

Occurrence of n-gram (%)

0.04

0.03

0.02

0.01

0

SDGI

LAAL

AALL

LLAA

ALLA

AAAL

LAAA

ALAA

AALA

AVLA

AAAA

AVAA

AAAV

EAAA

AEAA

AAEA

AAVA

AAAE

GRLK

MPSE

n-gram = sequence of length n

Use of BLMT BY AMINO ACID N-GRAM ANALYSIS

- N-gram statistical analysis gives us a detailed statistical data in terms of frequency of n-grams and their respective mean and standard deviations.
- We have taken 45 organisms into consideration –bacteria, archaea, mycoplasmas and human
- Search for n-grams whose standard deviations are away from the mean values.
- Indicates the difference between expected and observed values in frequency of the n-grams.
- Eventually helps us to see the unsusuality of this n-gram in the organism unlike the others compared.

Xylella(black) BY AMINO ACID N-GRAM ANALYSIS

Vibrio(red)

Ureaplasma(green)

Treponema(blue)

Thermotoga(yellow)

Difference Between Expected and Observed frequenciesn-gram

The positive values indicate the over-represented n-grams while

the negative values indicate the under-represented n-grams

Initial Points of difference between expected and observed frequency graph

Xylella(black)

Vibrio(red)

Ureaplasma(green)

Treponema(blue)

Thermotoga(yellow)

Ureapasma shows high difference values (approx 0.00021), indicating

over-representation of n-grams compared

to expected probability of occurence in the organism

Mycoplasma genitalium(black) frequency graph

M.tuberculosis(red)

M.leprae(green)

Mesorhizobium(blue)

Lactococcus(yellow)

Standard deviation away from the mean- Mycoplasma genitalium(black)
- M.tuberculosis(red)
- M.leprae(green)
- Mesorhizobium(blue)
- Lactococcus(yellow)

Shows distribution of n-gram standard deviations with

both high and low values of difference, indicating the

over-expressed and under-expressed n-gram values.

Highest standard deviations away from the mean frequency graph

- Mycoplasma genitalium(black)
- M.tuberculosis(red)
- M.leprae(green)
- Mesorhizobium(blue)
- Lactococcus(yellow)

Shows initial (highest) values of standard deviation away from mean

N-grams of M.tuberculosis much higher than M.leprae.

Comparison of genome size with varying standard deviations frequency graph

- Examine the relationship between genome size and distribution of n-gram standard deviations for each organism
- Human genome taken as reference.
- Compare genome size and standard deviations within same genus but across different species.

Size Distribution of Genomes frequency graph

1.Human 22889476

2.Bacteria_Mesorhizobium_loti 4080256

3.Bacteria_Pseudomonas_aeruginosaPA01 3730192

4.baceria E_coi0157H7Baceria_Escherichia_coiO157H7 3229098

5.Bacteria_Escherichia_coliO157H7EDL933 3228100

6.Bacteria_Escherichia_coliK12 2726558

7.Bacteria_Mycobacterium_tuberculosisH37Rv 2666338

8.Bacteria_Bacillus_subtilis 2442200

9.Bacteria_Bacillus_halodurans_C125 2384352

10.Bacteria_SynechocystisPCC6803 2072748

11.Bacteria_Vibrio_cholerae_chr1 1725852

12.Bacteria_Deinococcus_radioduransR1_chr1 1559376

13.Bacteria_Xylella_fastidiosa 1490262

14.Archaea_Archaeoglobus_fulgidus 1343990

15.Bacteria_Pasteurella_multocida 1340102

16.Bacteria_Lactococcus_lactis_subsp_lactis 1335222

17.Archaea_Aeropyrum_pernix 1280062

18.B_Neisseria_meningitidis_serogroupBstrainMC58 1178096

19.Archaea_Halobacterium_spNRC1 1178038

20.B_Neisseria_meningitidis_serogroupAstrainZ2491 1176104

21.Bacteria_thermotoga_maritima 1167344

22.Bacteria_Pyrococcus_horikoshiiOT3 1141216

23.Bacteria_Mycobacterium_leprae_strinTN 1080756

24.A_Methanobacterium_thermoautotrophicum_deltaH 1054752

25.Bacteria_Haemophilus_influenzaeRd 1045572

26.Bacteria_Campylobacter_jejuni 1020944

27.Bacteria_Helicobacter_pylori_strianJ99 990942

28.Bacteria_Helicobacter_pylori26695 986258

29.Archaea_Methanococcus_jannaschii 970558

30.Bacteriae_Aquifex_aeolicus 968068

31.Archaea_Thermoplasma_acidophilum 909164

32.Archaea_thermoplasma_volcanium 903228

33.Bacteria_Chlamydophila_pneumonieaeJ138 735350

34.Bacteria_Chlamydophila_pneumonieaCWL029 725492

35.Bacteria_Chlamydophila_pneumonieaeAR39 729896

36.Bacteria_Treponema_pallidum 703414

37.Bacteria_Chlamydia_muridarum 646712

38.Bacteria_Chlamydia_trachomatis 626142

39.Bacteria_Rickettsia_prowazekii_strain_MadridE 559828

40.Bacteria_Mycoplasma_pneumoniae 480870

41.Bacteria_Ureaplasma_urealyticum 457608

42.Bacteria_Buchnera_sp_APS 371470

43.mycoplasma genitalium 352826

44.Bacteria_Borrelia_burgdorferi 300106

Size genome graph and varying std deviation values frequency graph

- Human(black22889476)
- Mesorhizobium(red,4080256)
- P.aeruginosa(green,3730192)
- E_coi0157h7(blue,3229098)
- E_coli0157h7EDl933
- (yellow,3228100)

The organisms are listed in descending order of genome size.

The relation between distribution of n-gram standard deviations

and size is compared.

Tail end of Genome size and n-gram distribution of standard deviations

Human(black,22889476)

Mesorhizobium(red,4080256)

P.aeruginosa(green,3730192)

E_coi0157h7(blue,3229098)

E_coli0157h7EDl933

(yellow,3228100)

Human genome, though largest in size, has low values

of n-gram standard deviation values away from the mean

compared to smaller genomes

Initial points: Genome size and n-gram distribution of standard deviations

Human(black,22889476)

Mesorhizobium(red,4080256)

P.aeruginosa(green,3730192)

E_coi0157h7(blue,3229098)

E_coli0157h7EDl933 (yellow,3228100)

Human n-gram std deviation values are almost equal to Mesorhizobium

though Mesorhizobium has much smaller genome.

Genome size and n-gram distribution of standard deviations standard deviations

- Human (black,22889476)
- E_coliK12(red,2726558)
- M.tuberculosis(green,2666338)
- B.subtilis(blue,2442200)
- B.halodurans(yellow,2384352)
- Synechocystis(brown,2072748)

M.tuberculosis has very high n-gram standard deviation values.

It exceeds the values of human, despite its smaller genome size.

Initial points of Genome size and n-gram distribution of standard deviations

Human (black,22889476)

E_coliK12(red,2726558)

M.tuberculosis(green,2666338)

B.subtilis(blue,2442200)

B.halodurans(yellow,2384352)

Synechocystis(brown,2072748)

The thickness of lines indicates the genome size.

The thinnest line represents E_coliK12.

Mycobacterium tuberculosis shows highest values.

Final points of Genome size and n-gram distribution of standard deviations

Human (black,22889476)

E_coliK12(red,2726558)

M.tuberculosis(green,2666338)

B.subtilis(blue,2442200)

B.halodurans(yellow,2384352)

Synechocystis(brown,2072748)

M.tuberculosis and all other organisms here

have n-grams with higher difference values than human.

Same genus / different species standard deviations

- 4-grams in M. tuberculosis have much higher 4-gram standard deviations from mean than M. leprae

Neisseria meningitidis standard deviations

Thermotoga maritima

Synechocystis spec.

Haemophilus influenza

Human

Other OrganismsConclusions standard deviations

- n-grams which are at least 30 standard deviations away from the mean are significant candidates for genome signatures.
- Difference graphs: estimate the likelihood of n-gram observed in an organism.
- Genome size graphs : there is no specific relationship between the size of genome and its standard deviation values.
- Same genus and different species, where genome size is specified: There is a noticeable difference observed between Mycobacterium species (M.leprae and M.tuberculosis).

Current and future work standard deviations

- Find n-gram signatures n-grams in E.coli.
- Explore the relationship between genome size and distribution of n-gram standard deviations different species of the same organism.
- Find more specific targets to differentiate species in terms of signature peptides for all the 44 organisms taken for study.

Download Presentation

Connecting to Server..