1 / 25

Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence

Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence. Henry Hyun-il Paik Bioinformatics, School of Informatics Indiana University. Outline. What CpG islands are The Known Relations between CpG islands and Genes

catori
Download Presentation

Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence analysis of CpG islands reveals possiblefunctional correlation between genes and its CpG island sequence Henry Hyun-il Paik Bioinformatics, School of Informatics Indiana University

  2. Outline • What CpG islands are • The Known Relations between CpG islands and Genes • Motivation and Goal • Data set • Procedures • Results • Discussion

  3. What CpG islands are? • CpG dinucleotides are rare in mammal DNA • DNA Methylation only occurs at CpG sites • Methylated cytosines may be converted to thymine by deamination over evolution • CpG  TpG • CpG islands are short stretches of DNA with higher frequency of the CG sequence • Usually they are not methylated

  4. What CpG islands are? • Definition from Gardiner-Garden & Frommer • At least 200 bases long • G+C content: > 50% • observed CpG/expected CpG ratio: >= 0.6 • Definition from Takai & Jones • Longer than 500 bp • G+C content: > 55% • observed CpG/expected CpG ratio: >= 0.65 • With this definition, these CpGi’s are more likely to be associated with the 5’ regions of genes and exclude most Alu’s • There are about 29,000 such regions in the human genome

  5. What CpG islands are?

  6. CpG islands & Genes • CpG islands located in the promoter regions of genes can play important roles in gene silencing • Housekeeping genes • Almost all housekeeping genes are associated with at least one CpG island • CpG islands are starting 5’ to the transcription start site and covering one or more exons and introns • Tissue specific genes • About 40 % tissue specific genes are associated with islands • The position of these islands is not strongly toward the transcription start site as in the housekeeping genes

  7. CpG islands & Genes • Not all CpG islands are associated with genes • Ioshikhes & Zhang determined the features to discriminate the promoter-associated and non-associated CpG islands • There are methylation-prone and methylation-resistant CpG islands • Feltus et. al. found patterns to discriminate methylation-prone from methylation-resistant CpG islands

  8. CpG islands & Genes 5’ end CpGi Gene Promoter CpG islands Gene Gene CpG islands in body Gene 3’ end CpG islands

  9. Motivation and Objective • Our project was inspired by these ideas • Mechanical definition follows the definition as it is • At least 200 bases long • G+C content: > 50% • observed CpG/expected CpG ratio: >= 0.6 • We tried to find “Semantic meaning” of CpG islands : Co-relation between CpG islands & Gene Functions • Are there any significant CpGi patterns related to the gene functions?

  10. Motivation and Objective CpGi 1 Gene 1 CpGi 2 Gene 2 • We assume that gene1 and gene2 have similar function • Then gene 1 sequence and gene 2 sequence are probably similar. • Our Goal is to find CpGi patterns when genes have similar function

  11. Data Set • Reference: • Larsen F., Gundersen, G., Lopez L., Prydz H. • CpG island as Gene Markers in the Human Genome • Genomics 13:1095-1107 (1992) • Total number of entries: 1711 • Entries with no islands: 1212 • Entries with islands: 499 • Total number of islands: 928 • The Length of CpG islands • Average size of islands: 465 bp • Shortest detectable island: 200 bp • Largest island: 3340 bp

  12. a Snap Shot of Data set

  13. Procedures Fasta all-to-all Comparison Clustering Clustering By BAG MEME Motif (Pattern) Discovery & Search for each cluster MAST Database search with CpG islands patterns BLAST

  14. Clustering • We use a clustering program, BAG by Sun Kim • We compare each CpG island to all CpG islands using fasta for the input of BAG • BAG makes clusters based on sequence similarity

  15. Motif Discovery & Search • MEME discovers patterns for each cluster • To see the significance of a pattern, MAST searches all CpG islands with the pattern • We can see how significant the pattern is or how often the pattern occur according to E value • Profiles are made to represent each cluster

  16. Motif Discovery & Search

  17. BLAST • The entire GenBank was searched with CpG island profile, not with Gene • We see how efficiently the profile can find the genes that have similar function • This verifies the validity of the profile

  18. Results • There are 26 clusters in which members have similar gene function among total 115 clusters • These 26 clusters are divided into two categories depending on CpGi location • 18 clusters have CpGi’s in coding region • 8 clusters have CpGi’s in promoter region

  19. Results • One example from CpGi in body • Cluster # 18 : Human heat-shock protein HSP70B' gene • Meme • Mast • profile sequence ATCATCGCCAACGACCAGGGCAACCGCACCACCCCCAGCTACGTGGCCTT • Blast

  20. Results • One example from promoter CpGi • Cluster # 25 : Human gene for creatine kinase B • Meme • Mast • Profile sequence GAGGAGTCCTACGAAGTGTTCAAGGATCTCTTCGACCCCATCATTGAGGA • Blast

  21. Gene & CpG islands in promoter region

  22. Gene & CpG islands in CDS

  23. Discussion • The blast result implies that both CpG islands in promoter region and in CDS are good markers for gene sequences • Even though there are small numbers of promoter CpG islands, they represented their clusters significantly • Since many CpG islands tend to cover exons, they can be used to identify transcripts • Need more data to support this result and to make generic patterns

  24. Acknowledgement • Dr. Sun Kim • Dr. Paul Ma • Arvind • Bioperl community

  25. Comments & Questions

More Related