slide1
Download
Skip this Video
Download Presentation
Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence

Loading in 2 Seconds...

play fullscreen
1 / 25

Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence - PowerPoint PPT Presentation


  • 163 Views
  • Uploaded on

Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence. Henry Hyun-il Paik Bioinformatics, School of Informatics Indiana University. Outline. What CpG islands are The Known Relations between CpG islands and Genes

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Sequence analysis of CpG islands reveals possible functional correlation between genes and its CpG island sequence' - catori


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Sequence analysis of CpG islands reveals possiblefunctional correlation between genes and its CpG island sequence

Henry Hyun-il Paik

Bioinformatics, School of Informatics

Indiana University

outline
Outline
  • What CpG islands are
  • The Known Relations between CpG islands and Genes
  • Motivation and Goal
  • Data set
  • Procedures
  • Results
  • Discussion
what cpg islands are
What CpG islands are?
  • CpG dinucleotides are rare in mammal DNA
  • DNA Methylation only occurs at CpG sites
  • Methylated cytosines may be converted to thymine by deamination over evolution
    • CpG  TpG
  • CpG islands are short stretches of DNA with higher frequency of the CG sequence
  • Usually they are not methylated
what cpg islands are4
What CpG islands are?
  • Definition from Gardiner-Garden & Frommer
    • At least 200 bases long
    • G+C content: > 50%
    • observed CpG/expected CpG ratio: >= 0.6
  • Definition from Takai & Jones
    • Longer than 500 bp
    • G+C content: > 55%
    • observed CpG/expected CpG ratio: >= 0.65
    • With this definition, these CpGi’s are more likely to be associated with the 5’ regions of genes and exclude most Alu’s
  • There are about 29,000 such regions in the human genome
cpg islands genes
CpG islands & Genes
  • CpG islands located in the promoter regions of genes can play important roles in gene silencing
  • Housekeeping genes
    • Almost all housekeeping genes are associated with at least one CpG island
    • CpG islands are starting 5’ to the transcription start site and covering one or more exons and introns
  • Tissue specific genes
    • About 40 % tissue specific genes are associated with islands
    • The position of these islands is not strongly toward the transcription start site as in the housekeeping genes
cpg islands genes7
CpG islands & Genes
  • Not all CpG islands are associated with genes
    • Ioshikhes & Zhang determined the features to discriminate the promoter-associated and non-associated CpG islands
  • There are methylation-prone and methylation-resistant CpG islands
    • Feltus et. al. found patterns to discriminate methylation-prone from methylation-resistant CpG islands
cpg islands genes8
CpG islands & Genes

5’ end

CpGi

Gene

Promoter CpG islands

Gene

Gene

CpG islands in body

Gene

3’ end CpG islands

motivation and objective
Motivation and Objective
  • Our project was inspired by these ideas
  • Mechanical definition follows the definition as it is
    • At least 200 bases long
    • G+C content: > 50%
    • observed CpG/expected CpG ratio: >= 0.6
  • We tried to find “Semantic meaning” of CpG islands : Co-relation between CpG islands & Gene Functions
  • Are there any significant CpGi patterns related to the gene functions?
motivation and objective10
Motivation and Objective

CpGi 1

Gene 1

CpGi 2

Gene 2

  • We assume that gene1 and gene2 have similar function
  • Then gene 1 sequence and gene 2 sequence are probably similar.
  • Our Goal is to find CpGi patterns when genes have similar function
data set
Data Set
  • Reference:
  • Larsen F., Gundersen, G., Lopez L., Prydz H.
  • CpG island as Gene Markers in the Human Genome
  • Genomics 13:1095-1107 (1992)
  • Total number of entries: 1711
  • Entries with no islands: 1212
  • Entries with islands: 499
  • Total number of islands: 928
  • The Length of CpG islands
    • Average size of islands: 465 bp
    • Shortest detectable island: 200 bp
    • Largest island: 3340 bp
procedures
Procedures

Fasta all-to-all Comparison

Clustering

Clustering By BAG

MEME

Motif (Pattern)

Discovery & Search

for each cluster

MAST

Database search

with CpG islands patterns

BLAST

clustering
Clustering
  • We use a clustering program, BAG by Sun Kim
  • We compare each CpG island to all CpG islands using fasta for the input of BAG
  • BAG makes clusters based on sequence similarity
motif discovery search
Motif Discovery & Search
  • MEME discovers patterns for each cluster
  • To see the significance of a pattern, MAST searches all CpG islands with the pattern
  • We can see how significant the pattern is or how often the pattern occur according to E value
  • Profiles are made to represent each cluster
blast
BLAST
  • The entire GenBank was searched with CpG island profile, not with Gene
  • We see how efficiently the profile can find the genes that have similar function
  • This verifies the validity of the profile
results
Results
  • There are 26 clusters in which members have similar gene function among total 115 clusters
  • These 26 clusters are divided into two categories depending on CpGi location
    • 18 clusters have CpGi’s in coding region
    • 8 clusters have CpGi’s in promoter region
results19
Results
  • One example from CpGi in body
  • Cluster # 18 : Human heat-shock protein HSP70B' gene
    • Meme
    • Mast
    • profile sequence ATCATCGCCAACGACCAGGGCAACCGCACCACCCCCAGCTACGTGGCCTT
    • Blast
results20
Results
  • One example from promoter CpGi
  • Cluster # 25 : Human gene for creatine kinase B
    • Meme
    • Mast
    • Profile sequence

GAGGAGTCCTACGAAGTGTTCAAGGATCTCTTCGACCCCATCATTGAGGA

    • Blast
discussion
Discussion
  • The blast result implies that both CpG islands in promoter region and in CDS are good markers for gene sequences
  • Even though there are small numbers of promoter CpG islands, they represented their clusters significantly
  • Since many CpG islands tend to cover exons, they can be used to identify transcripts
  • Need more data to support this result and to make generic patterns
acknowledgement
Acknowledgement
  • Dr. Sun Kim
  • Dr. Paul Ma
  • Arvind
  • Bioperl community
ad