Kendra baughman york marahrens lab ucla l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on
  • Presentation posted in: General

Kendra Baughman York Marahrens’ Lab UCLA. Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes. Overview. Goal Background Prior Studies Strategy Results Remaining Tasks Future Directions. Goal.

Download Presentation

Finding Sequence Motifs in Alu Transposons that Enhance the Expression of Nearby Genes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Kendra Baughman

York Marahrens’ Lab

UCLA

Finding Sequence Motifs in AluTransposons that Enhance the Expression of Nearby Genes


Overview

Goal

Background

Prior Studies

Strategy

Results

Remaining Tasks

Future Directions


Goal

Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression


Background – Alu Elements

Repetitive sequence

Transposons (DNA sequences that make copies of themselves and insert elsewhere in the genome)

Over 1 million in human genome

~50 subfamilies categorized by sequence differences


Prior Studies

“Repetitive sequence environment distinguishes housekeeping genes”

Eller, Daniel et al. submitted

“Alu abundance positively correlates with gene expression level”

C.D. Eller et. al. submitted


Higher Alu concentration near widely expressed genes


Higher Alu concentration near highly expressed genes


Alu Subfamilies

# Alu in the Subfamily

Subfamily


Data

Human gene expression levels from microarray data (Stan Nelson’s lab, UCLA)

Alu information from UCSC Genome Browser, Repeat masker tracks


Goal, reiterated

Determine if there are motifs present among Alu elements near highly expressed genes, and missing from Alu elements near poorly expressed genes, that might contribute to gene expression


Strategy

Find Alu “near” high and low expression genes (within 20kb)

Perform multiple sequence alignment on Alu sequences

Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)


Strategy

Find Alu “near” high and low expression genes (within 20kb)

Perform multiple sequence alignment on Alu sequences

Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)


Used Perl scripts to extract information from MySQL databases

Grouped genes by expression level in R

Chose genes in top and bottom 20%

Expression Level

Genes

Screening the genes…


Chrom1 1st 20mb

Chrom10

Chrom19 1st 20mb

10kb

3%

6%

20%

20kb

7%

7%

28%

50kb

17%

11%

50%

Screening the Alu…

  • Used MySQL queries to determine flanking region

  • Used Perl scripts to screen

    Alu located within 20kb of genes

  • Omitted Alu in overlapping flanking regions

PERCENTAGES OF ALU THROWNOUT

LO-gene

HI-gene

HI-Alu

??-Alu

LO-Alu


Strategy

Find Alu “near” high and low expression genes (within 20kb)

Perform multiple sequence alignment on Alu sequences

Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)


Alignment Process…

  • First alignment tool: Clustalw

    • Slow, inaccurate

  • Second alignment tool: T-COFFEE

    • Can’t handle hundreds of sequences

  • Third alignment tool: MUSCLE

  • Aligning thousands of sequences = big gaps and processing limitations

  • Chose to analyze by subfamily (S, Sp/q)

    • Aligned elements around highly expressed genes

    • Aligned elements around poorly expressed genes

    • Profile high/low alignment

    • Consensus sequence alignment


  • Alignment viewed in Jalview


Alignments of Alu Sp/q and AluS Elements

High Alu

High conserv.

Low conserv.

AluSp-q EPS

AluSp/q

AluS


Strategy

Find Alu “near” high and low expression genes (within 20kb)

Perform multiple sequence alignment on Alu sequences

Identify motifs preferentially conserved around highly expressed genes (these motifs could help the genes be highly expressed)


Alu w/ a base:

*5547666896759699995769699999999999*9989979

Frequency of consensus base

All Alu:

0444762289674300448576809499545545409449808

High Alu:

TATCCACGCCTGCAAAATCTCAGCCACTCCCAAAGTTGCTGCG

Alu consensus sequence

Low Alu

CANCC-CGCCT-CGTAATCCCAA--------AATGTT--TG-G

All Alu:

76044 55899 37444989894 454045 98 8

Frequency of consensus base

Alu w/ a base:

77488 66899 67444999995 455645 98 9

AluSp/q

High Alu: TGCTCAGAAATTTCTCGGCTCACTGCAACCTCCGTATCACCCC

Low Alu:CG---A-AA--------------------CTCCGT--T---CT

Alu w/ a base: 596**65559458765699999978999999966566******

Alu w/ a base: 56 5 69 555655 6 99

Frequency of consensus base

All Alu: 0860005458443600233333323333333345400000000

All Alu: 55 4 58 444544 0 77

Alu consensus sequence

Frequency of consensus base

AluS


Remaining Tasks

Analyze the remaining sub-families

Determine whether identified motifs agree across subfamilies

BLAST motifs against all Alu sequences and correlate alignment scores with expression level


Future Directions

Cluster alignments into a relationship tree to see if HI and LO Alu groups cluster differently from each other

Create a matrix of pairwise alignments and cluster these into a tree using nearest neighbour clustering

Use Hidden Markov Models or Gibbs sampling to identify sequence motifs (non-multiple sequence alignment method of motif finding)


Acknowledgements

Danny Eller

York Marahrens

Marc Suchard

Chiara Sabatti

SoCalBSI

NIH/NSF


  • Login