david goldberg cs 1950 directed study n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
David Goldberg CS 1950 Directed Study PowerPoint Presentation
Download Presentation
David Goldberg CS 1950 Directed Study

Loading in 2 Seconds...

play fullscreen
1 / 12

David Goldberg CS 1950 Directed Study - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

David Goldberg CS 1950 Directed Study. RNA Sequence. Exon. Down Intron. Up Intron. GATTACACATGCCGTAG. CCCACTCCATGATTACAC. CATGCCGTAGCTCATGCC. GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA. Pattern 1:ACATCAC Pattern 2:ACGT. Desired Upgrades. Current Program:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'David Goldberg CS 1950 Directed Study' - signa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
david goldberg cs 1950 directed study
David Goldberg

CS 1950 Directed Study

rna sequence
RNA Sequence

Exon

Down Intron

Up Intron

GATTACACATGCCGTAG

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCC

GCCACGTCTTTTGCTCTTTGCAGGATTACATCACTGGAAACTTTAGCCACGTAAACTTTA

Pattern 1:ACATCAC

Pattern 2:ACGT

desired upgrades
Desired Upgrades

Current Program:

  • Command line arguments only (2 patterns)
  • Cannot Use Y or R or N
  • Only Checks Human RNA for patterns
  • Has static search length
  • Result file displays Human RNA id, mouse RNA id, and last 75 characters.
  • Only Searches Down Intron

New Program:

  • Better user interface
  • Ability to use Y,R and N
  • Control Lengths between patterns and length of search
  • Easier to decipher result file
  • Checks Human and Mouse RNA
  • Can search up or down introns and exons.
possible problems
Possible Problems
  • Programming in Perl
  • Extensive Use of Regular Expressions
  • Trouble figuring out exactly what is needed to be done
  • Don’t know if what we want to be done can be done
slide5

Old Program:

  • Command line arguments only (2 patterns)
  • Cannot Use Y or R or N
  • Only Checks Human RNA for patterns
  • Has static search length
  • Result file displays Human RNA id, mouse RNA id, and last 75 characters.
  • Only Searches Down Intron

New Program:

  • Prompts user for inputs:
    • Path of database with default
    • 2 patterns
    • Minimum and Maximum distance between patterns
    • Searches from either 3’ splice site(beginning) or 5’ splice site(end)
    • Length from beginning or end to search
    • Which part to search(down intron, exon, up intron)
  • Will find matches in either the Human RNA, Mouse RNA or both
  • Result file displays Human RNA id, Mouse RNA id, sequence searched, 1st pattern found, sequence in between 1st pattern and 2nd pattern, and 2nd pattern
slide6

Old Program Results File

Pattern1=ACG Pattern2=TT

humanID

mouseID

ENSG00000124721_61 ENSMUSG00000033826_64 CENA GTAAGTTTTTATTTTTATTTATATCTACGTAGAAAGAGTTCCTTATTTAAAGGTGCTTAGTTTGCCTTCTCTGAT

ENSG00000113569_8 ENSMUSG00000022142_8 CENA GTAAGTAGAAAACAATAAATTTGGCAAGTACAACTAATTTCTAACACATTGTTCCCTCAACGTTTTCTTCAGAAA

ENSG00000105323_14 ENSMUSG00000040725_13 CENA GTGAGAGAATGAGTGTGTGTTTGTATGTAGTGATCGCACGTGTGCTTTTGAACCTGAGCAAGTTAGGTGGAGGCG

...

slide7

New Program Results File

Pattern1=ACG Pattern2=TT Search=up SITE=3'

humanID

mouseID

ENSG00000134690_4 SE CTACAACGTTCTTTTTAAAG ACG TT

ENSMUSG00000028873_3 SE Not Found

ENSMUSG00000026954_6 CENA TTTTATTCATACGCTTACAG ACG C TT

ENSG00000115145_5 CENA Not Found

ENSG00000124721_67 CENA CCACGTCTTCTTCTTTTCAG ACG TC TT

ENSMUSG00000033826_70 CENA Not Found

ENSG00000052126_20 CENA ACGTTTTCTAATATTCCCAG ACG TT

ENSMUSG00000030231_11 CENA Not Found

ENSG00000138468_2 SE CACGTCTTTGGTTTTTGTAG ACG TC TT

ENSMUSG00000022591_2 SE TACGTCTTTCATTTTTGTAG ACG TC TT

ENSG00000151376_4 CENA ACGTGTTTTATTTCTTTTAG ACG TG TT

ENSMUSG00000030621_4 CENA Not Found

...

exon intron program
Exon, Intron Program
  • Wanted a program that searched the end of the down intron and beginning of the exon.
  • The first pattern would be in the intron.
  • The second pattern would be in the exon.
  • Exon usually start with a GT pattern so if it starts with that it should ignore that part in the pattern matching, but if the GT is not present it should still try to match the 2 patterns.
rna sequence1
RNA Sequence

Exon

Down Intron

Up Intron

GATTACACATGCCGTAG

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCC

ACTCCATGATTACAC GATTACACATG

Pattern 1:GATT

Pattern 2:ACAT

exon intron program1
Exon, Intron Program
  • Prompts user for inputs:
    • Path of database with default
    • 2 patterns
    • Minimum distance between patterns
    • Will find matches in either the Human RNA, Mouse RNA or both
  • Result file displays Human RNA id, Mouse RNA id, small part of the down intron before first pattern, 1st pattern found, sequence in between 1st pattern and end of down intron, the GT sequence if it was at the start of the exon, the beginning of the exon until the 2nd pattern, and 2nd pattern, small part of the exon after the 2nd pattern, the length of the pattern in between the 1st pattern and the end of the intron, the length of the pattern between the start of the exon and the 2nd pattern.
rna sequence2
RNA Sequence

Exon

Down Intron

Up Intron

GTATTACACATGCCGTA

CCCACTCCATGATTACAC

CATGCCGTAGCTCATGCC

ACTCCATGATTACAC GTATTACACATGCCGTA

4

5

Pattern 1:GATT

Pattern 2:ACAT

slide12

Exon, Intron Program Results File

Pattern1=ACTG Pattern2=TTAC Max Space:15

humanID mouseID

ENSG00000163872_13 ENSMUSG00000041215_12 CENA TGTAACATCT ACTG TCAAG GT AACATTC TTAC TGCGTT 5 7

ENSG00000135390_3 ENSMUSG00000010371_3 CENA GGAGAT ACTG ACAGATGAG GT ACC TTAC AGTGGAGTTG 9 3

ENSG00000103876_8 ENSMUSG00000030630_8 CENA CTTATGAACG ACTG GAGTG GT AA TTAC TGGAGCTCTGC 5 2

ENSG00000156253_3 ENSMUSG00000041079_3 SE TGCCTGAAATT ACTG TCAG GT ACG TTAC AGAAGCTCTG 4 3

ENSG00000151490_18 ENSMUSG00000030223_18 CENA AGAAGAGGAA ACTG ACAAA GT AAGTTTTTC TTAC TATG 5 9

...