1 / 9

Biology is the science of reverse-engineering life

Biology is the science of reverse-engineering life. Living organisms are molecular machines capable of replicating themselves The functional unit of life is the “cell” 1-100 um in diameter contains a primary information store: the genome. The Structure of a Genome.

albertruiz
Download Presentation

Biology is the science of reverse-engineering life

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biology is the science of reverse-engineering life • Living organisms are molecular machines capable of replicating themselves • The functional unit of life is the “cell” • 1-100 um in diameter • contains a primary information store: the genome

  2. The Structure of a Genome • Generally one or several strands of a polymer, called DNA, packaged into “chromosomes” • Information is encoded as the order of the monomer sub-units (of types “A”, “C”, “G”, “T”) in the linear polymer • Each cell carries the entire genome of the organism.

  3. The Nature of DNA • Linear, water-soluble, molecular data storage • The polymer is actually “double-stranded”-each strand the “reverse-complement” of the the other • The double strand is 2 nm in diameter • Each monomer unit, “base”, added lengthens the strand by 0.34 nm

  4. DNA Storage Density • Genome length of an average bacterium is 2 megabases (Mb) • Human genome 3 gigabases (Gb) • Typical DNA “prep” solution contains about 25 petabytes/ml. (A ml is about 20 drops of a liquid.)

  5. Perl and Genomics • Good: • Perl is quick to write • Excellent for parsing • DWIM is good for the typical biologist • Bad: • Not as fast running • Result: frequently middleware

  6. My perl scripts • 167 in my /bin • Most are for either dealing with system stuff or parsing output from other programs • A few are meant to directly analyze “sequence data”

  7. Example Sequence Analysis Program • ssr3.pl • “ssr” is “simple sequence repeat” aka “microsatellite” • E.g: >Echinomicrosat_01_B04_T7 XXXXXCAGAAGCGCTTCACAATTAAAAGCAAATCATACAAATATGATCAT CAGGCAGGCTATTTGAACACACTGTTTCGCACTGAACTCATAGTCACATT TCAGTCGTTCAGTGAGATGATTCATATGGCATAATTTGAACTGACGTTCG CTCTGACTATCGTTCAGCTCGTTGTGGGCACAATCGTTAGTCAGTTCGTT CACTCAACCACACACACACACACACACACGGAAACATCAGATTCGAGCTA AGCTCTTATTACAGCTGATCAGTAGGAGCACTGTTAGACAGTCTACTAAA TCAATATCAATTATCCCCCCCACACAACCATGGCTTCTGXXXXX

  8. Example run of ssr3.pl %ssr3.pl Echinomicrosat_01_B04_T7.fasta Name Seq Len Range # of repetitions of sub unit Sub unit Echinomicrosat_01_B04_T7 344 209-228 10 of repeat "CA" ----------------------------- >Echinomicrosat_01_B04_T7 XXXXXCAGAAGCGCTTCACAATTAAAAGCAAATCATACAAATATGATCAT CAGGCAGGCTATTTGAACACACTGTTTCGCACTGAACTCATAGTCACATT TCAGTCGTTCAGTGAGATGATTCATATGGCATAATTTGAACTGACGTTCG CTCTGACTATCGTTCAGCTCGTTGTGGGCACAATCGTTAGTCAGTTCGTT CACTCAACCACACACACACACACACACACGGAAACATCAGATTCGAGCTA AGCTCTTATTACAGCTGATCAGTAGGAGCACTGTTAGACAGTCTACTAAA TCAATATCAATTATCCCCCCCACACAACCATGGCTTCTGXXXXX

  9. ssr3.pl core routine: while ( $sequences{$x} =~ m/#Capture each ssr sub-unit within tolerance #Note "?" for lazy capture. Ensures "AC" is #the repeat unit instead of "ACAC" for example ([ACGT]{$min_repeat_unit_len,$max_repeat_unit_len}?) \1{$min_repeat_num,} /gix ) { my $repeat_unit = $1; my $start_of_ssr = $-[0]+1; my $end_of_ssr = $+[0]; my $ssr = $&; my $ssr_length = length($ssr)/length($repeat_unit);

More Related