1 / 20

Code-Level Parameter Estimation

Code-Level Parameter Estimation. The Dryest Presentation Ever Bob Zimmermann 7 September 2005. Annotations. Parameters!. Sequences. What is it that We’re Doing Here Again?. An object-oriented, extensible parameter estimator A parameter estimator with minimized redundant code

dafydd
Download Presentation

Code-Level Parameter Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code-Level Parameter Estimation The Dryest Presentation Ever Bob Zimmermann 7 September 2005

  2. Annotations Parameters! Sequences What is it that We’re Doing Here Again? • An object-oriented, extensible parameter estimator • A parameter estimator with minimized redundant code • A usable parameter estimator

  3. Overview Parameter Estimation has 5 main phases: • Instantiation • Read in config files, initialize gHMM • Annotation • Convert annotations to state sequences • Segment the annotations • Regioning • Convert annotations to regions • Counting • Count the models • Estimation

  4. Instantiation:What the User Sees • 3 levels of configuration • Instance file: command line options describing the sequences & annotations to be estimated • gHMM file: HMM description, model description, null region description • Feature Map file: describes the conversion required to get from annotation to state sequences. • User only inputs an instance file

  5. Instantiation:UML

  6. Annotation:Steps • Each annotation is read in one by one, possibly by chromosome • Any number of sequences are associated with each annotation • Annotations are converted into features • Null regions are applied to appropriate features • Segmentation

  7. Einit0 Intron0 Exon0 Einit0 Intron0 Exon0 Annotation:A Review of Layering and Segmentation

  8. Annotation: UML

  9. Eterm0 Eterm0 Einit0 Acceptor Stop Acceptor Einit0 Stop “Parent Region” Stop Stop “Context” Regioning:Segmentation and Counting

  10. Regioning: UML

  11. Regioning: Simplified • A region includes the sequence to count • A region specifically defines where a model should be counted • The accessor needs no knowledge of strand, regions are reverse complemented on instantiation. • Simply, count from region start to region end on the provided string

  12. Estimation:General Idea • Smoothing • Each model is given a smoother • Normalization • Scoring

  13. Estimation: UML

  14. Smoother • smoothAref ( ), smoothHref ( ) - smooth the counts for the given parameters

  15. Duration • countFeature ( ) - count the feature duration in the model • smooth ( ) - smooth the counts of the distribution using your smoothers • normalize ( ), score ( ) - convert your counts to scores

  16. Emission • init( ) - Initialize internal variables • clear( ) - Zero out all parameters • countRegion ( ), countNullRegion ( ) - Count a region. • smooth ( ) - Use your smoothers to smooth the data. • normalize ( ), score ( ) - Convert parameters to probabilities or scores. • outputPrepare ( ) - Set the parameter string

  17. Putting it All Together sub _countString { my ($this, $region, $null) = @_; my $buck; if($null) { $buck = $this->nullCounts } else { $buck = $this->posCounts } my $start = $region->start; my $length = $region->end - $region->start + 1; my $weight = $region->weight; my $context = $region->context; my $order = $this->order; my $strRef = $region->strRef; for my $pos (0 .. $length-1) { my $nmer = substr($$strRef, $start+$pos-$order, $order+1); $buck->[$pos+$context]->{$nmer} += $weight; } }

  18. Performance • Runs in about 1-2 hours on the whole genome • Takes up <2GB memory (keeps entire sequence in memory) • Further optimizations can be applied

  19. Prognosis • Running tests now with Randy • Releasing testing version to another lab • Lower-level testing inside the lab • Available on CPAN by the end of the year

  20. Next Predicting skipped exons!

More Related