1 / 16

More on iParameterEstimation

More on iParameterEstimation. Bob Zimmermann 27 April 2005. iParameterEstimation Data Flow. Sequence. Parameter Template. ACTATTACGTATTAGGATCCGAATGAGGATTA…. Dispatcher. Feature Mapping. State Annotation. Backend. TTAA. AAGG. CCTT. GTATT. TGCA. GCTC. TCCA. Annotation. gHMM.

arne
Download Presentation

More on iParameterEstimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More on iParameterEstimation Bob Zimmermann 27 April 2005

  2. iParameterEstimation Data Flow Sequence Parameter Template ACTATTACGTATTAGGATCCGAATGAGGATTA… Dispatcher Feature Mapping State Annotation Backend TTAA AAGG CCTT GTATT TGCA GCTC TCCA Annotation gHMM Model1 Model2 Model3 Model4

  3. What Kind of Model? • Existing Model: Learn XML (not hard) • (SDT, LUT, WMM, WAM, CDS, ISO) • New Model: Learn OO Perl (not too hard) • Inherit from the iPE::Model class and count • AltSplice Model: Invent an algorithm to locate instances (kinda hard)

  4. Overview on Adding Models • Update DTD • Update XML* • Update gHMM.pm • Add a file YourModel.pm to iPE/Models • Count and test!

  5. DTD • XML files are (sometimes) linked to a DTD: Document Type Definition • Allows us to do some simple preprocess error checking • Is a human- and computer-readable definition of the expected pattern of the data • DTD:XML::Class Definition:Object

  6. param_template … sequence_models … string_model “Stop” string_submodel “TAG” string_submodel “TGA” string_submodel “TAA” XML Hierarchy • XML is a hierarchical data description language • Example:

  7. ELEMENTs, ATTLISTs • The generic hierarchy is described in DTD <!ELEMENT param_template (author, date)> <!ELEMENT author (#PCDATA)> <!ELEMENT date (#PCDATA)> • This might describe the following: <?xml version="1.0"?> <!DOCTYPE param_template SYSTEM "param_template.dtd"> <param_template> <author>Bob Zimmermann</author> <date>4/26/05</date> </param_template>

  8. param_template … sequence_models … string_model “Stop” string_submodel “TAG” string_submodel “TGA” string_submodel “TAA” XML Hierarchy • XML is a hierarchical data description language • Example:

  9. ELEMENTs, ATTLISTs, cont’d <!ELEMENT param_template (author, date, states, init_model, trans_model, state_durations, sequence_models, conservation_models)> … <!ELEMENT sequence_models (string_model|fixed_string_model)+> <!ELEMENT string_model (string_submodel|fixed_string_submodel)*> <!ELEMENT string_submodel (string_submodel|fixed_string_submodel)*> <!ELEMENT fixed_string_submodel (#PCDATA)> <!ELEMENT fixed_string_model (#PCDATA)> …

  10. ELEMENTs, ATTLISTs, cont’d • Elements have zero or more attributes and data <!ATTLIST sequence_models > <!ATTLIST conservation_models > <!ATTLIST string_model name CDATA #REQUIRED source CDATA #REQUIRED states CDATA #REQUIRED focus CDATA #REQUIRED length CDATA #REQUIRED begin CDATA #REQUIRED end CDATA #REQUIRED data CDATA "" model (SDT|WAM|WMM|WWAM|LUT|CDS|MIX|ISO|SIG) #REQUIRED submodels CDATA #REQUIRED>

  11. Bringing it all together <string_model name="Start" model="SDT" source="DNA" states="Einit0 Einit1 Einit2 Einit- Esngl" begin="-6" end="3" focus="6" length="12" submodels="2"> <string_submodel name="ATG" model="WMM" submodels="0" /> <fixed_string_submodel name="NNN" model="WMM"> . . . . </fixed_string_submodel> </string_model> <string_model name="Stop" model="SDT" source="DNA" states="Eterm Eterm0- Eterm1- Eterm2- Esngl" begin="L" end="L+12" focus="3" length="12" submodels="2"> <string_submodel name="TAA" model="WMM" submodels="0" /> <string_submodel name="TAG" model="WMM" submodels="0" /> <string_submodel name="TGA" model="WMM" submodels="0" /> <fixed_string_submodel name="NNN" model="WMM"> . . . . </fixed_string_submodel> </string_model>

  12. OO PERL • Objects: HREFs ($object->{membr}) • Classes: Packages • Methods: subs ($object->method(arg);) • Inheritence: @ISA, use base (“”);

  13. iPE Object Hierarchy, Revisited Estimator AnnotatedSequence Model Locus Duration Emission Transition Initial Explicit WMM Geometric LUT … …

  14. Extending Model Base Class • Container for an array of scalar values, representing the parameters • Update iPE/gHMM.pm • Add a new .pm file to the Model Directory

  15. What You Will Be Responsible For • Construction • Zeroing out (Pseudocounting) your parameters and null parameters • Counting Positives and Nulls • Apply a weighted count to every base you see • Normalizing, Calculating the log-odds • Outputting a Zoe header • Other output formats will have auto-generated header • Outputting your Parameters • Whatever state you are in, counted, normalized or logized, print params tab-separated and human readable

  16. Future Topics • Feature-level Parallelization • Cluster Parallelization • EM (Baum-Welch) Estimator

More Related