1 / 23

Fast and Accurate MRFs through Evidence-Specific Structures 

Fast and Accurate MRFs through Evidence-Specific Structures . Veselin Stoyanov and Jason Eisner HLT/COE and CLSP Johns Hopkins University. Why Evidence-Specific MRFs?. Data may be missing heterogeneously. For example, the relational case.

fia
Download Presentation

Fast and Accurate MRFs through Evidence-Specific Structures 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast and Accurate MRFs through Evidence-Specific Structures  VeselinStoyanov and Jason Eisner HLT/COE and CLSP Johns Hopkins University

  2. Why Evidence-Specific MRFs? • Data may be missing heterogeneously. • For example, the relational case. • We want to learn evidence specific structures in order to: • Improve accuracy (under approximate inference) • Speed up inference (while keeping accuracy high)

  3. Markov Random Fields B A C C D E

  4. Heterogeneous Evidence B A C C D E

  5. Heterogeneous Evidence B A C C D E

  6. Evidence Specific Structures B A C C D E

  7. Evidence Specific Structures through Gates • Gates [Minka and Winn, 2008] G2 G1 G4 G6 B G3 A G1 G5 G7 C F D E

  8. Evidence Specific Structures through Gates • Conditioned on the inputs G2 G1 G4 G6 B G3 A G1 G5 C G7 F D E

  9. Evidence Specific Structures through Gates • Gates [Minka and Winn, 2008] • A formalism for expressing contextual independence • In vanilla MRF: • Including a gate g{0,1} to control each factor : • A soft version:

  10. Evidence Specific Structures through Gates • Each gate is a classifier that decides whether the corresponding factor should be on or off • Using features of the observation pattern and/or the observed values • Inference time can be expressed as a function of the number of gates that are on • Gates can also be partially on: • Damp messages over the corresponding factors

  11. A Two-step test-time process 1. Compute gate values G2 G1 G4 G6 B G3 A G1 G5 C G7 F D E

  12. A Two-step test-time process 2. Run inference on the active factors G2 G1 G4 G6 B G3 A G1 G5 C G7 F D E

  13. A Two-step test-time process 2. Run inference on the active factors G2 G4 B G3 A G1 C G7 F D E

  14. A Two-step test-time process 2. Run inference on the active factors G4 B G3 A G1 C F D E

  15. A Two-step test-time process 2. Run inference on the active factors B A C F D E

  16. Gates are Random Variables We can define an MRF over the gates: G2 G1 G4 G6 B G3 A G1 G5 C G7 F D E

  17. Gates are Random Variables We can define an MRF over the gates: G2 G1 G6 B G3 A G1 G5 C G7 G4 F D E

  18. Training • The objective: • Learn MRF parameters and gate features that minimize a measure of accuracy and speed on training data: • We will use ERMA [Stoyanov, Ropson and Eisner,2011] to jointly learn the MRF and gate parameters

  19. ERMA • Empirical Risk Minimization under Approximations • An algorithm for learning in graphical models that will be used with approximations • Uses back-propagation of error to compute gradients of loss with respect to parameters • A local optimizer to find parameters that (locally) minimize loss

  20. Experiments • Synthetic data • Sample a random 4-ary MRF • Sample training and test data. • Forget the structure. • Start learning with a fully connected but binary graph.

  21. Experiments • Gate features • A conjunction of factor id and r.v. values • Say GAB controls factor FAB between r.v.’s A and B. • A (and B) can be {1,0,hidden or output} • Features for GAB are: • fab_{1,0,h,o}_{1,0,h,o} • For instance, if A = 1, B is output: • Feature fab_1_o will fire • Total number of features for a gate is 4x4 – 2x2 (we can exclude factors for which both r.v.’s are observed)

  22. Results

  23. Conclusions and Future Work • Evidence-specific structures can result in more accurate and/or faster models. • Gates and ERMA can be utilized for representing and learning conditional independences. • We will be applying our model to relational data.

More Related