1 / 16

Efficient Inference on Sequence Segmentation Models

Efficient Inference on Sequence Segmentation Models. Sunita Sarawagi IIT Bombay sunita@iitb.ac.in. Sequence segmentation models. Flexible & accurate models for many applications Speech segmentation on phonemes Syntactic chunking Protein/Gene finding

fola
Download Presentation

Efficient Inference on Sequence Segmentation Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Inference on Sequence Segmentation Models Sunita Sarawagi IIT Bombay sunita@iitb.ac.in

  2. Sequence segmentation models • Flexible & accurate models for many applications • Speech segmentation on phonemes • Syntactic chunking • Protein/Gene finding • Information extraction with entity-level features • Whole entity match with database of entities • Length of entity between 3 and 8 words • Third or fourth token of entity a “-” • Last three tokens are digits From Keshet et al ’05 NIPS wkshp

  3. Similarity to author’s column in database Sequence Vs. Segmentation Models t x y Features describe the single word “Fagin” y1 y2 y3 y4 y5 y6 y7 y8 l,u x y Features describe full entity

  4. Segmentation models • Input: sequence x=x1,x2..xn, label set Y • Output: segmentation S=s1,s2…sp • sj = (start position, end position, label) = (tj,uj,yj) • Score: F(x,s) = • Transition potentials • Segment starting at i has label y and previous label is y’ • Segment potentials • Segment starting at i’, ending at i, and with label y. • All positions from i’ to i get same label. • Inference • Most likely segmentation (Max-margin trainers) • Marginal around segments (likelihood-based & exponentiated-gradient trainers)

  5. Inference: Marginal for a segment Forward messages (L = max segment length) O(n L2) Matrix notation: for L = n Segment Marginal y1 y2 y3 y4 y5 y6 y7 y8

  6. Goal • Speed up segmentation models • Currently 3—8 times slower than sequence models • Eliminate L, the hard limit on segment length • Efficiently handle mix of potentials spanning varying number of tokens • Pay the penalty of segmentation models only for longer entity level features instead of all of them Empirical results on extraction tasks: • Segmentation models with few entity features: higher accuracy at the same cost as sequence models

  7. Succinct potentials • Key insight • Compactly represent features on overlapping segments • Main challenge • Inference algorithms on compact potentials where cost is independent of segments a potential applies to • Four kinds of potentials

  8. Applications with mixed potentials • Named Entity Recognition • Speech segmentation on phonemes

  9. Efficient Inference: forward pass • Sharing computation • Split potentials (y-s) into two parts: different Common to all segments ending after i-1 Common to all segments starting before i-m Maximum gap between boundary of any y

  10. Optimized forward pass Two sets of modified forward messages y1 y2 y3 y4 y5 y6 y7 y8 y9 O(n m2) Similar two sets of backward messages Same strategy for max-product inference

  11. Marginals around potentials • Direct computation of marginals is O(n2) • Reduced to O(1) by two tricks • Decomposing potentials as in a,b • Sharing computations across adjacent potentials Direct Optimized a bit more tricky

  12. Complexity and data structures • Complexity of computing marginals • Optimized: O(nm+H), Original: O(nL+G) • H = number of features in succinct form • G = O(L2H) (In real-data |G| = 5--10 times |H|) • Achieved via incremental computation of q • Special data structure for storing y to compute qi’:i in O(1) time from previous qi’:i-1 • Marginals m computed in sorted order: increasing start boundary, decreasing end boundary

  13. Empirical evaluation • Task: • Citations: Cora, articles (L=20) • Address: Indian address (L=7) • Features: • Token-level • Orthographic properties/lexicon match of words at the start, end, middle, left, right of segment • Entity-level • TFIDF Match with lexicon, entity length • Methods • Sequence-BCEU: Begin-Continue-End-Unique labels • Segment: Original un-optimized algorithm • Segment-Opt: Optimized inference with compact potentials

  14. Running time and Accuracy

  15. Limit on segment length (L) • L (Hard limit on segment length) • Too small  reduced accuracy 9081 • Too large  increased running time 30 minutes  1 hour • m (Maximum entity-level features) • Reduced by half  accuracy still 3% higher than Sequence • Too large  running time does increases by only 30%

  16. Concluding remarks • Segmentation models: natural, flexible, accurate • Main limitation: inference expensive • Solved via a compact design of shared potentials • New efficient inference algorithms • Pays penalty of entity-level features only when needed • Running time comparable to sequence models • No hard limit on segment length • Future work: • Features that are functions of distance from boundary • Other models: 2-D segmentation? Code: http://crf.sourceforge.net

More Related