Download
pos tagging and chunking with hmm and crf n.
Skip this Video
Loading SlideShow in 5 Seconds..
PoS tagging and Chunking with HMM and CRF PowerPoint Presentation
Download Presentation
PoS tagging and Chunking with HMM and CRF

PoS tagging and Chunking with HMM and CRF

430 Views Download Presentation
Download Presentation

PoS tagging and Chunking with HMM and CRF

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. PoS tagging and Chunking with HMM and CRF Dept. Of CSE IIT Madras Pranjal Awasthi, Delip Rao, Ravindran Balaraman

  2. Outline • Overview of the system • PoS tagging with HMM • Chunking with CRF • Results • Summary

  3. Overview of the system Aim: To leverage existing tools and algorithms (for English) for the NLPAI task Tools used: TnT tagger, TBL, MALLET

  4. Overview of the system TNT CRF (MALLET) + TBL PoS Tagging Chunking

  5. The TnT tagger (Brants, 2000) • A Second Order Hidden Markov Model based tagger • Used for English and other languages • On NLPAI dataset, TnT alone gave F1=78.9 • Why TnT? • PoS tagging a sequence labeling task • HMM, CRFs are good candidates

  6. Poor performance of CRFs in PoS tagging • For NLPAI dataset F1 = 69.4 • Features used: wi-1, wi-1wi, wi+1, wiwi+1 • Linear chain CRF was used (MALLET) • Reasons for poor performance • Large number of PoS tags (26) compared to Chunking • Selection of features • Type of CRF?

  7. Transformation Based Learning (Brill, 1995) • Added as a post processing step to “correct” TnT output • Idea: • Derive correction rules during training based on observing what has gone wrong • Apply these rules for testing

  8. Transformation Based Learning (contd …) • Use of TnT improved F1 by 1% • TnT is sensitive to the templates used • Possible improvements on template selection • Training time can be long unless indexing is used

  9. Summary of PoS tagging Results

  10. Chunking with CRF • Based on (Sha & Periera, 2003) • Using SimpleTagger providedwith MALLET • Chunking accuracies

  11. Summary • Demonstrated the use of off-the-shelf software for Tagging and Chunking • Only code written: TBL + glue scripts • Overall PoS F1 = 80.74 and Chunk F1 = 79.58 • Have we “hit the wall” in pure ML based tools • Not sure yet!

  12. Thanks!