PoS tagging and Chunking with HMM and CRF Dept. Of CSE IIT Madras Pranjal Awasthi, Delip Rao, Ravindran Balaraman
Outline • Overview of the system • PoS tagging with HMM • Chunking with CRF • Results • Summary
Overview of the system Aim: To leverage existing tools and algorithms (for English) for the NLPAI task Tools used: TnT tagger, TBL, MALLET
Overview of the system TNT CRF (MALLET) + TBL PoS Tagging Chunking
The TnT tagger (Brants, 2000) • A Second Order Hidden Markov Model based tagger • Used for English and other languages • On NLPAI dataset, TnT alone gave F1=78.9 • Why TnT? • PoS tagging a sequence labeling task • HMM, CRFs are good candidates
Poor performance of CRFs in PoS tagging • For NLPAI dataset F1 = 69.4 • Features used: wi-1, wi-1wi, wi+1, wiwi+1 • Linear chain CRF was used (MALLET) • Reasons for poor performance • Large number of PoS tags (26) compared to Chunking • Selection of features • Type of CRF?
Transformation Based Learning (Brill, 1995) • Added as a post processing step to “correct” TnT output • Idea: • Derive correction rules during training based on observing what has gone wrong • Apply these rules for testing
Transformation Based Learning (contd …) • Use of TnT improved F1 by 1% • TnT is sensitive to the templates used • Possible improvements on template selection • Training time can be long unless indexing is used
Chunking with CRF • Based on (Sha & Periera, 2003) • Using SimpleTagger providedwith MALLET • Chunking accuracies
Summary • Demonstrated the use of off-the-shelf software for Tagging and Chunking • Only code written: TBL + glue scripts • Overall PoS F1 = 80.74 and Chunk F1 = 79.58 • Have we “hit the wall” in pure ML based tools • Not sure yet!