210 likes | 283 Views
Explore term extraction from financial news, analyzing metrics like Frequency, Conditional Probability, and Mutual Information. Identify uni-grams, bi-grams, tri-grams based on frequency and MI values. Learn about Extreme Status indicators and potential further work like PAT-Tree and Pattern Filter.
E N D
Term Extraction from Financial News Jian-Shiun 2008/10/31
Data Collection • Period:2008/10/10 ~ 2008/10/30 • Number of news:1,987
grams docs
Metrics • Frequency • Conditional Probability • Mutual Information
Mutual Information • If f(w) ≥ f(c1) f(c2)… f(cn), then Mi(w) ≥ 0
Extreme Status Using MI • f(w) is very low, and MI is very high* • f(w) is very low, and MI is very low • f(w) is very high, and MI is very high* • f(w) is very high, and MI is very low
Further Work • PAT-Tree • Pattern Filter • Cross Validate with CKIP
Reference • 劉開瑛(2000),中文文本自動分詞和標註,北京:商務印書館。