Personal Research in NLP --as a master of engineering student. Li Jun Department of Computer Science and Technology, Tsinghua University. Outline . TextMatrix Project Sentiment Classification Experiment1 Experiment2 Machine Translation. Text Matrix ( C++, cross-platform).
Department of Computer Science and Technology,
Availabe on my website: http://nlp.csai.tsinghua.edu.cn/~lj/
Abbreviations: Word-based unigram: WBU Word-based bigram: WBB
Character-based bigram: CBB Character-based trigram: CBT
SVM, NB, ME, ANN using WBU as features with different feature weights
SVM, NB, ME, ANN using WBU, WBB,CBB,CBT as features with some specified feature weighting scheme which obtained best performance.
Suffix tree for the string BANANA
a substring-group = a single feature
Zhang D, LeeWS. Extracting key-substring-group features for text classification. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge
discovery and data mining (KDD.06)
Modified Program which support chinese is available in TextMatrix v1.1
Text Classification Package is available at http://nlp.csai.tsinghua.edu.cn/~lj/
Soon published in my master thesis
echo “long time no see” | moses –config moses.ini
BEST TRANSLATION: 好久 不见