Stemming Algorithms. 資訊擷取與推薦技術：期中報告 指導教授：黃三益 老師 學生： 9142608 黃 哲修 9142609 張家豪. Outline. Introduction Types of stemming algorithms Experimental evaluations of stemming Stemming to compress inverted files Summary Appendix. Introduction.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
學生： 9142608 黃哲修
measured with recall and precision, and on their speed, size, and so on
Test Word: READABLE
Corpus: ABLE, APE, BEATABLE, FIXABLE, READ, READABLE,
READING, READS, RED, ROPE, RIPE
has the successor j is given by
statistics => st ta at ti is st ti ic cs
unique digrams = at cs ic is st ta ti
statistical => st ta at ti is st ti ic ca al
unique digrams = al at ca ic is st ta ti
A and B are the numbers of unique digrams in the first and the second words. C is the number of unique digrams shared by A and B.
Then “ies” -> “y”
Then “es” -> “e”
Then “s” -> “NULL”
1.The measure , denoted m ,of a stem is based on its alternate vowel-consonant sequences.
2.*<X> ---the stem ends with a given letter X
3.*v*---the stem contains a vowel
4.*d ---the stem ends in double consonant
5.*o ---the stem ends with a consonant-vowel-consonant,sequence ,where the final consonant is not w, x or y
Suffix conditions take the form: (current_suffix == pattern)
if (the second or third rule of step 1b was used) step1b1(stem);
Lennon et al. report the following compression percentages for various stemmers and databases. It is obvious that the savings in storage can be substantial.
Compression rates also increase for affix removal stemmers as the number of suffixes increases.