discussion class 3 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Discussion Class 3 PowerPoint Presentation
Download Presentation
Discussion Class 3

Loading in 2 Seconds...

play fullscreen
1 / 10

Discussion Class 3 - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Discussion Class 3. The Porter Stemmer. Discussion Classes. Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When answering: Stand up. Give your name. Make sure that the TA hears it. Speak clearly so that all the class can hear.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Discussion Class 3' - kalb


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
discussion class 3

Discussion Class 3

The Porter Stemmer

discussion classes

Discussion Classes

Format:

Questions.

Ask a member of the class to answer.

Provide opportunity for others to comment.

When answering:

Stand up.

Give your name. Make sure that the TA hears it.

Speak clearly so that all the class can hear.

Suggestions:

Do not be shy at presenting partial answers.

Differing viewpoints are welcome.

question 1 stemming
Question 1: Stemming

Who wrote this paper? When? For what audience?

Define the terms: stem, suffix, prefix, conflation

What makes a good stemming algorithm? How would you measure it?

Porter proposes a criterion for removing suffixes. What is it? Do you agree with it?

question 2 effectiveness
Question 2: Effectiveness

Earlier system Present system

precision recall precision recall

0 57.24 0 58.60

10 56.85 10 58.13

20 52.85 20 53.92

30 42.61 30 43.51

40 42.20 40 39.39

50 39.06 50 38.85

60 32.86 60 33.18

70 31.64 70 31.19

80 27.15 80 27.52

90 24.59 90 25.85

100 24.59 100 25.85

Explain the data in this table.

The paper calls this, "the standard recall cutoff method". Have you any comments?

question 3 categories of stemmer
Question 3: Categories of Stemmer

The following diagram illustrate the various categories of stemmer. Porter's algorithm is shown by the red path. What do these terms mean?

Conflation methods

Manual Automatic (stemmers)

Affix Successor Table n-gram

removal variety lookup

Longest Simple

match removal

question 4 mechanics step 1a
Question 4: Mechanics Step 1a

The paper gives the following example of Step 1a. Explain what this step does.

Suffix Replacement Examples

sses ss caresses -> caress

ies i ponies -> poni

ties -> ti

ss ss caress -> caress

s cats -> cat

question 5 mechanics step 1b
Question 5: Mechanics Step 1b

Conditions Suffix Replacement Examples

(m > 0) eed ee feed -> feed

agreed -> agree

(*v*) ed null plastered -> plaster

bled -> bled

(*v*) ing null motoring -> motor

sing -> sing

(a) Explain this table

(b) How does this table apply to: "exceeding", "ringed"?

question 6 mechanics step 5a
Question 6: Mechanics Step 5a

Step 5a is defined as follows. What does this do and why?

(m>1) E -> probate -> probat

rate -> rate

(m=1 and not *o) E -> cease -> ceas

question 7 ad hoc decisions
Question 7. Ad hoc decisions

Discuss the following:

"The algorithm is careful not to remove a suffix when the stem is too short, the length of the stem being given by its measure, m. There is no linguistic basis for this approach. It was merely observed that m could be used quite effectively to help decide whether or not it was wise to take off a suffix."

(a) What is m?

(b) Why is it a reasonable measure?

(c) What anomalies does it produce?

question 8 stemming in web searching
Question 8: Stemming in Web searching

(a) In Web search engines, the tendency is not to use stemming. Why? (There are several answers.)

(b) Does your answer to part (a) mean that stemming is no longer useful?