Advanced best match
Download
1 / 26

Advanced best match - PowerPoint PPT Presentation


  • 298 Views
  • Uploaded on

Advanced best match Mark Sanderson Porto, 2000 Aims To build on what you have done so far, reviewing more sophisticated ways of ranking documents in relation to a query. Objectives At the end of this lecture you will be able to describe a range of statistically based approaches to IR namely

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Advanced best match' - albert


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Advanced best match l.jpg

Advanced best match

Mark Sanderson

Porto, 2000


Slide2 l.jpg
Aims

  • To build on what you have done so far, reviewing more sophisticated ways of ranking documents in relation to a query.


Objectives l.jpg
Objectives

  • At the end of this lecture you will be able to

    • describe a range of statistically based approaches to IR namely

      • Best match ranking

      • Passage retrieval

      • Web page popularity techniques

      • Web link counting


Slide4 l.jpg
Why?

  • Extend your knowledge in how IR systems work


Up till now l.jpg
Up till now…

  • Inverse document frequency

    • More frequent - bad

    • Less frequent - good

  • More often a term is used (term frequency)

    • More likely document is about that term

    • Depends on document length?


Basically covers it l.jpg
Basically covers it

  • More sophisticated formulas about

    • Basically tf*idf

    • Some variation, e.g.

      • tf normalisation

      • Divide by max tf?

      • Look at test collections

        • Singhal, A., Buckley, C., Mitra, M. (1996): Pivoted Document Length Normalization, in Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval : 21-29


Anything else l.jpg
Anything else?

  • How to make ranking better?

    • Term location?

      • Why? big documents

    • Web link analysis

      • Why?

    • Popularity

      • Why?


Warning l.jpg
Warning

  • Customise system to a collection…

    • …and it become less generally applicable.


Term location l.jpg
Term location

  • Prefer documents where terms are closer together?

    • Passage retrieval

      • Hearst, M.A., Plaunt, C. (1993): Subtopic structuring for full-length document access, in Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval: 59-68

      • Callan, J. (1994): Passage­Level Evidence in Document Retrieval, in Proceedings of the 17th annual international ACM-SIGIR conference on Research and development in information retrieval: 302-310


Hearst and plaunt l.jpg

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system searching over two years of the Financial Times newspaper. The system, called NRT (News Retrieval Tool), was described in a paper by Donna Harman, "User Friendly Systems Instead of User-Friendly Front Ends". Donna's paper appears in JASIS and the "Readings in IR" book by Sparck-Jones and Willett.

1st

2nd

3rd

4th

Hearst and Plaunt

  • Split document into passages

  • Rank a document based on scores of each of it’s passages

  • What is a passage?

    • Paragraph?

    • TextTiling?

    • (non-overlapping) Fixed window?


Texttiling l.jpg
TextTiling?

  • Compute sentence block similarity


Results l.jpg
Results?

  • TextTiling and paragraphs work equally well

  • Fixed windows work less well.


Callan l.jpg

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system searching over two years of the Financial Times newspaper. The system, called NRT (News Retrieval Tool), was described in a paper by Donna Harman, "User Friendly Systems Instead of User-Friendly Front Ends". Donna's paper appears in JASIS and the "Readings in IR" book by Sparck-Jones and Willett.

1st

2nd

3rd

4th

Callan

  • Split document into passages

  • Rank a document based on score of its highest ranking passage

  • What is a passage?

    • Paragraph?

      • Bounded paragraph

    • (overlapping) Fixed window?


Results14 l.jpg
Results?

  • Overlapping passage did the best.




How do they do that l.jpg
How do they do that?

  • Best passage retrieval in the result list

    • Tombros, A. and Sanderson, M. Advantages of query-biased summaries in IR, in Proceedings of the 21st ACM SIGIR conference, Pages 2-10, 1998


Web link analysis l.jpg

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Web link analysis

  • Expand representation of document with text

    • Image retrieval by hypertext linksV. Harmandas, M. Sanderson and M.D. DunlopIn the Proceedings of the 20th ACM SIGIR conference, Pages 296-303, 1997


Some sites better than others l.jpg

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Research interests/publications/lecturing/supervising

My publications list probably does a reasonable job of ostensively defining my interests and past activities. For those with a preference for more explicit definitions, see below.

I'm now working at the CIIR with an interest in automatically constructed categorisations and means of explaining these constructions to users. I also plan to work some more on a couple aspects of my thesis that look promising.

Supervised by Keith van Rijsbergen in the Glasgow IR group, I finished my Ph.D. in 1997 looking at the issues surrounding the use of Word Sense Disambiguation applied to IR: a number of publications have resulted from this work. While doing my Ph.D., I was fortunate enough to apply for and get a small grant which enabled Ross Purves and I to investigate the use of IR ranked retrieval in the field of avalanche forecasting (snow avalanches that is), this resulted in a paper in JDoc. At Glasgow, I also worked on a number of TREC submissions and also co-wrote the guidelines for creating the very short queries introduced in TREC-6. Finally, I was involved in lecturing work on the AIS Advanced MSc course writing and presenting two short courses on Implementation and NLP applied to IR. I also supervised/co-supervised four MSc students. The work of three of these bright young things have been published in a number of good conferences.

My first introduction to IR was the building (with Iain Campbell) of an interface to a probabilistic IR system

Some sites better than others?

  • Hubs and authorities

    • Search for “Harvard”

      • Gibson, D., Kleinberg, J., Raghavan, P. (1998): Inferring Web Communities from Link Topology, in Proceedings of The 9th ACM Conference on Hypertext and Hypermedia: links, objects, time and space—structure in hypermedia systems: 225-234


Like google l.jpg
Like Google?

  • www.google.com

  • Base ranking on best match plus authority weight

  • Harder to spam authorities?


Popularity l.jpg
Popularity?

  • Query IMDB (www.imdb.org) for “Titanic”

Titanic (1915)

Titanic (1997)

Titanic (1943)

Titanic (1953)

Titanic 2000 (1999)

Titanic: Anatomy of a Disaster (1997)

Titanic: Answers from the Abyss (1999)

Titanic Chronicles, The (1999)

Titanic in a Tub: The Golden Age of Toy Boats (1981)

Titanic Too: It Missed the Iceberg (2000)

Titanic Town (1998)

Titanic vals (1964)...aka Titanic Waltz (1964) (USA)

Atlantic (1929)...aka Titanic: Disaster in the Atlantic (1999) (USA: video title)

Night to Remember, A (1958)...aka Titanic latitudine 41 Nord (1958) (Italy)

Gigantic (2000)...aka Untitled Titanic Spoof (1998) (USA: working title)


Use popularity l.jpg
Use popularity

  • Query “titanic” on IMDB

    • Titanic (1997)

    • Titanic Too: It Missed the Iceberg (2000)


Other examples l.jpg
Other examples

  • Direct Hit

    • www.directhit.com

    • Increasingly popular



Url specific stuff l.jpg
URL specific stuff

  • Prefer web pages with short URLs

    • http://dis.shef.ac.uk/mark/

    • http://dis.shef.ac.uk/mark/trivia/links/stuff/

  • Prefer with “.com”

    • not “.pt”?


Conclusions l.jpg
Conclusions

  • By now, you will be able to

    • describe a range of statistically based approaches to IR namely

      • Best match ranking

      • Passage retrieval

      • Web page popularity techniques

      • Web link counting


ad