1 / 28

The SearchMaster's Toolbox

The SearchMaster's Toolbox. David Hawking. ECIR Industry Day 01 Apr 2010. UK Customers. From 2004/5 : Staffordshire University, Scottish Care Commission From 2009 :The Electoral Commission, Digital UK, Hargreaves Lansdown

zamir
Download Presentation

The SearchMaster's Toolbox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The SearchMaster's Toolbox David Hawking ECIR Industry Day 01 Apr 2010

  2. UK Customers From 2004/5: Staffordshire University, Scottish Care Commission From 2009:The Electoral Commission, Digital UK, Hargreaves Lansdown From 2010: London School of Economics and Political Science, Incisive Media, British Medical Journal, East Ayrshire Council, ...

  3. “Search is life”

  4. Costs of poor search Butler Group: Up to 10% of salary costs wasted through ineffective search IDC: A company with 1000 information workers can expect to waste more than $5M p.a. due to poor search Accenture: A survey of 1000 middle managers spend as long as 2 hrs/day searching for information.

  5. Who's the SearchMaster in your organisation?

  6. Stakeholders expect every SearchMaster to do her duty! To make external website search work Sales conversions Information dissemination Reduced inquiry handling load To provide effective search of corporate information Happy, productive employees (plus students and other stakeholders)

  7. Give them the tools and they will do the job! Searchmaster End-user • Simple • Powerful

  8. 1. The basic search tool Should: Have good performance out of the box, without weeks of implementation. Be simple to configure Avoid features which are too complex to use or set up. Be able to cover your content and scale to the necessary level

  9. 2. FineTuner Every search deployment is different Web, database, fileshare, Lotus The weighting of ranking features must accommodate to the differences Manual tweaking is fraught with danger Fix one query, break a dozen Make a test file and use a tuning tool to learn feature weightings

  10. Testfile Desiderata Representative of real workload Need an unbiased sample Many queries (typically >> 100) Multiple weighted answers (where applicable) Redirects Equivalent answers See es.csiro.au/C-TEST/

  11. Academic Research on Evaluation Masses of academic research How does it translate to tuning an enterprise search system? Setting good defaults Tuning to specific characteristics in hundreds of customer deployments Note: the system starts with no user interaction data. Creation of testfiles must be affordable.

  12. Spreadsheet testfile

  13. LSE Case Study

  14. Sources of testfiles at LSE A-Z Sitemap (>500 entries) Biased toward anchortext Keymatches file (>500 entries) Pessimistic Click data (>250 queries with > t clicks) Biased toward clicks – 100% success! Pop/crit queries (134 manually judged) All biased – Use a sampling tool!

  15. Dimension-at-a-time tuning dim2 3 1 2 dim1

  16. Popular/Critical Set

  17. Fine Tuning Summary Tuning a large number of dimensions (Funnelback FineTune covers 38) Millions of query executions Achieves substantial gains

  18. But why do queries still fail? Misspelled Europian Conferense oninformation retreival Query words don't match document “door” or “MOPEM” v. “manually operated personnel egress mechanism” There is no answer to that question. Maybe there should be Scope issues.

  19. Need more tools!

  20. 3. Spelling suggestion tools Suggestions may be useful even if words are correctly spelled: Carlton furball club → Carlton football club Suggestions based on whole query, not word-by-word Don't suggest queries which make no sense in the collection being searched Autocompletion: Guide users to the best query Context is king

  21. 4. Query expansion tools Manual rules: Rego → [registration rego] MOPEM →[“manually operated personnel egress mechanism door”] Related queries (automatic) Based on co-clicking Contextual navigation (on-the-fly) Finding superphrases in a deep result set Faceting (semi-automatic)

  22. 5. Reporting and alerting tools Reporting on Queries which: Produced no results Logged behaviour suggestive of unfulfilment Alerting when: Submissions of a query (or group of related queries) sharply increase in frequency For: business intelligence Triggering creation or changes to content

  23. Query Spike Alerting

  24. Conclusions • Search is important • Organisations benefit when someone takes responsibility for effective search – the SearchMaster. • Academic research into evaluation needs careful translation for use in enterprise search tuning. • Further tools are needed to overcome poor queries and missing content. Thanks to Mike Swanson of Oxfam Australia for the Ned Kelly line.

More Related