200 likes | 315 Views
Commercial Online Databases and the Internet. OSS ‘99 Global Information Forum May 24, 1999 Anne Caputo Dow Jones Interactive Publishing. Traditional Search Services Challenge the Web. The Internet Searchoff September 1997-February 1998 Susan Feldman, DATASEARCH sef2@cornell.edu Goal
E N D
Commercial Online Databases and the Internet OSS ‘99 Global Information Forum May 24, 1999 Anne Caputo Dow Jones Interactive Publishing
Traditional Search Services Challenge the Web • The Internet Searchoff • September 1997-February 1998 • Susan Feldman, DATASEARCH • sef2@cornell.edu • Goal • Compare searching traditional online services with World Wide Web • Effectiveness in finding information • When to use which one • Strengths of each approach
Searchoff Ground Rules • Be a trained, experienced searcher • Use a real question from a client • Search either Dialog or Dow Jones Interactive • Relevance rank the results • Rank the top 30 retrieved documents on a scale of 1 to 5
Business Technology Medicine/Pharmaceuticals Science Humanities Engineering Other 38% 18% 14% 10% 8% 6% 6% Subjects Searched
Alta Vista Hotbot Excite Infoseek Lycos Webferret 45% 20% 14% 14% 5% 2% Web Search Engines Used
Internet Search-Off Results 1400 Web totals 1400 1200 Dlg/dj totals 1143 1000 W D 800 600 515 484 400 W D 200 0 Relevance Points # Documents
Searching time • Total minutes searching time: • DIALOG/DOW JONES: 594 minutes • WWW search engines: 1230 minutes • Plus formatting time
Searching Assumptions:traditional search engines • Information exists on the subject • The information is high quality • The information is current • The information is expensive • To find it, we need expertise and training to know how and where to search • It will be a surprise if we can’t find something
Searching assumptions:World Wide Web • There MIGHT be information on the topic • Quality and timeliness is unpredictable • The information is free • There’s no telling how the search engine works • searching requires no skill • searching requires no training • It will be a surprise if we find something
Series1 Series2 Retrieved Documents by Relevance 350 306 300 Web 250 200 147 150 -- DIALOG/ 117 Dow Jones 108 111 100 D D 60 W 52 38 50 34 D 26 D W w W 0 RANKED 1 RANKED 2 RANKED 3 RANKED 4 RANKED 5 Less Relevant More Relevant
Conclusion DIALOG training has influenced an entire generation of searchers: we automatically shift into Boolean
Digression: • Nested Boolean searches don’t take advantage of the strong points of Web search engines • Statistical search engines search a whole territory. Boolean engines search for a point in that territory
Web Strategies • Map the territory: • Use your searching skills to create lists of related terms • Omit Boolean operators; • Let the search engine work without interference • Put the most important and most rare words first • Use MORE LIKE THIS to improve results
Web Strategies • Use phrases when possible to eliminate irrelevant materials • Ignore the useless hits and pursue the good ones • Don’t worry about finding six million documents. • Just look at the top 30 • Rephrase the search • Move to another search engine if you don’t find anything
Conclusions: traditional search services • Predictable archives • Chemical Engineering • Electrical Engineering • Strengths • History and background on companies • History and historical figures • Market reports, industry reports
Conclusions: traditional search services • Current drug studies (authoritative) • Industry newsletters and journals • Financial industry coverage • Scholarly journal articles • High quality information • Quick searches when you know the information is likely to be there
Conclusions: The Web • Pictures and illustrations • Some conference coverage and papers • Product information comes from company • Small companies – products/ background • Medical statistics (current) • If you know where to find the information
Conclusions: use both • To supplement each other for: • Standards • Articles on topics of general interest • Popular subjects • Organizations • Directory information • Reviews/evaluations/how-to information
Conclusions: use both • Government regulations and other agency information • Competitive intelligence • Obscure topics • Clues for finding information on and offline
Conclusions: general • Time is money. • Free information that takes too long to find and format is expensive information • The Web is a new tool. • We need to learn to use both online sources well • Vary strategies and approach to take advantage of each medium