1 / 53

Google and Google Scholar

Google and Google Scholar . Roger Mills and Judy Reading May 2007 . Welcome to the Web. The world’s biggest haystack. What can you do in a haystack?. Romp Get hay fever Have unexpected encounters Sleep Not do research So what do you fancy?. Finding needles.

Download Presentation

Google and Google Scholar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Google and Google Scholar Roger Mills and Judy Reading May 2007

  2. Welcome to the Web The world’s biggest haystack

  3. What can you do in a haystack? • Romp • Get hay fever • Have unexpected encounters • Sleep • Not do research • So what do you fancy?

  4. Finding needles • Google helps you find needles in haystacks • But: • Google is an index of web pages • A journal article is not a web page • So Google is not good at finding journal articles • However: • An image of a journal article may be placed on a web page • So Google may find it • If it’s free and not behind a firewall • How do you know?

  5. Google is fast • Very fast • Proudly fast • Tells you how fast • Found OUCS home page in 0.09 secs • Also found 350,000 other ‘relevant’ pages • But put home page first • Brilliant - How does it do it? • Not telling….

  6. Did I need 350,000 references? • Nobody looks at all the references Google retrieves • So why display them? • Algorithm takes into account links made by other pages • And click-throughs • So the top result for a given search is determined over time by the people who make that search • Is that the same as the ‘best’ result?

  7. OK, how would you do it? • To index a document, I’d read it first. • Google can’t read • We don’t read the web – we view it • We remember references visually – that red book on the third shelf down… • If Google can list all the red books on all the third shelves down in all the world I’m bound to find it, right? • Actually I remember I saw in Oxford, so I just need to list all the red books in Oxford – doddle • That’s not really how Google works – is it?

  8. So you read the article, and then…? • Give it some index terms • Not ones I’ve just made up, but ones from a standard list. • That way, everyone will know what the article’s about, and every article on the same topic can be found. • Provided everyone agrees what the article’s about. • Then I’d list the authors in a standard form: so everything by Roger Mills, Roger Anthony Mills, Roger A Mills, R Anthony Mills, Anthony Mills, R A Mills can be found in one go. • That’s a controlled vocabulary. • Works for journal titles too.

  9. Google doesn’t do that • No controlled terms • So you must think of synonyms, different forms of name, title abbreviations etc • You must define the context – that matters….

  10. Knitting according to Google

  11. OK, we get it. So let’s invent… • Google Scholar • Let’s team up with publishers so they let us search behind their firewalls • Let’s modify our algorithm so it excludes non-scholarly material (how do we define that?) • Let’s look at citations so when one article we index cites another one we index, we can move it higher up the relevance ranking • Let’s link together different versions of the same article • Let’s include library locations for full-text access • Let’s see how it goes

  12. But let’s not allow: • creation of sets • Or controlled vocabularies • Or combining of searches • Or hit rate figures for individual search terms • Or proximity searching • Or saving and e-mailing results • Or creation of alerts • Or standardisation of journal names/abbreviations • Or info on what is included and what is not • Or info on how the system decides what is scholarly • Or an indication of update frequency – seems slower than normal Google

  13. Which of these statements is true? • Google is comprehensive • Google is all I need • Google is up-to-date • Google is not evil • Google is commercial • Google is independent • Google is secretive • Google wants to rule the world • Google wants to beat Microsoft • Google loves me • I love Google

  14. Google is a family • A range of products under a common brand • Some add value to the basic search engine; others are nothing to do with searching • Google Scholar is a variant of the standard search engine • It uses a different algorithm, but we don’t know how it differs

  15. What’s in Google Scholar? • “Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research.”

  16. NB: only in Beta • Features may change • Developing in tandem with Google Books, which will include digitised texts from Oxford collections and others • In competition with WoK, ScienceDirect, SCOPUS, Scirus etc

  17. Content • Algorithm to identify scholarly materials crawled by Google from the open web • Access to materials locked behind subscription barriers • Must include abstract • Full-text access requires institutional subscriptions or individual payment • Includes peer-reviewed papers, theses, books, preprints, abstracts, full-text, citations, etc.

  18. Library links • Includes OpenURL links to local library holdings • In Oxford displays as ‘Oxford Full Text’ beside title

  19. Includes citation data • Uses ‘citation extraction’ to build connections between papers • ‘Cited by’ link lists items (known to Google Scholar) that cite the original paper • Cited items not available online are listed with prefix [citation] • ‘Citation analysis’ puts the most-cited papers at the top of the results list

  20. Searching • AND implied between words as in normal Google • + to include common words, letters or numbers that Google’s search technology generally ignores • “quote marks” to search for a phrase • minus sign – to exclude from a search • OR for either search term • author: for author search • intitle: to search document title • restrict by date and publication • advanced search screen available

  21. Exercise • Try searching for: French national identity • In Google and Google Scholar • With and without quotation marks • Now try searching in Web of Science (or other relevant database) • Is it clear why results differ? • What approach provides the most useful results: • For writing a paper for publication • For quoting in a thesis • For preparing a speech • For preparing for a pub quiz • Or any other purpose…

  22. Help screens

  23. Earlier version

  24. Alternatives to Google • Google it! • See Charles Knight’s up-to-date ‘Top 100’ list in Reade/Write Web: http://www.readwriteweb.com/archives/top_100_alternative_search_engines_mar07.php • Use Intute www.intute.ac.uk for reputable human-selected sites, chosen for a UK academic audience • Check OxLIP www.ouls.ox.ac.uk/oxlip for complete listing and subject guide to university-subscribed databases. Most list the sources they cover and use controlled vocabularies for indexing

  25. An example of Google’s strengths • and weaknesses in finding a specific article: a search done in 2005 and repeated in Nov 2006:

  26. Biology search: glutathione in green Arabidopsis

  27. WoS

  28. Exact article in one step

  29. Scholar phrase search 2005: 15 results, this one at 7

  30. Scholar phrase search 2006: 16 results, this one first

  31. Scholar keyword search 2005:2420 results, this one at 10

  32. Scholar keyword search 2006:4800 results, this one first

  33. Google keyword search 2005:17600 results, this one first

  34. Google keyword search 2006:169000 articles, this one first

  35. Google phrase search 2005:59 results, this first

  36. Google phrase search 2006:86 results, this first

  37. Scholar 2005: ‘all 7 versions’

  38. Scholar 2005: cited by 2

  39. Scholar 2006: cited by 14

  40. WoS 2005: cited by 3

  41. WoS 2006: cited by 15

  42. Comparing citations data: 2005 X GS X SC X GS

  43. Comparing citations data: 2006 X GS

  44. Citations arranged by most cited

  45. SCIRUS phrase search: 2 journals, this first; 8 other web sources (inc previous versions of this talk!)

  46. SCIRUS keyword search: 735 journals, this first; 6996 others

  47. Biological Abs phrase search: exact match in 1note controlled keywords

  48. SCIRUS • Very similar to Scholar but can also: • Mark records • Save records • E-mail records • Export set in RIS format (for Endnote)

  49. Search on controlled terms in Biological Abstracts

  50. Omitting ‘green’, 14 results

More Related