1 / 64

Google Scholar - pros and cons

Google Scholar - pros and cons. Roger Mills and Sue Bird February 2009. Today.

furness
Download Presentation

Google Scholar - pros and cons

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Google Scholar - pros and cons Roger Mills and Sue Bird February 2009

  2. Today • Google Scholar offers a very convenient method of retrieving article citations and often the accompanying full text, and is growing in popularity. This session offers tips on using it effectively and extending your search to other sources should Scholar's coverage prove inadequate for your purposes. • What Google does and doesn't do • Coverage of Google Scholar (GS) • Setting up GS to retrieve local full-text • Other customisation of GS • Comparison GS, Web of Science, SCOPUS etc

  3. Welcome to the Web The world’s biggest haystack

  4. What can you do in a haystack? • Romp about • Get hay fever • Have unexpected encounters • Sleep • Not do research • So why would you start there?

  5. Finding needles • Google helps you find needles in haystacks But: • Google is an index of web pages • A journal article is not [necessarily] a web page • So Google is not good at finding journal articles However: • An image of a journal article may be placed on a web page • So Google may find it • If it’s free and not behind a firewall • How do you know?

  6. Google is fast • Very fast • Proudly fast • Tells you how fast • Found OUCS home page in 0.08 secs • Also found 428,000 other ‘relevant’ pages • But put home page first • Brilliant - How does it do it? • Not telling….

  7. Did I need 428,000 references? • Nobody looks at all the references Google retrieves • So why display them? • Algorithm takes into account links made by other pages • And click-throughs • So the top result for a given search is determined over time by the people who make that search • Is that the same as the ‘best’ result? • It means Google can work out appropriate advertising to display

  8. OK, how would you do it? To index a document, I’d read it first. • Google can’t read • We don’t read the web – we view it • We remember references visually – that red book on the third shelf down… • If Google can list all the red books on all the third shelves down in all the world I’m bound to find it, right? • Actually I remember I saw in Oxford, so I just need to list all the red books in Oxford – doddle That’s not really how Google works – is it? based on memory, rather than problem analysis?

  9. So you read the article, and then…? Give it some index terms • Not ones I’ve just made up, but ones from a standard list. • That way, everyone will know what the article’s about, and every article on the same topic can be found. • Provided everyone agrees what the article’s about. Then I’d list the authors in a standard form: so everything by Roger Mills, Roger Anthony Mills, Roger A Mills, R Anthony Mills, Anthony Mills, R A Mills can be found in one go. • That’s a controlled vocabulary. • Works for journal titles too.

  10. Google doesn’t do that • No controlled terms • So you must think of synonyms, different forms of name, title abbreviations etc • You must also define the context – that matters….

  11. Knitting according to Google

  12. OK, we get it. So let’s invent… • Let’s team up with publishers so they let us search behind their firewalls • Let’s modify our algorithm so it excludes non-scholarly material (how do we define that?) • Let’s look at citations so when one article we index cites another one we index, we can move it higher up the relevance ranking • Let’s link together different versions of the same article • Let’s include library locations for full-text access • Let’s see how it goes

  13. But let’s not allow: • creation of sets • Or controlled vocabularies • Or combining of searches • Or hit rate figures for individual search terms • Or proximity searching • Or saving and e-mailing results • Or creation of alerts • Or standardisation of journal names/abbreviations • Or info on what is included and what is not • Or info on how the system decides what is scholarly • Or an indication of update frequency – seems slower than normal Google

  14. Which of these statements is true? • Google is comprehensive • Google is all I need • Google is up-to-date • Google is not evil • Google is commercial • Google is independent • Google is secretive • Google wants to rule the world • Google wants to beat Microsoft • Google loves me • I love Google

  15. Google is a family • A range of products under a common brand • Some add value to the basic search engine; others are nothing to do with searching • Google Scholar is a variant of the standard search engine • It uses a different algorithm, but we don’t know how it differs

  16. What’s in Google Scholar? “Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research.”

  17. NB: only in Beta • Launched 18 Nov 2004 but still beta - features change • Developing in tandem with Google Books, which includes digitised texts from Oxford collections and others • In competition with WoK, SCOPUS etc

  18. Content • Algorithm to identify scholarly materials crawled by Google from the open web • Access to materials locked behind subscription barriers • Must include abstract • Full-text access requires institutional subscriptions or individual payment, unless open-access • Includes peer-reviewed papers, theses, books, preprints, abstracts, full-text, citations, etc. • Mostly post 93? • Updated 2-3 monthly?

  19. Library links • Includes OpenURL links to local library holdings • In Oxford displays as ‘Oxford Full Text’ beside title • May need to set this up in ‘Scholar preferences’

  20. Includes citation data • Uses ‘citation extraction’ to build connections between papers • ‘Cited by’ link lists items (known to Google Scholar) that cite the original paper • Cited items not available online are listed with prefix [citation] • ‘Citation analysis’ puts the most-cited papers at the top of the results list

  21. Citation analysis • ‘Cited by’ numbers will differ in GS, WoS, Scopus because based on different literature sets For a recent comparison see: • Harzing, Anne-Wil K. and Ron van der Wal Google Scholar as a new source for citation analysis Ethics in science and environmental politics, Vol. 8: 61–73, 2008 http://www.int-res.com/articles/esep2008/8/e008p061.pdf

  22. From that article: • as a general rule of thumb, we would suggest that using GS might be most beneficial for 3 of the GS categories: (1) business, administration, finance & economics; (2) engineering, computer science & mathematics; (3) social sciences, arts & humanities. • Although broad comparative searches can be done for other disciplines, we would not encourage heavy reliance on GS for individual academics working in other areas without verifying results with either Scopus or WoS.

  23. and Meho & Yang (2007) [found] that GS missed 40.4% of the citations found by the union of WoS and Scopus, suggesting that GS does miss some important refereed citations. It must also be said though that the union of WoS and Scopus misses 61.04% of the citations in GS. Further, Meho & Yang (2007) found that most of the citations uniquely found by GS are from refereed sources. The social sciences, arts and humanities, and engineering in particular seem to benefit from GS’s better coverage of (citations in) books, conference proceedings and a wider range of journals. The natural and health sciences are generally well covered in ISI and hence GS might not provide higher citation counts. In addition, user feedback … seems to indicate that for some disciplines in the natural and health sciences GS’s journal coverage is very patchy.

  24. Searching • AND implied between words as in normal Google • + to include common words, letters or numbers that Google’s search technology generally ignores • “quote marks” to search for a phrase • minus sign – to exclude from a search • OR for either search term • author: for author search • intitle: to search document title • restrict by date and publication Advanced search screen available

  25. Help screen - original version

  26. For example Always worth searching Google too Let’s try rhinoceros tusks Context might be • Ecology • Law • Medicine • Art etc

  27. Alternatives to Google • Google it! • Try www.altsearchengines.com for specialised alternatives • Use Intute www.intute.ac.uk for reputable human-selected sites, chosen for a UK academic audience • Check OxLIP+ www.oxlip-plus.ouls.ox.ac.uk for complete listing and subject guide to university-subscribed databases. Most list the sources they cover and use controlled vocabularies for indexing

  28. An example of Google’s strengths • and weaknesses in finding a specific article: a search done in 2005 and repeated in Nov 2006:

More Related