1 / 93

How Google Works: Are Search Engines Really Dumb and Should NTTIers Care?

How Google Works: Are Search Engines Really Dumb and Should NTTIers Care?. Paul Barron Director of Library and Archives George C. Marshall Foundation barronpb@marshallfoundation.org

dino
Download Presentation

How Google Works: Are Search Engines Really Dumb and Should NTTIers Care?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Google Works:Are Search Engines Really Dumb and Should NTTIers Care? Paul Barron Director of Library and Archives George C. Marshall Foundation barronpb@marshallfoundation.org All Right Reserved. This presentation may be copied and distributed for nonprofit educational purposes only. Revised November 2011

  2. We know our students … For them, “to Google” is a lifestyle, a habit pattern. Do you agree? “Whereas libraries once seemed like the best answer to the question, Where do I find…? the search engine now rules.” “No Brief Candle: Preconceiving Research Libraries for the 21st Century;” Part II Council of Library and Information Resources http://www.clir.org/pubs/reports/pub142/pub142.pdf JEFF STAHLER: (c) Columbus Dispatch Dist. by Newspaper Enterprise Association, Inc WVPT 2012 NTTI 2 2 2 2 2 2

  3. Are you surprised that … “The prevalence of Google in student research is well-documented, but the Illinois researchers found something they did not expect: students were not very good at using Google.” “They were clueless about how the search engine organizes and displays its results.” “Consequently, the students did not know how to build a search that would return good sources.” What Students Don’t Know Ethnographic Research in Illinois Academic Libraries Project Inside Higher Ed (http://tinyurl.com/3m6yyhp) WVPT 2012 NTTI

  4. Web Searching - A 21st Century Habit “Email and search form the core of online communication and online information gathering.  Perhaps the most significant change over that time is that have become more habitual.” Search and email still top the list of most popular online activities Pew Internet Project http://www.pewinternet.org/Reports/2011/Search-and-email/Report.aspx WVPT 2012 NTTI 4

  5. Predictable Behavior? “This notion of “doing a search” is rapidly becoming outmoded. My daughter is five; [h]er notion of “going to” an engine and searching is pretty comical. She just assumes that everything is always knowable, instantaneously, at her fingertips through some mechanism.” Stefan Weitz, Director – Bing Search Five Visionaries Sum Up The Future Of Search: Part II http://tinyurl.com/6zrgwzh WVPT 2012 NTTI 5

  6. From HabittoSecond Nature • “Googling has become second nature. Many college students turn to the Web as the starting point for serious academic research.” Getting Past Google: Perspectives on Information Literacy from the Millennial Mind Educause Learning Initiative http://net.educause.edu/ir/library/pdf/ELI3007.pdf • Question: When did these students begin this habitual behavior? WVPT 2012 NTTI

  7. If educators hope … To change students’ excessive use of Google, educators must embrace Google and learn how the search engine works, in order … To influence students to integrate Google use with other reliable sources of information. WVPT 2012 NTTI 7 7 7 7 7 7

  8. Why learn howGoogleworks? “Users are not familiar with how search engines “find” what they are looking for. [U]sers might benefit from having more information [how]Google“crawls” the Web and determines how a website is ranked.” In Google We Trust: Users’ Decisions on Rank, Position, and Relevance Laura Granka and others Journal of Computer-Mediated Communication User Experience Researcher Google, Inc WVPT 2012 NTTI 8 8 8 8 8 8

  9. Improve Students’ Media Literacy • They need to understand how the search engine determines and ranks results. • Why? Because … • “Students “trust” search engines and perceive credible sites because a site was returned at the top of the results by the search engine.” “Trust Online: Young Adults’ Evaluation of Web Content” Journal of Communications: 4 (2010), 468-494 ijoc.org/ojs/index.php/ijoc/article/download/636/423 WVPT 2012 NTTI

  10. Presentation Objective Increase our understanding of how search engines and Google work by dispelling search engine myths Propose a plan to increase the use of library research databases Not by excluding Google use Integrate Google use with use of library databases Goal - Enable us to help our students become betterresearchers WVPT 2012 NTTI 10 10 10 10 10 10

  11. Presentation Objective: Dispel … Search engine myths: understand a searcher’s query, treat all sites and domains the same when determining results, and determine the results based on the popularity of the site with searchers. But we’re not equal. I’m .edu. I’m .net. WVPT 2012 NTTI 11 11 11 11 11 11

  12. Presentation Objective: Dispel … • Search engine myths: • Google accepts payment for ranking a site higher in the search results. • Google removes sites from the database that staff find offensive or when requested by searchers. WVPT 2012 NTTI

  13. Myth: Google Accepts “Pay for Ranking” “At Google we take our commitment to delivering useful and impartial search results very seriously.” “We don’t ever accept payment to add a site to our index, update it more often or improve its ranking.” Matt Cutts Head of Google’s Webspam Team http://www.google.com/howgoogleworks WVPT 2012 NTTI

  14. The Center of the Search Universe 2000 2000 2012 http://www.bruceclay.com/serc_histogram/histogram.htm WVPT 2012 NTTI 14 14 14 14 14 14

  15. Google - it’s … “… the World Brain.” “How Google Dominates Us” The New York Review of Books http://tinyurl.com/3t3yg2f

  16. Google’sPower: From Duke U Law “Googlehas become the index of choice for online information; [it] steers our thoughts and learning online.Google’s control … constitutes an awesome ability toset the course of human knowledge.” Google’s Law Greg Lastowka Duke University School of Law http://works.bepress.com/cgi/viewcontent.cgi?article=1003&context=lastowka WVPT 2012 NTTI 16 16 16 16 16 16 16 16 16 16

  17. And from another law review … “Whoever controls search has enormous influence on us. Search engines shape what we read, who we listen to, and who gets heard. No search engine comes closer to controlling search than Google.” James Grimmelmann; "The Google Dilemma" New York Law School Law Review ; Jan. 2009: 939http: works.bepress.com/james_grimmelmann/19 WVPT 2012 NTTI 17 17 17

  18. Student’s# 1 Online Information Source Googlewas the go-to resource for almost all of the students in the sample. Nearly all of the students in the sample reported always usingGoogle, both for course-related research and everyday life research. “How College Students Seek Information in the Digital Age” http://tinyurl.com/yfp7ol5 Google Great! We are one level above gossip. WVPT 2012 NTTI 18 18 18 18 18 18 18

  19. Should educators be concerned? “There are consequences to our students and our educational system ifwe [allow] a search engine to define the parameters of effective research.” The University of Google: Education in the (Post) Information Age Tara Brabazon WVPT 2012 NTTI 19 19 19 19 19 19

  20. First: Learn how search engines work? “[S]uccessful search on the Web is difficult. Learning how to use search engines should be central in any Internet skills training. Novices in our study were ignorant about the limited scope of search engines and thenecessity to state a search query at an adequate level of specificity.” Web Search Behavior of Internet Experts and Newbies Christoph Hölscher & Gerhard Strube http://www9.org/w9cdrom/81/81.html Google WVPT 2012 NTTI 20 20 20

  21. Why learn how Googleworks? Because … “We expect a lot search engines. We ask them vague questions about topics that we are unfamiliar and anticipate a concise organized response.” “You would have better success if you laid your head on the keyboard and coaxed the computer to read your mind.” Understanding Search Engines: Mathematical Modeling and Text Retrieval Michael W. Berry and Murray Browne WVPT 2012 NTTI 21 21 21 21 21 21

  22. Why? For example: Define a hokie? WVPT 2012 NTTI “Human uses of language are often illogical, playfully misleading, false or nefarious, thus human semantics can never be made comprehensible to machines.” The Fate of the Semantic Web Pew Internet & American Life Project May 2010 http://pewinternet.org/2010/Semantic-Web.aspx 22 22 22 22

  23. We must understand that … “(S)earch engines have no understandingof words or language. (They) don't recognize user intent, can't distinguish goal-oriented search from browsing search.” A ResourceShelf Interview: 20 Questions with Dr. Gary Flake, Ph.D.Head of Yahoo! Research Labshttp://searchenginewatch.com/showPage.html?page=3372051Thursday, June 3, 2004 WVPT 2012 NTTI 23 23 23 23 23 23

  24. And in 2010 … WVPT 2012 NTTI “We can write a computer program to beat the very best human chess players, but we can't write a program to understand a sentence anywhere near the precision of a child.” “Helping Computers Understand Language” Steven Baker, Google Software Engineer OfficialGoogle Blog January 19, 2010 24 24 24 24 24 24 24

  25. And in 2011 … “Keyword-based search engines index web pages by keywords and not by concepts or topics. In fact they do not understand the content of the web pages.” Toward Topic Search on the Web Microsoft Research March 2011 http://research.microsoft.com/apps/pubs/default.aspx?id=145837 WVPT 2012 NTTI 25

  26. And in 2012 … “Google has a confession to make: It does not understand you. Google Fellow Amit Singhal says Googledoesn’t understand the question. ‘We cross our fingers and hope someone on the web has written about these things or topics.’ ” Google Knowledge Graph Could Change Search Forever http://mashable.com/2012/02/13/google-knowledge-graph-change-search/ WVPT 2012 NTTI

  27. Why? Because … “I’ve been at this for two decades now; search isn’t out of its infancy yet. The science is at the point where we are crawling. Soon we’ll walk. I hope in my lifetime; I’ll see search enter its adolescence.” Amit Singhal; Google Fellow “This is tough stuff” 25 February 2010 http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html WVPT 2012 NTTI 27 27 27

  28. IfGoogledoesn’t understand my query … … how does Googledetermine how to select and rank the results in response to my query? WVPT 2012 NTTI 28 28 28 28 28 28

  29. What Google Considers on the Webpage • Google’salgorithms rely on more than 200 unique signals to determine a ranking. For example, • how often the search terms occur on the webpage, • if the search terms appear in the title or URL, and • whether synonyms or the search terms occur on the page. Facts about Google and Competition www.google.com/press/competition/howgooglesearchworks.html WVPT 2012 NTTI

  30. WhatGoogle Considers Offthe Webpage: Links • PageRank • PageRank counts the number and the quality of links to a webpage to determine how important the website is. • The assumption is that important websites receive more links from other websites. Facts about Google and Competition www.google.com/press/competition/howgooglesearchworks.html WVPT 2012 NTTI

  31. The Value of Quality Links “With PageRank, five or six high-quality links from websites would be valued much more highly than twice as many links from less reputable or established sites.” Librarian Central How does Google collect and rank results? http://www.google.com/librariancenter/articles/0512_01.html WVPT 2012 NTTI 31 31 31 31 31

  32. What factor was missing in that table? The fact that the site is popular with us, the searchers who view the sites! WVPT 2012 NTTI 32 32 32 32 32 32

  33. Searchers’ Preferences–Low Importance WVPT 2012 NTTI 33 33 33 33

  34. Why not consider usage data? "We believe the approach which relies heavily on an individual's tastes and preferences [to rank results] just doesn't produce the quality and relevant ranking that our algorithms do." Amit Singhal; Google Fellow “This is tough stuff” 25 February 2010 http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html WVPT 2012 NTTI 34 34 34

  35. Let’s considerGoogle’schallenge. “Every day Googleanswers more than one billion questions from people around the globe in 181 countries and 146 languages. 15% of the searches we see everyday we’ve never seen before.” Facts about Google and Competition http://www.google.com/press/competition/howgooglesearchworks.html One Billion Dollars WVPT 2012 NTTI

  36. Googleand Usage Data “Peter Norvig confirmed that Google does collect usage data. However when Googletries new ranking models, Google does not usereal usage datato tune their search ranking algorithm.” “How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u WVPT 2012 NTTI 36 36 36 36 36 36

  37. Why!?! First: “We have all been trained to trust Googleand click on the firstresult.” Ibid College students trustGoogle;they click onthe number one abstract most of the time, even when the abstracts are less relevant.” In Google We Trust: Users’ Decisions on Rank, Position, and Relevance Laura Granka Journal of Computer-Mediated Communication WVPT 2012 NTTI 37 37 37 37 37 37

  38. Trusting Googletoo Much? “Second: For informational queries … if a result on page 4, provides better information than the results on the first three pages, users will not know this result exists! Therefore, usage behavior does not provide the best feedback on the rankings.” “How Google Measures Search Quality” Datawocky http://tinyurl.com/6mpt4u But we are the best results! WVPT 2012 NTTI 38 38 38 38 38 38

  39. GoogleGullibility “Many users are at the search engine's mercy and mainly click the top links — a behavior [called]Google Gullibility. Sadly, while these top links are often not what they really need, users don't know how to do better.” Jakob Nielsen's Alertbox, February 4, 2008 User Skills Improving, But Only Slightly http://www.useit.com/alertbox/user-skills.html WVPT 2012 NTTI

  40. From GoogleGullibility to “Blind Trust” “[S]tudents and faculty appear to be satisfied, especially with Google. [Librarians] stressed that there is blind trust and an increasing reliance on search results, especially on whatever appears on the first couple of screens.” Search engine use behavior of students and faculty: User perceptions and implications for future researchOya Y. Rieger. First Mondayhttp://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2716/2385 WVPT 2012 NTTI 40 40 40

  41. And look at the first three results. “… 100% of participants looked at the top of the page, 85% looked at the bottom listing. Anything below the fold dropped dramatically to 50% at the top and a lowly 20% at the bottom.” Eye Tracking Web Usability Study Reveals the “Golden Triangle” June 14, 2010 http://tinyurl.com/5tj4mqw WVPT 2012 NTTI

  42. Consider this … “The computer screenis … literally a small thing [that] may display just over 300 words. If this world becomes our reality, we actually are relying on less information, not the more that is available.” “The Google-ization of Knowledge” Natasja Larson, Laura Servage, and Jim Parsons ; Faculty of Education; University of Alberta http://www.eric.ed.gov/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/28/03/99.pdf WVPT 2012 NTTI 42 42 42 42 42 42

  43. Google doesn’t need to consider … … the popularity of a website with searchers because their algorithm is so up-to-date that Google always returns the best results. Right? RIGHT! RIGHT! WVPT 2012 NTTI 43 43 43 43 43 43

  44. Relevance in Google= Only an Opinion Google’s… “assessments of the "value" of a web pageare subjectively-determined [by] formulae to come up with a ranking. PageRanksare opinions. They're professional opinions, but they remain opinions.” “Google Replies to SearchKing Lawsuit” http://research.yale.edu/lawmeme/ Thursday, January 9, 2003 Googlev. WVPT 2012 NTTI 44 44 44 44 44 44

  45. Google'srankings areprotected opinion. "The court simply finds there is no conceivable way to prove that the relative significance assigned to a given Web site is false. Accordingly, the court concludesGoogle'sPageRanks are entitled to full constitutional protection.” “Judge Dismisses Suit Against Google” http://news.cnet.com/2100-1032_3-1011740.html May 30, 2003 WVPT 2012 NTTI 45 45 45 45 45 45

  46. Evaluating Google’sOpinion Google returns all sites with the phrase, martin luther king. WVPT 2012 NTTI 46 46 46 46 46 46

  47. Google’s 4th Result as of 11-10-2011 WVPT 2012 NTTI 47 47 47 47 47 47

  48. Martin Luther King.org Homepage WVPT 2012 NTTI 48 48 48 48 48 48

  49. Martin Luther King.org is hosted by … WVPT 2012 NTTI 49 49 49 49 49 49

  50. The student wants to know … Why was that site returned as the 4th result among the millions of results!?! I thought Google and other search engines always returned the best results. WVPT 2012 NTTI 50 50 50 50 50 50

More Related