1 / 52

Creating a Path through the Labyrinth

Creating a Path through the Labyrinth. Using Sitemaps to Enhance Discoverability. Sandra McIntyre, Mountain West Digital Library Anne Morrow, University of Utah Patrick OBrien, University of Utah (volunteer) Lisa Chaufty, University of Utah. The Importance of Discoverability.

seth-barr
Download Presentation

Creating a Path through the Labyrinth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating a Path through the Labyrinth Using Sitemaps to Enhance Discoverability Sandra McIntyre, Mountain West Digital LibraryAnne Morrow, University of UtahPatrick OBrien, University of Utah (volunteer)Lisa Chaufty, University of Utah Western CONTENTdm Users Group

  2. The Importance of Discoverability

  3. Why is this a priority? • For our digital collections • The J. Willard Marriott Library hosts over 100 outstanding digital collections, containing over 1 million digital items. • For USpace, one of our large collections • Our mission is to collect, preserve, and provide access to the intellectual capital of the University of Utah, to reflect the University’s excellence, and to share that work with others.

  4. Sharing • Share what? • Unique materials, such as theses and dissertations • Above all, the work of our faculty • Q: Why do faculty submit to institutional repositories? • A: One primary motivator for faculty IR contribution is to make sure other scholars can find and cite their work. • A: Faculty will access and contribute to an IR if they see significant input activity. • How are people going to find what we want to share?

  5. De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-17. http://www.oclc.org/reports/pdfs/Percept_all.pdf

  6. De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-18. http://www.oclc.org/reports/pdfs/Percept_all.pdf

  7. In the beginning…

  8. Today…

  9. User discovery • Library Website • OAI harvesters (e.g., Mountain West Digital Library) • WorldCat • Search engines • Google • Google Scholar • Google Images • Yahoo • Bing • Baidu • And more…

  10. Focus on search engine discoverability • Googlebots • Crawls website and harvests links • Used by Google to build search index • We want to lend the bots a hand • What can we do to improve the bot-crawling of the digital library? • Understanding the relationship between Googlebots and CONTENTdm

  11. Cross-departmental collaboration • Search Engine Optimization (SEO) A-Team • Collection Managers • Lisa Chaufty, Coordinator of the Institutional Repository • Mountain West Digital Library • Sandra McIntyre, MWDL’s Program Director • IT Division • Kenning Arlitsch, Associate Director for IT Services • Systems Development • Application Development • Anne Morrow, Digital Initiatives Librarian • Search Engine Marketing (SEM) expertise • Patrick OBrien, Volunteer Advisor

  12. CONTENTdm and Google

  13. Google and CONTENTdm’s OAI • Big change in mid-2008:Google announced it would no longer crawl Open Archives Initiative (OAI) streams • Before then, Google would crawl OAI metadata for digital collections and index it • Many digital collections have been slowly “disappearing” from Google since then

  14. Index and store data Web Crawlers – how they work

  15. Dynamic pages • CONTENTdm constructs pages in HTML on the fly • Header • Record retrieved from database and formatted • Footer

  16. Dynamic page • Have to tell crawler how to assemble it (with URL)

  17. CONTENTdm: More challenges for crawlers • Compound objects use frameset that makes it harder to get to the content of the page • Table layout, Javascript, and CSS clutter up the code and are not used for structure of text, just for styling and management • Except for splash page, there is no hierarchy of linking that gives any context – “islands” of separate pages • Usable content is often far down in the code on the delivered page

  18. In the works • OCLC is working with Google and others to enhance visibility of resources in WorldCat • Expect more of a linkage between WorldCat and search engines

  19. Google Sitemaps

  20. Google Sitemaps • Instructions to Google’s web crawler: Crawl these URLs to get my content • One Sitemap for each collection • Not the same as a “site map” (contents page for website) “Here is a list of the URLs of the dynamic pages that I want you to crawl, one for each item.”

  21. Google Sitemap – example http://content.lib.utah.edu/sitemaps/sitemap_ir-main-001.xml

  22. Sitemap Index • XML file listing all the Sitemaps on your server “Here is a list of all the Sitemap files.”

  23. Sitemap Index - example http://content.lib.utah.edu/cdm4/autositemap/sitemapindex.xml

  24. Implementing Google Sitemaps • Create Sitemaps, one for each collection, and Sitemap Index. • Register with Google Webmaster Tools. • Inform Google about the location of your Sitemap Index. • In Webmaster Tools • In the robots.txt file on the server • Monitor crawler results.

  25. Step 1: Create Sitemaps and Index • According to the protocol at http://www.sitemaps.org: • Create a Sitemap file for each collection. • Create a Sitemap Index file. • See Terry Reese’s “makemap” script athttp://digitalcollections.library.oregonstate.edu/php/makemap.txt

  26. Step 2: Webmaster Tools Registration • Register (free) with Google Webmaster Tools at http://www.google.com/webmasters/tools • You will need a Google account • Follow the directionsto prove your controlover the site by adding a <meta> tag to the home page

  27. Step 2: Webmaster Tools Registration

  28. Step 3: Inform Google • Step 3A: Submit the address of Sitemap Index file on Webmaster Tools.

  29. Step 3: Inform Google • Step 3B: Modify the robots.txt file at the root of your CONTENTdm server to specify the location of the Sitemaps Index.

  30. Step 4: Monitor crawler results • Monitor crawler results on Webmaster Tools. • Top search queries • Links to your site • Keywords • Internal links • Crawl errors • Crawl stats • HTML suggestions

  31. Using and Maintaining Sitemaps • Re-generating (updating) Sitemaps and Sitemap Index frequently • Checking the crawler stats in Google Webmaster Tools and initiating changes as needed • Noticing the impact in Google searches

  32. Initial Progress

  33. Initial approach to SEO • Generated Sitemaps and Sitemap Index file: Applied a variation of Terry Reese’s makemap.php script • Edited robots.txt file • Registered with Webmaster Tools • Started observing crawler statistics

  34. The Next Phase • Enlisted expertise of Patrick OBrien • Sitemaps • Diagnose crawl error reports • Make recommendations • Proposing strategy for the future of SEO

  35. Short-term Opportunities

  36. Know your customers and what they value. Faculty • Publication Page Views • Publication Downloads • Requests for Information • Publication Citations Value Value High High Collection Donors Digital Collection Pages Indexed Digital Collection Page Views Digital Collection Visitors Requests for More Info Physical Collection Visitors Reproductions Ordered

  37. Why can’t the public find our content? Public CONTENTdm ? ? ? ? What do they value? • Are you worthy enough for their customer (i.e Index)? • How much will their customer value the introduction (i.e, Visibility)?

  38. Are you worthy enough for their customer? Can they trust you with their customer? Is your content worth an investment of their resources?

  39. Check the Crawl Errors Page Forbidden (401 errors) User Not Authorized (403 errors) Network Unreachable (5xx errors) Page Not Found (404 errors)

  40. Address errors and don’t leave users stranded! Low Trust Example 403 Error

  41. Eliminate sitemap and robots.txt conflicts Robots.txt Sitemap User-agent: * Disallow: /dmscripts/ Disallow: /cdm4/admin/ Disallow: /cdm4/client/ Disallow: /cdm4/cqr/ Disallow: /cdm4/images/ Disallow: /cdm4/includes/ Disallow: /cdm4/jscripts/ Disallow: /cdm-diagnostics/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /u/ User-agent: * Disallow: /dmscripts/ Disallow: /cdm4/admin/ Disallow: /cdm4/client/ Disallow: /cdm4/cqr/ Disallow: /cdm4/images/ Disallow: /cdm4/includes/ Disallow: /cdm4/jscripts/ Disallow: /cdm-diagnostics/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /u/ http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith

  42. Provide sitemaps linking context with simple URLs http://content.lib.utah.edu/ http://content.lib.utah.edu/cdm4/az.php#D http://content.lib.utah.edu/cdm4/az_details.php?id=44 http://content.lib.utah.edu/cdm4/browse.php?CISOROOT=/DardHunter http://content.lib.utah.edu/cdm4/document.php?CISOROOT=/DardHunter&CISOPTR=1919

  43. Increase Page Crawl efficiency A Papermaking Pilgrimage to Japan, Korea and China

  44. Are you worthy enough for their customer? • Can they trust you with their customer? • Check the Crawl Errors in Google Webmaster • Address errors and don’t leave their customers stranded! • Is your content worth an investment of their resources? • Eliminate sitemap & robots.txt conflicts • Provide sitemaps linking context with simple URLs • Increase Page Crawl efficiency

  45. How much will their customer value the introduction (i.e., Visibility)? Is your content relevant? Is your content credible? Is your content accessible?

  46. Summary

  47. Recommendations for CONTENTdm managers • Assemble the right players: your SEO team • Set your priorities • Create linking strategy from home page, to index of collections, to collection “splash” page, and to item pages • Create Sitemaps and Sitemap Index file • Set up a regular process to update Sitemaps • Eliminate any conflicts between robots.txt and Sitemaps • Get involved with CONTENTdm’s new Web templates • Set up to monitor results!

More Related