Navigating Digital Labyrinths: Enhancing Discoverability with Sitemaps

Creating a Path through the Labyrinth Using Sitemaps to Enhance Discoverability Sandra McIntyre, Mountain West Digital LibraryAnne Morrow, University of UtahPatrick OBrien, University of Utah (volunteer)Lisa Chaufty, University of Utah Western CONTENTdm Users Group

The Importance of Discoverability

Why is this a priority? • For our digital collections • The J. Willard Marriott Library hosts over 100 outstanding digital collections, containing over 1 million digital items. • For USpace, one of our large collections • Our mission is to collect, preserve, and provide access to the intellectual capital of the University of Utah, to reflect the University’s excellence, and to share that work with others.

Sharing • Share what? • Unique materials, such as theses and dissertations • Above all, the work of our faculty • Q: Why do faculty submit to institutional repositories? • A: One primary motivator for faculty IR contribution is to make sure other scholars can find and cite their work. • A: Faculty will access and contribute to an IR if they see significant input activity. • How are people going to find what we want to share?

De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-17. http://www.oclc.org/reports/pdfs/Percept_all.pdf

De Rosa, C., et al. Perception of Libraries and Information Resources. OCLC Membership Report, 1-18. http://www.oclc.org/reports/pdfs/Percept_all.pdf

In the beginning…

Today…

User discovery • Library Website • OAI harvesters (e.g., Mountain West Digital Library) • WorldCat • Search engines • Google • Google Scholar • Google Images • Yahoo • Bing • Baidu • And more…

Focus on search engine discoverability • Googlebots • Crawls website and harvests links • Used by Google to build search index • We want to lend the bots a hand • What can we do to improve the bot-crawling of the digital library? • Understanding the relationship between Googlebots and CONTENTdm

Cross-departmental collaboration • Search Engine Optimization (SEO) A-Team • Collection Managers • Lisa Chaufty, Coordinator of the Institutional Repository • Mountain West Digital Library • Sandra McIntyre, MWDL’s Program Director • IT Division • Kenning Arlitsch, Associate Director for IT Services • Systems Development • Application Development • Anne Morrow, Digital Initiatives Librarian • Search Engine Marketing (SEM) expertise • Patrick OBrien, Volunteer Advisor

CONTENTdm and Google

Google and CONTENTdm’s OAI • Big change in mid-2008:Google announced it would no longer crawl Open Archives Initiative (OAI) streams • Before then, Google would crawl OAI metadata for digital collections and index it • Many digital collections have been slowly “disappearing” from Google since then

Index and store data Web Crawlers – how they work

Dynamic pages • CONTENTdm constructs pages in HTML on the fly • Header • Record retrieved from database and formatted • Footer

Dynamic page • Have to tell crawler how to assemble it (with URL)

CONTENTdm: More challenges for crawlers • Compound objects use frameset that makes it harder to get to the content of the page • Table layout, Javascript, and CSS clutter up the code and are not used for structure of text, just for styling and management • Except for splash page, there is no hierarchy of linking that gives any context – “islands” of separate pages • Usable content is often far down in the code on the delivered page

In the works • OCLC is working with Google and others to enhance visibility of resources in WorldCat • Expect more of a linkage between WorldCat and search engines

Google Sitemaps

Google Sitemaps • Instructions to Google’s web crawler: Crawl these URLs to get my content • One Sitemap for each collection • Not the same as a “site map” (contents page for website) “Here is a list of the URLs of the dynamic pages that I want you to crawl, one for each item.”

Google Sitemap – example http://content.lib.utah.edu/sitemaps/sitemap_ir-main-001.xml

Sitemap Index • XML file listing all the Sitemaps on your server “Here is a list of all the Sitemap files.”

Sitemap Index - example http://content.lib.utah.edu/cdm4/autositemap/sitemapindex.xml

Implementing Google Sitemaps • Create Sitemaps, one for each collection, and Sitemap Index. • Register with Google Webmaster Tools. • Inform Google about the location of your Sitemap Index. • In Webmaster Tools • In the robots.txt file on the server • Monitor crawler results.

Step 1: Create Sitemaps and Index • According to the protocol at http://www.sitemaps.org: • Create a Sitemap file for each collection. • Create a Sitemap Index file. • See Terry Reese’s “makemap” script athttp://digitalcollections.library.oregonstate.edu/php/makemap.txt

Step 2: Webmaster Tools Registration • Register (free) with Google Webmaster Tools at http://www.google.com/webmasters/tools • You will need a Google account • Follow the directionsto prove your controlover the site by adding a <meta> tag to the home page

Step 2: Webmaster Tools Registration

Step 3: Inform Google • Step 3A: Submit the address of Sitemap Index file on Webmaster Tools.

Step 3: Inform Google • Step 3B: Modify the robots.txt file at the root of your CONTENTdm server to specify the location of the Sitemaps Index.

Step 4: Monitor crawler results • Monitor crawler results on Webmaster Tools. • Top search queries • Links to your site • Keywords • Internal links • Crawl errors • Crawl stats • HTML suggestions

Using and Maintaining Sitemaps • Re-generating (updating) Sitemaps and Sitemap Index frequently • Checking the crawler stats in Google Webmaster Tools and initiating changes as needed • Noticing the impact in Google searches

Initial Progress

Initial approach to SEO • Generated Sitemaps and Sitemap Index file: Applied a variation of Terry Reese’s makemap.php script • Edited robots.txt file • Registered with Webmaster Tools • Started observing crawler statistics

The Next Phase • Enlisted expertise of Patrick OBrien • Sitemaps • Diagnose crawl error reports • Make recommendations • Proposing strategy for the future of SEO

Short-term Opportunities

Know your customers and what they value. Faculty • Publication Page Views • Publication Downloads • Requests for Information • Publication Citations Value Value High High Collection Donors Digital Collection Pages Indexed Digital Collection Page Views Digital Collection Visitors Requests for More Info Physical Collection Visitors Reproductions Ordered

Why can’t the public find our content? Public CONTENTdm ? ? ? ? What do they value? • Are you worthy enough for their customer (i.e Index)? • How much will their customer value the introduction (i.e, Visibility)?

Are you worthy enough for their customer? Can they trust you with their customer? Is your content worth an investment of their resources?

Check the Crawl Errors Page Forbidden (401 errors) User Not Authorized (403 errors) Network Unreachable (5xx errors) Page Not Found (404 errors)

Address errors and don’t leave users stranded! Low Trust Example 403 Error

Eliminate sitemap and robots.txt conflicts Robots.txt Sitemap User-agent: * Disallow: /dmscripts/ Disallow: /cdm4/admin/ Disallow: /cdm4/client/ Disallow: /cdm4/cqr/ Disallow: /cdm4/images/ Disallow: /cdm4/includes/ Disallow: /cdm4/jscripts/ Disallow: /cdm-diagnostics/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /u/ User-agent: * Disallow: /dmscripts/ Disallow: /cdm4/admin/ Disallow: /cdm4/client/ Disallow: /cdm4/cqr/ Disallow: /cdm4/images/ Disallow: /cdm4/includes/ Disallow: /cdm4/jscripts/ Disallow: /cdm-diagnostics/ Disallow: /cgi-bin/ Disallow: /images/ Disallow: /u/ http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith http://content.lib.utah.edu/cgi-bin/browseresults.exe?CISOROOT=/DC_Beckwith

Provide sitemaps linking context with simple URLs http://content.lib.utah.edu/ http://content.lib.utah.edu/cdm4/az.php#D http://content.lib.utah.edu/cdm4/az_details.php?id=44 http://content.lib.utah.edu/cdm4/browse.php?CISOROOT=/DardHunter http://content.lib.utah.edu/cdm4/document.php?CISOROOT=/DardHunter&CISOPTR=1919

Increase Page Crawl efficiency A Papermaking Pilgrimage to Japan, Korea and China

Are you worthy enough for their customer? • Can they trust you with their customer? • Check the Crawl Errors in Google Webmaster • Address errors and don’t leave their customers stranded! • Is your content worth an investment of their resources? • Eliminate sitemap & robots.txt conflicts • Provide sitemaps linking context with simple URLs • Increase Page Crawl efficiency

How much will their customer value the introduction (i.e., Visibility)? Is your content relevant? Is your content credible? Is your content accessible?

Summary

Recommendations for CONTENTdm managers • Assemble the right players: your SEO team • Set your priorities • Create linking strategy from home page, to index of collections, to collection “splash” page, and to item pages • Create Sitemaps and Sitemap Index file • Set up a regular process to update Sitemaps • Eliminate any conflicts between robots.txt and Sitemaps • Get involved with CONTENTdm’s new Web templates • Set up to monitor results!

Navigating Digital Labyrinths: Enhancing Discoverability with Sitemaps

Navigating Digital Labyrinths: Enhancing Discoverability with Sitemaps

Presentation Transcript

Through the Labyrinth: How Women Become Leaders

Creating a Character Through Dialogue

Creating a Chartres Labyrinth

Creating a 7-Circuit Left-Handed Cretan Labyrinth

Escaping the Labyrinth

Skills Sheet: ‘Creating a path’

LABYRINTH

A PILGRIM’S PATH THROUGH THE TEMPLE

Labyrinth

Navigating the Labyrinth

A String in the Labyrinth

The Path to Creating a World-Class Organization

The vestibular labyrinth

Labyrinth

Creating a difference through partnerships

Labyrinth

THE GAME « A MYSTERY LABYRINTH »

The Building of A Labyrinth

Labyrinth

Labyrinth

Escaping the Labyrinth

Iamnobody89757- A Journey Through the Labyrinth of Digital Identity