1 / 46

How many PURLs would an URL Checker check…

How many PURLs would an URL Checker check…. Millennium URL Checker in the Real World. Mary M. Strouse Catholic University of America ILUG 2004. Evolution. 1994 Field 856 |u designated for URLs 1998 MARCxGen debuts 2000 URLVerify (telnet version, Rel. 2000)

aman
Download Presentation

How many PURLs would an URL Checker check…

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How many PURLs would an URL Checker check… Millennium URL Checker in the Real World Mary M. Strouse Catholic University of America ILUG 2004

  2. Evolution • 1994 Field 856 |u designated for URLs • 1998 MARCxGen debuts • 2000 URLVerify (telnet version, Rel. 2000) • 2002 |u added to other MARC fields • 2003 Millennium URL Checker (Rel. 2002 ph. 2) M. Strouse ILUG 2004

  3. URLVerify Web Reporthttp://[catalog]/screens/urlverify.html ~ M. Strouse ILUG 2004

  4. Millennium URL Checker Report Summary of error types (uncheck to hide) Integrated with MilCat M. Strouse ILUG 2004

  5. Sort by column headers Resize columns (no truncation) M. Strouse ILUG 2004

  6. Highlight a row and click “Edit” to open record Access public view Click edit to get MARC record M. Strouse ILUG 2004

  7. Clicking “GO” opens URL in a browser window (for rechecking) M. Strouse ILUG 2004

  8. Locating Missing Links M. Strouse ILUG 2004

  9. Automatic Substitution of New URL Check boxes to select, then click preview tab M. Strouse ILUG 2004

  10. Uncheck any errors, click “process” Summary screen M. Strouse ILUG 2004

  11. Correcting URL Directly in Report 1)Type in “New URL” 2) Check replace box 3) Preview & process M. Strouse ILUG 2004

  12. Copying Old URL to Edit Window • Check replace box (must do first) • Select Old URL - New URL • Edit in new URL window • Preview & Process M. Strouse ILUG 2004

  13. Find and Replace (New URL) M. Strouse ILUG 2004

  14. Interactive Reports Toggle between most recent Automatic and Interactive reports Create new interactive report M. Strouse ILUG 2004

  15. Interactive report can run against entire database, a review file, an index range, or a keyword search M. Strouse ILUG 2004

  16. Monday Morning Recheck M. Strouse ILUG 2004

  17. Can’t minimize or work with desktop while report is running M. Strouse ILUG 2004

  18. Error Types M. Strouse ILUG 2004

  19. Malformed URL (-2) htp://app.comm.uscourts.gov M. Strouse ILUG 2004

  20. New error type in Phase 3 (Millennium report only) Network is unreachable (-7) M. Strouse ILUG 2004

  21. http://public.afca.scott.af.mil/public…. M. Strouse ILUG 2004

  22. PURLs and Other Redirects Every server redirection reported as an error M. Strouse ILUG 2004

  23. Redirection can be a sign a resource has moved, and maintenance is warranted. M. Strouse ILUG 2004

  24. Missing slash after directory name reported as permanent redirect (301) Edit to eliminate from future reports M. Strouse ILUG 2004

  25. Server-side redirect to add timestamp http://library.nps/navy.mil/uhtbin.cgisirsi/Sun+Apr+20+22:28:15+PDT+2003/0/520/nss.pdf M. Strouse ILUG 2004

  26. All PURLs are identified as redirects, not checked further True also of 3rd-party link checkers (except Xenu) M. Strouse ILUG 2004

  27. I-Hate-PURLsWorkflow Use automatic substitution to replace PURL with (current) underlying URL Replace box can’t be batch-selected. M. Strouse ILUG 2004

  28. Beware the “Leaving GPO”Message M. Strouse ILUG 2004

  29. URL Checker reports entire frwebgate “wrapper” as the new URL http://frwebgate.access.gpo.gov/cgi-bin/leaving.cgi?from=exitpurl.html&to=http%3A//www.uscourts.gov/ttb/index.html M. Strouse ILUG 2004

  30. Library-editable URLBlock File Not a substitute for honoring “no robots” conventions! M. Strouse ILUG 2004

  31. Block can be a full URL, domain name or text string III-specified blocks for major aggregators PURL.ACCESS.GPO.GOV M. Strouse ILUG 2004

  32. Trust-the-Government Workflow 1. Unblock GPO PURLs and run interactive report monthly (e.g., after Marcive load) M. Strouse ILUG 2004

  33. 2. Exclude working redirects, troubleshoot others Must load entire report before excluding redirects (slow) M. Strouse ILUG 2004

  34. WAIS Database searches reported as timeout errors (-6) M. Strouse ILUG 2004

  35. WAM Proxy Rewrite URLs Not Checked Host Unreachable (-5) 3rd-party link checkers report all proxy-rewrite URLs OK even if nonexistent. M. Strouse ILUG 2004

  36. Fool-the-System Workflow 856 41|u http://heinonline.org/HeinOnline/ CollectionIndex.pl? journal-cjtl |z<A href="http://0-heinonline.org. columbo.law.cua.edu/HeinOnline/CollectionIndex.pl?journal=cjtl"> View via Hein Online </A> Underlying URL in |u, PURL or proxy-rewrite URL within anchor tag in |z. M. Strouse ILUG 2004

  37. “Multi-threading” Rate • The number of simultaneous “calls” sent to servers at a given time URL checker > 100 3rd-party link checkers: 20-30 (often user-configurable) • At issue when many resources concentrated on a few servers • URL Checker activity may be perceived as an “attack” M. Strouse ILUG 2004

  38. Summary: What URL Checker Checks • URLs in subfield u of 856 fields in Bib. Records (but not URLs in other subfields) • URLs in 956 fields in electronic reserves (Millennium Media) records M. Strouse ILUG 2004

  39. And What it Doesn’t… • URLs or domains in the URLBlock file (aggregators, etc) • Purls and other redirects • Proxy-rewrite URLs in WAM • Electronic journal issue URLs in checkin boxes • URLs in bibliographic record notes M. Strouse ILUG 2004

  40. Suggestions for Further Development – Reports & Editing • Pre-configure large interactive reports (faster loading) • Allow minimization during report prep • Bypass summary of attached items • Improve copy & paste, batch select & replace. • Interactive checking of “New URL” column M. Strouse ILUG 2004

  41. Suggestions for Further Development – Functionality • Follow redirects to final destination • Honor page-level and server-level robot exclusions, and report with a unique status code • Customize multi-threading rate • Output report in CSV (comma-delimited) format M. Strouse ILUG 2004

  42. URL Checker Documentation • Millennium Manual (Rls. 2003) • Permissions (#105370) • Reports (#105371) • Edit/Replace capability (#105372) • URLBlock (#105373) M. Strouse ILUG 2004

  43. URLVerify Documentation • Innopac manual, pages 102151-102153 • Maintaining Hyperlinks in the WebPac: Tools and Tradeoffs (IUG 8, May 2000) http://www.du.edu/~ttyler/iug2000/ctw/index.html • Tom Tyler’s freeware http://www.du.edu/~ttyler/freeware/ M. Strouse ILUG 2004

  44. URL Display WWWOptions • DISPLAY_856 – Defines the order and placement of subfields that form the hypertext link in an OPAC display (default is |z then |u) Multiple subfields (including access and usage notes) display as a single underlined link. Enhancement request: separate WWWoptions to control display of link and notes. M. Strouse ILUG 2004

  45. URL Display WWWOptions • LINK856TEXT – Defines the phrase that appears above the hypertext link in a full display (Default is “Click here to:”) • ICON_856LINK – controls display of 856 link in a brief display (Manual #102168) M. Strouse ILUG 2004

  46. Contact: Mary M. Strouse DuFour Law Library, Catholic University of America strouse@law.cua.edu Thank you! M. Strouse ILUG 2004

More Related