1 / 31

A Web Services Approach for Search and Retrieve The Next Generation Z39.50

TLA NetFair, April 7, 2005, Austin, TX. A Web Services Approach for Search and Retrieve The Next Generation Z39.50. William E. Moen <wemoen@unt.edu> School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton, TX 72603. Overview.

radha
Download Presentation

A Web Services Approach for Search and Retrieve The Next Generation Z39.50

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TLA NetFair, April 7, 2005, Austin, TX A Web Services Approach for Search and Retrieve The Next Generation Z39.50 William E. Moen<wemoen@unt.edu>School of Library and Information SciencesTexas Center for Digital KnowledgeUniversity of North TexasDenton, TX 72603

  2. Overview • Quick description of SRW • Brief background – historical, political, conceptual • Non-technical (almost) introduction to SRW • Common Query Language (CQL) briefly • Concluding thoughts NetFair -- Texas Library Association -- April 2005

  3. What is SRW/U? • An XML-based protocol for searching, retrieving, and other information retrieval transactions • Cast in the standards/technologies for web services • XML • SOAP • HTTP (Post and Get) • Brings the concepts and experience of Z39.50 into the web environment using web technologies NetFair -- Texas Library Association -- April 2005

  4. Why SRW/U? • Genesis: several years of soul searching by Z39.50 developers and implementors • The “web” had become the common implementation environment • Z39.50 was not perceived as web friendly • What was needed: • Simpler • More comprehensible • More easily implemented • Web compatible • Retain the intellectual contribution of Z39.50 NetFair -- Texas Library Association -- April 2005

  5. Taking action: June 2001 • Invitational meeting to discuss moving Z39.50 to an XML-based protocol • Goal • Lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50, discarding those aspects no longer useful or meaningful. • Objective • Define specifications for a new web service definition based on Z39.50 together with web technologies • Separate the Z39.50 abstract and associated semantic model from its specific encoding and wire protocol (i.e., ASN.1/BER and TCP/IP) • Initially called Z39.50 Next Generation (ZNG) • Intended as proof-of-concept • Defining only those protocol specifications that would actually be implemented by participants NetFair -- Texas Library Association -- April 2005

  6. ZING – Z39.50 International Next Generation • Make intellectual/semantic content of Z39.50 more broadly available • Several ZING initiatives: ZOOM, ez39.50, ZeeRex, SRW/U • Make Z39.50 more attractive by lowering barriers to implementation • Use of XML – to represent and encode data • Use of HTTP – for transport • Use of SOAP – for interaction between client and server based on Remote Procedural Call (RPC) FOR MORE INFORMATION, VISIT THE ZING WEBSITE… http://www.loc.gov/z3950/agency/zing/ NetFair -- Texas Library Association -- April 2005

  7. SRW/U, SRW, SRU • SRW/U: Search and Retrieve for the Web • General designation for this initiative • SRW: Search and Retrieve Web Service • HTTP Post • Simple Object Access Protocol (SOAP) • XML messages • SRU: Search and Retrieve URL Service • HTPP Get • Request parameters included in URL syntax • Development • Version 1.0 November 2001 • Version 1.1 February 2002 • Registered with NISO in Fall 2004 FOR MORE INFORMATION, VISIT THE SRW WEBSITE… http://www.loc.gov/srw NetFair -- Texas Library Association -- April 2005

  8. Networked information retrieval • What’s needed: • Identifying a target to search • A vocabulary for expressing search requests, search criteria, retrieval requests, etc. • Methods to encode the requests and responses from the target • Methods to transport the requests and responses across a network • In other words, a protocol and supporting specifications NetFair -- Texas Library Association -- April 2005

  9. Abstract Model of IR NetFair -- Texas Library Association -- April 2005

  10. Abstract model of Z39.50 NetFair -- Texas Library Association -- April 2005

  11. Z39.50 classic & SRW NetFair -- Texas Library Association -- April 2005

  12. SRW Overview • Builds on Z39.50 concepts and web technologies • Web technologies: XML, SOAP, HTTP • Uses new, human-readable query language • Combines several Z39.50 features into several “operation types” • searchRetrieve operation • scan operation • explain operation NetFair -- Texas Library Association -- April 2005

  13. searchRetrieve operation • The core of the protocol • Expresses the search and additional criteria • Records are returned in XML • Request parameters • version • query • Optional parameters • sortkeys • recordPacking • recordSchema • recordXPath • stylesheet • Response parameters • version • numberOfRecords • Optional parameters • resultSetID • resultSetIdleTime • records • diagnostics NetFair -- Texas Library Association -- April 2005

  14. SRW & XML • XML as foundation for protocol • Provides syntax for intelligent markup • Defines or references XML schemas • searchRetrieveRequest • searchRetrieveResponse NetFair -- Texas Library Association -- April 2005

  15. searchRetrieveRequest example • Sent as a HTTP Post • XML document is sent to the server • Using SOAP to wrap the request <searchRetrieveRequest> <version>1.1</version> <query>dc.title all "Squirrel Hungry"</query> <maximumRecords>1</maximumRecords> <startRecord>1</startrecord> <recordSchema>dc</recordSchema> </searchRetrieveRequest> NetFair -- Texas Library Association -- April 2005

  16. searchRetrieveResponse example <searchRetrieveResponse> <version>1.1</version> <numberOfRecords>10</numberOfRecords> <records> <record> <recordSchema>info:srw/schema/1/dc- v1.1</recordSchema> <recordData> <dc:record> <dc:title>Squirrel is Hungry</dc:title> </dc:record> </recordData> </record> </records> </searchRetrieveResponse> NetFair -- Texas Library Association -- April 2005

  17. searchRetrieve response • Records returned in response • All records in XML syntax • According to one or more XML schemas (semantics) • Dublin Core • Onix • MODS • MarcXml NetFair -- Texas Library Association -- April 2005

  18. searchRetrieve example <searchRetrieveRequest> <version>1.1</version> <query>dc.title computer</query> <startRecord>1</startrecord> <maximumRecords>10</maximumRecords> <recordPacking>xml</recordPacking> <recordSchema>dc></recordSchema> </searchRetrieveRequest> • Retrieval results • XML view • Screen shot with stylesheet applied NetFair -- Texas Library Association -- April 2005

  19. SRW results NetFair -- Texas Library Association -- April 2005

  20. S O A P S O A P SRW Model S R W S R W DB App HTTP POST Client DB Server side NetFair -- Texas Library Association -- April 2005

  21. SRU briefly • Protocol requests can be carried via HTTP Get • searchRetrieveRequest parameters expressed in standard URL syntax • baseURL and search part separated by question mark “?” • Response is XML document containing records • A searchRetrieveRequest in SRU: • http://www.loc.gov/z39voy?operation=searchRetrieve&version=1.1&query=texas&recordSchema=mods&startRecord=1&maximumRecords=1 NetFair -- Texas Library Association -- April 2005

  22. S R U SRU Model http://www.loc.gov/z39voy?operation=searchRetrieve&version=1.1&query=texas&recordSchema=mods&startRecord=1&maximumRecords=1 W e b s e r v e r Client S R W DB App HTTP GET DB Example Server side NetFair -- Texas Library Association -- April 2005

  23. search/Retrieve query • SRW query consists of one or more query statements linked by Boolean operators • Five categories of query statements: • single search clause • two or more search clauses linked by Boolean • search clauses and result sets linked by Boolean • two or more result sets linked by Boolean • single result set • Expressed in the Common Query Language (CQL) NetFair -- Texas Library Association -- April 2005

  24. Common Query Language (CQL) • A formal language for representing queries to information retrieval systems • Human-readable • Search clause • Always includes a term • simple terms consist of one or more words • May include index name • To limit search to a particular field/element • Index name includes base name and may include prefix • title, subject • dc.title, dc.subject • Several index sets have been defined (called Context Sets in SRW) • dc • bath • srw • Context set defines the available indexes for a particular application NetFair -- Texas Library Association -- April 2005

  25. Other components of CQL • Relation • <, >, <=, >=, =, <> • exact used for string matching • allwhen term is list of words to indicate all words must be found • anywhen term is list of words to indicate any words must be found • Boolean operators: and, or, not • Proximity (prox operator) • relation (<, >, <=, >=, =, <>) • distance (integer) • unit (word, sentence, paragraph, element) • ordering (ordered or unordered) • Masking rules and special characters • single asterisk (*) to mask zero or more characters • single question mark (?) to mask a single character • carat/hat (^) to indicate anchoring, left or right NetFair -- Texas Library Association -- April 2005

  26. CQL examples • Simple queries: • dinosaur • "the complete dinosaur" • Boolean • dinosaur and bird or dinobird • "feathered dinosaur" and (yixian or jehol) • Proximity • foo prox bar • foo prox/>/4/word/ordered bar • Indexes • title = dinosaur • bath.title="the complete dinosaur" • srw.serverChoice=dinosaur • Relations • year > 1998 • title all "complete dinosaur" • title any "dinosaur bird reptile" • title exact "the complete dinosaur" NetFair -- Texas Library Association -- April 2005

  27. SRW/U No explicit concept of connection, session, or state Results sets named by server Single record syntax (XML), multiple schemas String (i.e., human-readable)queries CQL Named indexes Classic Z39.50 Stateful Results sets named by client Multiple record syntaxes No human-readable query language Type 1 query using attribute sets Use attribute to identify access point SRW/U & classic Z39.50 • Z39.50 Concepts Retained • Result sets • Abstract access points • Abstract record schemas • Explain • Diagnostics NetFair -- Texas Library Association -- April 2005

  28. What problems does SRW solve • Addresses need for standards-based searching in the networked environment • Shows the vitality of the Z39.50 concepts and implements those in a web services & URL access context • Offers database providers with a web-friendly method for offering standards-based searching of resources • Provides low barrier to entry solution using commonly available technologies • XML format of records provide for more reuse, and more interesting use of resources NetFair -- Texas Library Association -- April 2005

  29. Possible implementation venues • Gateways to existing Z39.50 servers • Lightweight SRW/U servers to specialized databases • Cost-effective search access to commercial databases (e.g., citation, full-text) • Metasearching • Beyond libraries to many other information communities NetFair -- Texas Library Association -- April 2005

  30. SRU at The European Library (TEL) Graphic from : van Veen, T. & Oldroyd, B. (2004, February) Search and retrieval in The European Library. D-Lib Magazine 10(2). Retrieved February 24, 2005 from D-Lib Magazine website: http://www.dlib.org/dlib/february04/vanveen/02vanveen.html NetFair -- Texas Library Association -- April 2005

  31. References • Z39.50 International Next Generation – ZING • http://www.loc.gov/z3950/agency/zing/ • Search and Retrieve for the Web – SRW/U • http://www.loc.gov/srw • A Gentle Introduction to SRW • http://www.loc.gov/z3950/agency/zing/srw/introduction.html • A Gentle Introduction to CQL • http://zing.z3950.org/cql/intro.html • Search and Retrieval in The European Library: A New Approach by van Veen and Oldroyd in D-Lib (Feb04) • http://www.dlib.org/dlib/february04/vanveen/02vanveen.html NetFair -- Texas Library Association -- April 2005

More Related