1 / 23

Eye to the Telescope: Future-Gazing Current Projects from OCLC Research

Selected OCLC Research work. Making data work harder. Data mining of WorldCat (e.g., FRBR ... xISBN send OCLC an ISBN, receive all ISBNs for the same work ...

Lucy
Download Presentation

Eye to the Telescope: Future-Gazing Current Projects from OCLC Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Eye to the Telescope: Future-Gazing & Current Projects from OCLC Research 2006 Amigos Conference 11 May 2006 Dallas, TX Eric Childress OCLC Research My sincere thanks to Bonnie Juergens and Laura Kimberly for kindly arranging my appearance here, and to all of you for attending such an early event!My sincere thanks to Bonnie Juergens and Laura Kimberly for kindly arranging my appearance here, and to all of you for attending such an early event!

    2. Outline The Big Picture Pattern Recognition Brand, Data, Technology trends The Library - next phase Selected OCLC Research work

    3. Pattern recognition Production anywhere, Global distribution Make products anywhere, ship them everywhere Offshore business processes & research centers Big brands & micro channels Mega-publishers, -media, -retailers, -search engines Niche markets exploited via AdWords & affiliate programs Portable devices, digital content, interactive Net iPods, now with video; Are iPhones next? Ringtones, iTunes, Podcasts, Vlogs/Google Video, online gaming, etc. Self-service, micro-consumption The “convenience” society – 24x7 stores, ATMs, click-n-buy Disaggregation – consume by the news story, song, etc. Intellectual Property issues Big business not-so-secretly wants all transactions billable Open Source & Open Content rising (e.g., Apache, Creative Commons) Outsourcing – business process outsourcing (BPO) expected to grow from $6B USD (2005) to $14B USD (2010) [http://www.thehindubusinessline.com/2006/05/08/stories/2006050802730200.htm] Accounting services, call centers, pharmacy trials, software development, and research (e.g., Microsoft, IBM) are displaying a significant offshore presence Big brands – Wal-Mart commonly accounts for as much of 20% of total sales volume for bestselling books in the U.S. Amazon & Google especially have leveraged narrow distribution channels by sharing the wealth. Also worthy of note is a very robust used book market built in part by “long-tail” sales from online outlets (BISG estimates 2004 used book market = $2.2 Billion (111 million books, 8.4% total consumer spending on books.) Microsoft is now squarely targeting Google – Amazon has recently switched engines to adopt Microsoft (N.B. estimates indicate that 10% of Amazon users clicked through to Google links when Google was Amazon’s partner). Rumors of Microsoft-Yahoo discussions Portable devices: Ringtones market has risen from $68M (2003) to a projected $600M (2006) MP3 devices = 58 million devices shipped in 2005 to 116 million in 2007; Video – YouTube (30 million streams per day) [see http://www.engadget.com/2006/05/04/the-clicker-youtube-and-fair-use-a-match-made-in-heaven/] IP issues – a complex space gets more complicated Copyright as originally conceived (i.e. public domain as the default) is greatly diminished – Disney and other corporate content players wield significant political influence. Fair Use and the First Sale Doctrine are inconvenient… Countering this has been the Creative Commons and similar efforts that allow IP owners to conveniently cede rights for IP reuse with no or minimal, uniform terms and conditionsOutsourcing – business process outsourcing (BPO) expected to grow from $6B USD (2005) to $14B USD (2010) [http://www.thehindubusinessline.com/2006/05/08/stories/2006050802730200.htm] Accounting services, call centers, pharmacy trials, software development, and research (e.g., Microsoft, IBM) are displaying a significant offshore presence Big brands – Wal-Mart commonly accounts for as much of 20% of total sales volume for bestselling books in the U.S. Amazon & Google especially have leveraged narrow distribution channels by sharing the wealth. Also worthy of note is a very robust used book market built in part by “long-tail” sales from online outlets (BISG estimates 2004 used book market = $2.2 Billion (111 million books, 8.4% total consumer spending on books.) Microsoft is now squarely targeting Google – Amazon has recently switched engines to adopt Microsoft (N.B. estimates indicate that 10% of Amazon users clicked through to Google links when Google was Amazon’s partner). Rumors of Microsoft-Yahoo discussions Portable devices: Ringtones market has risen from $68M (2003) to a projected $600M (2006) MP3 devices = 58 million devices shipped in 2005 to 116 million in 2007; Video – YouTube (30 million streams per day) [see http://www.engadget.com/2006/05/04/the-clicker-youtube-and-fair-use-a-match-made-in-heaven/] IP issues – a complex space gets more complicated Copyright as originally conceived (i.e. public domain as the default) is greatly diminished – Disney and other corporate content players wield significant political influence. Fair Use and the First Sale Doctrine are inconvenient… Countering this has been the Creative Commons and similar efforts that allow IP owners to conveniently cede rights for IP reuse with no or minimal, uniform terms and conditions

    4. Voices carry Old media losing to new media Broadcast radio vs Satellite & Internet radio Newspapers vs Google News, Craigslist, etc. Brand & voice through new channels Blogging by top execs & by staff Personal branding – “Webcred” is key to one’s fortunes Individual-driven content rising: Personal web pages Blogs (a new one each second!) Digital images/video (flickr, Picasa, YouTube) Bookmarks, etc. (e.g., del.icio.us, furl, digg, technorati) Infotainment increasingly social & peer-to-peer Community authorship, open content (Wikipedia) Myspace, Facebook, etc. personal presence services MySpace has 70M registered users, had 47M transactions in Feb. 2006 & is the second most visited destination on the Web after Yahoo [http://www.azstarnet.com/business/125984]MySpace has 70M registered users, had 47M transactions in Feb. 2006 & is the second most visited destination on the Web after Yahoo [http://www.azstarnet.com/business/125984]

    5. The blogosphere is doubling every 3 months (a new blog created every second) [http://www.sifry.com/alerts/archives/000432.html] The blogosphere is doubling every 3 months (a new blog created every second) [http://www.sifry.com/alerts/archives/000432.html]

    6. Data rules Deep indexing: Amazon’s “Search Inside” and “Statistically Improbable Phrases” Google, Yahoo, Microsoft underwriting library digitization work Library space: NetLibrary, Alexander Street, many others indexing content Custom search feeds: Google Alerts, News topic RSS, etc. Instant verification: Many voices, many fact-checkers widely-distributed – Spin doctors beware! Recommendation systems: Amazon, Apple iTunes, other retailers – “people like you chose…” Novel concepts: Pandora – suggests music based on intrinsic patterns of music you like (the “music genome”) Empowered consumption My iPod, my tags, my playlists Reuse, derive, mix content from many sources (e.g. Mashups) The "world churns out new digital information equivalent to the entire collection of the U.S. Library of Congress every 15 minutes. Such a proliferation of information in digital format, occurring almost 100 times a day, adds up to approximately five exabytes (five quintillion bytes or five billion gigabytes) a year [http://www.nist.gov/public_affairs/techbeat/tb2006_0330.htm#bytes]The "world churns out new digital information equivalent to the entire collection of the U.S. Library of Congress every 15 minutes. Such a proliferation of information in digital format, occurring almost 100 times a day, adds up to approximately five exabytes (five quintillion bytes or five billion gigabytes) a year [http://www.nist.gov/public_affairs/techbeat/tb2006_0330.htm#bytes]

    7. Techscape Web 2.0: The Network spans all attached devices (e.g., iPods, phones, etc.) Software resides on the Net, not the workstation “Participative Net” – social environment, shared content reused Everywhere Net Internet, GPS, cellphone, municipal wireless… System refactoring Modularity (micro-services, remixing, multiple sources) Layering (loosely-coupled systems) Interoperability (low-friction, high reuse) Lightweight protocols gaining favor (e.g., SRW/SRU, microformats) Machine-oriented services (web services) Web 2.0: Source: http://radar.oreilly.com/archives/2005/10/web_20_compact_definition.html For more information & interesting graphic: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software / Tim O’Reilly http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=1 Mashup defined: “a website or web application that seamlessly combines content from more than one source into an integrated experience. Content used in mashups is typically sourced from a third party via a public interface or API. Other methods of sourcing content for mashups include Web feeds (e.g. RSS or Atom) and JavaScript includes.” [http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29] Web 2.0: Source: http://radar.oreilly.com/archives/2005/10/web_20_compact_definition.html For more information & interesting graphic: What Is Web 2.0: Design Patterns and Business Models for the Next Generation of Software / Tim O’Reilly http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html?page=1 Mashup defined: “a website or web application that seamlessly combines content from more than one source into an integrated experience. Content used in mashups is typically sourced from a third party via a public interface or API. Other methods of sourcing content for mashups include Web feeds (e.g. RSS or Atom) and JavaScript includes.” [http://en.wikipedia.org/wiki/Mashup_%28web_application_hybrid%29]

    8. Libraries - next phase Surfacing seamlessly Point-of-need delivery (e.g., library content in non-library apps such as the Web, course management systems, etc.) Open WorldCat, RedLightGreen, OAIster, etc. Open standards, easy integration of data from many sources Re-thinking, re-engineering Library 2.0 changes systems & services Moving towards “Lego”-like modularity in systems & data User-tasks-oriented designs (e.g., NCSU catalog) Adding means for users to contribute, shape their own experiences Supporting Library 2.0 will mean changing organizations & operations More building space for people-to-people interaction, less for books Process & operational changes Example: Choose-acquire-catalog vs Acquire-choose-catalog

    9. Cascading commodization Open standardization of content/software + exposure “Any services that can abstracted to generic network services will be” -- Robin Murray Cascading commodization Open standardization of content/software + exposure “Any services that can abstracted to generic network services will be” -- Robin Murray

    10. Selected OCLC Research work Making data work harder Data mining of WorldCat (e.g., FRBR (Functional Requirements of Bibliographic Records) clustering of related records) FictionFinder – browse/search all fiction works in WorldCat Audience Level – assigns an audience indicator value based on data in bib records for a work, or – alternatively – by inferring audience from the type and number of libraries holding a work xISBN – send OCLC an ISBN, receive all ISBNs for the same work New views, new uses DeweyBrowser – Dewey-based visualization of WorldCat, more Live Search – An AJAX-based search interface that leverages FRBR, advanced relevance, and rank-by-holdings to provide fast results Terminology Services – Controlled vocabularies searchable in a sidebar

    11. FRBR Group 1 Entities The FRBR model is made up of three sets of entities. Group 1 is a 4-level bibliographic model that’s described as you see here. [Read the slide] OK … what does this really mean? A work: Shakespeare’s Hamlet is a work … not any particular edition or version or translation … but Hamlet as an intellectual concept. An expression: this is a work realized in a particular version or translation. The original text of Hamlet is an expression; so is a French translation or a German one. A manifestation: this is an expression that is issued or published … So the edition of the Andre Gide French translation published in Paris in 1946 is a manifestation. An item: a copy of a manifestation on the shelves in a library … so the copy of Andre Gide’s translation of Hamlet in the stacks at the Library of Congress with the call number PR2779.H3G5 1946 *** These four entities in Group 1 are the product of intellectual or artistic endeavor. The entities form a hierarchy with work at the top of the model so it may help to see them in a diagram as here. This diagram begins to show how the Group 1 entities are related to each other. A single work can be realized in various expressions … for example, a work may be translated into many different languages (a one-to-many relationship). One or more expressions can be embodied in one or more manifestations (a many-to-many relationship) The FRBR model is made up of three sets of entities. Group 1 is a 4-level bibliographic model that’s described as you see here. [Read the slide] OK … what does this really mean? A work: Shakespeare’s Hamlet is a work … not any particular edition or version or translation … but Hamlet as an intellectual concept. An expression: this is a work realized in a particular version or translation. The original text of Hamlet is an expression; so is a French translation or a German one. A manifestation: this is an expression that is issued or published … So the edition of the Andre Gide French translation published in Paris in 1946 is a manifestation. An item: a copy of a manifestation on the shelves in a library … so the copy of Andre Gide’s translation of Hamlet in the stacks at the Library of Congress with the call number PR2779.H3G5 1946 *** These four entities in Group 1 are the product of intellectual or artistic endeavor. The entities form a hierarchy with work at the top of the model so it may help to see them in a diagram as here. This diagram begins to show how the Group 1 entities are related to each other. A single work can be realized in various expressions … for example, a work may be translated into many different languages (a one-to-many relationship). One or more expressions can be embodied in one or more manifestations (a many-to-many relationship)

    12. OCLC FRBR work set algorithm-based cluster of related WorldCat records *Similar to Family of works see: Tillett, Barbara. 2004. What is FRBR?: A Conceptual Model for the Bibliographic Universe. Available at: http://www.loc.gov/cds/downloads/FRBR.PDF Incorporating the concepts of the FRBR model in systems: Superior presentation of search results Esp. in large files – more intuitive clustering May help streamline library cataloging Reduces repeated keying of work-related info Bibliographic & management intelligence New insights into works (e.g., OCLC’s 1000 list) Libraries can operate at workset level (e.g., ILL) *Similar to Family of works see: Tillett, Barbara. 2004. What is FRBR?: A Conceptual Model for the Bibliographic Universe. Available at: http://www.loc.gov/cds/downloads/FRBR.PDF Incorporating the concepts of the FRBR model in systems: Superior presentation of search results Esp. in large files – more intuitive clustering May help streamline library cataloging Reduces repeated keying of work-related info Bibliographic & management intelligence New insights into works (e.g., OCLC’s 1000 list) Libraries can operate at workset level (e.g., ILL)

    13. As of mid 2005?As of mid 2005?

    14. FictionFinder An OCLC experimental prototype Supports searching & browsing of fiction materials cataloged in WorldCat Fiction records — 2.8 million Unique works — 1.4 million Total holdings — 130 million Employs FRBR to: Build a “work” view & cluster related records Support the creation of special indexes OCLC Research team: Diane Vizine-Goetz (lead) Roger Thompson Carol Hickey Lance Osborne J.D. Shipengrover New version: Available later in 2006 Improved navigation & work-based displays Interface: http://fictionfinder.oclc.org Project page: http://www.oclc.org/research/projects/frbr/fictionfinder.htm What’s in* Fiction Drama Novels Short stories Text Including eBooks Sound Audiobooks & cassettes, etc. What’s out* Works about fiction, drama, etc. Movies, films, video Music Interface: http://fictionfinder.oclc.org Project page: http://www.oclc.org/research/projects/frbr/fictionfinder.htm What’s in* Fiction Drama Novels Short stories Text Including eBooks Sound Audiobooks & cassettes, etc. What’s out* Works about fiction, drama, etc. Movies, films, video Music

    16. Prototype redesign in progress… Prototype redesign in progress…

    22. Questions? My inspiration for the title of this presentationMy inspiration for the title of this presentation

    23. Further reading OCLC Reports http://www.oclc.org/reports OCLC Research http://www.oclc.org/research OCLC-related blogs: Lorcan Dempsey http://orweblog.oclc.org Thom Hickey http://outgoing.typepad.com/outgoing Stu Weibel http://weibel-lines.typepad.com It’s All Good http://scanblog.blogspot.com

More Related