open source software for digital libraries l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Open Source Software for PowerPoint Presentation
Download Presentation
Open Source Software for

Loading in 2 Seconds...

play fullscreen
1 / 46

Open Source Software for - PowerPoint PPT Presentation


  • 423 Views
  • Uploaded on

In the phrase open source, source refers to source code, the human-readable computer code which is the origin, or source, of the computer application. ...

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Open Source Software for' - Kelvin_Ajay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
open source software for digital libraries
Open Source Software for Digital Libraries

Jon Dunn

Associate Director for Technology

John A. Walsh

Manager of Electronic Text Technologies

Indiana UniversityDigital Library Program

IU Digital Library Brown Bag SeriesBloomington, IN09 April 2004

outline
Outline
  • Open Source Introduction
  • Categories of Open Source Software for Libraries
  • Open Source Digital Library Systems
  • Open Source XML Tools and Systems
what is open source software
What is open source software?
  • In the phrase open source, source refers to source code, the human-readable computer code which is the origin, or source, of the computer application. Open refers to the terms of access to that computer source code. So open source software is software for which the source code is freely available. But this is a very general and incomplete definition.
  • A detailed definition of open source software is maintained by the Open Source Initiative
advantages and disadvantages
Advantages and Disadvantages

Advantages

  • Access to source code and ability and right to modify it
  • Right to redistribute modifications to benefit wider community
  • Free
  • Excellent support networks
  • Large and enthusiastic user base

Disadvantages

  • Limited or no accountability
  • Informal and unaccountable support channels
categories of open source software
Categories of Open Source Software
  • Operating Systems
    • Linux
  • Programming Languages
    • Perl, PHP, Python
  • Applications
    • Apache, Tomcat, emacs, grep, MySQL, sendmail, ssh
different open source licenses
Different Open Source Licenses
  • GNU GPL ("General Public License")
  • GNU Lesser GPL
  • BSD License
  • Mozilla Public License
  • IU Open Source License
  • And more...
open source software in the dlp
Open Source Softwarein the DLP
  • Linux, Apache, Tomcat, PHP, Perl, DLXS, ImageMagick, ePrints, MySQL, Darwin Streaming Server, emacs, CVS, Webalizer, LibXML, LibXSLT, Saxon, and more!
open source resources
Open Source Resources
  • Open Source Initiative
  • GNU
  • SourceForge
some categories of open source library software
Some categories of open source library software
  • Library-oriented search engines
    • Cheshire, Pears
  • Z39.50 toolkits
    • ZetaPerl (Perl), JAFER (Java), YAZ (C/C++)
  • MARC parsers
    • MARC.pm (Perl), MARC4J (Java)
  • Image processing
    • ImageMagick, tiffinfo/tiffdump
some categories of open source library software10
Some categories of open source library software
  • Portals
    • MyLibrary
  • OAI service providers and data providers
    • PHP OAI Data Provider
    • Lots! See www.openarchives.org
  • METS tools
    • Page turners, toolkits, more: see www.loc.gov/mets/
  • Digital object repositories
    • Fedora
a good starting point
A Good Starting Point
  • oss4lib: Open Source Systems for Libraries
    • www.oss4lib.org
complete dl systems
Complete DL Systems
  • DSpace
  • Eprints
  • Greenstone
dspace
DSpace
  • “DSpace is a groundbreaking digital institutional repository that captures, stores, indexes, preserves, and redistributes the intellectual output of a university’s research faculty in digital formats.”
  • Developed jointly by MIT Libraries and Hewlett-Packard
  • Licensed under BSD distribution license
  • www.dspace.org
dspace14
DSpace
  • Supports submission of, management of, and access to digital content
    • Formats: text, images, audio, video
  • Organized based on organizational needs of a large university
    • Communities and collections
dspace features
DSpace Features
  • Digital preservation
    • Persistent IDs, support levels for different file formats
  • Access control
  • Versioning
  • Search and retrieval
    • Based on qualified Dublin Core metadata
  • OAI-PMH data provider
    • To support metadata harvesters
dspace technology
DSpace Technology
  • OS: Unix or Linux
  • Written in Java
  • PostgreSQL relational database
  • Provides complete Web user interface, but Java APIs available
dspace demonstration
DSpace Demonstration
  • MIT DSpace
    • dspace.mit.edu
eprints
EPrints
  • “free software which creates online archives”
  • Developed by University of Southampton, UK
  • Supports self-archiving of e-prints
  • Can be configured as institutional repository or otherwise, e.g. repository focused on particular research area or discipline
  • Licensed under GNU General Public License
  • software.eprints.org
eprints21
EPrints
  • Supports submission, management of, and access to digital content
  • Can support multiple archives on one server
  • Moderated or unmoderated archives
  • Search and retrieval
    • Based on metadata
    • Metadata can be customized for different archives and document types
  • No access control
  • OAI-PMH data provider
eprints technology
EPrints Technology
  • OS: Unix or Linux
  • Written in Perl
  • Requirements:
    • Apache web server
    • MySQL relational database
eprints demonstration
EPrints Demonstration
  • Digital Library of the Commons
    • dlc.dlib.indiana.edu
greenstone
Greenstone
  • “Suite of software for building and distributing digital library collections”
  • Developed by University of Waikato, New Zealand
    • Developed in cooperation with UNESCO and the Human Info NGO
  • Licensed under GNU General Public License
  • www.greenstone.org
greenstone features
Greenstone Features
  • Supports creation and management of collections by administrator(s)
  • Web interface for search and retrieval
    • Customizable metadata
    • Supports full text search of content
  • Extensive document filters
    • Word, Excel, PowerPoint, PDF, ...
    • Can extract metadata from documents
  • Many ways to build a collection, including:
    • Local files
    • Retrieve web sites
    • Retrieve objects via OAI-PMH
greenstone features26
Greenstone Features
  • Focus on:
    • Ease of installation
    • Ease of use
    • Internationalization
      • Full support for English, French, Spanish, Russian, and Kazakh
      • Support for many other languages
    • Low barriers to use
      • Minimal system requirements
      • Creation of CD-ROMs
greenstone technology
Greenstone Technology
  • Runs on Windows (back to 3.1), Linux, Mac OS X, Unix
  • Written in C++, Perl, and Java
  • Uses MG/MG++ search engine
  • Several different Web and Java/Swing user interfaces for various functions
  • Web interface for user access
greenstone demonstration
Greenstone Demonstration
  • Examples at www.greenstone.org
open source xml tools and systems
Open Source XMLTools and Systems
  • Utilities
    • Xalan, Xerces, libxml, libxslt, saxon
  • Editors
    • emacs / nxml-mode
  • Database / Search Engines
      • Apache Xindice
      • Berkeley DB XML
      • eXist
  • Publishing/WebApplication Frameworks
      • AxKit
      • Cocoon
xml databases search engines
XML Databases &Search Engines
  • Apache Xindice
  • Berkeley DB XML
  • eXist
apache xindice
Apache Xindice
  • http://xml.apache.org/xindice/
  • Technology: Java
  • Optimized for large numbers of small XML files. Does not work well on large files.
berkeley db xml
Berkeley DB XML
  • http://www.sleepycat.com/products/xml.shtml
  • Technology: C
  • C++ and Java APIs
exist
eXist
  • http://exist.sourceforge.net/
  • Technology: Java
xml publishing web application frameworks
XML Publishing / Web Application Frameworks
  • XML Publishing, or Web Application, Frameworks provide systems for publishing XML data in a variety of formats, such as HTML, WAP/WML, PDF, etc. Both AxKit and Cocoon use a "pipeline" paradigm to route incoming requests through different processing routines.
  • Apache AxKit
  • Apache Cocoon
apache axkit
Apache AxKit
  • http://axkit.org/
  • Technology: Perl
  • AxKit is an XML Application Server for Apache. It provides on-the-fly conversion from XML to any format, such as HTML, WAP or text using either W3C standard techniques, or flexible custom code. AxKit also uses a built-in Perl interpreter to provide some amazingly powerful techniques for XML transformation.
apache cocoon
Apache Cocoon
  • http://cocoon.apache.org/
  • Technology: Java
  • "Apache Cocoon is a web development framework built around the concepts of separation of concerns and component-based web development."
cocoon key concepts
Cocoon: Key Concepts
  • publishing framework
  • XML and XSLT
  • "pipelined SAX processing"
  • separation of:
    • content
    • logic
    • style
  • centralized configuration
  • sophisticated caching
cocoon problems to be solved
Cocoon: Problems to Be Solved
  • Separation of content, style, logic, and management functions in an XML content based web site:
cocoon basic mechanisms for processing xml documents
Cocoon: Basic mechanisms for processing XML documents
  • Dispatching based on Matchers.
  • Generation of XML documents (from content, logic, Relation DB, objects or any combination) through Generators
  • Transformation (to another XML, objects or any combination) of XML documents through Transformers
  • Aggregation of XML documents through Aggregators
  • Rendering XML through Serializers
cocoon the pipeline
Cocoon: The Pipeline

Sequence of interactions:

generators transformers serializers
Generators, Transformers, & Serializers
  • Generators
  • Transformers
  • Serializers
cocoon configuration the sitemap
Cocoon: Configuration: The Sitemap

<?xml version="1.0"?>

<map:sitemap xmlns:map="http://apache.org/cocoon/sitemap/1.0">

<map:components>

...

</map:components>

<map:views>

...

</map:views>

<map:pipelines>

<map:pipeline>

<map:match>

...

</map:match>

...

</map:pipeline>

...

</map:pipelines>

...

</map:sitemap>

cocoon configuration a pipeline
<map:pipelines>

<map:pipeline>

<map:match pattern="technochat/">

<map:generate src="technochat/index.xhtml"/>

<map:serialize/>

</map:match>

<map:match pattern="technochat/*.xml">

<map:read mime-type="text/xml" src="technochat/{1}.xml"/>

</map:match>

<map:match pattern="technochat/*.html">

<map:generate src="technochat/{1}.xml"/>

<map:transform src="technochat/tei2html.xsl"/>

<map:serialize/>

</map:match>

<map:match pattern="technochat/*.css">

<map:read mime-type="text/css"

src="technochat/resources/styles/{1}.css“

/>

</map:match>

<map:match pattern="technochat/*.svg.jpg">

<map:generate src="technochat/{1}.xml"/>

<map:transform src="technochat/tei2svg.xsl"/>

<map:serialize type="svg2jpeg"/>

</map:match>

<map:match pattern="technochat/*.svg">

<map:generate src="technochat/{1}.xml"/>

<map:transform src="technochat/tei2svg.xsl"/>

<map:serialize type="svgxml"/>

</map:match>

<map:match pattern="technochat/*.pdf">

<map:generate src="technochat/{1}.xml"/>

<map:transform src="technochat/tei2fo.xsl"/>

<map:serialize type="fo2pdf"/>

</map:match>

</map:pipeline>

Cocoon: Configuration: A Pipeline