search abilities in digital libraries with generic databases l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
SEARCH ABILITIES IN DIGITAL LIBRARIES WITH GENERIC DATABASES PowerPoint Presentation
Download Presentation
SEARCH ABILITIES IN DIGITAL LIBRARIES WITH GENERIC DATABASES

Loading in 2 Seconds...

play fullscreen
1 / 20

SEARCH ABILITIES IN DIGITAL LIBRARIES WITH GENERIC DATABASES - PowerPoint PPT Presentation


  • 245 Views
  • Uploaded on

SEARCH ABILITIES IN DIGITAL LIBRARIES WITH GENERIC DATABASES. Kunal Bansal kbansal@cs.odu.edu. Overview. ‘Information Search’ has grown to be an industry of its own with the advent of the WWW

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

SEARCH ABILITIES IN DIGITAL LIBRARIES WITH GENERIC DATABASES


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • ‘Information Search’ has grown to be an industry of its own with the advent of the WWW
  • Serious challenges are being posed to libraries to both traditional and digital libraries by pioneers such as Google, MSN and Y! search.
  • Cataloging information such as electronic journals, e-books, reference papers in legacy databases can be cumbersome as complexity and titles increase.

Intelligent Internet Databases

overview continued
Overview (Continued)
  • Vast amount of information comprising of different content and type such as preprint and e-print servers, digital repositories, media archives are all integrated in databases.
  • Estimates suggest there are ~ 1 billion ‘visible’ web pages and up to ~ 550 billion ‘deep’ pages in 200,000 sites as of 2001.
  • Google’s search index was recorded at 4.2 billion ‘visible’ pages as of May 2004 compared to 3.3 billion in 2003.

Intelligent Internet Databases

vision identifying possibilities
Vision (Identifying Possibilities)
  • Pages of scientific interest mainly identified through their domains (~ 22 billion).
  • Searches will not be influenced by the advertisement industry, but instead focus on content search related to quality.
  • Future searches based on major shared data source allowing individual customization for integration with a local host.
  • New service would have robustness and reliability comparable to Google, but quality and ‘proven’ content provided by networked libraries.

Intelligent Internet Databases

drawbacks in current systems challenges
Drawbacks in Current Systems (Challenges)
  • Main point of focus is metadata such as bibliographic content, references, keywords and abstract often restricted to HTML and TXT
  • Data available tends to be copyrighted and free academic content is skipped in the process
  • Sequentially incoming responses presented as joint result causing increasing dependence on target (source) databases decreasing performance and limiting scalabilty

Intelligent Internet Databases

drawbacks in current systems search comfort and page rank
Drawbacks in Current Systems (Search Comfort and Page Rank)
  • Traditional Boolean searching impacts ease of search on part of user
  • Search engines incorporating linguistic analysis and semantic dictionaries allow greater tolerance but could still return irrelevant content due to factors such as page rank
  • Ranking of results based on several unrelated factors such as payment of search index providers by individuals owning those pages

Intelligent Internet Databases

preliminary requirements for academic content
Preliminary Requirements (For Academic Content)
  • Indexing resources which are factually and intellectually sound thus enforcing a certain degree of standard.
  • Handling Data Heterogenity using intelligent mark up of of certain searches for routing and search enhancement
  • Results should overcome page rank and have the ability to filter amongst itself using further parameters from the user
  • Automated generation of metadata on the fly

Intelligent Internet Databases

federated academic network indexi
Federated Academic Network (Indexi)
  • Interoperability amongst heterogenous digital libraries such as GIOSS and STARTS
  • Searchable Database Markup Language (Search DB-ML) based on XML independent of the DB (not widely used though)
  • Central approach where Digital Libraries are linked up with a central service with XML description of capabilites

Intelligent Internet Databases

dl defination language digital library language
DL Defination Language (Digital Library Language)
  • Extension of search capabilites by description of API’s for large number of libraries.
  • Improvments in the tags of DLDL made by following factors
    • Information already included in the library
    • Access methods
    • Information to be retrieved

Intelligent Internet Databases

mapping queries federated apporach with xml
Mapping Queries (Federated Apporach with XML)
  • XML specification contains the mapping information
  • Generic specifications along with a included digital library’s behavior is used to generate the digital library XML specification
  • Resulting user interface is simple enough for future developments and modifications

Intelligent Internet Databases

integration of data sources dataflow
Integration of Data Sources (Dataflow)

Screenshot 1 : Database Flow

Intelligent Internet Databases

search engine solutions advanced search
Search Engine Solutions (Advanced Search)

Screenshot 2 : Advanced Search Capabilites

Intelligent Internet Databases

search engine solutions results
Search Engine Solutions (Results)

Screenshot 3 : Advanced Search Results

Intelligent Internet Databases

geo coding and geo parsing mapping
Geo-coding and Geo-parsing (Mapping)
  • Unique form of searching for data corresponding to geographical co-ordinates such as latitude and longitude
  • Processing of ingested documents in digital media libraries
  • Information matches references on gazetteer which then ties this to existing latitudes and longitudes
  • Currently marketed by MetaCarta which allows its search technology to probe on XML Web Service for deeper integration with existing applications

Intelligent Internet Databases

geo coding and geo parsing metacarta
Geo-coding and Geo-parsing (MetaCarta)

Screenshot 4 : Search results from MetaCarta for USEPA

Intelligent Internet Databases

worldwide initiatives distributed content gateways
Worldwide Initiatives (Distributed Content Gateways)
  • German ‘Vascoda’ portal

www.vascoda.de/ & www.vascoda.com

  • Deutsche Forschungsgemeinschaft (DFG)

www.dfg.de/ & www.dfg.de/en/

  • American Research Libraries ‘Scholars Portal’

www.arl.org/arl/pr/scholars_portal.html

  • British Resource Discovery Network (RDN)

www.rdn.ac.uk/partners/

  • European RENARDUS

www.renardus.org

  • North American SCOUT project

www.scout.wisc.edu

Intelligent Internet Databases

future developments additional needs
Future Developments (Additional Needs)
  • Introduction of template technology to add additional search boxes for user inputted parameters
  • Search Interface should be developed based on API’s used for the search.
  • Automation and Configuration during the process of gathering and pre-processing of items of interest
  • Ultimate goal to enable a user to search a multiple of independent, discretely mounted, data sources or databases through one query (in case of federated systems)

Intelligent Internet Databases

conclusions
Conclusions
  • Search abilities can take off only when concerted effort is made on part of content providers to enhance information
  • Localized infrastructure for the searches needs to be given priority to advance existing indexes
  • More investment needed in technologies such as Federated Searches, improvements in DB-XML, Search API’s and SOAP
  • Ease of usability (so called search comfort) needs to be far more superior.

Intelligent Internet Databases

references courtesy world wide web
References (Courtesy: World Wide Web)
  • Searching Digital Libraries – Pros and Cons

http://www.dlib.org/dlib/june04/lossau/06lossau.html

  • Search Engines for Digital Libraries – A Realization

http://www.dlib.org/dlib/september04/lossau/09lossau.html

  • Federated Searches for Libraries

http://www9.org/final-posters/poster17.html & http://en.wikipedia.org/wiki/Federated_search

  • MetaCarta

http://www.metacarta.com

  • Page Rank Citation – Bringing Order to the Web

http://dbpubs.stanford.edu:8090/pub/showDoc.Fulltext?lang=en&doc=1999-66&format=pdf&compression

Intelligent Internet Databases

questions comments
Questions & Comments

Anybody ?

Intelligent Internet Databases