Gir wg @ ogf19
Download
1 / 17

GIR-WG @ OGF19 - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

GIR-WG @ OGF19. Grid Information Retrieval Working Group January 30, 2007 Chapel Hill, NC. Agenda. IP Policy reminder Introduce participants GIR-WG charter & overview GIR document status review Reference implementations Mention of related work elsewhere Paul Kim presentation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' GIR-WG @ OGF19' - angie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Gir wg @ ogf19

GIR-WG @ OGF19

Grid Information RetrievalWorking Group

January 30, 2007

Chapel Hill, NC


Agenda
Agenda

  • IP Policy reminder

  • Introduce participants

  • GIR-WG charter & overview

  • GIR document status review

  • Reference implementations

  • Mention of related work elsewhere

  • Paul Kim presentation

  • Chris Fallen presentation

  • Discussion

2


Session particulars
Session Particulars

  • OGF IP policies apply

  • GIR-WG chairs:

    • Dr. Greg Newby, Arctic Region Supercomputing Center

    • Dr. Paul Yangwoo Kim, Dongguk U.

    • Nassib Nassar, RENCI

3


What is gir wg
What is GIR-WG?

  • GIR-WG was chartered by OGF to develop standards and reference implementations for information retrieval (IR) on computational grids.

  • GIR-WG has published a Requirements document under GGF (GFD-I.027)

  • Our first Experimental document was published recently (GFD-E.082)

  • Progress on the Architecture document is dormant, awaiting practical experience

  • Practical experience is being gained, and will result in at least further experimental documents.

4


What is information retrieval
What is Information Retrieval?

  • IR is the science and method of delivering documents that are relevant to human information needs.

  • Rather than delivering sets of matching documents (as DBMS do), IR systems rank matching documents.

  • IR systems usually focus on textual input data (aka, natural language) either unformatted or formatted (plain text, HTML, XML, etc.)

5


Gir wg charter
GIR-WG Charter

  • The GIR WG will establish a specific set of requirements, an architecture, and detailed specifications for Information Retrieval (IR) on computational grids. GIR will provide document collection management, indexing/searching, and query processing services to grid users and applications.

  • GIR Milestones:

  • GIR Requirements Document - Stakeholder-driven list of service-level requirements for building a grid-based IR system. Published in 2005 as GFD-I.27.

  • GIR Architecture Document - Describes overall system comprised of integrated grid services, scenarios, etc. Draft under consideration since 2004; based on Experimental document outcomes, final version is expected in 2007.

  • Experimental Documents - Experiences with GIR implementations or partial implementations (query processors, indexers, collection managers...). GFD-E.082 in 2006; others under consideration

  • GIR Recommendation Draft Document - Describes each service in detail, with sections for different implementation platforms (such as Web Services, Grid Services, standalone...). Draft is expected after Architecture document, in 2008.

  • GIR Recommendation Final Document - After the Draft Recommendation, based on independent interoperable implementations and further practical experiences. Within 2 years of the Draft Recommendation.

6


Why ir is a good candidate for grid computing
Why IR is a good candidatefor Grid computing

  • Excellent for “divide and conquer” coarse-grained parallelism

    • Input items are discrete

    • Coordination across subsets of a document collection can be minimal

    • Results from multiple sources can be coordinated and relevance ranked together

    • Queries may be handled independently

7


Significant progress
Significant Progress

  • Documents:

    • “GIR Requirements” published

    • “GIR Architecture” in mid-draft (dormant)

    • Experimental document: published

  • Implementation:

    • MCNC released a technology preview

    • Kim’s work: an experimental document

    • Newby’s work: heading to an experimental document

    • Nassar’s work: Sarcomere & Amberfish, open source toolkit based on GT4

    • Fallen & Newby distributed IR research

8


Requirements overview per gfd i 027
Requirements overview (per GFD-I.027)

  • Desirability of Grid infrastructure for IR, notably enterprise IR:

    • VO (for security, segmentation)

    • Conceptual separation of functions (for indexing, collection management & query processing)

    • Flexible but coarse-grained flow of control among elements

    • Persistence of queries, collections and indexes

  • Three primary components :

    • Collection manager: handles input gathering, transformation, transport, staging and delivery

    • Indexer: core information retrieval collection representation

    • Query processor: respond to user needs, including standing information needs (i.e., information filtering)

9


Implementation approaches
Implementation Approaches

  • Do not rely on particular implementations or middleware (e.g., Globus)

  • Pursue different types of Grid implementations:

    • Minimalist, home grown

    • Globus-based

    • Pure Web services

  • These approaches can each be separate Experimental docs; will be appendices in the Architecture doc

10


Gfd e 082
GFD-E.082

  • Kim: Grid Information Retrieval System for Dynamically Reconfigurable Virtual Organization

  • Practical experience on re-allocation of GIR nodes based on system load

    • Indexer, collection manager or query processor, based on system load

    • Dynamic reallocation of nodes within a computational grid

11


Nassar sarcomere
Nassar: Sarcomere

See http://sourceforge.net/projects/sarcomere/

  • Sarcomere calls a collection of documents a "database". One or more "indexes" can be created per database. Each index represents an access point for searching the document collection. In theory, indexes can differ in how they constrain the queries (e.g. by fields), what kind of data structures are used, etc. At the moment only Amberfish full text indexes are supported (index type = "Amberfish").

  • Current port types (very rudimentary and highly subject to change):

    • createDatabase

    • deleteDatabase

    • createIndex

    • deleteIndex

    • addDocument

    • Search

  • Stay tuned for more developments!

12


Newby multisearch
Newby: Multisearch

  • How can we merge result sets from different IR engines?

    • Desire to merge based on global relevance

    • Challenging because different IR engines have different scoring/ranking algorithms

    • Challenging because different collections have different characteristics, influencing ranking

  • Used for TREC by Fallen & Newby 2005, 2006

13


Simple interface to an

Axis/Tomcat backend

  • Results are merged based on statistical normalization

  • No accounting for different IR engines or different collections

    • Simplifying assumptions that all IR rankings come from the same basic distribution

14


Opportunities for interaction
Opportunities for Interaction

  • OGSA-DAI has middleware that provides basic query and result set transport

  • Search from multiple databases; add a higher-level merger

  • Seems promising for GIR!

    • http://www.ogsadai.org.uk

15


Discussion of gir wg
Discussion of GIR-WG

  • Your questions, thoughts and suggestions

16


Get involved
Get Involved!

  • Visithttp://www.gir-wg.org

  • Subscribe to [email protected]

  • Talk with chairs about data and reference implementations and documents

17


ad