1 / 19

The Niagara Project

The Niagara Project. “I have avoided networking like the plague. I am terrified of getting [a connection] because it’s like drinking from Niagara Falls.” - Arthur C. Clarke. Who is working on Niagara?. Professors: DeWitt and Naughton @ UW, Maier @ OGI Students: Lots of them!

alaura
Download Presentation

The Niagara Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Niagara Project “I have avoided networking like the plague. I am terrified of getting [a connection] because it’s like drinking from Niagara Falls.” - Arthur C. Clarke

  2. Who is working on Niagara? • Professors: DeWitt and Naughton @ UW, Maier @ OGI • Students: Lots of them! • See http://www.cs.wisc.edu/niagara

  3. Goals of the Niagara Project: • In broadest terms, to: • improve the precision of Internet searching • allow queries over the whole Internet (the “FROM *” clause) • work over streams as well as static files • monitor the Internet for changes • Not finished yet...

  4. Current status: • Completed three java prototypes: • A “text-in-context” XML search engine. • An XML-QL query engine. • An XML-QL trigger engine. • Doing the same thing (again) in C++, maybe with Quilt as query language. • Finding (solving?) interesting research problems along the way...

  5. Text-in-Context XML SE • Rather than ask: What are all the documents that contain the string “Montreal”? We can ask: What are all the documents that contain ship departure information for a ship whose name is “Montreal”?

  6. How it works: • Locate documents by crawling the web or using explicit input from user. • Build local index on these docs that supports fast evaluation of Search Engine Query Language (SEQL) queries. • Return URL’s of documents that satisfy SEQL queries. • Two uses: stand alone, or part of XML-QL

  7. XML-QL Query Engine • Evaluates queries expressed in XML-QL. • Result is XML • Different from Search Engine: Instead of asking: Find all files with ship departure events where the ship’s name is “Montreal”? We can ask What is a list of departure dates for ships named “Montreal”?

  8. Ex: Fragment of XML file... <department> <deptname> Electrical Engineering </deptname> <faculty> <name> <lastname>Robertson</lastname> <firstname>Pedro</firstname> </name> <phone>6988086</phone> <email>Robertson.Pedro@foo.edu</email> <office>660</office> </faculty> </department>

  9. XML-QL Query... WHERE <department> <deptname>"Electrical Engineering"</> <faculty> <name> <lastname> $v2 </> <firstname> $v3 </> </> </> content_as $v4 CONSTRUCT <fname> $v4 </>

  10. Important Question • Which documents should be consulted to answer an XML-QL query? We support three approaches: • explicitly listed documents (“in foo.xml”) • documents conforming to DTD (“conforms to some_dtd.xml”) • documents that satisfy search engine predicates extracted from query

  11. Example of third approach: • Given the previous XML-QL query finding first and last names of EE faculty members, the system will extract this Search Engine query: department CONTAINS (deptname IS "Electrical Engineering" AND faculty CONTAINS name CONTAINS (lastname AND firstname))

  12. Control Flow for Typical Query • So full flow of typical XML-QL query: • user submits XML-QL query • system extracts SEQL query from XML-QL, passes it to search engine • search engine evaluates SEQL query, returns list of URLs to XML-QL query engine • XML-QL engine fetches documents from URL list, evaluates query • Answer returned to the user.

  13. XML-QL Trigger Engine • Goal: • allow users to define “triggers” on XML files using XML-QL predicates. • Scale to huge numbers of triggers by exploiting commonality among sets of triggers.

  14. Some research topics... • Semantics and impl. of queries over streams? • Use RDBMS for anything at all? • How smart should the search engine be? • Can you use caching anywhere? • Query optimization: plan space, stats? • How should you index (cached?) XML? • What do you do with queryable sources? • How do you handle huge #s of triggers? • Performance, performance, performance.

  15. A Petabyte in your Pocket David DeWitt, Dave Maier @ OGI, Jeff Naughton

  16. Title of NSF ITR Project • What does it mean? • Goal is to have, available from a PDA, your evolving and customized view of all the on-line digital data that exists anywhere. • Goal is not to develop holographic memory technology or DNA-based storage units.

  17. What the PetDB is: • An example of what can be done with new software infrastructure termed “Net Data Managers” (NDMs.) • NDMs: • focus on data movement as well as storage • store and query data of arbitrary types without a schema having been defined • execute queries and triggers over tens of thousands of information sites

  18. Connection with Niagara... • Niagara is a very early prototype of a simple NDM. • Project goal: • continue developing Niagara and working on research problems that arise • prototype a simple NDM application using Niagara to see if we are on the right track

  19. For more information... • Web site http://www.cs.wisc.edu/niagara • Talk to me or any other Niagara project member...

More Related