1 / 74

Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library. University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management. Today. Object Relational Database Applications The Berkeley Digital Library Project

kaemon
Download Presentation

Object-Relational Database Applications -- The UC Berkeley Environmental Digital Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object-Relational DatabaseApplications -- The UC Berkeley Environmental Digital Library University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management Database Management -- Spring 1998 -- R. Larson

  2. Today • Object Relational Database Applications • The Berkeley Digital Library Project • Slides from RRL and Robert Wilensky, EECS • Use of DBMS in DL project. Database Management -- Spring 1998 -- R. Larson

  3. Final Presentations and Reports • Specifications for final report are on the Web Site under assignments • Presentations (1 on Nov. 28, Others on Nov 30, Dec 5th and 7th (Full)) Database Management -- Spring 1998 -- R. Larson

  4. Today • Object Relational Applications • The UCB Digital Library Database Management -- Spring 1998 -- R. Larson

  5. Overview • What is an Digital Library? • Overview of Ongoing Research on Information Access in Digital Libraries Database Management -- Spring 1998 -- R. Larson

  6. Digital Libraries Are Like Traditional Libraries... • Involve large repositories of information (storage, preservation, and access) • Provide information organization and retrieval facilities (categorization, indexing) • Provide access for communities of users (communities may be as large as the general public or small as the employees of a particular organization) Database Management -- Spring 1998 -- R. Larson

  7. Traditional Library System Originators Libraries Users Database Management -- Spring 1998 -- R. Larson

  8. But Digital Libraries Are Different From Libraries... • Not a physical location with local copies; objects held closer to originators • Decoupling of storage, organization, access • Enhanced Authoring (origination, annotation, support for work groups) • Subscription, pay-per-view supported in addition to “free” browsing. • Integration into user tasks. Database Management -- Spring 1998 -- R. Larson

  9. A Digital Library Infrastructure Model Originators Index Services Repositories Network Users Database Management -- Spring 1998 -- R. Larson

  10. UC Berkeley Digital Library Project • Focus: Work-centered digital information services • Testbed: Digital Library for the California Environment • Research: Technical agenda supporting user-oriented access to large distributed collections of diverse data types. • Part of the NSF/NASA/DARPA Digital Library Initiative (Phases 1 and 2) Database Management -- Spring 1998 -- R. Larson

  11. UCB Digital Library Project: Research Organizations • UC Berkeley EECS, SIMS, CED, IS&T • UCOP • Xerox PARC’s Document Image Decoding group and Work Practices group • Hewlett-Packard • NEC • SUN Microsystems • IBM Almaden • Microsoft • Ricoh California Research • Philips Research Database Management -- Spring 1998 -- R. Larson

  12. Testbed: An Environmental Digital Library • Collection: Diverse material relevant to California’s key habitats. • Users: A consortium of state agencies, development corporations, private corporations, regional government alliances, educational institutions, and libraries. • Potential: Impact on state-wide environmental system (CERES ) Database Management -- Spring 1998 -- R. Larson

  13. The Environmental Library -Users/Contributors • California Resources Agency, California Environment Resources Evaluation System (CERES) • California Department of Water Resources • The California Department of Fish & Game • SANDAG • UC Water Resources Center Archives • New Partners: CDL and SDSC Database Management -- Spring 1998 -- R. Larson

  14. The Environmental Library - Contents • Environmental technical reports, bulletins, etc. • County general plans • Aerial and ground photography • USGS topographic maps • Land use and other special purpose maps • Sensor data • “Derived” information • Collection data bases for the classification and distribution of the California biota (e.g., SMASCH) • Supporting 3-D, economic, traffic, etc. models • Videos collected by the California Resources Agency Database Management -- Spring 1998 -- R. Larson

  15. The Environmental Library - Contents • As of late 2000, the collection represents about one terabyte of data, including over 165,000 digital images, about 300,000 pages of environmental documents, and nearly 2 million records in geographical and botanical databases. Database Management -- Spring 1998 -- R. Larson

  16. Botanical Data: • The CalFlora Database contains taxonomical and distribution information for more than 8000 native California plants. The Occurrence Database includes over 600,000 records of California plant sightings from many federal, state, and private sources. The botanical databases are linked to our CalPhotos collection of Calfornia plants, and are also linked to external collections of data, maps, and photos. Database Management -- Spring 1998 -- R. Larson

  17. Geographical Data: • Much of the geographical data in our collection is being used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000-records from the USGS GNIS database. California Dams is a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area. Database Management -- Spring 1998 -- R. Larson

  18. Documents: • Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by California state agencies. This collection includes documents, maps, articles, and reports on the California environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species. Database Management -- Spring 1998 -- R. Larson

  19. Documents - cont. • The collection also includes about 20Mb of full-text (HTML) documents from the World Conservation Digital Library. In addition to providing online access to important environmental documents, the document collection is the testbed for our Multivalent Document research. Database Management -- Spring 1998 -- R. Larson

  20. Testbed Success Stories • LUPIN: CERES’ Land Use Planning Information Network • California Country General Plans and other environmental documents. • Enter at Resources Agency Server, documents stored at and retrieved from UCB DLIB server. • California flood relief efforts • High demand for some data sets only available on our server (created by document recognition). • CalFlora: Creation and interoperation of repositories pertaining to plant biology. • Cloning of services at Cal State Library, FBI Database Management -- Spring 1998 -- R. Larson

  21. Research Highlights • Documents • Multivalent Document prototype • Page images, structured documents, GIS data, photographs • Intelligent Access to Content • Document recognition • Vision-based Image Retrieval: stuff, thing, scene retrieval • Natural Language Processing: categorizing the web, Cheshire II, TileBar Interfaces Database Management -- Spring 1998 -- R. Larson

  22. Multivalent Documents • MVD Model • radically distributed, open, extensible • “behaviors” and “layers” • behaviors conform to a protocol suite • inter-operation via “IDEG” • Applied to “enlivening legacy documents” • various nice behaviors, e.g., lenses Database Management -- Spring 1998 -- R. Larson

  23. Document Presentation • Problem: Digital libraries must deliver digital documents -- but in what form? • Different forms have advantages for particular purposes • Retrieval • Reuse • Content Analysis • Storage and archiving • Combining forms (Multivalent documents) Database Management -- Spring 1998 -- R. Larson

  24. Spectrum of Digital Document Representations AdaptedfromFox, E.A., etal. “Users, User Interfaces and Objects: Evision, an Electronic Library”, JASIS 44(8), 1993 Database Management -- Spring 1998 -- R. Larson

  25. Document Representation: Multivalent Documents • Primary user interface/document model for UCB Digital Library (Wilensky & Phelps) • Goal: An approach to new document representations and their authoring. • Supports active, distributed, composable transformations of multimedia documents. • Enables sophisticated annotations, intelligent result handling, user-modifiable interface, composite documents. Database Management -- Spring 1998 -- R. Larson

  26. Network Protocols & Resources Cheshire Layer GIS Layer Table Layer OCR Layer OCR Mapping Layer Valence: 2: The relative capacity to unite, react, or interact (as with antigens or a biological substrate). Webster’s 7th Collegiate Dictionary History of The Classical World kdk dkd kdk Modernjsfj sjjhfjs jsjj jsjhfsjf sslfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj The jsfj sjjhfjs jsjj jsjhfsjf sjhfjksh sshf jsfksfjk sjs jsjfs kj sjfkjsfhskjf sjfhjksh skjfhkjshfjksh jsfhkjshfjkskjfhsfh skjfksjflksjflksjflksf sjfksjfkjskfjskfjklsslk slfjlskfjklsfklkkkdsj ksfksjfkskflk sjfjksf kjsfkjsfkjshf sjfsjfjks ksfjksfjksjfkthsjir\\ ks ksfjksjfkksjkls’ks klsjfkskfksjjjhsjhuu sfsjfkjs Scanned Page Image taksksh kdjjdkd kdjkdjkd kj sksksk kdkdk kdkd dkk skksksk jdjjdj clclc ldldl Table 1. Multivalent Documents Database Management -- Spring 1998 -- R. Larson

  27. Database Management -- Spring 1998 -- R. Larson

  28. Database Management -- Spring 1998 -- R. Larson

  29. MVD Third Party Work • Japanese support by NEC; application to office document management • Printing, support for other OCR formats, by HP • Chinese character and multilingual lens by UCB Instructional Support staff (Owen McGrath) • Automatic enlivening of documents via Transcend proxy. Database Management -- Spring 1998 -- R. Larson

  30. MVD Forthcoming • Support for XML + style sheets • More robust parsing • Saving where you want • Media adaptors for • Continuous media • Near image formats, word proc. formats • Improve authoring tools • Interoperation with paper • Application versus applet? • Release to community, get feedback, iterate. Database Management -- Spring 1998 -- R. Larson

  31. GIS in the MVD Framework • Layers are georeferenced data sets. • Behaviors are • display semi-transparently • pan • zoom • issue query • display context • “spatial hyperlinks” • annotations • Written in Java (to be merged with MVD-1 code line?) Database Management -- Spring 1998 -- R. Larson

  32. GIS Viewer: Recent Developments • Annotation and saving • points, rectangles (w. labels and links), vectors • saving of annotations as separate layer • Integration with address, street finding, gazetteer services • Application to image viewing: tilePix • Castanet client Database Management -- Spring 1998 -- R. Larson

  33. Database Management -- Spring 1998 -- R. Larson

  34. Database Management -- Spring 1998 -- R. Larson

  35. Database Management -- Spring 1998 -- R. Larson

  36. GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.html Database Management -- Spring 1998 -- R. Larson

  37. Geographic Information: Plans and Ideas • More annotations, flexible saving • Support for large vector data sets • Interoperability • On-the-fly • conversion of formats • generation of “catalogs” • Via OGDI/GLTP • Experimenting with various CERES servers Database Management -- Spring 1998 -- R. Larson

  38. Documents: Information from scanned document • Built document recognizers for some important documents, e.g. “Bulletin 17”. “TR-9”. • Recognized document structure, with order magnitude better OCR. • Automatically generated 1395 item dam relational data base. • Enabled access via forms, map interfaces. • Enable interoperation with image DB. Database Management -- Spring 1998 -- R. Larson

  39. Document Recognition: Future Plans • Document recognizers: for ~ dozen document types • Development and integration of mathematical OCR and recognition. • Eventually produce document recognizer generator, i.e., make it easier to write recognizers. Database Management -- Spring 1998 -- R. Larson

  40. Vision-Based Image Retrieval Find objects by grouping coherent low-level properties • Stuff-based queries: “blobs” • Basic blobs: colors, sizes, variable number • demonstrated utility for interesting queries • “Blob world”: Above plus texture, applied to • retrieving similar images • successful learning scene classifier • Thing-finding: Successfully deployed detectors adding body plans (adding shape, geometry and kinematic constraints) Database Management -- Spring 1998 -- R. Larson

  41. Image Retrieval Research • Finding “Stuff” vs “Things” • BlobWorld • Other Vision Research Database Management -- Spring 1998 -- R. Larson

  42. (Old “stuff”-based image retrieval: Query) Database Management -- Spring 1998 -- R. Larson

  43. (Old “stuff”-based image retrieval: Result) Database Management -- Spring 1998 -- R. Larson

  44. Blobworld: use regions for retrieval • We want to find general objectsRepresent images based on coherent regions Database Management -- Spring 1998 -- R. Larson

  45. (“Thing”-based image retrieval using “body plans”: Result) Database Management -- Spring 1998 -- R. Larson

More Related