from geospatial data to knowledge the service framework implementation and research issues l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues PowerPoint Presentation
Download Presentation
From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues

Loading in 2 Seconds...

play fullscreen
1 / 33

From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues - PowerPoint PPT Presentation


  • 168 Views
  • Uploaded on

From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues Liping Di Laboratory for Advanced Information Technology and Standards (LAITS) George Mason University Introduction Geospatial data is the major type of data that human beings has collected.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues' - adamdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
from geospatial data to knowledge the service framework implementation and research issues

From Geospatial data to Knowledge—The Service Framework, Implementation, and Research Issues

Liping Di

Laboratory for Advanced Information Technology and Standards (LAITS)

George Mason University

introduction
Introduction
  • Geospatial data is the major type of data that human beings has collected.
    • more than 80% of the data are geospatial data.
  • Image/gridded data is dominant form of geospatial data in terms of volume.
    • Most of those data are collected by the EO community.
  • Geospatial data will grow to ~exabyte very soon.
    • NASA EOSDIS has more than one petabyte of data in archives; more than 2 terabytes per day of new data are added.
    • Application data centers: 10’s of terabytes of imagery
    • Tens of thousands of datasets on-line now.
  • How to effectively, wisely, and easily use the geospatial data is the key information technology issue that we have to solve.
the problems
The Problems
  • In order for the geospatial data to be useful, they have to be converted to user-specific information and knowledge.
  • However, the conversion requires:
    • Significant amount of knowledge
      • Domain knowledge for information/knowledge extraction from raw data
      • Domain knowledge on the geospatial data processing/formats
    • Significant amount of computer hardware and software resources.
    • As a result, currently the use of geospatial data is very expensive
  • Most geo-imagery will never be directly analysed by humans
    • Human attention is the scarce resource, insufficient to analyse petabytes of geospatial data.
    • Many datasets have not been analysed once before they are archived.
  • The fundamental problem is that current data and information systems running by EO agencies only can provide data at best, not the user-specific information and knowledge.
    • Rich in geospatial data but poor in up-to-date geospatial information and knowledge.
what users need
What Users Need
  • The ready-to-use geospatial information and knowledge that can answer the specific application questions of the end users.
  • Who cares about if or not the answer is derived from Landsat TM, or SPOT, or field observations, as long as the users can obtain the right answer and accuracy of it easily from geospatial information systems.
  • Instead of only experts can use the geospatial data, everyone, from students to decision-makers can obtain and use the geospatial information and knowledge easily.
what we need to do
What We Need to Do
  • As geospatial technology developers, we need to:
    • Make the geospatial information the mainstream information so that anyone can easily obtain the ready-to-use geoinformation and knowledge if they want.
  • What we need to develop is a system that can automatically convert the geospatial data to user-specific geoinformation and knowledge.
    • Automate the process from geospatial data to information to knowledge.
steps from geospatial data to information knowledge
Steps from Geospatial Data to Information & Knowledge
  • Three steps in the processes for conversion of geospatial data into the information and knowledge
    • Geoquery: to discover the data needed for the project and obtain the data.
    • Geo-assembly: assemble data from multiple sources into a homogeneous form for analysis.
    • Geo-analysis/modeling: geospatial information extraction (GIE) and geospatial knowledge discovery (GKD).
  • In order to automate the process of conversion, we need to automate all of the three steps.
the service oriented architecture soa
The Service-Oriented Architecture (SOA)
  • The key component in the service-oriented architecture is services
  • A service is a well-defined set of actions. It is self-contained, stateless, and does not depend on the state of other services.
  • Stateless means that each time a consumer interacts with a Web Service, an action is performed. After the results of the service invocation have been returned, the action is finished. There is no assumption that subsequent invocations are associated with prior ones.
  • In the service-oriented architecture, the description of a service is essentially a description of the messages that are exchanged between the consumer and the service.
  • Standard-based individual services can been chained together to solve complex tasks.
  • The implementation of SOA in the web environment is called Web services.
web services
Web Services
  • Web Services are self-contained, self-describing, modular applications that can be published, located, and dynamically invoked across the Web.
  • Web services perform functions, which can be anything from simple requests to complicated business processes.
  • Once a Web service is deployed, other applications (and other Web services) can discover and invoke the deployed service.
  • The real power of web services relies on
    • Everyone on the Internet can set up a web service to provide service to anyone who wants—many services will be available.
    • The standard-based services can be chained together dynamically to solve complicated tasks – Just in-time integration.
the difference between web service and open grid service
The Difference between Web Service and Open Grid Service
  • Globus 3.0 implemented the Open Grid Service Architecture.
  • The fundamental concepts of services in the Grid are the same as Web services.
  • The differences between Grid and Web services include
    • A Web service can be invoked by any consumer over the Web while a Grid service can only be invoked by consumers within the virtual organization, similar to the difference between Internet and Intranet.
    • Web services practice has been extended in Grid to accommodate the additional requirements of Grid services
      • Stateful interactions between consumers and services
      • Exposure of a web service’s “publicly visible state”
      • Access to (possibly large amounts of) identifiable data
      • Service lifetime management
approach for the automation of gkd
Approach for the automation of GKD
  • In order for automating the process of geoinformation extraction and KD, all steps in the geospatial KD has to be automated.
  • The fundamental requirement for the automation of GKD is that all data and associated metadata have to be on-line.
  • Geoquery (including the data access): The OGC standards on data discovery and on-line access allow for automation of geoquery and data access.
  • Geo-assembly: The automated data transformation services have been implemented as part of data access that allows to obtain data from multiple sources in a ready-to-analyze homogenous form through interoperable, personalized, on-demand data access and services (IPODAS).
  • Geo-analysis/modeling: Automation of GKD through the approach of automated web service chaining is proposed.
geo object geo tree virtual dataset geospatial models
Geo-object, Geo-tree, Virtual Dataset, Geospatial Models

modeling and virtual data services

no service

data service

User Requested

User Obtained

archived geo-object

user geo-object

Geospatial web/Grid services

Intermediate geo-object

Automated data transformation service(WCS/WFS)

relationship between geo tree and service chain
Relationship between Geo-tree and Service Chain
  • Geo-tree is a conceptual model representing the step-by-step how a geo-object (representing either GI or GK) is derived.
    • representing the domain knowledge needed for analyzing geospatial data to derive user-specific GI or GK.
  • The root of a geo-tree is a virtual product that the Geo-tree can derive, and all sub-trees are also virtual products.
  • A service chain is the instantiation of the Geo-tree. If a Geo-tree can be instantiated, then all virtual products in the tree can be produced on demand.
types of service chaining
Types of Service Chaining
  • Three types of service chaining are defined in ISO 19119:
    • User-defined (transparent) – the Human user defines and manages the chain.
    • Workflow-managed (translucent) – the Human user invokes a service that manages and controls the chain, where the user is aware of the individual services in the chain.
    • Aggregate (opaque) – the Human user invokes a service that carries out the chain, where the user has no awareness of the individual services in the chain.
  • An intelligent geospatial web service system needs to support all three chaining methods.
construction of service chains
Construction of Service Chains
  • The first type of chaining allows users to construct a geospatial model to be run in the system
    • Require domain knowledge—for expert to contribute their domain knowledge.
    • The knowledge is kept in the Geo-tree/service chain.
  • The second type of chaining basically is to use existing geo-tree to materialize a virtual object.
    • Anyone can use this type of chaining to produce a virtual product on demand.
    • Anyone can use but it is not able to produce a product who’s geo-tree doesn’t already exist in a data/information system.
  • The third type of chaining require the system to be intelligent enough to automatically form a geo-tree/service chain by decomposing user’s query.
    • require the domain knowledge
    • require the automated reasoning.
    • Anyone can use and can produce a new product based on users’ query automatically.
  • The first two types of chains do not require significant machine intelligence.
    • Current technology is enough for implementing such chaining approach.
  • The third one requires significant machine intelligence
    • Current technologies are not able to provide such kind of chaining.
    • Significant research is needed.
features of service based d i systems
Features of Service-based D&I Systems
  • Evolvable
    • The system will grow its capabilities with more and more service modules and models become available.
  • Truly open, distributed system
    • anyone can set up a service and to become a part of the system after proper registration of his services.
    • This is not true in GRID environment
  • Build by the community for the community
    • User community involvement of the development is one of the keys for the success of the system since they can contribute service modules or models to the system.
    • A contributor is not necessary to set up the services by using their resources. The contributions can be hosted anywhere on the web.
  • Complying with standards for building the service modules and models are the key for the system to work.
the key standard environments
The Key Standard Environments
  • The system has to be built on the two key standard environment.
    • common data environment
    • common service environment
  • The common data environment provides the standard interfaces for data discovery and access from multiple data archives.
    • upon that the service can be built independent of data sources and formation.
    • The OGC WCS, WFS, WMS, and WRS are the fundamental standards for building a common data environment.
  • The common service environment provides standard interfaces and methods for service declaration, discovery, chaining, binding, and execution.
    • foundation for interoperating and chaining services.
    • The W3C provides the base standards and OGC provides the geospatial extensions.
implementation in grid environment
Implementation in Grid Environment
  • The implementation in Web service environment has been detailed in the paper.
  • This presentation gives some details in the implementation in Grid environment (geospatial information Grid)
ogc interfaces to the geospatial data grid
OGC Interfaces to the Geospatial Data Grid

MCS Query

MCS Database

OGC WRS Query

MCS Web Server

WRS

Server

inter-faces

Logical Filenames

Metadata Catalog Service

OGC

Client

Logical Filenames

WRS Results

Physical locations

Replica index node

Replica cat.

OGC data protocols

Replica Location Service

WCS/WFS/WMS

Server inter-faces

Physical locations

Transformed Data

Physical Storage System/files

Data

research issues in virtual geospatial data services
Research Issues in Virtual Geospatial Data Services
  • Representation of Geo-Tree.
    • The Geotree/model description
      • Need a language to describe the geo-tree and subtree
      • Only logical and thematic description of the tree.
      • Not attach to individual physical files, use virtual data types
  • Service module cataloging
    • Need a catalog in MCS to catalog all service modules available (modules not necessary in the same system)
    • Describe the inputs, outputs, and how to invoke the service
    • Use for both manually or automatically constructing the geo-tree.
    • Use for instantiation of the virtual dataset (to create the workflow)
research issues in virtual geospatial data services22
Research Issues in Virtual Geospatial Data Services
  • Geo-tree/model database
    • Contains all geo-trees available
  • Virtual dataset cataloging
    • Need to catalog all virtual datasets in a geo-tree (the root of the geo-tree and all intermediate datasets ).
    • Catalog the virtual datasets with the real data set, use the same description as the real dataset in the catalog except for two things:
      • no description of spatial and temporal coverage
      • include a point to the entry to the geotree database where the specific geotree is located.
additional required functional components
Additional Required Functional Components
  • Logical Instantiation
    • This component will check if a virtual data can be materialized against a specific user search.
    • Generate logical filenames which are unique and different from the real logical filenames.
    • Generate the logical workflow per request of physical Instantiation (filenames in the workflow are logical names)
  • Physical Instantiation
    • This component will produce the executable workflow when user actually requests the virtual dataset.
    • What workflow language should be used?
  • Workflow execution manager
    • Manage the execution of the workflow for materializing the virtual dataset.
    • Return the materialized dataset to users.
virtual data services in the geospatial data grid
Virtual Data Services In the Geospatial Data Grid

Metadata Catalog Service

MCS Web Server

MCS Query

MCS Database

1

2

GeoTree Lib

LF

logical instant.

Module cata.

OGC WRS Query

WRS

Server

2

LF

OGC

Client

Replica index node

Replica cat.

PL

WRS Results

Replica Location Service

3

OGC data protocols

2

Workflow Execution Manager

2

Physical Inst.

WCS/WFS/

WMS

Server

4

5

Transformed Data

PL

Physical Storage System/files

Data

1: Matched virtual datasets

2: logically instanced virtual filenames (LIVF)

3. logical workflow

4. Physical workflow

5. Data

LF: Logical filename Yellow: New component

PL: Physical Location Light Yellow: Modified MCS component

research on semantic intelligent geospatial grid
Research on Semantic/intelligent Geospatial Grid
  • The goal is to fully automate the process from data to knowledge in a limited domain.
  • Four key issues for geospatial grid services
    • How to automatically decompose user’s query (user object) to construct the geo-tree based on distributed data and service catalogs?
    • How to represent the geo-trees in computer understandable and executable workflows?
    • How to manage, share, and reuse geo-trees that represent the geospatial knowledge of a specific domain?
    • How to execute the geo-tree at the distributed, web service environment automatically to derive the product that exactly meets the user’s query.
  • This first one requires geospatial domain knowledge while all other three are common in the broad web service community.
    • Ontology-based machine reasoning may be the solution for the issues.
    • AI community is working on this issue. And we will develop geospatial ontologies to test the AI solution.
  • The other three issues are also concerned by W3C and OGC. And we will follow closely the progresses made by them.
the development team
The Development Team
  • PI, Liping Di, LAITS/GMU.
  • Co-I, Williams Johnston NASA Ames and DOE LBNL.
  • Co-I, Deans Williams, DOE LLNL.
implementation plan
Implementation Plan
  • The first phase is the testbed and initial integration, including the setup of the development environment, preliminary design of the integration, and implementation of WCS access to Grid-managed data.
  • The second phase is the data naming and location transparency, which include the use of Data Grid and Replica Services (metadata catalogues, replication location management, reliable file transfer services, and network caches) to provide naming and location independence for data used by NWGISS and revising NWGISS to invoke such Grid services.
    • The approach to investigating the Data Grid and Replica Services will be to configure a Data Grid testbed. This will be followed by the integration of NWGISS data catalogs into a data Grid catalog and the investigation of naming approaches, followed by interfacing NWGISS with data generators and Data Grid Replica Location service
  • The third phase is the virtual dataset research and development.
the development environment
The development environment
  • A prototype development environment has been setup at LAITS/GMU
    • Three machines—2 Linux and 1 SunFire Unix servers.
    • Machines are linked through 100 Mb LAN.
    • External link to Internet through dedicated T1 line.
  • The real development/demo environment is being set
    • GMU will purchase a server with 4-8 Tb of disk space. The machine will be hosted at NASA Goddard.
    • NASA AMES will provide a machine with 4-8 Tb disk space and 30 Tb near real-time storage device.
    • DOE LLNL will provide a machine with 2-4 Tb of disk space.
    • 1 Gb/sec Internet link
  • The machine will be populated with NASA EOS data
    • e.g., MODIS, ASTER, Landsat, MISR.
current status
Current Status
  • Most of the phase-one developments are near complete.
    • MCS has been extended to handle spatial, temporal, and parameter-based search by adding a layer on top of MCS.
    • WRS interface has been implemented on top of MCS for data search and discovery.
    • WCS server has been modified to access data within the virtual organization.
    • WRS and WCS are connected for searching and then deliver the one-demand data to users.
  • A demonstration will show the on-demand access of Grid-managed EOS data through a OGC client.
    • You will not see the Grid because it supposes to work invisible to clients outside to the Grid.
near term plan within 6 months
Near-term Plan (within 6 months)
  • Build the real development/test environment.
  • Modify WCS server so that it can work as the client to another WCS server remotely located in other machine.
    • to fetch just the right amount of data to the requested machine within the Grid.
  • Test the service concepts
    • Register services in MCS
    • Couple services with data
      • enable to search available services associated with data
      • enable to search available data with a given service
  • Develop fundamental data services
    • Reformatting, subsetting/resampling
    • Georectification/reprojection
    • Supervised and unsupervised classification services
the nasa eosdis data pools
The NASA EOSDIS Data Pools
  • The NASA EOSDIS project is implementing data pools that contains huge amount of remote sensing data on-line for users to directly and rapidly access.
  • The data pools will be operated at each of nine NASA’s distributed active archive centers (DAACs).
  • Each data pool will provide discipline-specific EOSDIS data archived at the DAAC.
  • DAACs are connected through the high-speed network
  • Currently there are total four operational data pools at GSFC, Langley, EDC, and NSIDC.
    • Both data search through search criteria and data finding through browsing/drilling-down are provided.
    • ftp for data downloading. No data services is provided.
  • OGC WCS interface is being implemented.
    • provide better data access than FTP.
the ceos grid data pool application
The CEOS Grid Data Pool Application
  • Deploy the geospatial Grid software developed by GMU-led team to data pools as one of CEOS Grid Applications
    • Initially at NASA Goddard DAAC.
    • Intend to expand to all data pools.
  • The application will provide
    • secured sharing of computing resources among the data pools.
    • a single point of entry to all resources in the pools--location transparent.
    • geospatial standard-based data discovery and access.
    • Automatic data transformation services
    • Virtual data services
    • Interactive geospatial modeling, execution, and model sharing.
contribution to the ceos grid activities
Contribution to the CEOS Grid Activities
  • Share the technology, software, and experience with other CEOS Grid application projects
  • Contribute the Grid data pool application to the CEOS Grid testbed for technology demonstration.
  • The technology and the software created by this project can be used to create a CEOS-based Global EO Data and Information Grid
    • Support globally sharing the EO data, information, and/or computational resources.
    • Support International scientific and EO initiatives such as Integrated Global Observation System (IGOS).
    • Support the use of EO data/information in the developing countries for environmental monitoring and decision support.