domain specific software architectures for science lecture for software architectures usc 578 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 PowerPoint Presentation
Download Presentation
Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578

Loading in 2 Seconds...

play fullscreen
1 / 46

Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 - PowerPoint PPT Presentation


  • 307 Views
  • Uploaded on

Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 Dan Crichton April 2010 Topics Introduction – who am I? Architecture – what is means to me Challenges in Developing Architectures Reference Architecture vs Domain Specific Software Architectures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578' - jana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
domain specific software architectures for science lecture for software architectures usc 578

Domain Specific Software Architectures for ScienceLecture for Software Architectures USC 578

Dan Crichton

April 2010

topics
Topics
  • Introduction – who am I?
  • Architecture – what is means to me
  • Challenges in Developing Architectures
  • Reference Architecture vs Domain Specific Software Architectures
  • Experience in Science
  • Lessons Learned
  • Q&A
who am i
Who am I?
  • Employed by Jet Propulsion Laboratory since 1995; prior software engineering positions at Hughes Aircraft Company and in private industry
  • MS in Computer Science, USC; 20+ years of experience
  • Program Manager & Principal Computer Scientist for
    • Planetary Data System Engineering in Solar System Exploration Directorate
    • Data Systems and Technology in Earth and Technology Directorate
  • Principal Investigator for
    • Informatics Center, Early Detection Research Network, National Cancer Institute
    • Facilitating Integration of NASA and Earth System Grid, NASA
    • Object Oriented Data Technology
  • Several co-Investigator Tasks
architecture why do i care
Architecture: why do I care?
  • Architecture is a game changer in our business
    • Enable scientific discovery, novel engineering, etc
    • Coordination across multiple enterprises
  • Data system costs per mission, project, investigation, etc is high
  • Technology infusion is limited
  • Experience and knowledge reuse
but there are challenges
But, there are challenges
  • Lack of true architects
    • Most think of point solutions or confuse architecture and implementation
    • Abstracting is difficult
  • Governance is often at a project level; little view at an enterprise level
  • Limited planning and understanding of the reference requirements
architects what are they
Architects: what are they?
  • Effective Architects have…
  • Years of experience
  • Holistic view of domain
    • Look at both aesthetics and practical details
    • Variable technical depth
  • Lifecycle roles
    • Strong involvement up-front
    • May oversee development
    • Chooses stable steps in development
  • Effective Architects are not…
  • Lone inventors or scientists
    • The architect is a good communicator and politician -- architectures must be sold and explained and their integrity maintained
    • Architecting is not a science, but depends on science
  • Purely technologists
    • Architecture is a strategy
  • “Top level only” designers
    • Details are often critical
  • Collaborators
    • A coherent vision is critical; they drive it
architecture what is it
Architecture: what is it?
  • The fundamental organization of a system embodied in its components, their relationships to each other, and to the environment, and the principles guiding its design and evolution. (ANSI/IEEE Std. 1471-2000)
communicating an architecture
Communicating an architecture
  • A good architecture is one that can be communicated to the stakeholders
  • A good architecture presents viewpoints of the system that address stakeholder concerns
  • A good architecture uses models and descriptions that are relevant to the stakeholders
    • Different models may be used to present different viewpoints (e.g., A UML model of the system may be appropriate for some but not all stakeholders)
viewpoints and views

The view is what you see

The viewpoint is where you look from

Viewpoints and views
  • A viewpoint is a template for constructing a view
    • Enterprise, Functional, Informational, etc
  • A view is a description of the entire system from the perspective of a set of related concerns. A view is composed of one or more models.
  • A model is an abstraction or representation of some aspect of a thing
  • Examples: RM-ODP, FEAF, TOGAF, etc

(Project Managers, Engineers, Scientists, Business Analysts, …)

reference architectures
Reference Architectures
  • Show components, functions, and interfaces at a high level of abstractions
  • Likewise, we consider information models to also be part of a reference architecture (at a sufficient abstract level)
    • In observing systems, the information model patterns are highly compatible as a reference information model
  • Implementation neutral; architectural frameworks can be useful in defining a structure for a reference architecture
  • We use Reference Architectures to give us a strategic advantage as well as improve enterprise scale software
domain specific software architectures
Domain Specific Software Architectures*
  • Domain model
    • Leverage experts who have the “holistic” view and can drive the need for product lines
    • An unambiguous view is critical (in fact, this has been a problem in science arenas)
  • Reference requirements
    • Drives the reference architecture
    • However, it is critical to map domain models to reference requirements in order to understand the solution space
  • Reference architecture
    • Satisfies an abstracted set of functions from the reference requirements
    • It’s engineered for the “ilities” reusability, extensibility and configurability
    • It demonstrates the separation of functional elements of the architecture

* Tracz, Will, Domain-Specific Software Architecture, ACM SIGSOFT, 1995

ras vs dssas in science
RAs vs DSSAs in Science
  • In science data systems, construction of multiple architecture viewpoints of a system is critical
    • Process/Enterprise
    • Information/Data
    • Technology
  • We find the “viewpoints” are similar, but models can be domain specific
    • This is the opportunity to develop a reusable reference architecture if the “patterns” can be extracted
scientific data systems
Scientific data systems
  • Covers a wide variety of disciplines
    • Solar system exploration
    • Astrophysics
    • Earth science
    • Biomedicine
    • etc
  • Each has its own communities, standards and systems
  • But, there is an underlying reference architecture and discipline software architectures in each!
the e science trend
The “e-science” trend
  • Highly distributed, multi-organizational systems
    • Systems are moving towards loosely coupled systems or federations in order to solve science problems which span center and institutional environments
  • Sharing of data and services which allow for the discovery, access, and transformation of data
    • Systems are moving towards publishing of services and data in order to address data and computationally-intensive problems
    • Infrastructures which are being built to handle future demand
  • Address complex modeling, inter-disciplinary science and decision support needs
    • Need a dynamic environment where data and services can be used quickly as the building blocks for constructing predictive models and answering critical science questions
  • Changing the way in which data analysis is performed
    • Moving towards analysis of distributed data to increase the study power
    • Enabling greater collaboration across centers
context space data systems
Context: Space data systems

Relay Satellite

Simple Information Object

Spacecraft and

Scientific Instruments

Spacecraft / lander

Science

Data Archive

External Science Community

Primitive Information Object

Primitive Information Object

Science Information Package

Science Information Package

Science

Data Processing

Science Products - Information Objects

Telemetry Information Package

Science Information Package

Data Analysis and Modeling

Science Information Package

Planning

Information

Object

Instrument

Planning

Information

Object

Science Team

Data Acquisition

and Command

Mission Operations

Instrument /Sensor

Operations

  • Common Meta Models for Describing Space Information Objects
  • Common Data Dictionary end-to-end
earth science data systems

Science Processing Center 1

Archive & Distribution

(DAAC 1)

Earth Science Data Systems

DS Mission #1

PO.DAAC

Science Processing Center 2

Archive & Distribution

(DAAC 2)

Distributed Data Analysis

(Subsetting,

Gridding,

Transformation,Modeling)

DS Mission #2

Users

Other Data Sources (e.g. NOAA)

SMAP, Desdyni

Infrastructure to support

Analysis of Distributed Data

patterns in scientific data systems
Patterns in scientific data systems
  • Instrument and Spacecraft Commands
  • Instruments that capture observations
  • Generation of Engineering and Science Data Products
  • Data Processing
  • Data Management
  • Data Distribution
  • Distributed Facilities
  • Data Movement
finding the reference architecture
Finding the reference architecture
  • Simple SOA-style pattern
  • Data/Information Architecture
  • Components, middleware, and communication
  • NOTE: Process is implicit here
ilities in science data systems
Usability

Diversity within the domain

Scalability

Reliability

Portability

NOTE: Our reference architecture must address these ilities long term

“Ilities” in science data systems
specialization within domains
Specialization within domains
  • Domain information models
    • Planetary Science Ontology
    • Cancer Biomarker Ontology
    • Etc
  • Specific services and domain implementations are derived from the reference architecture
    • Reference Architecture->Domain Specific Software Architecture-> Domain Implementations
  • In these science domains, the architectures need to be long-lived (20+ years)
software product lines
Software product lines
  • This is about strategy more than technology
  • Goal is a software product line that
    • Implements our reference architecture
    • Allows for construction of core software components that can be reused across projects and science disciplines
    • Can demonstrate sufficient cost and schedule benefits without sacrificing flexibility in meeting requirements and adapting to technology change
    • Extensions can be applied at the discipline level
object oriented data technology
Object Oriented Data Technology
  • Represents both a reference architecture AND a software product line for science data systems
    • Exploits common patterns
    • Delivers reusable software components as building blocks for construction of higher order data systems
  • Applied to multiple science disciplines
  • Funded originally back in 1998; runner up for NASA Software of the Year in 2003
  • Heavily used by NASA and NIH projects
architectural principles
Architectural principles*
  • Separate the technology and the information architecture
  • Encapsulate the messaging layer to support different messaging implementations
  • Encapsulate individual data systems to hide uniqueness
  • Provide data system location independence
  • Require that communication between distributed systems use metadata
  • Define a model for describing systems and their resources
  • Provide scalability in linking both number of nodes and size of data sets
  • Allow systems using different data dictionaries and metadata implementations to be integrated
  • Leverage existing software, where possible (e.g., open source, etc)`

* Crichton, D, Hughes, J. S, Hyon, J, Kelly, S. “Science Search and Retrieval using XML”,

Proceedings of the 2nd National Conference on Scientific and Technical Data, National Academy of Science, Washington DC, 2000.

architectural focus
Architectural focus
  • Consistent distributed capabilities
    • Resource discovery (data, metadata, services, etc), “grid-ing” loosely coupled science system, workflow management
  • On-demand, shared services (E.g. processing, translation, etc)
    • Processing
    • Translation
  • Deploy high throughput data movement mechanisms
  • End-to-end capabilities across the science environment
  • Reduce local software solutions that do not scale
    • Increasing importance in developing an “enterprise” approach with common services
  • Build value-added services and capabilities on top of the infrastructure
exploiting common patterns
Exploiting common patterns
  • How data is managed (registry/repository, information objects themselves)…
  • How data is generated, captured, etc (e.g., workflow and data processing)…
  • How data is accessed (metadata, data)…
  • How information is discovered …
  • How data is distributed (e.g., transformed)…
  • How data is visualized…
what does oodt do
What does OODT do?
  • Tie together loosely coupled distributed heterogeneous data systems into a virtual data grid
  • Support critical functions
    • Data Production and workflow
    • Data Distribution
    • Data Discovery (including query optimization across highly distributed systems)
    • Data Access
  • An architectural approach first, an implementation second
    • Adapt to different distributed computing deployments
    • Promotes a REST-style architectural pattern for search and retrieval
  • Scalability in linking together large, distributed data sets
oodt data architecture focus
OODT data architecture focus
  • On types of and relationships among a software system’s data
  • Decomposition of data within a software system to its logical components and interactions
      • Components: Data Elements, Data Dictionary, Data Models of individual data sources
      • Interactions: Mappings between Data Dictionary to Data Models, Data Element structural comparison
  • Some standards currently exist for data architecture
    • ISO: ISO-11179 Standardization and Specification of Data Elements
    • Dublin Core Metadata Initiative: Dublin Core Data Elements to describe any electronic resource
  • Specifications for the Data Architecture
    • Common XML schema for managing information about data resources
    • Common XML schema for messaging between distributed services
    • Methods for integrating existing domain models within architecture
oodt data architecture models

nasa.pds.xmlquery

XMLQuery

XMLQuery

1

fromSet

-

-

resultModeId: String

resultModeId: String

-

-

propogationType: String

propogationType: String

QueryElement

QueryElement

selectSet

1

-

-

propogationLevels: String

propogationLevels: String

-

-

role: String

role: String

-

-

maxResults: int

maxResults: int

whereSet

1

-

-

value: String

value: String

-

-

kwqString: String

kwqString: String

-

-

numResults: int

numResults: int

-

-

mimeAccept: List

mimeAccept: List

1

result

1

queryHeader

1

QueryHeader

QueryHeader

-

-

id: String

id: String

QueryResult

QueryResult

1

-

-

title: String

title: String

-

-

list: List

list: List

-

-

description: String

description: String

-

-

type: String

type: String

-

-

statusID: String

statusID: String

-

-

securityType: String

securityType: String

-

-

revisionNote: String

revisionNote: String

-

-

dataDictID: String

dataDictID: String

OODT data architecture models

Based on Dublin Core

Request/Response Model

Resource Metadata Model

Based on ISO/IEC 11179

oodt software components
OODT software components
  • Profile Service – A server-based registry that is able to either serve local XML profiles or plug-into an existing catalog. This component provides resource discovery.
  • Product Service – A server component that plugs into existing repositories and serves products. This includes translation serves, etc
  • Catalog and Archive Service – Transaction-based server that catalogs and archives products providing profile and product servers for discovery and distribution
  • Query Service – Provides query management across distributed services to enable discovery.
distributed architecture
Distributed architecture

1. Science data tools and applications use “APIs” to connect to a virtual data repository

2. Middleware creates the

data grid infrastructure connecting distributed heterogeneous systems and data

3. Repositories for

storing and retrieving

many types of data

Mission

Data

Repositories

OODT

Reusable

Data

Grid

Framework

OODT

API

Visualization Tools

Biomedical

Data

Repositories

OODT

API

Web Search Tools

Engineering

Data

Repositories

OODT

API

Analysis Tools

technology architecture
Technology architecture

Service Registry

Name Server

Name Server

Registry Server

Node 1

Profile Server

WSDL

WSDL

Web I/F

Node 1

Profile Server

Query Integration

Node 1

Profile Server

XML Request

Information Object

Product

Catalogs

XML Request

Repository

Product Server

XML Request

Desktop I/F

Information Object

Information Object

Science Products

XML Request

Repository

Product Server

Info Object

Information Object

Science Products

XML Request

Repository/Archive

Server

  • Common Meta Models for Describing Space Information Objects
  • Common Data Dictionary end-to-end

Science

Products

oodt software implementation
OODT software implementation
  • OODT is Open Source
  • Developed using open source software (i.e. Java/J2EE and XML)
  • Implemented reusable, extensible Java-based software components
    • Core software for building and connecting data management systems
  • Provided messaging as a “plug-in” component that can be replaced independent of the other core components. Messaging components include:
    • CORBA, Java RMI, JXTA, Web Services, etc
    • REST seems to have prevailed
  • Provided client APIs in Java, C++, HTTP, Python, IDL
  • Simple installation on a variety of platforms (Windows, Unix, Mac OS X, etc)
  • Used international data architecture standards
    • ISO/IEC 11179 – Specification and Standardization of Data Elements
    • Dublin Core Metadata Initiative
    • W3C’s Resource Description Framework (RDF) from Semantic Web Community
edrn knowledge environment
EDRN Knowledge Environment
  • EDRN has been a pioneer in the use of informatics technologies to support biomarker research
  • EDRN has developed a comprehensive infrastructure to support biomarker data management across EDRN’s distributed cancer centers
    • Twelve institutions are sharing data
    • Same architectural framework as planetary science
  • It supports capture and access to a diverse set of information and results
    • Biomarkers
    • Proteomics
    • Biospecimens
    • Various technologies and data products (image, micro-satellite, …)
    • Study Management
application to planetary science
Application to planetary science
  • Often unique, one of a kind missions
    • Can drive technological changes
  • Instruments are competed and developed by academic, industry and industrial partners
    • Highly distributed acquisition and processing across partner organizations
    • Highly diverse data sets given heterogeneity of the instruments and the targets (i.e. solar system)
  • Missions are required to share science data results with the research community requiring:
    • Common domain information model used to drive system implementations
    • Expert scientific help to the user community on using the data
    • Peer-review of data results to ensure quality
    • Distribution of data to the community
  • Planetary science data from NASA (and some international) missions is deposited into the Planetary Data System
earth science data systems38
Earth Science Data Systems

Other Data Systems

Web Portal

Distributed Data Analysis

Airborne

Instruments

Data Production/Processing

Data Integration

Data Acquisition/Ingestion

Catalogs

Multi-mission

Policies &

Rules

Local Storage

(Models, Data, etc)

(Testbed and Operational Deployed

Environments)

Surface Instruments

Special Product

Processing Environment /

Computational Infra

Modeling

and Visualization

Facility

application to climate research
Application to Climate Research
  • Highly distributed modeling and observational systems
  • Heterogeneous implementations
  • Different purposes
  • But, brought together as a virtual system, provides new science discovery opportunities

(Models)

(Observations)

lessons learned
Lessons Learned
  • A reference architecture is critical for driving a strategy and support large-scale/enterprise systems
    • However, limited experience in organizations to build reference architectures
    • Useful ways to represent the architecture can be tough!
    • How detailed to make the reference architecture is an art! (Don’t let the implementation drive the RA)
  • Products lines are useful to providing reusable components based on the reference architecture
more lessons learned
More Lessons Learned….
  • Distributed service architectures
    • Not anything new (my experience with them goes back to the early 1990s)
    • But, often, newer technologies and approaches are seen as a panacea
  • Technology is not a replacement for a conceptual architecture
    • My experience is that definition of the architecture independent of technology is critical
    • The goal should be stability in the architecture model; the selection of appropriate technology will change over time
    • This is why an architect is much more of a strategist than a technologist
final thoughts
Final Thoughts
  • Software architecture in science is critical to
    • Reducing cost of building science data systems
    • Building virtual organizations
    • Constructing software product lines
    • Driving standards
    • Supporting new paradigms in mission operations and scientific research
  • Science is still learning how to best leverage technology in a collaborative discovery environment, but significant progress is being made!
resources
Resources
  • (1) Tracz, Will. Domain-Specific Software Architecture. ACM SIGSOFT, 1995.
  • (2) D. Crichton, S. Kelly, C. Mattmann, Q. Xiao, J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey, S. Srivastava, L. Esserman, and B. Bigbee. A Distributed Information Services Architecture to Support Biomarker Discovery in Early Detection of Cancer. In Proceedings of the 2nd IEEE International Conference on e-Science and Grid Computing, pp. 44, Amsterdam, the Netherlands, December 4th-6th, 2006.
  • (3) C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), pp. 721-730, Shanghai, China, May 20th-28th, 2006.
edrn s ontology model
EDRN’s Ontology Model
  • EDRN has developed a High level ontology model for biomarker research which provides standards for the capture of biomarker information across the enterprise
  • Specific models are derived from this high level model
    • Model of biospecimens
    • Model for each class of science data
  • EDRN is specifically focusing on a granular model for annotating biomarkers, studies and scientific results
  • EDRN has a set of EDRN Common Data Elements which is used to provide standard data elements and values for the capture and exchange of data

EDRN CDE Tools

EDRN Biomarker Ontology Model