370 likes | 536 Views
CLASS Presentation to the NOAA Science Advisory Board’s Data Archiving and Access Requirements Working Group Robert Rank CLASS New Campaigns Manager May 24-25, 2007. Objectives. Review CLASS mission, role, drivers, challenges What is an ‘Archive’ Open Archive Information System (OAIS)
E N D
CLASS Presentation to the NOAA Science Advisory Board’s Data Archiving and Access Requirements Working Group Robert Rank CLASS New Campaigns Manager May 24-25, 2007 DAARWG _ May 24-25, 2007
Objectives Review CLASS mission, role, drivers, challenges • What is an ‘Archive’ • Open Archive Information System (OAIS) • CLASS Architectural Transitions • Discuss GEO-IDE/CLASS joint efforts • CLASS Campaign Status • NODC IOC/FOC • CLASS APIs • NPP DDR • Open discussion DAARWG _ May 24-25, 2007
CLASS Mission In its simplest form, CLASS’s mission is to provide IT infrastructure and support for NOAA archives. DAARWG _ May 24-25, 2007
What is an “Archive?” • Historical definition: roughly “record preservation” • Semantics, functions varied significantly across “archives” • Digital information preservation is driving changes • Need for long-term understandability/usability • “Data” vs. “Information” • Modern definition: OAIS-RM • “… an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community.” DAARWG _ May 24-25, 2007
OAIS-RM • International standard reference model for “open information archives” • Reference in: • Report to Congress, CLASS L1Rs, ARWG-A&A Reqs • Initially inspired by need to establish common framework for discussions between archival entities • Provides • Terminology and concepts • Guidelines • Typical functional entities and services • Information taxonomy • “Mandatory responsibilities” • Core ‘Definition of Archive’ DAARWG _ May 24-25, 2007
OAIS Mandatory Responsibilities • Negotiate for and accept appropriate information from Producers • Obtain sufficient control of the information to ensure Long-Term Preservation • Determine the Designated Community • Ensure Independent Understandability • Follow documented policies and procedures for preservation • Make information available to the Designated Community DAARWG _ May 24-25, 2007
CLASS/Archive Relationship • Information Preservation – (Requirements definition) • Archive “performs” • CLASS “provides IT capabilities in support of” • Expertise • Archive: information structure, semantics, usage; science • CLASS: IT • Focus • Archive: information • CLASS: data (bits) • Stewardship • Archive: science data stewardship - (Requirements development) • CLASS: data stewardship • Key takeaway: CLASS is not (nor does it “have”) an archive DAARWG _ May 24-25, 2007
CLASS Mission Revisited CLASS’s mission is to provide IT infrastructure and support that enable NOAA archives to implement OAISs. DAARWG _ May 24-25, 2007
OAIS-RM Functional Entities in CLASS Context DAARWG _ May 24-25, 2007
Selected CLASS Attributes • Vision • “… An enterprise-wide IT system supporting long-term, secure storage of and access to NOAA’s archived environmental datasets” – CLASS L1Rs • Scope • “… NOAA (digital spatial environmental) data • “… capable of supporting both existing and new archive collections …” – CLASS L1Rs • Applicability • “NOAA has directed that both legacy and emerging environmental observing systems requiring long-term archive plan to use CLASS.” – CLASS L1Rs • Customers - (Requirements development) • NOAA archives (e.g., NNDCs & Centers of Data) are CLASS’s direct customers • Consumers and Producers are archives’ direct customers DAARWG _ May 24-25, 2007
Key CLASS Challenges • NOAA roles, responsibilities • Identification and allocation of all archival mission roles and responsibilities, operations • Essential to CLASS success, both practice and perception • NOAA integration/interoperation – GEO/IDE • Data heterogeneity/specificity • Large data volumes • Data management • Long-term data preservation • Complex, evolving problem • Currently in a developmental, research stage • Widely divergent user needs, requirements • Evolving from legacy system DAARWG _ May 24-25, 2007
Addressing the Challenges - Internal • Emphasize generality, scalability, flexibility • Depend on standards wherever possible • Data models • Ontologies • Provide infrastructure customizable by external entities • Customer Interface for Access • Software must support hardware refresh transparently • Abstract hardware in software • Layered architecture • Track preservation-related best-practices and similar archival projects DAARWG _ May 24-25, 2007
Addressing the Challenges - External • Continue to push for clarification of roles and responsibilities • Archive ConOps is key • Continue to stress clear, comprehensive requirements • L1Rs have helped enormously • Push for L2, L*, requirements • ARWG – Archive & Access Requirements - V2.2 • Are they complete? Do they need to be revised? • Transparency • Early, strong commitment to GEO-IDE DAARWG _ May 24-25, 2007
CLASS’s SOA • Consistency with FEAF/NOAA EA • Facilitates interoperability & provides public interfaces • Enables custom “client” applications • Reduces pressure to be all things to all people • Foundation for distributed infrastructure • Facilitates evolution from “as-is” system • Facilitates cost-effective extensibility • Primary services • Externally visible • Ingest • Access • Internally visible • Archival Storage • Data Management DAARWG _ May 24-25, 2007
CLASS Architectural Transitions - Internal • Transition to SOA • Wrap existing functionality in services • Refactor behind-the-scenes later • Transition to layered architecture • Hardware abstraction • Distributed infrastructure • Continue to emphasize, expand componentization • Tactics • Prototype services and vet internally • Incrementally redesign subsystems • Increase use of standards (formats, code, terminology, etc.) • Decouple CLASS Web Interface from CLASS internals DAARWG _ May 24-25, 2007
CLASS Architectural Transitions - External • Track industry best-practices and others’ lessons-learned • Work with GEO-IDE to • Specify services, interfaces, standards • Vet CLASS-proposed services, interfaces, standards • Publish services gradually, starting with friendly users • … DAARWG _ May 24-25, 2007
Keys to NOAA Integration Efforts • Standard terminology, so people can exchange information efficiently • Interfaces and APIs, so systems know how to communicate • Standards, so data can be exchanged • GEO-IDE partnerships development are the Key • Specification of roles and responsibilities DAARWG _ May 24-25, 2007
Where Should CLASS Participate in Pilots with GEO-IDE? • Services and signatures • Spatial operations • Structural Data Types • Metadata • Standards • Development of guidelines • Early visibility and awareness helps us • Interoperability • WHO do we need to interoperate with? DAARWG _ May 24-25, 2007
Priorities for CLASS/GEO-IDE Relationship • Start interactions now • CLASS already moving forward • Interactions now will help avoid re-work • Identify contacts and conduits • Identify joint efforts and pilot projects • APIs – Prototype now • NODC IOC/FOC • Agree on priorities, schedule • Initiate joint activities DAARWG _ May 24-25, 2007
CLASS NODC IOC Campaign Status NODC IOC – Five (5) Operational Threads – June 07, 2007 • Ingest Operations – The thread begins when the NODC SIP arrives at CLASS. • Dissemination Operations – Two separate sub-threads are considered, depending on the restriction level of an AIP. • Data Update – This thread begins when NODC requests its data from CLASS for purposes of updating its data • Data Integrity Check Processing – This thread is triggered by a schedule for integrity checks on the stored data • Restriction Level Reset – This thread begins with the receipt of a restriction level reset request message from NOD DAARWG _ May 24-25, 2007
CLASS NODC FOC Campaign Status NODC FOC implementation steps – FY07-08 • The Archive Requirements Working Group (ARWG) and CLASS will review the existing Submission Agreement (SA) templates, and will define additional templates for different types of data (e.g., non-periodic data, historical data) • Develop one or more SAs for NODC data as needed • Develop the NODC-CLASS Interface Control Document (ICD) • Evaluate the IOC and improve it if necessary • Resolve the technical issues regarding deletion and versioning of NODC data • Define the FOC requirements • Define steps for transition from the IOC to FOC • Implement the FOC • Transfer all NODC data to CLASS for long-term storage DAARWG _ May 24-25, 2007
CLASS API prototype background • NGDC has a strong interest in a CLASS API in order to fully integrate CLASS within the center • NGDC has extensive experience in API development through SPIDR, SABR and ESG systems. • August of 2006 Users workshop showed a strong user interest in CLASS API’s as well • An initial API un-veiled and discussed at the Asheville workshop (CLASS, SABR, SPIDR, etc.) DAARWG _ May 24-25, 2007
CLASS API Goals • First draft of a user focused WS interface • Demonstration of the concept of “fundamental separation” of archive and storage from access • Interaction with and demonstration for users • Technology discovery and evaluation of cutting edge tools for CLASS • First integration of multiple data types through CLASS (time-series, grid, swath, etc..) DAARWG _ May 24-25, 2007
CLASS –NGDC Prototype Scope DAARWG _ May 24-25, 2007
Current snapshot of the CLASS API architecture DAARWG _ May 24-25, 2007
CLASS NPP Campaign DDR Deliverables • Updated Documents • CLASS-NPP Submission Agreement • Software Description Document • Allocated Requirements with NPP Requirements • Review Item Discrepancy (RID) form • New Documents • NPP Delta Design Review Organization Note • Software Upgrade Plan - (Gap Analysis) • Hardware Upgrade Plan for NPP • Network Upgrade Plan for NPP • Prioritization Policies and Procedures Document • CLASS Load Test Plan for NPP • Performance Benchmark Technical Report • DDR Presentation Slides DAARWG _ May 24-25, 2007
CLASS NPP Campaign Status The CLASS-NPP Delta Design Review (DDR) Scheduled for June 21-22, ’07 at NSOF, Suitland, Md. DAARWG Membership Invited DAARWG _ May 24-25, 2007
Discussion? Thank you! DAARWG _ May 24-25, 2007
Background Slides DAARWG _ May 24-25, 2007
Selected CLASS Bounds/Assertions • CLASS is not an archive/OAIS • OAIS-RM is important for CLASS, but CLASS does not (and can not) conform/comply with it • CLASS does not have a science mission • “Data producers and data centers are responsible for the science data stewardship missions and the development and maintenance of science data stewardship data, information, and metadata.” – CLASS L1Rs • Ramifications for data specificity problem • CLASS is an extant operational system • Future versions of CLASS will be evolutions • The as-is CLASS system was developed for a very different set of requirements than those which now exist DAARWG _ May 24-25, 2007
Selected CLASS Architectural Drivers • Requirements: L1, L* (future), A&A v. 2.2, system and allocated • As-is system • New campaigns • Data heterogeneity & volumes • Constant change • Long-term mission • OAIS-RM • GEO-IDE DAARWG _ May 24-25, 2007
Selected CLASS “ilities” • Flexibility adaptation to change • Generality support variety in data types, users needs, etc. • Scalability support increasing data volumes and user activity • Interoperability fundamental to NOAA, user community • Security essential aspect of any generally-accessible IT system • Reliability essential to long-term secure storage mission • Maintainability and evolvability address long-term mission and change • Openness and standards conformance support interoperability, usability • Modularity and layering promote flexibility and maintainability • Heterogeneity provide flexibility, cost reduction alternatives DAARWG _ May 24-25, 2007
Flexibility is Essential • CLASS’s environment is characterized by change • Emphasis on evolution, evolvability • What can be done, not what must be done • Example: support multiple nodes • Example: support small-footprint service deployment • Key goal: provide options to CLASS PM DAARWG _ May 24-25, 2007
CLASS Long-Term System Architecture Status • Documents • Long-Term System Architecture Overview - done • To-Be System Architecture Overview – in progress, end of 2007 • Long-Term System Architecture Transition Plan • Long-Term System Architecture Reference Manual • Work in progress • Service decomposition • Interface development • Data specificity approaches workable for both CLASS, NOAA • Infusing LTSA thinking into CLASS redesigns DAARWG _ May 24-25, 2007
Service-Oriented Architecture • Design style used throughout all aspects of creating and using business services • Defines the ways in which services are deployed and managed • Increases reuse • Lowers overall costs • Improves extensibility • Maps easily and directly to a business’s operational processes • Supports a better division of labor between IT and business personnel • Uses description model capable of unifying new and old IT systems • Most important application is connecting the various operational systems that automate an enterprise’s business processes • For CLASS • Internally: connecting Ingest, Archival Storage, Data Management, and Access • Externally: enabling participation in NOAA SOA; interoperability with other NOAA systems • For NOAA, connecting new and legacy IT systems (including CLASS) • Facilitates composition of services across disparate pieces of software, whether old or new; inter- or intra-enterprise; and regardless of platform DAARWG _ May 24-25, 2007
Archive ConOps • Identification and allocation of all archival mission roles and responsibilities, operations • Essential to CLASS success, both practice and perception • Example: CLASS strategy for dealing with data specificity needs to be feasible within NOAA DAARWG _ May 24-25, 2007
Strawman Process for Breaking Down Stovepipes • Extra-CLASS • Decide which stovepipes are to be transitioned into CLASS • Analysis • Analyze stovepipe holdings and capabilities • Assess impacts on CLASS for subsuming stovepipe holdings • Draft Submission Agreement and ICD • IOC • Demonstrate ingest and rudimentary access for sample of stovepipe holdings • Poll stovepipe users regarding UI issues • “Historical ingest” campaign • Finalize Submission Agreement and ICD • Extend CLASS-written UI as needed, or …. • … develop new stand-alone UI that duplicates the look-and-feel of the old stovepipe’s interface, but uses CLASS services • Initiate ingest DAARWG _ May 24-25, 2007