1.29k likes | 1.46k Views
Building a caGrid Node. Getting Started. This course is self-paced, but is also hands-on. Therefore, you will need to download an accompanying code base to complete the exercises associated with this course. The code base can be downloaded from:
E N D
Getting Started • This course is self-paced, but is also hands-on. Therefore, you will need to download an accompanying code base to complete the exercises associated with this course. • The code base can be downloaded from: • http://pir.georgetown.edu/~suzek/caBIG_Bootcamp_July2008/gridPIR_Training.zip • From the download: • 1) Copy gridPIR_Training.zip to "C:\“ on your machine • 2) Unzip gridPIR_Training.zip (overwrite existing files if needed) • ** Do not proceed with this course, until you have downloaded and unzipped the above files. You will not be able to complete the exercises without this zip file.**
Session Goals By the end of this session, you will be able to: • Use Introduce to create a node on the Grid • More specifically: • Install a public data service on caGrid (no security) • Deploy a data service using caCORE SDK-generated artifacts and the Introduce toolkit • Use caGrid Data Services • Use caGrid Metadata Service APIs • Use caGrid for Semantic Interoperability
Lessons • Lesson 1: Installing caGrid for Data Service Deployment • Lesson 2: Deploying a caGrid Data Service • Lesson 3: Using caGrid Data Services • Lesson 4: Using caGrid Metadata Service APIs • Lesson 5: Using caGrid for Semantic Interoperability
Lesson 1: Installing caGrid for Data Service DeploymentOverview • In this Lesson, we will cover: • caGrid Overview • Steps involved in caGrid installation • There will be an exercise to install caGrid 1.2 on a Windows machine
Lesson 1: Installing caGrid for Data Service Deployment Outline • Overview • caGrid • caGrid Infrastructure • Step-by-step caGrid Installation for Data Service Deployment
Lesson 1: Installing caGrid for Data Service Deployment What is caGrid? Development project of Architecture Workspace Service oriented infrastructure that supports caBIG An architecture that allows building a grid of your own Enables collaborating institutions to share information and analytical resources efficiently and securely
Lesson 1: Installing caGrid for Data Service Deployment caGrid Community Involvement caGrid itself provides no real “data” or “analysis” to caBIG™; its the enabling infrastructure which allows the community to develop Analytical Services Data Services Community members add value to the grid as applications, services (data/analytical), and processes caGrid provides the necessary core services, APIs, and tooling Community members develop end user applications/clients which consume the resources provided on the grid
Lesson 1: Installing caGrid for Data Service Deployment caGrid Infrastructure Client and service APIs are object oriented, and operate over well-defined and curated data types Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) Object definitions are drawn from controlled terminology and the vocabulary is registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described Objectsare serialized to XML that adhere to XML schemas registered in the Global Model Exchange (GME)
Lesson 1: Installing caGrid for Data Service Deployment caGrid Infrastructure – cont’d • Service and the hosting center metadata is registered in Index Service
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Before starting • Dowload caGrid 1.2 Installer http://gforge.nci.nih.gov/frs/download.php/3738/caGrid-installer-1.2.zip • Setting environment variables • JAVA_HOME : Location of Java JDK 1.5.X • ANT_HOME: Location of Ant 1.6.5 • CATALINA_HOME: Location of Tomcat ver. 5.0.28 • GLOBUS_LOCATION: Location of Globus Toolkit ver. 4.0.3 If not available, caGrid Installer installs Ant, Globus Toolkit and/or Tomcat • Unzip caGrid-installer-1.2.zip • Run caGrid installer: • java -jar caGrid-installer-1.2.jar
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: License Agreement
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Installation Types • Choose any combination of installation types to install one or more caGrid components • For data service deployment, choose options “Install caGrid” and “Configure Container” • Not a “secure” installation since our data service is public; additional steps such as securing container is required for a secure caGrid node
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Service Container • Choose Tomcat or Globus as service containers
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Prerequisites Ant Tomcat Globus Toolkit • Install (or reinstall) prerequisite software
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Location • Provide the directory where caGrid will be installed
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Target Grid • Choose one of the available grids: • NCICB Development • NCICB Production • NCICB QA • OSU Development • OSU Training • and more Each target grid basically uses different URLs for caGrid core services. For instance service URLs for OSU Training Grid are: cagrid.master.index.service.url=http://training03.cagrid.org:6080/wsrf/services/DefaultIndexService cagrid.master.cadsr.service.url=http://training02.cagrid.org:6080/wsrf/services/cagrid/CaDSRService cagrid.master.gme.service.url=http://training02.cagrid.org:6080/wsrf/services/cagrid/GlobalModelExchange cagrid.master.gridgrouper.service.url=https://training03.cagrid.org:6443/wsrf/services/cagrid/GridGrouper cagrid.master.dorian.service.url=https://dorian.cagrid.org:6443/wsrf/services/cagrid/Dorian
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Container Configuration • Securing container is needed to host secure services. • Secure services are those that require clients to use one of the Globus Security Infrastructure (GSI) authentication mechanisms.
Lesson 1: Installing caGrid for Data Service Deployment caGrid Installation: Completion
Lesson 1: Installing caGrid for Data Service Deployment Additional Information • caGrid Wiki: • http://www.cagrid.org/mwiki/index.php?title=CaGrid • caBIG™ Architecture WS caGrid Web Page: • https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/
Lesson 1: Installing caGrid for Data Service Deployment Exercise • Install a caGrid 1.2 on your local machine using instructions locatedat: https://gforge.nci.nih.gov/docman/view.php/196/13043/01_Installing_caGrid.doc
Lesson 2: Deploying a caGrid Data ServiceOverview • In this Lesson, we will cover: • Steps involved in deployment/creation of caGrid Data Service using caCORE SDK 3.2.1 generated artifacts • Selecting type of service and template • Selecting domain model • Selecting the schema • Providing metadata (hosting site/POC etc) • Deploying the service • There will be exercises to create and deploy gridPIR on local caGrid node
Lesson 2: Deploying a caGrid Data Service Outline • Overview • Major steps for deployment • Introduce Toolkit • Step-by-step deployment of a Data Service; gridPIR
Lesson 2: Deploying a caGrid Data Service caGrid Data Service Deployment – Major steps Provide client and service APIs that are object oriented Provide objects that are defined in UML and registered in the Cancer Data Standards Repository (caDSR) Provide object definitions drawn from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS) Provide XML schemas used for XML serialization of objects that are registered in the Global Model Exchange (GME)
Lesson 2: Deploying a caGrid Data Service caGrid Data Service Deployment – Major steps • Provide service metadata about the service and the center where service is deployed
Lesson 2: Deploying a caGrid Data Service Service Metadata (Domain Model Portion) <ns135:UMLAttributedataTypeName="CHARACTER" description="UniProtKB primary accession number." name="uniprotkbPrimaryAccession" publicID="2322254" version="1.0"> <ns135:SemanticMetadata conceptCode="C25402" conceptDefinition="A control number unique …..” conceptName="Accession Number" order="1"/> ….. <ns135:ValueDomain longName="Protein UniProtKB Primary Accession Number Genomic Identifier"> <ns135:enumerationCollection/> </ns135:ValueDomain> </ns135:UMLAttribute>
Lesson 2: Deploying a caGrid Data Service Introduce: Grid Service Authoring Toolkit • An open-source and extensible toolkit • Supports easy development and deployment of WS/WSRF compliant Grid services by hiding low level details of the Globus Toolkit • Enables the implementation of strongly-typed Grid services • Facilitates caGrid data service development using caCORE SDK artifacts through pluggable service styles
Lesson 2: Deploying a caGrid Data Service Grid-enablement of Protein Information Resource (gridPIR) • A data service to provide comprehensive and fully annotated protein related information for genomic and proteomic cancer research • Developed using model driven approach and caCORE SDK 3.2.1 • All data is public so no security layer implemented
Lesson 2: Deploying a caGrid Data Service Introduce: Create a caGrid Service ant introduce Modify an existing service Deploy an existing service Browse Data Types from caDSR or GME
Lesson 2: Deploying a caGrid Data Service Introduce: Enter service information • An analytical service exposes operation(s) with input/output objects • A data service exposes objects that presents the data resource
Lesson 2: Deploying a caGrid Data Service Introduce: Data Service Configuration Different Service Styles (including caCORE SDK) supported. gridPIR is generated using caCORE SDK v3.2.1 Optional extensions for Bulk Data Transfer or Web Services Enumeration
Lesson 2: Deploying a caGrid Data Service Introduce: caCORE SDK-generated Client Selection Two options for client selection: Option 1:Use remote API if data service caCORE-like system (API) and caGrid Data Service are on the different machines Option 2:Use local API if both caCORE-like system (API) and caGrid Data Service are deployed on the same machine
Lesson 2: Deploying a caGrid Data Service Introduce: Remote API Selection (Option 1) Library folder (including client jar) generated by caCORE SDK
Lesson 2: Deploying a caGrid Data Service Introduce: Remote API Selection(Option 1) (cont’d) Treat all queries case-insensitive Use Common Security Module Enter URL for remote caCORE-like gridPIR API (publicly accessible)
Lesson 2: Deploying a caGrid Data Service Introduce: Local API Selection (Option 2) Library (including client jar) and configuration folders are generated by caCORE SDK
Lesson 2: Deploying a caGrid Data Service Introduce: Local API Selection (Option 2)(cont’d) Treat all queries case-insensitive
Lesson 2: Deploying a caGrid Data Service Introduce: Choose objects (model) service exposes 1. Fetch models from caDSR 4. Add selected packages 2. Select gridPIR model v1.2 3. Select package from gridPIR model
Lesson 2: Deploying a caGrid Data Service Introduce: Choose XML Schema Find schemas from GME (if registered) OR Resolve schemas manually
Lesson 2: Deploying a caGrid Data Service Introduce: Choose XML Schema – Manual Resolution (cont’d) XSD generated by caCORE SDK
Lesson 2: Deploying a caGrid Data Service Introduce: Enter Service Description 1. Select Metadata Tab 2. Select ServiceMetadata row 3. Edit Property
Lesson 2: Deploying a caGrid Data Service Introduce: Enter Service Metadata (cont’d) • Enter: • POC • Hosting Center • Address
Lesson 2: Deploying a caGrid Data Service Introduce: Deploy gridPIR Data Service Deploy an existing service
Lesson 2: Deploying a caGrid Data Service Introduce: Select Data Service Location in the File System Compiled service stubs Metadata files Library files XML schemas Source code for service stubs
Lesson 2: Deploying a caGrid Data Service Introduce: Select Data Service Location in the File System Container information Register to Index Service? URL for Index Service
Lesson 2: Deploying a caGrid Data Service Verify Deployment URL for deployed service
Lesson 2: Deploying a caGrid Data Service Additional Information • caGrid Wiki: • http://www.cagrid.org/mwiki/index.php?title=CaGrid • Introduce Toolkit Wiki: • http://www.cagrid.org/mwiki/index.php?title=Introduce • caGrid Data Services Wiki: • http://www.cagrid.org/mwiki/index.php?title=Data_Services • caBIG™ Architecture WS caGrid Web Page: • https://cabig.nci.nih.gov/workspaces/Architecture/caGrid/
Lesson 2: Deploying a caGrid Data Service Exercise • Deploy a caGrid data service for gridPIR on your local machine using instructions located at: https://gforge.nci.nih.gov/docman/view.php/196/13044/02_Creating_caGrid_Node.doc and https://gforge.nci.nih.gov/docman/view.php/196/13045/03_Deploying_caGrid_Node.doc
Lesson 3: Using caGrid Data ServicesOverview • In this Lesson, we will cover: • Ways to query data services • Creating and running queries in caBIG Query Language (CQL) • There will be exercise to create and run CQL queries using command line against gridPIR production.