1 / 23

geWorkbench caGrid TeraGrid Integration

geWorkbench caGrid TeraGrid Integration. Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2 caBIG Architecture Face-to-Face Salt Lake City, UT January 2008. Agenda. Overview (5 min) Introduction on TeraGrid Workgroup  

betty
Download Presentation

geWorkbench caGrid TeraGrid Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. geWorkbenchcaGrid TeraGridIntegration Scott Oster Ohio State University – Dept. of Biomedical Informatics Christine Hung Columbia University – JCSB/C2B2 caBIG Architecture Face-to-Face Salt Lake City, UT January 2008

  2. Agenda • Overview (5 min) • Introduction on TeraGrid Workgroup   • Background on geWorkbench and geWorkbench/caGrid/TeraGrid Project • Technology (10 min) • Steps to establishing geWorkbench/caGrid/TeraGrid Interface • Use of caGrid Security (GTS, Grid Grouper, Dorian, CDS) • Workflow and communications between services • Demo (5 min) • Discussion (5 min)

  3. Team Members • geWorkbench (Columbia University) • Christine Hung • Kiran Keshav • caGrid (Ohio State University) • Scott Oster • Stephen Langella • caGrid/TeraGrid (Argonne National Laboratory) • Ravi Madduri • TeraGrid (Argonne National Laboratory) • Stuart Martin • Management • Aris Floratos (Columbia University) • Krishnakant Shanbhag (Argonne National Laboratory) • Michael Keller (Booz Allen Hamilton) • Patrick McConnell (Duke University) • Nancy Wilkins-Diehr (San Diego Supercomputer Center)

  4. Overview • Primary problem to address • Lack of infrastructure and operating procedures to support high performance computing needs of caBIG • Overarching goals • Regular caGrid services will run as caGrid/TeraGrid gateways services • Virtualize TeraGrid resources (both compute and storage) • Approach: labor divided between domain and technical tasks • Use cases will be drafted to identify the needs of the community • Existing TeraGrid Gateway projects will be surveyed to identify lessons learned and potential technology for reuse • Demonstrate approach through working prototype • Document best practices and develop “cookbook”

  5. TeraGrid Overview “TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource.” • Characteristics: • > 250 teraflops of computing capability • >30 petabytes of online and archival data storage • high-performance networks • Mechanics: • Prospective users request allocation of HPC resources to a review committee • Allocations are granted, and credentials are issued • Jobs are run with credentials and resource usage is billed to the allocation

  6. caGrid Gateway Service Overview • caGrid service running in the caBIG™ environment which acts as a bridge or proxy to TeraGrid resources for a subset of caBIG™ users • should meet Gold compatibility requirements • Created for a specific scientific scenario: • abstracts away the details of leveraging TeraGrid for performance intensive operations • uses domain-specific operations and data types • has access to TeraGrid allocation • Alleviates the need for caBIG™ users to: • understand the complexities of TeraGrid (or HPC systems) • obtain TeraGrid accounts/allocations

  7. geWorkbench – a Platform for Integrated Genomics • Integrated genomics analysis application • Support for gene expression data, sequences, pathways, and structure • 50+ visualization and analysis modules • Access to local and remote data sources and analytical services • Integration with biological annotation sources • Development Platform • Open source • Java based • Component architecture • Facilitates customization

  8. geWorkbench – a Platform for Integrated Genomics • Large collection of components • Data parsers: Affy MAS/GCOS (txt and CEL), Genepix, RMA, FASTA, caArray, PDB. • Data Management: Project folders, marker/sequence/array groups. • Visualization: Dendrograms, color mosaics, scatter plots, SOM clusters, BLAST results, dot matrices. • Analyses: Hierarchical clustering, t-test, SVM, ARACNE, MEDUSA. MatrixREDUCE. • 3rd Party components: Cytoscape, GoMiner, GeneWays, GenePattern, MEV. • Complete list at www.geworkbench.org.

  9. geWorkbench – a Platform for Integrated Genomics http://www.geworkbench.org/

  10. geWorkbench – Graphical User Interface Projects Area Visualization Area Selection Area Command Area

  11. Clustering

  12. caGrid Service

  13. TeraGrid Aware caGrid Service

  14. Creating the Gateway Service • Manually stage the binary (jar file) on TeraGrid • Takes in .ser files as input • Produces results also in a .ser file • Used the RAVi plugin for Introduce to create the gateway service • http://www-unix.mcs.anl.gov/~neillm/ravi/ • Gateway gridFTPs input data and parameters from geWorkbench to TeraGrid • geWorkbench passes input to the gateway in geWorkbench’s native format (caDSR compliant) • Gateway serializes the input before gridFTPing to TeraGrid • Gateway invokes the staged binary • Gateway gridFTPs results back to geWorkbench • Gateway deserializes the result file • Gateway returns results to geWorkbench in its native format • Gateway service is a secured caGrid service which in turn invokes TeraGrid with a caBIG community account

  15. Steps to establishing geWorkbench/caGrid/TeraGrid Interface

  16. caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  17. Workflow and Communications Between Services

  18. Special Thanks • caGrid (Security Services) • Scott Oster • Stephen Langella • caGrid(RAVi Plugin, Gateway Service) • Ravi Madduri

  19. Demo and Discussions

  20. Steps to establishing geWorkbench/caGrid/TeraGrid Interface

  21. caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  22. caGrid Security (GTS, Grid Grouper, Dorian, CDS) http://www.cagrid.org/mwiki/index.php?title=GAARDS:Main

  23. caGrid Security (GTS, Grid Grouper, Dorian, CDS)

More Related