1 / 16

GridBLAST: HighThroughput BLAST on the Grid

GridBLAST: HighThroughput BLAST on the Grid. Arun Krishnan Projects Leader, HPC BII, Singapore arun@bii.a-star.edu.sg http://www.bii.a-star.edu.sg/~arun http://gridblast.bii.a-star.edu.sg. Agenda . GridBLAST Architecture Results Demo Other Projects inGRD: Grid Resource Discovery

hastin
Download Presentation

GridBLAST: HighThroughput BLAST on the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridBLAST: HighThroughput BLAST on the Grid Arun Krishnan Projects Leader, HPC BII, Singapore arun@bii.a-star.edu.sg http://www.bii.a-star.edu.sg/~arun http://gridblast.bii.a-star.edu.sg

  2. Agenda • GridBLAST • Architecture • Results • Demo • Other Projects • inGRD: Grid Resource Discovery • GridX: Meta-Scheduler for the Grid • Miscellaneous

  3. Local Node Remote Node Remote Node Grid SPMD Architecture Head (Initiating) Node GridBLAST Globus Middleware Results Results local remote Remote Script Remote Script Executables, Databases, Input Files Remote Grid Nodes

  4. NETWORK B NODE 2 NODE 3 ROUTER NODE 1 NETWORK A backbone ROUTER ROUTER Mini-Grid

  5. GridBLAST Main Script Flow Sheet Split Queries Into Multiple Files Static Scheduling Of Queries Tar & Zip Executables, Databases & Query Files Spawn Separate Threads of Execution For Each Remote Node Spawn jobs Using Globusrun Spawn jobs Using Globusrun Spawn jobs Using Globusrun Gather Results & Cleanup

  6. Remote Node Script File Open GASS Server Copy Files,Executables, Databases from Initiating Node Untar & Unzip Blast Executables, Databases & Query Files No Single Processor? Yes Spawn Scatter Server Spawn Blastall Jobs Spawn Scatter Clients using Local Job Managers Queries Done? No Work-queue Scheduler Distributes Queries Yes No Queries Done? Yes Tar Results File and Copy to Initiating Node Cleanup Temp Directories/ Files and Exit

  7. Bound on communication time • A bound on the communication time corresponding to the maximum parallel execution time, for an SPMD type grid application is given by T_comm_Max Bound Speedup Speedup = (TG/TL) Normalized Bound\Comm. Time Problem Size (# of Queries) • The minmax problem can be formulated as: TC_Max_Prop Bound_Prop Tc_Max_Minmax Bound_Minmax Normalized Bound\Comm. Time Problem Size (# of Queries)

  8. SpeedupMinmax Speedup_Prop Minmax Scheduler Speedup for two different schemes Query Distribution across nodes for two different schemes Prop: Node1 Prop: Node2, Node3 Minmax: Node1 Minmax: Node2 Minmax: Node3 Speedup = (TG/TL) Queries/Node Problem Size (# of Queries) Problem Size (# of Queries)

  9. Experimental Results

  10. inGRD: Inter Network Grid Resource Discovery

  11. Why inGRD? • Inconsistency in information that MDS can provide. Dependent on Globus GIIS/GRIS configuration by Grid Administrators. • Does not require further installation of sensors on every compute node within a grid node. Makes use of readily available resource information collected by the job managers. • Pre-formatted data on Grid nodes enable faster request, collection and processing of large amounts of data.

  12. inGRD overview • inGRD sensors are installed on Grid nodes to collect available resource information from their compute cluster. • inGRD client applications facilitate the submission of requests and collection of responses from the inGRD enabled Grid nodes. • Results are represented as a single XML document.

  13. GridX: Meta-scheduler for the Grid

  14. GridX: Metascheduler for the Grid • Metascheduler for scheduling jobs in a grid framework • Will provide a user-friendly interface for grid users to submit jobs • Provides Grid resources information by interfacing with inGRD, NWS, Ganglia • Provides basic grid requirements : job submission, monitoring, cancellation, file transferetc • Advanced features include: accounting, load balancing, static and dynamic scheduling strategies

  15. Other Projects • GridGene Project: • High-throughput, grid-enabled version of two different gene-finding applications, GRAIL and GeneWise • Project with GIS: • Parallelization of mass spectrometry code for analysis of proteomics data

  16. Thank You!

More Related