1 / 42

Case Studies of Using Condor for Scientists Barcelona, 2006

Case Studies of Using Condor for Scientists Barcelona, 2006. Agenda. Extended user’s tutorial Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing Case studies, and a discussion of your application‘s needs. BLAST. Background.

meghan
Download Presentation

Case Studies of Using Condor for Scientists Barcelona, 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Case Studies of Using Condor for Scientists Barcelona, 2006

  2. Agenda • Extended user’s tutorial • Advanced Uses of Condor Java programs DAGMan Stork MW Grid Computing • Case studies, and a discussion of your application‘s needs

  3. BLAST

  4. Background • Each species has a genetic encoding within its cells • Humans are made of approximately 1014 cells

  5. Background • The human nucleus of each cell contains 46 chromosomes • Each chromosome contains between 231 and 2958 genes • Each chromosome is made of somewhere between 25 million and 237 million (approximately) base pairs

  6. Base Pairs (Simplified) • Each base pair is one of 4 nucleotides • Each nucleotide is represented by one letter: ACGT

  7. The Science Issue Scientists ask many questions and pose computationally difficult issues: map a species’ genome - build a huge database of information understand evolution at a genetic level – answer homology and related questions identify mutations and genes – to develop diagnoses and medical treatments

  8. BLAST • Basic Local Alignment Search Tool • A really good pattern matching program • An answer to the science questions often requires queries such as Does the following nucleotide sequence (~1000 pairs), or something close appear in the database (several billions of pairs)? To what certainty is there a match?

  9. The Biological Magnetic Resonance Data Bank • Department of Biochemistry at University of Wisconsin-Madison • Part of the Center for Eukaryotic Structural Genomics (CESG) • Working on three dimensional protein structure

  10. The BMRB and BLAST • The BMRB (with the help of the Condor Team) has a weekly set of automated BLAST runs • These BLAST runs compare progress on the BMRB set of working proteins to the Protein Data Bank

  11. Serial versus Parallel • Too slow: The BMRB working set could be input as a single BLAST program execution • Load the Protein Data Bank database • Serially query the database with each protein in the working set • Faster: Divide the working set into pieces that allow parallel executions of BLAST

  12. Weekly BMRB Runs • Obtain and install the BLAST executable and Protein Data Bank database • Decide on the best way to split the BMRB working set of proteins to minimize the parallel execution time • Make a custom DAG for this split • Produce a report on the BMRB run

  13. E B B B E E C The Custom DAG . . . B is BLAST . . . E is Extract results

  14. An Economics Application • Computations are done at points on a coordinate plane • Initial values are known along the axes • Computation of one point at a time is too slow (serial execution) • Each point is dependent on 2 neighboring points (x,y) can be computed knowing (x-1,y) and (x,y-1)

  15. The Coordinate Plane known result 6 5 4 3 2 1 1 2 3 4 5 6

  16. The Coordinate Plane known result 6 inputs ready 5 4 3 2 1 1 2 3 4 5 6

  17. The Coordinate Plane known result 6 inputs ready 5 4 3 2 1 1 2 3 4 5 6

  18. The Coordinate Plane known result 6 inputs ready 5 4 3 2 1 1 2 3 4 5 6

  19. The Coordinate Plane known result 6 inputs ready 5 4 3 2 1 1 2 3 4 5 6

  20. The Coordinate Plane known result 6 inputs ready 5 4 3 2 1 1 2 3 4 5 6

  21. The DAG 1-4 1-3 1-2 2-3 etc. 1-1 2-2 2-1 3-2 3-1 4-1

  22. Use DAGMan • Write a program to generate the DAG input file • The submit description file (and the executable) is the same for each node in the DAG

  23. Job 1-1 gonkulate.submit Job 1-2 gonkulate.submit Parent 1-1 Child 1-2 Job 2-1 gonkulate.submit Parent 1-1 Child 2-1 Job 1-3 gonkulate.submit Parent 1-2 Child 1-3 Job 2-2 gonkulate.submit Parent 1-2 2-1 Child 2-2 Vars 2-2 left=“file1-2” Vars 2-2 below=“file2-1” Vars 2-2 result=“file2-2” . . . DAG input file, continued Job 3-4 gonkulate.submit Parent 2-4 3-3 Child 3-4 Vars 3-4 left=“file2-4” Vars 3-4 below=“file3-3” Vars 3-4 result=“file3-4” . . . DAG Input File

  24. Submit Description File In gonkulate.submit: universe = vanilla executable = gonkulate output = $(result) should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = $(left) $(below) log = gonkulate.log notification = Never queue

  25. Nug30

  26. Description of Nug30 • nug30 (a Quadratic Assignment Problem instance of size 30) had been the “holy grail” of computational QAP research since 1968 • In 2000, Anstreicher, Brixius, Goux, & Linderoth set out to solve this problem • Using a mathematically sophisticated and well-engineered algorithm, they still estimated that we would require 11 CPU years to solve the problem.

  27. Nugent’s Problem • There are a set of N locations and a set of N facilities, and each facility must be assigned a location. To measure the cost of each possible assignment, the flow between each pair of facilities is multiplied by the distance between the pair's assigned locations, and then a sum is taken over all of the pairs. • For Nug30, N = 30

  28. QAP Definition* The formal definition of the quadratic assignment problem is Given two sets, P ("facilities") and L ("locations"), of equal size, together with a weight function w : P x P g R and a distance function d : L x L g R. Find the bijection f : P g L (assignment) such that the cost function: w(a,b) . d(f(a), f(b)) is minimized and a and b are members of P. Usually weight and distance functions are viewed as a square real-valued matrices. *Wikipedia

  29. Scope of the Problem • This QAP problem is difficult due to the excessively large number of possible facility assignments. • The number of possible assignments is factorial in the number of facilities. N! = N x (N-1) x (N-2) x . . . x 2 • 30! is approximately 2.6 x 1032

  30. The Simplified Approach • Method of choice is branch and bound • The complete tree has 30! nodes as leaves • Branching grows the tree • Bounding results in pruning the tree

  31. The Nug30 Solution • Used a new algorithm called quadratic programming bound developed by Anstreicher and Brixius • Sequential execution would have taken 7 years, so parallelization of the algorithm was important • Used MW

  32. Nug30 Computational Grid • Used tricks to make it look like one Condor pool • Flocking • Glidein • 2510 CPUs total

  33. Workers Over Time

  34. Nug30 solved

  35. The Football Pool Problem

  36. Win By Gambling Each week, 6 games are played The outcome of each game is • win • lose • tie

  37. Bet, and win $$$ • Get 5 of the 6 games correctly predicted, and you win • What is the minimum number of predictions you must make to guarantee winning?

  38. Known Values number of games minimum predictions

  39. Problem Description • A covering code • An NP Hard problem • Many years of research and effort for 6 games leads to 65 < minimum number of predictions < 73 • An integer programming problem • Best solver is the commercial application CPLEX

  40. Why the Problem is Difficult • Number of tickets possible:6! x 36 • The tree that represents the problem (and solutions) has many isomorphic branches. This makes it difficult to prune the tree. • New techniques have been developed, which leads to reducing the interval of solution • The latest and greatest does many smaller problems using MW

  41. Solution! • Not yet. . . • The first effort (many CPU years worth of time) had a very small error in input • Second effort is still in progress. • All this to improve the lower bound from 65 to 70, thereby reducing the range for the solution

More Related