gpu accelerated genetic algorithms n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
GPU-Accelerated Genetic Algorithms PowerPoint Presentation
Download Presentation
GPU-Accelerated Genetic Algorithms

Loading in 2 Seconds...

play fullscreen
1 / 27

GPU-Accelerated Genetic Algorithms - PowerPoint PPT Presentation


  • 172 Views
  • Uploaded on

GPU-Accelerated Genetic Algorithms. Rajvi Shah + , P J Narayanan + , Kishore Kothapalli ˆ IIIT Hyderabad Hyderabad, India. + : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research. GAs – an introduction. Genetic Algorithms

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

GPU-Accelerated Genetic Algorithms


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. GPU-Accelerated Genetic Algorithms Rajvi Shah+,P J Narayanan+, KishoreKothapalliˆ IIIT Hyderabad Hyderabad, India + : Center for Visual Information Technology ˆ : Center for Security, Theory and Algorithmic Research

    2. GAs – an introduction • Genetic Algorithms • A class of evolutionary algorithms • Efficiently solves optimization tasks • Potential Applications in many fields • Challenges • Large execution time International Institute of Information Technology, Hyderabad, India

    3. Typical flow of a GA User Specifies … Create Initial Population A representation for chromosome Select Parents GA Parameters Terminate? Crossover Operator No Create New Population Mutation Operator Termination Criteria Evaluate Fitness A method for fitness evaluation Yes Exit International Institute of Information Technology, Hyderabad, India

    4. Accelerating Genetic Algorithms • High degree of parallelism • Fitness evaluation • Crossover • Mutation • Most obvious : • chromosome level parallelism • Same Operations on each chromosome • Use a thread per chromosome International Institute of Information Technology, Hyderabad, India

    5. Gene-level Parallelism • Thread-per-chromosome model • Good enough for small to moderate sized multi-core • Doesn’t map well to a massively multithreaded GPUs • Solution : • identify and exploit gene-level parallelism International Institute of Information Technology, Hyderabad, India

    6. CUDA International Institute of Information Technology, Hyderabad, India

    7. Our Approach • A column of threads read a chromosome gene-by-gene and cooperate to perform operations • Results in coalesced read and faster processing Population Matrix in Memory Thread Blocks in a grid International Institute of Information Technology, Hyderabad, India

    8. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Crossover Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India

    9. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Population Crossover Kernel Scores Mutation Kernel International Institute of Information Technology, Hyderabad, India

    10. Fitness Evaluation Partially parallel method Fully parallel method CUDA familiar user can effectively use 2D thread layout Use gene level Parallelism for fitness evaluation Benefit : Efficiency • Partially-parallel Method • User Specifies a serial code fragment for fitness evaluation. • Threads are arranged in a 1D grid. • Each thread executes user’s code on one chromosome. • Providing chromosome level parallelism. • Benefit : Abstraction International Institute of Information Technology, Hyderabad, India

    11. Example – 0/1 Knapsack • Task : • Given weights , costs & knapsack capacity • Aim : maximize the cost. • Representation • 1D binary string • 0/1: Absence/Presence of an item, • W and C are total weight and Cost of given representation • Best Solution : One with max C given W < Wmax • Fully Parallel Method • Use a group of threads to compute total cost and weight in logarithmic time International Institute of Information Technology, Hyderabad, India

    12. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Scores Crossover Kernel Statistics Mutation Kernel International Institute of Information Technology, Hyderabad, India

    13. Statistics • Selection and Termination most often use Population Statistics • We use standard parallel reduce algorithm to calculate • Max, Min, Average Scores • We use highly optimized public library CUDPP • To sort and rank chromosomes International Institute of Information Technology, Hyderabad, India

    14. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Selection Kernel Statistics Crossover Kernel Parents Mutation Kernel International Institute of Information Technology, Hyderabad, India

    15. Selection • Selection Kernel • Uses N/2 threads • Each thread selects two parents for producing offspring • Uniform Selection : • Selects parents in a uniform random manner • Roulette Wheel Selection: • Fitness based approach, more the fitness, better the chance of selection International Institute of Information Technology, Hyderabad, India

    16. Selection • Roulette Wheel • Sort fitness scores • Compute a roulette wheel array by doing a prefix-sum scan of scores and normalizing it. • Generate a random number in 0-1. • Perform binary search in roulette wheel array for the nearest smaller number to the randomly selected number. • Return the index of the result in array Image Courtesy : xyz International Institute of Information Technology, Hyderabad, India

    17. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Old Population Crossover Kernel Crossover Kernel New Population Mutation Kernel International Institute of Information Technology, Hyderabad, India

    18. Crossover GPU Global Memory Parent1 Crossover Parent2 04 03 02 08 13 02 02 12 07 19 05 04 15 01 14 Population Thread idy Thread idy Thread idy Thread idy 02 1 2 3 4 5 6 7 8 08 13 02 12 07 02 05 19 02 04 Thread idx 1-L Thread idx 1-L Thread idx 1-L Thread idx 1-L 03 International Institute of Information Technology, Hyderabad, India

    19. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel New Population Crossover Kernel New Population Mutation Kernel Mutation Kernel International Institute of Information Technology, Hyderabad, India

    20. Mutation Thread Id y Flip Mutator • Each thread handles one gene and mutates it with probability of mutation Thread Id x X X X X X X X X X X X X X X X X X X X X x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Thread 1,4 x x x x x x x x x x x x x x x x x x x x Coin State X Population Gene Flip Coin Coin State T Gene International Institute of Information Technology, Hyderabad, India

    21. Mutation Thread Id y Flip Mutator • Each thread handles one gene and mutates it with probability of mutation Thread Id x F F F F F F F F F F X X X X X X X X X X F X X F F F X F F X F F X F X X X X X F F F T F F F F F T F x x x x x x x x x x x F x x F x F x F F x F x F x x x T F F F F F F F T F F F F x x x x x x x x x x F F x x x F F x F F F F x x x F x F x x F F F F F F F F F F x x x x x x x x x x T F x F x x F F x F x x x F x x F x F F F F F T F F F F F T x x x x x x x x x x x F F x x F x x F F F x x F x F F x F x F F F F F F F F F F Thread 1,4 x x x x x x x x x x x x x F x x F x F x x x F F F F F F x F Coin State X Population Gene Flip Coin Coin State T Gene International Institute of Information Technology, Hyderabad, India

    22. Program Execution Flow On CPU On GPU Parse GA Parameters Generate Random Numbers Generate Random Numbers Construct Initial Population Evaluation Kernel Statistics Update Kernel GPU Global Memory Random Numbers Old Population New Population Fitness Scores Statistics Selection Kernel Crossover Kernel Random No.s Mutation Kernel International Institute of Information Technology, Hyderabad, India

    23. Random Number Generation • Extensive use of random numbers • No primitive for on the fly single random number generation • Solution: • Generate a pool of random numbers and copy it on GPU • We use CUDPP routine to generate a large pool of random numbers on GPU (faster) • If better quality random numbers are needed, this can be replaced by a CPU based routine International Institute of Information Technology, Hyderabad, India

    24. Results • Test Device : • A quarter of Nvidia Tesla S1030 GPU • Test Problem : • Solve a 0/1 knapsack problem • Test Parameters: • Representation : A 1D Binary String • Crossover : One-point crossover • Mutation : Flip Mutation • Selection : Uniform and Roulette Wheel International Institute of Information Technology, Hyderabad, India

    25. Results Ave. Run-time for 100 iterations (Uniform Selection) Growth in run-time for increase in NxL Ave. Run-time for 100 iterations (Roulette Wheel Selection) N: Population Size , L: Chromosome Length International Institute of Information Technology, Hyderabad, India

    26. Scope • Our approach is modeled after GAlib and maintains structures for GA, Genome and Statistics • It is built with enough abstraction from user program so that user does not need to know CUDA architecture or programming. • This can be extended to build a GPU-Accelerated GA library International Institute of Information Technology, Hyderabad, India

    27. Thank You rajvi.shah@research.iiit.ac.in pjn@iiit.ac.in kkishore@iiit.ac.in International Institute of Information Technology, Hyderabad, India