1 / 17

Circuit Placement on Multicore CPUs

Circuit Placement on Multicore CPUs. May 10-02 Mike Drob Grant Furgiuele Ben Winters. Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson. Project Overview. Circuit Placement problem is bottleneck of physical design Currently only single-core – no threads

Download Presentation

Circuit Placement on Multicore CPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Circuit Placement on Multicore CPUs May 10-02 Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson

  2. Project Overview • Circuit Placement problem is bottleneck of physical design • Currently only single-core – no threads • Will attempt to parallelize some functions of the FastPlace algorithm using the linuxpthreads library. • Implement RQL idea (IBM) into FastPlace

  3. Project Plan • Start with existing serial FastPlace algorithm • Parallelize FastPlace algorithm to decrease run-time • Hope to gain increases as close to N times speedup (N = cores) as possible • Realistically, expect 0.75N or 0.5N • End-goal is mostly proof-of-concept • IBM uses in-house algorithm • Contains proprietary circuit processing

  4. Project Design • Written in C • Run under Linux using POSIX thread library • Consider scalability – 2, 4, 8, etc. cores • RQL implementation • IBM Concept • Netlist optimization for placement

  5. Implementation – Overall • Using Data Parallelism as scheme • Assigning loop iterations to threads • Localizing variable usage • Where absolutely necessary, using thread synchronization (mutex, etc..) • To maximize speed improvement with threads, minimize total number of tasks for threads to accomplish • Have individual threads do as much as possible

  6. Implementation – Thread Pool • Threads are created once at start • Various Benefits: • Minimizes overhead from thread creation • Increases cache performance • Allows core scalability – number of threads running can equal cores available

  7. Implementation - RQL • Force-vector Modulation • Forces acting upon cells • Forces are modeled as a spring potential energy problem • Native Force in the algorithm tries to reduce wire length by bringing connected cells closer to each other • Spreading Force tries to move cells into sparse areas within the placement region • Need a balance of the two to meet placement and wire length objectives • Modulate the Spreading Forces • High Spreading Forces means the connection belongs to a fan-out net or boundary • Therefore, cells with connections in the top 5 percentile of spreading forces are skipped in quadratic placement • Skipping these leaves the cell’s other connections minimized instead of degrading them. • Results in placing cells in their overall optimal location

  8. Implementation - RQL • During quadratic placement (global placement process) • Calculate magnitude of spreading forces for all cells in each iteration • Calculate force on current cell • If current cell’s force is above the 5% threshold, skip its placement

  9. Implementation - Functions • Move_8pt family • move_8pt, move_8pt_withMap, move_8pt_mixedMode, move_8pt_mixedMode_withMap, move_8pt_clustering, move_8pt_clustering_withMap • Calculates score based on cell coordinates and bin utilization • Doesn’t lend well to parallelization • The fix? • If a new cell is within 3x3 box of cell being currently calculated for, new cell is skipped • Helps remove significant wirelength degradation

  10. Implementation - Functions • Swap_move family • swap_move_FM, vswap_move, local_order3_FM, flipAllCells • Row-based data processing • Break up matrix into segments based on number of threads • Assign each thread to do X rows

  11. Testing • Profiled original FastPlace algorithm • gprof gives CPU time per function • Profiling parallel FastPlace • Valgrind • FastPlace code outputs actual time elapsed • Can be used to compare performance • Not 100% consistent

  12. Testing & Results • Test results for correctness • Compare “wire length” results • Average total wirelength no worse than 1% greater • Threadpool is tested and working • Test results for speedup • Compared actual run-time • See slides on next page

  13. Test Results – RQL Implementation • Wire length Results • Between .12% - 2.08% decreased wire length on ISPD98 benchmarks with an average of .98% • Between .11% - 3.18% decreased wire length on ISPD2005 benchmarks with an average of 1.39% • Run-time Results • Some run-time slow down • Average of 3.36% increasedon ISPD98 • Average of 4.02% increased on ISPD2005

  14. Test Results – Global Placement

  15. Test Results – Detailed Placement

  16. Project Impact • Shows that threads can be used to speed up the placement process • With availability of multi-core CPU’s, and scalability of thread implementation, speed improvement could continue • Reduces bottleneck in development process

  17. Questions?

More Related