Circuit Placement on Multicore CPUs

Circuit Placement on Multicore CPUs May 10-02 Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson

Project Overview • Circuit Placement problem is bottleneck of physical design • Currently only single-core – no threads • Will attempt to parallelize some functions of the FastPlace algorithm using the linuxpthreads library. • Implement RQL idea (IBM) into FastPlace

Project Plan • Start with existing serial FastPlace algorithm • Parallelize FastPlace algorithm to decrease run-time • Hope to gain increases as close to N times speedup (N = cores) as possible • Realistically, expect 0.75N or 0.5N • End-goal is mostly proof-of-concept • IBM uses in-house algorithm • Contains proprietary circuit processing

Project Design • Written in C • Run under Linux using POSIX thread library • Consider scalability – 2, 4, 8, etc. cores • RQL implementation • IBM Concept • Netlist optimization for placement

Implementation – Overall • Using Data Parallelism as scheme • Assigning loop iterations to threads • Localizing variable usage • Where absolutely necessary, using thread synchronization (mutex, etc..) • To maximize speed improvement with threads, minimize total number of tasks for threads to accomplish • Have individual threads do as much as possible

Implementation – Thread Pool • Threads are created once at start • Various Benefits: • Minimizes overhead from thread creation • Increases cache performance • Allows core scalability – number of threads running can equal cores available

Implementation - RQL • Force-vector Modulation • Forces acting upon cells • Forces are modeled as a spring potential energy problem • Native Force in the algorithm tries to reduce wire length by bringing connected cells closer to each other • Spreading Force tries to move cells into sparse areas within the placement region • Need a balance of the two to meet placement and wire length objectives • Modulate the Spreading Forces • High Spreading Forces means the connection belongs to a fan-out net or boundary • Therefore, cells with connections in the top 5 percentile of spreading forces are skipped in quadratic placement • Skipping these leaves the cell’s other connections minimized instead of degrading them. • Results in placing cells in their overall optimal location

Implementation - RQL • During quadratic placement (global placement process) • Calculate magnitude of spreading forces for all cells in each iteration • Calculate force on current cell • If current cell’s force is above the 5% threshold, skip its placement

Implementation - Functions • Move_8pt family • move_8pt, move_8pt_withMap, move_8pt_mixedMode, move_8pt_mixedMode_withMap, move_8pt_clustering, move_8pt_clustering_withMap • Calculates score based on cell coordinates and bin utilization • Doesn’t lend well to parallelization • The fix? • If a new cell is within 3x3 box of cell being currently calculated for, new cell is skipped • Helps remove significant wirelength degradation

Implementation - Functions • Swap_move family • swap_move_FM, vswap_move, local_order3_FM, flipAllCells • Row-based data processing • Break up matrix into segments based on number of threads • Assign each thread to do X rows

Testing • Profiled original FastPlace algorithm • gprof gives CPU time per function • Profiling parallel FastPlace • Valgrind • FastPlace code outputs actual time elapsed • Can be used to compare performance • Not 100% consistent

Testing & Results • Test results for correctness • Compare “wire length” results • Average total wirelength no worse than 1% greater • Threadpool is tested and working • Test results for speedup • Compared actual run-time • See slides on next page

Test Results – RQL Implementation • Wire length Results • Between .12% - 2.08% decreased wire length on ISPD98 benchmarks with an average of .98% • Between .11% - 3.18% decreased wire length on ISPD2005 benchmarks with an average of 1.39% • Run-time Results • Some run-time slow down • Average of 3.36% increasedon ISPD98 • Average of 4.02% increased on ISPD2005

Test Results – Global Placement

Test Results – Detailed Placement

Project Impact • Shows that threads can be used to speed up the placement process • With availability of multi-core CPU’s, and scalability of thread implementation, speed improvement could continue • Reduces bottleneck in development process

Questions?

Circuit Placement on Multicore CPUs

Circuit Placement on Multicore CPUs

Presentation Transcript

CH3 CPUs

Circuit Placement w/ Multi-core Processors

CPUs

ARM CPUs

Stencil Computations on CPUs

Stencil Computations on CPUs

Automatic OpenCL Work-Group Size Selection for Multicore CPUs

CPUs

CPUs

CPUs

CPUs

CPUs

CPUs

Large Scale Circuit Placement: Gap and Promise

CPUs

CPUs

Psychoacoustic audio coding on ARM CPUs

CPUs

CPUs

CPUs

CPUs