Lithographic Aerial Image Simulation with FPGA based Hardware Acceleration. Jason Cong and Yi Zou UCLA Computer Science Department. Lithography Simulation (Application). Simulation of the optical imaging process Computational intensive and quite slow for fullchip simulation.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lithographic Aerial Image Simulation with FPGA based Hardware Acceleration
Jason Cong and Yi Zou
UCLA Computer Science Department
Socketcompatible :
Replace one Opetron CPUwith the XD1000 coprocessor
The module connects to the CPU's HyperTransport bus and motherboard DIMMs while utilizing the existing power supply and heat sink solution for the CPU.
Dedicated DIMM for FPGA (not shared with CPU)
Coprocessor communicates with CPU via hypertransport link , has similar behavior as a PCI device
Loop over different rectangles
Loop over pixels
Loop over kernels
I(x,y)image intensity at (x,y)
yk(x,y)kth kernel
fk(x,y)kth eigenvector
(x1,y1)(x2, y2) (x1,y2) (x2,y1) layout corners
tmask transmittance
Pseudo code of the Imaging Equation
Loop over pixels
Loop over kernels
Loop over layout corners
Loop over kernels
Loop over layout corners
Loop over pixels
Loop interchange
Kernel Array

+
Image
(partial sum)

+
Layout corners
Object
(one rectangle)
Loop unrolling
different partitions of kernel and provides
the data for each PE
might also need the kernel data in
other partitions
Kernel partition 2
Kernel partition 1
Computing Element
Computing Element
Multi
plexing
Logic
Image
Partial Sum partition 2
Image
Partial Sum partition 1
Kernel partition 4
One partition
of Kernel
Kernel partition 3
Computing Element
Computing Element
Image
Partial Sum partition 4
One partition of Image
Partial Sum
Image
Partial Sum partition 3
4PE example
blocks/stages
PE 1
PE 2
PE 3
PE 4
Using Kernel
Partition 1
Compute Image
Partition 1
Using Kernel
Partition 2
Compute Image
Partition 2
Using Kernel
Partition 3
Compute Image
Partition 3
Using Kernel
Partition 4
Compute Image
Partition 4
Using Kernel
Partition 2
Compute Image
Partition 1
Using Kernel
Partition 3
Compute Image
Partition 2
Using Kernel
Partition 4
Compute Image
Partition 3
Using Kernel
Partition 1
Compute Image
Partition 4
Time
Using Kernel
Partition 3
Compute Image
Partition 1
Using Kernel
Partition 4
Compute Image
Partition 2
Using Kernel
Partition 1
Compute Image
Partition 3
Using Kernel
Partition 2
Compute Image
Partition 4
Using Kernel
Partition 4
Compute Image
Partition 1
Using Kernel
Partition 1
Compute Image
Partition 2
Using Kernel
Partition 2
Compute Image
Partition 3
Using Kernel
Partition 3
Compute Image
Partition 4
partition 1
partition 2
partition 3
partition 4
partition 1
partition 2
partition 3
partition 4
Image Partial Sum Array
Kernel Array
configuration 1
configuration 2
configuration 3
configuration 4
a
1
b
2
c
3
d
4
Start from:
Reg_1=array_a[..]
Reg_2=array_b[..]
Reg_3=array_c[..]
Reg_4=array_d[..]
Wanted :
Reg_1=array_c[..]
Reg_2=array_d[..]
Reg_3=array_a[..]
Reg_4=array_b[..]
Shift 1 step in
Y direction
Shift 0 step in
X direction
Reg_3
Reg_4
Reg_1
Reg_2
kernel[size];
Loop body with unrolling pragma and pipelining pragma
{
…. +=kernel […]…
//computation
}
kernel[4][4][size/16];
Loop body with unrolling pragma and pipelining pragma
{
…. +=kernel [i][j][…]…
//if some index are constant
}
SW
HW
HW
DI1:
Transferring Input From software to SRAM
DI1
DI2
DI1
Comp
Reading Input Data
DI2:
Transferring Input From SRAM to FPGA
DI2
Reading Input Data
Computation
DO2
Comp
Writing Output Data
Computation
Reading Input Data
DO1
Writing Output Data
DI1
DO2:
Transferring Output From FPGA to SRAM
Computation
Reading Input Data
Writing Output Data
DI2
Computation
DO2
Writing Output Data
Comp
DO1
DO1:
Transferring Output From SRAM to Software
DO2
DO1