1 / 30

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI. Chaiwoot Boonyasiriwat, Ge Zhan, Madhu Srinivasan , Markus Hadwiger , and Gerard Schuster Jan. 7, 2010. Outline. Introduction to Graphics Processing Unit (GPU) Multisource RTM and FWI on GPU Numerical Results

tracen
Download Presentation

Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breaking the I/O Bottleneck in GPUs with Fast Multisource RTM and FWI Chaiwoot Boonyasiriwat, Ge Zhan, MadhuSrinivasan, Markus Hadwiger, and Gerard Schuster Jan. 7, 2010

  2. Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 1

  3. Introduction to GPU Real-time volume rendering using GPU by Markus Hadwiger 2

  4. Performance of GPU vs CPU 1000 750 Peak GFLOP/s 500 250 0 3 Courtesy of NVIDIA

  5. Memory Bandwidth of GPU vs CPU 120 100 80 Bandwidth GB/s 60 40 20 0 4 Courtesy of NVIDIA

  6. Seismic Applications for GPUs • Well Logging (Mendoza et al., 2009) • Migration (Foltinek et al., 2007; Li et al., 2009; Wang et al., 2009) • Visualization and Interpretation (Lin and Wei, 2007; Kadlec et al., 2009) 5

  7. GPU Architecture: High-Level View Multiprocessors: each contains 8 processors High performance when thousands of threads execute concurrently Image from Micikevicius, NVIDIA 6

  8. CUDA Programming Model (Device) Grid Block (0, 0) Block (1, 0) Shared Memory Shared Memory Registers Registers Registers Registers Threadblocks Thread (0, 0) Thread (1, 0) Thread (0, 0) Thread (1, 0) Local Memory Local Memory Local Memory Local Memory Host Global Memory Constant Memory Texture Memory 7

  9. CUDA Programming Model Grid of Threadblocks 8

  10. Heterogeneous Programming Serial code Parallel kernel Kernel<<<grid,block>>> 9

  11. Parallel Kernel block(BLOCK_X,BLOCK_Y) grid(nx/BLOCK_X, nz/BLOCK_Y) Kernel<<<grid,block>>> nx Block(0,0) Block(1,0) Block(2,0) nz Block(0,1) Block(1,1) Block(2,1) 10

  12. Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 11

  13. Multisource RTM/FWI Model Encoded Data Evaluate misfit function and compute gradient Perturb Model Evaluate misfit function No Search criterion Yes Yes No Convergence criterion Done 12

  14. Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg init_pressure<<< grid,block>>> For it = 1, nt modeling <<< grid,block>>> save_boundary <<< grid,block>>> End End … End Parallel Parallel Parallel Parallel Encoded Modeling 13

  15. Multisource RTM/FWI on GPU Serial Read input parameters, velocity model, etc. For iter = 1, iter_max init_grad<<<grid,block>>> For is = 1, nssg Encoded modeling Encode observed data End Compute the gradient Line Search End Parallel Reduce I/O Parallel Serial Parallel Parallel Encoded data reused 14

  16. Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 15

  17. Numerical Results: RTM 2D SEG/EAGE Salt Model 16

  18. Numerical Results: RTM Conventional RTM Image using 200 CSGs 17

  19. Numerical Results: RTM Multisource RTM Image using 20 SSGs 10x speedup 18

  20. Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI on GPU • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 19

  21. Numerical Results: FWI Marmousi II Model 20

  22. Numerical Results: FWI Conventional RTM Image using 272 CSGs 21

  23. Numerical Results: FWI Multisource RTM Image using 17 SSGs 16x speedup 22

  24. Numerical Results: FWI Multisource FWI Velocity Tomogram using 17 SSGs 23

  25. Outline • Introduction to Graphics Processing Unit (GPU) • Multisource RTM and FWI • Numerical Results • RTM of 2D SEG/EAGE Salt Model • FWI of Marmousi II Model • Summary • Future Work • Acknowledgment 24

  26. Summary • Multisource RTM/FWI are implemented on a GPU. • I/O from the host machine to GPU are reduced by phase encoding. • Theoretical speedup is achieved for multisource RTM. • CUDA code using 1 GPU is about 10x faster than MPI code using 8 processors for a 2D RTM experiment. • GPU is a cheap, high-performance computing machine for seismic migration and inversion. 25

  27. Outline • Introduction to Multisource Technology • Multisource Full-Waveform Inversion • Numerical Results • 3D SEG/EAGE Overthrust Model • Summary • Future Work • Acknowledgment 26

  28. Future Work • Implement 3D multisource RTM/FWI on a GPU. • Develop CUDA codes for a GPU cluster. • Develop real-time 2D multisource RTM/FWI with user interfaces (computational steering) • Joint proposals with PSU and U of U 27

  29. GPU Crews 28

  30. Acknowledgment • Sponsors of 2009 UTAM consortium • Workstation: Benoit Marchand • Thank you for your attention 29

More Related