Speeding up large scale geospatial polygon rasterization on gpgpus
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Speeding Up Large-Scale Geospatial Polygon Rasterization on GPGPUs PowerPoint PPT Presentation


  • 167 Views
  • Uploaded on
  • Presentation posted in: General

Speeding Up Large-Scale Geospatial Polygon Rasterization on GPGPUs. Jianting Zhang Department of Computer Science, the City College of New York [email protected] Outline. Introduction and Motivations Background and Related Works The Serial Scan-Line Fill Algorithm

Download Presentation

Speeding Up Large-Scale Geospatial Polygon Rasterization on GPGPUs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Speeding up large scale geospatial polygon rasterization on gpgpus

Speeding Up Large-Scale Geospatial Polygon Rasterization on GPGPUs

Jianting Zhang

Department of Computer Science, the City College of New [email protected]


Outline

Outline

  • Introduction and Motivations

  • Background and Related Works

  • The Serial Scan-Line Fill Algorithm

  • Preprocessing Polygon Collections

  • Efficient Polygon Rasterization on GPGPUs

  • Experiments and Results

  • Conclusion and Future Work


Introduction personal hpc g

Introduction: Personal HPC-G

A. Clematis, M. Mineter, and R. Marciano. High performance computing with geographical data. Parallel Computing, 29(10):1275–1279, 2003

“Despite all these initiatives the impact of parallel GIS research has remained slight…”

“…fundamental problem remains the fact that creating parallel GIS operations is non-trivial and there is a lack of parallel GIS algorithms, application libraries and toolkits.”

  • Marrying GPGPU with GIS – The next generation High-Performance GIS in a Personal Computing Environment (Zhang 2010, HPDGIS)

    • Every personal computer is now a parallel machine: CMPs and GPUs

    • Multi-core CPUs become the mainstream ; the more cores they have, the more GPU features they have

    • NVIDIA alone has shipped almost 220 million CUDA-capable GPUs from 2006-2010 (CACM 2010/11)


Introduction personal hpc g1

Introduction – Personal HPC-G

  • Chip-Multiprocessors (CMP):

    • http://en.wikipedia.org/wiki/Multi-core_processor

    • Cores/per chip: Dual-core Quad-core Six-core8/10/12

    • Chips/per node: 1->24/8

    • Intel MIC (32 cores)

    • UIUC Rigel Design (1024 core)

  • Massively parallel GPGPU computing: Hundreds of GPU cores in a GPU card

    • Nvidia GTX480 (03/2010): 480 cores, 1.4 GHZ, 1.5GB, 177.4 GB/s memory bandwidth, 1.35 TFlops

    • Nvidia GTX590 (03/2011): 1024 cores, 1.2 GHZ, 3GB, 327.74 GB/s memory bandwidth, 2.49 TFlops

Parallel hardware is ever affordable than before …


Introduction personal hpc g2

Introduction – Personal HPC-G

COM.GEO’10

SSDBM’10

ACMGIS 10

ACMGIS 11

  • Geospatial data volumes never stop growing

    • Satellite: e.g., from GOES to GOES-R (2016)

      • http://www.goes-r.gov/downloads/GOES-R-Tri.pdf

      • Spectral (3X)*spatial (4X)* temporal (5X)=60X

      • Derived thematic data products (vector)

        • http://www.goes-r.gov/products/baseline.html

        • http://www.goes-r.gov/products/option2.html

    • Species distributions and movement data

      • E.g. 300+ millions occurrence records (GBIF)

      • E.g. 717,057 polygons and 78,929,697 vertices for 4148 birds distribution data (NatureServe)

      • Animals can move across space and time

    • Event Locations, trajectories and O-D data

      • E.g., Taxi trip records (traces or O-D locations)

      • 0.5 million in NYC and 1.2 million in Beijing per day

      • From O-D to shortest paths to flow patterns

ACMGIS’08

ACMGIS’09

GeoInformatics’09

HPDGIS’11

COM.GEO’10

HPDGIS’10

???


Motivations

0

2

3

1

Motivations

GPU-based parallel algorithm design to efficiently manage large-scale species distribution data (overlapped polygons)

  • Part 1: Extended quadtree to represent overlapped polygons (GeoInformatics’09 and ACMGIS’09)

  • Part 2: Efficient conversion between real-world geospatial polygons to quadtrees

    • Step 1:From polygons to scan-line segments. Step 2: from scan-line segments to quadtrees

  • Part 3: Query-driven visual exploration (ACMGIS’08 and ACMGIS’09)


Background and related works

Background and Related Works

  • Polygon-rasterization on GPUS

    • State-of-the-art: OpenGL GL_Polygon

    • Problems

      • Fix-function, proprietary, black-box

      • Does not support complex (e.g. concave) polygons – results may be incorrect (although acceptable for display purposes)

      • GL_Polygon is much slower than GL_TRIANGLES

      • Require a hardware context to read back rasterization results

      • Accuracy is limited by screen resolution

      • Difficult to implement using graphics languages for GIS developers

    • GPGPU comes to the rescue

      • Being able to use GPU parallel computing power

      • Using C/C++ languages is more intuitive

      • Directly generating spatial data structures can be more efficient (than using rasterized images to construct quadtrees)

      • More client-server computing friendly

    • No previous works on polygon rasterization on GPGPUs for geospatial apps.


Background and related works1

Background and Related Works

  • Spatial Data structures on GPUs for computer graphics applications

    • KD-Tree (Zhou et al 2008, Hou et al 2001), Octree (Zhou 2011)

    • They are designed to efficiently render triangles, not querying polygons

  • Software rasterization of triangles

    • (Laine and Karras 2011), (Panntaleoni 2011), (Schwarz and Seidel 2011)

    • Results are encouraging when compared to hardware rasterization (2-8x gap)

    • Again, they are deisgned for rasterizing/rendering triangles, not for query polygons


Background and related works2

Background and Related Works

  • Geospatial Data Processing on GPUs

    • Pre-GPGPU:

      • Using graphics data structures and primitives for spatial selection and spatial join queries (Sun et al 2003)

      • Difficult and unintuitive

    • Post-GPGPU

      • Spatial similarity join (Lieberman et al 2008)

      • Density-based spatial clustering (Bohm et al 2009)

      • Min-Max quadtree for large-scale raster data (Zhang et al 2010)

      • Decoding quad-tree encoded bitplane bitmaps of large-scale raster data (Zhang et al 2011)


The serial scan line fill algorithm

The Serial Scan-Line Fill Algorithm

  • For each scan line y from ymin to ymax

    • Compute the intersection points with all edges

    • Sort the intersection points and form the scan line segments

    • (Fill the raster cells in the scan line segments)

  • End

Intersection points between scan line y=y’ and edge (x1,y1) and (x2,y2)

x’=(x1+(y-y1)/(y2-y1)*(x2-x1))

GDAL/GRASS codebases


Polygon rasterization on gpgpus c hallenges

Polygon Rasterization on GPGPUs - Challenges

  • Unique hardware characteristics (e.g. Nvidia Telsa C2050)

    • large number of threads (1024 per SM, 14 SMs)

    • limited shared memory: 48K per SM (shared by 1024 threads)

    • limited registers: 32768 per SM, i.e., 32 per thread

    • Need explicit shared memory management to make full utilization of the memory hierarchy

  • Parallelizing Scan-Line Fill Algorithm

    • Mimicking CPU algorithm (assigning a polygon to a thread)

      • Will NOT Work

      • Uncoalesced accesses to global memory are extremely inefficient

      • Insufficient registers and shared memory

    • How to assign computing blocks and threads to scan-lines and polygon edges?


Polygon rasterization on gpgpus design

  • The GPU SMs are divided into 14*4 computing blocks

  • A computing block has 256 threads and processes one polygon

  • All threads in a computing block loop through scan lines cooperatively

Polygon Rasterization on GPGPUs – Design

GPU Global Memory

L2

L1

SM2

SMn

SM1


Polygon rasterization on gpgpus design1

1

2

3

4

5

6

Global Memory

X/Y

1

2

3

4

5

6

Shared Memory

Polygon Rasterization on GPGPUs – Design

1

3

a

b

c

f

2

4

d

6

For each scan line y from ymin to ymax

End

e

5

X

O

O

X

O

Intersection

X

X

O

O

O

Sorting

X/Y coordinates in shared memory are re-used (ymax-ymin-1) times


Polygon rasterization on gpgpus sorting

0

1

1

0

0

0

0

0

0

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

1

1

1

2

2

2

2

1

2

Polygon Rasterization on GPGPUs – Sorting

Step 0

  • GPGPUs are extremely good at sorting

  • Sorting on shared memory are extremely fast

Step 1

Step 2

Step 3

__device__ inline ushort scan4(ushort num) {

__shared__ ushort ptr[2* MAX_PT];

ushort val=num; uint idx = threadIdx.x;

ptr[idx] = 0; idx += Tn;

ptr[idx] =num; SYNC

val += ptr[idx - 1]; SYNC ptr[idx] = val; SYNC

val += ptr[idx - 2]; SYNC ptr[idx] = val; SYNC

val += ptr[idx - 4]; SYNC ptr[idx] = val; SYNC

val = ptr[idx - 1]; return val;

}

  • Benefits

  • only true intersection results are written back to global memory

  • Save GPU memory footprint and I/O costs

Result of exclusive scan


Experiments and results

Experiments and Results

  • Data:

    • NatureServe West Hemisphere birds speices distributions: http://www.natureserve.org/getData/birdMaps.jsp

    • 4148 birds: http://geoteci.engr.ccny.cuny.edu/geoteci/SPTestMap.html

    • 717,057 polygons, 1,199,799 rings

    • 78,929,697 vertices (1.3 G - shp files)

    • Total number of scan-line/polygon edge intersections: 200+ billions


Experiments and results1

Experiments and Results


Discussions handling large polygons

Discussions - handling large polygons

  • The current implementation can not process polygons whose number of vertices are above a few thousands

    • 8n bytes for x coordinates

    • 8n bytes for y coordinates

    • 4n bytes for x coordinates of the intersections

    • ~100 extra bytes

    • (20n+100)<48kn~2000 (using a whole SM as a computing block)

    • We have limited the number of points to the number of threads (1024) - having one thread process a few vertices is not scalable

    • We need a better way to handle scalability


Discussions handling large polygons1

1

2

3

4

5

6

Global Memory

X/Y

1

2

3

4

5

6

shared Memory

Chunking

Computing

Sorting using a separate kernel

assembling

(x1,y1)

(x3,y1)

(x1,y1)

(x2,y2)

(x2,y2)

(x4,y2)

(x3,y1)

(x4,y2)

Discussions - handling large polygons

Proposed Solution: chunking edge list, computing separately and then assembling


Summary and conclusion

Summary and Conclusion

  • Introduced A GPGPU accelerated software rasterization framework to rasterize and index large-scale geospatial polygons

  • Provided A GPGPU based design and implementation of computing intersection points

  • Achieved about 20X speedup for groups of polygons with vertices between 64 and 1024 using the birds species distribution data in the West Hemisphere that has about 3/4 million of polygons and more than 78 millions of vertices

  • Discussed on extending the current implementation to support polygons with arbitrarily large numbers of vertices by extensively using efficient sorting

  • Work reported is preliminary - several important components in realizing a dynamically integrated vector-raster data model for high-performance geospatial analysis on GPGPUs are still currently under development.


Future work

Future Work

  • Extend our current implementation to support large polygons with arbitrary numbers of vertices

  • Implement the quadtree construction (step2) based on the GPGPU computed scan-line segments (CPU/GPU)

  • Perform a comprehensive performance comparison with that of commercial spatial database indexing

  • Integrate with front end modules in spatial databases (e.g., query parser and optimizer)


Speeding up large scale geospatial polygon rasterization on gpgpus

Q&A

[email protected]

21


  • Login