An apples to apples gpgpu benchmark or at least an attempt at one
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one) PowerPoint PPT Presentation


  • 32 Views
  • Uploaded on
  • Presentation posted in: General

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one). Peter S. Shenkin. Attachment-Based Core Hopping. What it does The architecture The benchmark. Attachment-Based Core Hopping. What it does Find a replacement for the central portion of a molecule

Download Presentation

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


An apples to apples gpgpu benchmark or at least an attempt at one

An Apples-to-Apples GPGPU Benchmark(…or at least an attempt at one)

Peter S. Shenkin


Attachment based core hopping

Attachment-Based Core Hopping

  • What it does

  • The architecture

  • The benchmark


Attachment based core hopping1

Attachment-Based Core Hopping

  • What it does

    • Find a replacement for the central portion of a molecule

    • … keeping the peripheral parts in place

    • … while making “chemical sense”

    • Why would you do such a thing?

      • Increase efficacy

      • Improve “ADMET” properties

        • (Absorption, Distribution, Metabolism, Excretion, Toxicity)

      • Find new IP

    • Designed as a fast interactive desktop application

  • The architecture

  • The benchmark


Define core in a template molecule

Define Core in a “Template” Molecule

  • Two ways shown, to emphasize user choice

1kv1 core

“1kv1-smaller” core


Result 1err olap 0 95 rel gscore 1 37

Result: 1err: olap= 0.95 relgscore= -1.37

  • Replaced C with N

  • Replaced S with C


Result 1erb olap 0 80 rel gscore 0 96

Result: 1erb: olap= 0.80, relgscore= -0.96

  • Spiro core!


Result 1kv2 olap 0 29 rel gscore 0 37

Result: 1kv2: olap= 0.29, relgscore= -0.37

  • Replaced O with N

  • Replaced N with C

  • Added an N

  • Huge shape difference!


Attachment based core hopping2

Attachment-Based Core Hopping

  • What it does

  • The architecture

    • Workflow engine independent of application code

      • (… and APU technology)

    • Multithreaded using Qthreads; C++

    • Application stages are essentially plug-ins

  • The benchmark


Architecture

Architecture

Legend

Non-thread-safethread

Thread-safethread

CUDAthread

I

O

Queue

Stage 1

Stage 2

Stage 3

Stage 4

Stage 5

Scheduler


Attachment based core hopping3

Attachment-Based Core Hopping

  • What it does

  • The architecture

  • The benchmark

    • A truism that goes without saying

    • Results slowly unveiled

    • The dilemma & its resolution

    • Did we “do the right thing”?


The truism

The Truism

  • There are lies…


The truism1

The Truism

  • There are lies…

  • … damn lies


The truism2

The Truism

  • There are lies…

  • … damn lies

  • … statistics


The truism3

The Truism

  • There are lies…

  • … damn lies

  • … statistics

  • … benchmarks


The truism4

The Truism

  • There are lies…

  • … damn lies

  • … statistics

  • … benchmarks

  • … salesmen’s claims


The truism5

The Truism

  • There are lies…

  • … damn lies

  • … statistics

  • … benchmarks

  • … salesmen’s claims

    … and the last two all too often interact


Results

Results

Test system:

  • i7/930, 2.7 GHz processor

    • 4 physical cores, run hyperthreaded

  • 12 Gb RAM

  • 8-lane PCIe motherboard

  • SSD drive


Results1

Results


Results2

Results


Results3

Results


Results4

Results

At constant CPU utilization:

  • With two GPGPUs:

    • Speedup = 1.07 / 0.3275 = 3.3

  • With one GPGPU:

    • Speedup = 0.76 / 0.20 = 3.8


Closing remarks

Closing Remarks

  • If we did our comparisons with different number of threads, speedups would be different

  • If we worked on a machine with more or fewer processors, speedups would be different

  • If we used an 4-lane PCIe motherboard, or a different CPU, or a slower hard drive, speedups would be different

  • If our software architecture were different, speedups would be different

  • Conclusion from above: The world is a complicated place

  • Do you agree that our approach is fair?


  • Login