data driven query processing for immersive computational turbulence n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Data-driven Query Processing for Immersive Computational Turbulence PowerPoint Presentation
Download Presentation
Data-driven Query Processing for Immersive Computational Turbulence

Loading in 2 Seconds...

play fullscreen
1 / 17

Data-driven Query Processing for Immersive Computational Turbulence - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

Data-driven Query Processing for Immersive Computational Turbulence. Kalin Kanov Department of Computer Science Johns Hopkins University. The Big Picture. Scientific disciplines have developed a computational branch Models without closed form solutions solved numerically

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data-driven Query Processing for Immersive Computational Turbulence' - selene


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
data driven query processing for immersive computational turbulence

Data-driven Query Processing for Immersive Computational Turbulence

Kalin Kanov

Department of Computer Science

  • Johns Hopkins University
the big picture
The Big Picture
  • Scientific disciplines have developed a computational branch
    • Models without closed form solutions solved numerically
    • This has lead to an explosion of data
  • Simulation and analysis workloads are data-intensive
    • Producing\scanning large amounts of data
  • Management of these data represents a significant challenge
    • Storage\archiving
    • Query processing
    • Visualization
remote immersive analysis
Remote Immersive Analysis
  • Formerly, analysis performed during the computation
    • No data stored for subsequent examination
  • Data-intensive computing breakthroughs have allowed for new interaction with scientific numerical simulations
  • Turbulence Database Cluster
    • Stores entire space-time evolution of the simulation
    • Provides public access to world-class simulations
    • Implements “immersive turbulence*” approach
  • Introduces new challenges

*E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Supercomputing, 2007.

goals
Goals
  • Develop data-driven query processing techniques
    • Reduce I/O and computation costs
    • Reduce or eliminate storage overhead
    • Exploit domain knowledge and structure
  • Provide user interfaces that are efficient and flexible
  • Streamline the process of data ingest
processing a batch query
Processing a Batch Query

query 2

10

11

14

15

  • Redundant I/O
  • Multiple disk seeks

8

9

12

13

2

3

6

7

0

1

4

5

query 1

query 3

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0

1

2

3

4

6

8

9

12

q1:

9

11

12

14

q2:

q3:

4

5

6

7

i o streaming evaluation method
I/O Streaming Evaluation Method
  • Linear data requirements of the computation allow for:
    • Incremental evaluation
    • Streaming over the data
    • Concurrent evaluation of batch queries
processing a batch query1
Processing a Batch Query

query 2

10

11

14

15

  • Sequential I/O
  • Single pass

8

9

12

13

2

3

6

7

0

1

4

5

query 1

query 3

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

0

1

2

3

4

5

6

7

8

9

11

12

14

I/O Streaming:

q1

q1

q1

q1

q1

q3

q1

q3

q1

q1

q2

q1

q2

q2

q3

q3

q2

lagrange polynomial interpolation
Lagrange Polynomial Interpolation

Lagrange coefficients

Data

128 workload
128 Workload
  • I/O Streaming
    • Each atom is read only once
    • Effective cache usage
  • Join/Order By executes entire batch as a join
  • Sorting leads to a more sequential acces
  • Over an order of magnitude improvement
slide13

I/O Streaming alleviates I/O bottleneck

  • Computation emerges as the more costly operation
particle tracking
Particle Tracking

Web Server/Mediator

Distribute Points based on

xp(tm)

xp(tm)

DB Node 1

DB Node N

x*p(tm)

x*p(tm)

Computational Module

Computational Module

Storage Layer

Retrieve

Storage Layer

Retrieve

particle tracking1
Particle Tracking

Web Server/Mediator

x*p(tm)

x*p(tm)

Distribute Points based on

DB Node 1

DB Node N

xp(tm+1)

xp(tm+1)

Computational Module

Computational Module

Storage Layer

Retrieve

Storage Layer

Retrieve

summary and future work
Summary and Future Work
  • Extend I/O streaming technique to different decomposable kernel computations:
    • Differentiation
    • Spatial Interpolation
    • Temporal interpolation
    • Filtering and coarse-graining
  • Provide a flexible user interface
    • Allow for different filter functions
    • Allow for new kernel computations
  • Improve particle tracking routine
    • Reduce communication between mediator and DB nodes
    • Asynchronous processing
    • Caching and pre-fetching
questions
Questions

Images courtesy of Kai Buerger (buerger@tum.de)