Computational issues in nanotechnology and stochastic computing
Download
1 / 39

Computational issues in nanotechnology and stochastic computing - PowerPoint PPT Presentation


  • 372 Views
  • Uploaded on

Computational issues in nanotechnology and stochastic computing. Ashok Srinivasan Department of Computer Science Florida State University. Motivation. Research areas Parallel algorithms Scientific computing Discrete algorithms Applications. Motivation ... 2.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Computational issues in nanotechnology and stochastic computing' - Philip


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computational issues in nanotechnology and stochastic computing l.jpg

Computational issues in nanotechnology and stochastic computing

Ashok Srinivasan

Department of Computer Science

Florida State University


Motivation l.jpg
Motivation computing

  • Research areas

    • Parallel algorithms

    • Scientific computing

    • Discrete algorithms

    • Applications


Motivation 2 l.jpg
Motivation ... 2 computing

  • Applications needing large amounts of computational power

    • Nanotechnology

    • Pharmaceuticals

    • Finance

    • Defense


Motivation 3 l.jpg
Motivation ... 3 computing

  • New computational paradigms

    • Grid computing

    • Massive parallelism

  • Need new algorithmic paradigms

    • Develop algorithms and software tools for the computational environment 5 to 10 years from now


Motivation 4 l.jpg
Motivation ... 4 computing

  • Algorithms

    • Scalable

    • Latency tolerant

  • Enabling technologies

    • Fault tolerant

    • Usable software tools


Outline l.jpg
Outline computing

  • Applications

    • Nanotechnology

      • Background

      • Sequential computation

      • Parallelization

      • Research issues

  • Algorithms

    • Stochastic techniques

      • Scalable parallelization

      • Linear algebra

      • Applications


Applications l.jpg
Applications computing

  • Nanotechnology

    • Background

    • Sequential computation

    • Parallelization

    • Research issues


Background l.jpg
Background computing

  • Uses of Carbon nanotubes

    • Materials

    • NEMS

    • Transistors

    • Displays

    • Etc

  • www.ipt.arc.nasa.gov


Sequential computation l.jpg
Sequential computation computing

  • Molecular dynamics, using Brenner’s potential

    • Short-range interactions

    • Neighbors can change dynamically during the course of the simulation

    • Computational scheme

      • Find force on each particle due to interactions with “close” neighbors

      • Update position and velocity of each atom

Conventional particle methods, with pair-wise interactions


Force computations l.jpg
Force computations computing

  • Pair interactions

  • Bond angles

  • Dihedral

  • Multibody



Profile of execution time l.jpg
Profile of execution time computing

  • 1: Force

  • 2: Neighbor list

  • 3: Predictor/corrector

  • 4: Thermostating

  • 5: Miscellaneous



Neighbor search l.jpg
Neighbor search computing

  • Neighbor lists

    • Crude algorithm

      • Compare each pair, and determine if they are close enough

      • O(N2) for N atoms

    • Cell based algorithm

      • Divide space into cells

      • Place atoms in their respective cells

      • Compare atoms only in neighboring cells

      • Problem

        • Many empty cells

        • Inefficient use of memory


Computational geometry techniques l.jpg
Computational geometry techniques computing

  • Orthogonal search data structures

    • K-d tree

      • Tree construction time: O(N log N)

      • Worst case search overhead: O(N2/3)

      • Memory: O(N)

    • Range tree

      • Tree construction time: O(N log2N)

      • Worst case search overhead: O(log2N)

      • Memory: O(N log2N)


Desired properties of search techniques l.jpg
Desired properties of search techniques computing

  • Update should be efficient

    • But the number of atoms does not change

    • Position changes only slightly

    • The queries are known too

    • Use knowledge of the structure of the nanotube

    • Account for periodic boundary conditions

    • Parallelization


Parallelization l.jpg
Parallelization computing

  • Shared memory

    • Common memory

    • Multiple threads divide the computation amongst themselves

  • Distributed memory

    • Distinct memory for each process

    • Processes communicate to exchange data

  • Distributed shared memory

    • Memory physically distributed, but logically shared

    • Data locality important


Shared memory parallelization l.jpg
Shared memory parallelization computing

  • Do each of the following loops in parallel

    • For each atom

      • Update forces due to atom i

      • If neighboring atoms are owned by other threads, update an auxiliary array

    • For each thread

      • Collect force terms for atoms it owns

    • Srivastava, et al, SC-97 and CSE 2001

      • Simulated 105 to 107 atoms

      • Up to 32 processors

      • Speedup around 16

      • Include long-range forces too


Message passing parallelization l.jpg
Message passing parallelization computing

  • Decompose domain into cells

    • Each cell contains its atoms

  • Assign a set of adjacent cells to each processor

  • Each processor computes values for its cells

    • Communicates with neighbors when their data is needed

  • Caglar and Griebel, World scientific, 1999

    • Simulated 108 atoms on up to 512 processors

    • Linear speedup for 160,000 atoms on 64 processors


Load balancing l.jpg
Load balancing computing

  • Atom based decomposition

    • For each atom, compute forces due to each bond, angle, and dihedral

    • Load not balanced


Load balancing 2 l.jpg
Load balancing ... 2 computing

  • Bond based decomposition

    • For each bond, compute forces due to that bond, angles, and dihedrals

    • Finer grained

    • Load still not

      balanced!


Load balancing 3 l.jpg
Load balancing ... 3 computing

  • Load imbalance was not caused by granularity

    • Symmetry is used to reduce calculations through

      • If i > j, don’t compute for bond (i,j)

    • So threads get unequal load

  • Change condition to

    • If i+j is even, don’t compute bond (i,j) if i > j

    • If i+j is odd, don’t compute bond (i,j) if i < j

    • Does not work, due to regular structure of nanotube

  • Use a different condition to balance load


Load balancing 4 l.jpg
Load balancing ... 4 computing

  • Load is much better balanced now

    • ... at least for this simple configuration


Locality l.jpg
Locality computing

  • Locality important to reduce cache misses

  • Current scheme based on lexical ordering

  • Alternate: Decompose based on a breadth first search traversal of the atom-interaction graph


Locality 2 l.jpg
Locality ... 2 computing


Research issues l.jpg
Research issues computing

  • Neighbor search

    • More efficient data structures

    • Update should be efficient

      • But the number of atoms does not change

      • Position changes only slightly

    • The queries are known too

    • May be able to use knowledge of the structure of the nanotube

    • Account for periodic boundary conditions

    • Parallelization


Research issues 2 l.jpg
Research issues ... 2 computing

  • Load balancing and locality

    • Better graph based techniques

    • Geometric partitioning

    • Dynamic schemes

    • Use structure of the tube

      • Spectral partitioning

  • Multi-scale

    • Space

    • Time


Algorithms l.jpg
Algorithms computing

  • Stochastic techniques

    • Scalable parallelization

    • Linear algebra

    • Applications


Scalable parallelization l.jpg
Scalable parallelization computing

  • Conventional Monte Carlo parallelization

    • Perform identical computations on each processor, but with a different random number sequence

    • Finally, combine the results

    • Latency tolerant and fault tolerant


Linear algebra l.jpg
Linear algebra computing

  • Linear solvers

  • Matrix-vector multiplication

  • Smallest eigenvalue and eigenvector

  • Largest eigenvalue and eigenvector


Monte carlo power method l.jpg
Monte Carlo power method computing

  • Obtain the eigenvector for the largest eigenvalue as

    • Amh, as m approaches infinity for some h

    • Use a random walk of length m to estimate Amh

      • Initial probabilities given by Pa = |ha|/Si |hi|

      • Transition probability from state b to state a by pab = |aab|/Si |aai|

      • Define random variables Wi as W0 = hk0/Pk0, Wi =

      • Wi-1 akiki-1 / pkiki-1, where ki = i th state of random walk

      • Then E(Widaki) = (Aih)a, where d is the Kronecker delta function (dij = 1 if i = j, and 0 otherwise).


Mc inverse iterations l.jpg
MC inverse iterations computing

  • Obtain the eigenvector for the smallest eigenvalue as

    • (A-1)i h, as i approaches infinity for some h

    • Repeatedly solve: Axk+1 = Axk, x0 = h

    • MC linear solve: write A = I – C. Then

      • yk = Cyk-1 + h = S Ciy0, y0 = h

      • Estimate yk for large k, for example, using the matrix-vector product technique to estimate each Ciy0.


Applications33 l.jpg
Applications computing

  • Graph partitioning

  • Seriation


Graph partitioning l.jpg
Graph partitioning computing

  • Applications in

    • Parallel computing

    • VLSI

    • Databases

    • Clustering

    • Linear programming

    • Matrix reordering

Partition the vertices into components of equal size such the number of edges between vertices in different components is minimized

Heuristic: Compute the Fiedler vector of L. Partition vertices such that all vertices with Fiedler component smaller than the median are in one component, and the rest in another. Recursively apply this algorithm.


Seriation l.jpg
Seriation computing

  • Applications in

    • DNA sequencing

    • Matrix envelope reduction

    • Archaeological dating

Given a similarity function f, find a permutation psuch that p(i) < p(j) < p(k) implies f(i,j) > f(i,k)

Heuristic: Compute the Fiedler vector of L. Order vertices by the values of the corresponding components of the Fiedler vector.


Acceleration techniques for laplacian of a graph l.jpg
Acceleration techniques for Laplacian of a graph computing

  • Deflation: define H as:

    • hij = -1 if j = 1, hij = 1 if j = i > 1, and 0 otherwise

    • HLH-1 yields a deflated matrix B.

    • B is at least half as sparse as L, and can be computed in time proportional to the number of non-zero elements of B.

    • The Fiedler vector is easily computed from the eigenvector of the smallest eigenvalue of B.

  • Shift and use matrix-vector multiplication

    • If D = 2 Si di, compute largest eigenvalue of DI – B


Slide37 l.jpg

Edge cut and time using deflated matrix, relative to exact Fiedler vector, for inverse iterations. Solid line – test.graph, dash-dotted line – hammond.graph.


Slide38 l.jpg

Comparison of current stationary process (solid line), with Jacobi (dash-dotted) and Gauss-Seidel (dashed), for test.graph.


Research issues39 l.jpg
Research issues Jacobi (dash-dotted) and Gauss-Seidel (dashed), for test.graph.

  • We have developed non-Jacobi based techniques, with theoretically better properties

  • Other stationary and non-stationary methods

  • Use the structure of the application, for example the nanotube, to accelerate convergence


ad