1 / 15

IFESTOS: A KB System for POEMS

IFESTOS Architecture and Goals for POEMS Knowledge Base (KB). Benchmark features, machine architectures, numerical solutions, application performance metrics, system performance profiles, and status of the IFESTOS project.

Download Presentation

IFESTOS: A KB System for POEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IFESTOS: A KB System for POEMS Elias Houstis, John Rice, Ann Catlin, Naren Ramakhrisnan, and V. Verykios Purdue University Department of Computer Sciences August 98

  2. IFESTOS Architecture diagram

  3. IFESTOS Goals for POEMS KB • Predict the performance of a conceptual design by comparing it with the performance of existing designs/implementations and assuming some user defined computational goals and design features • Rank the various designs/implementations based on their performance data from well designed benchmarks with specific features and with respect to some range values of some performance indicators on • Estimate operational parameters of a new design based on the performance data of “similar” designs

  4. PDE Application Benchmark (population, solvers, and parameters) Features for POEMS KB Generation • Problem Population • A general elliptic PDE with a non rectangular domain leading to a non-symmetric large FD algebraic system • Two self-adjoint PDE for which FEM is applicable including a 3-D PDE problem • An elliptic PDE leading to a symmetric FD large system • SWEEP3D • Solvers • Finite Difference and Finite Element discretizers • At least 5 different domain decomposition algorithms that give significantly different partitionings of grid/mesh data • Four grid/mesh sizes (small, moderate, large, very large) • IIPACL: Jacobi type, SOR type, and CG type • AZTEC routines

  5. Machine Architectures • Purdue’s SP2 (16 processors) • Use 2, 4, 5 , 6, 8,12, 13, 16 processor configuration • National SP2 (Large Configuration) • LAN workstations • Simulator • Analytical Models

  6. Numerical Solution Data Collected for Each PDE Application Run • Boundary points found in the domain • Boundary pieces found in the domain • Grid size • Solution Error in Max, L1, L2 norms • Total elapsed time for the post-processing module

  7. Application Performance Metrics Generated by PELLPACK System (per-processor and per-run) • Domain processor module time • Discretization module time • Indexing time • Linear algebra solution module time • Communication time • Total elapsed time

  8. cpu_user_utilization : cpu percentage allocated for the user (mean, std) cpu_kernel_utilization : cpu percentage allocated for the kernel (mean, std) cpu_wait : cpu percentage spent waiting (mean, std) cpu_idle : cpu percentage spent idling (mean, std) cswitch : the number of context or task switches syscalls : the number of calls made into kernel services pagefaults : the number of page faults total_xfers : the number of DMA transfers to all disks blocks_read : the number of blocks of data read from all disks blocks_written : the number of blocks of data written to all disks ip_packets_rcvd : the number of IP protocol packets received ip_packets_sent : the number of IP protocol packets sent sending_time : time spent sending data receiving_time : time spent receiving data broadcasting_time : time spent broadcasting data barrier_time : time spent in a barrier primitives allreducing_time : time spent in all reduce MPI primitives SP2 System Performance Metrics

  9. CPU and Communication based performance profiles of Application/Architecture pairs • Tcomp(p): global computation time vs. the no. of processors • Tcomm(p): global communication time vs. number ofprocessors • T(p): global execution time vs. no. of processors • S(p): speed up vs. no. of processors (S(p) = T(1)/T(p)) • E(p): efficiency vs. no. of processors (= S(1)/p) • η(p): efficacy vs. no. of processors (= S(p)^2/p) • ηbusyproc : no. of busy processors vs. execution time • ηcommproc : no. of communicating processors vs. execution time • ηcompproc : no. of computing processors vs. execution time

  10. Memory and I/O based Performance Profiles • Pagefaults vs. no. of processors • Total_xfers vs. no. of processors • Blocks_read vs. no. of procesrros • Blocks_written vs. no. of processors

  11. Communication Overhead Profiles • avg total no. of ip_packets_rcvd vs. no. of processors • avg total no. of ip_packets_sent vs. no. of processors • avg sending_time vs. no. of processors • avg receiving_time vs. no. of processors • avg broadcasting_time vs. no. of processors • avg barrier_time vs. no. of processors • avg allreducing_time vs. no. of processors

  12. Status of the IFESTOS project • IFESTOS Kernel (60%) • KB1: Performance of SP2 architecture on PDE applications (20%) • KB2: Performance of parallel linear solvers on large PDE disretization systems (10%) on SP2

  13. IFESTOS Goals for Linear Algebra Solvers KB • Predict the performance of linear solvers on new problems with features similar to those in the linear algebra benchmark population (i.e., someone gives the size and characteristics of the system and then wants to find out the best solver to use) • Rank the various linear solvers over specific benchmarks with some (or all) of the features present (i.e., symmetric systems, FD system, FEM systems, non-symmetric, etc.) The ranking is made for all machine configurations. • Estimate the iteration parameters of linear solvers for given system based on the performance data of “similar” systems

  14. Linear Algebra Benchmark Features (population, solvers, and parameters) for Linear Solvers • Problem Population • 10 large, non-symmetric systems of 2D FD origin • 10 large, symmetric systems of 2D and 3D FEM origin • 10 large, symmetric systems of 2D and 3D FD large origin • Solvers • 5 domain decomposition algorithms with significantly different partitionings • All applicable ITPACK routines (some apply only to special type systems) • All applicable AZTEC routines • How about LAPACK? • Machines • SP2 with 2, 4, 8, 16 processors • SGU with 2, 4, 8, 16, 32 processors • NOW with 2, 4, 8, 16, 32 processors

  15. IFESTOS Linear Algebra User Specific Functionality • Select the best algorithm for user’s linear system. • The KB expects a list of features, an estimate of system size, and desired bounds for memory/execution time • The KB returns the name of algorithm, parameter values, and estimates for the needed resources included the machine configuration and an exemplar to explain its decision • Verify some assumptions or answer domain specific questions • Are iterative solvers better than direct solvers for large systems? • Is CG an efficient method for non-symmetric systems? • What is the best method for FD symmetric systems? • What is the best method for FEM systems?

More Related