1 / 15

Last Level Cache (LLC) Performance of Data Mining Workloads on a CMP A Case Study of Parallel Bioinformatics Workloads

Last Level Cache (LLC) Performance of Data Mining Workloads on a CMP A Case Study of Parallel Bioinformatics Workloads. Aamer Jaleel Intel, VSSAD University of MD ajaleel@eng.umd.edu Aamer.Jaleel@intel.com. Matthew Mattina Tilera Corporation mmattina@tilera.com. Bruce Jacob

elke
Download Presentation

Last Level Cache (LLC) Performance of Data Mining Workloads on a CMP A Case Study of Parallel Bioinformatics Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Last Level Cache (LLC) Performance of Data Mining Workloads on a CMPA Case Study of Parallel Bioinformatics Workloads Aamer Jaleel Intel, VSSAD University of MD ajaleel@eng.umd.edu Aamer.Jaleel@intel.com Matthew Mattina Tilera Corporation mmattina@tilera.com Bruce Jacob University of MD ECE Department blj@eng.umd.edu

  2. CPU ????? ??????? ?????? Cache ???? DATABASE MEDICINE FINANCE WORLDS DATA INCREASING SPATIAL STOCK Recognition, Mining, and Synthesis (RMS) Workloads Paper Motivation • Growth of CMPs and Design Issues • Growth of Data and Emergence of New Workloads:

  3. Paper Contributions • First to characterize memory behavior of parallel data-mining workloads on a CMP • Bioinformatics workloads • Sharing Analysis: • Varying amount of data shared between threads • Shared data frequently accessed • Degree of sharing is f(cache size) • Cache Performance Studies: • Private vs shared cache studies • Greater sharing  better shared cache performance

  4. Bioinformatics • Using software to understand, and analyze biological data • Why bioinformatics? • Sophisticated algorithms and huge data sets • Use mathematical and statistical methods to solve biological problems • Sequence analysis • Protein structure prediction • Gene classification • And many, many, more… Src: http://www.imb-jena.de/~rake/Bioinformatics_WEB

  5. Parallel Bioinformatics Workloads • Structure Learning: • GeneNet – Hill Climbing, Bayesian network learning • SNP – Hill Climbing, Bayesian network learning • SEMPHY – Structural Expectation Maximization algorithm • Optimization: • PLSA – Dynamic Programming • Recognition: • SVM-RFE – Feature Selection • OpenMP workloads developed by Intel Corporation • Donated to Northwestern University, NU-MineBench Suite • http://cucis.ece.northwestern.edu/projects/DMS/MineBench.html • Also made available at: http://www.ece.umd.edu/biobench/

  6. Experimental Methodology - Pin • Pin – x86 Dynamic Binary Instrumentation Tool • Developed at VSSAD, Intel • ATOM-like tool for Intel Xscale, IA-32, IPF Linux binaries • Provides infrastructure for writing program analysis tools – pin tools • Supports instrumentation of multi-threaded workloads • Hosted at: http://rogue.colorado.edu/Pin

  7. The simCMPcache Pin tool • Instruments all memory references of an application • Gathers numerous cache performance statistics • Captures time varying behavior of applications

  8. Experimental Methodology

  9. 1 Core 3 Core 2 Core 4 Core Measuring Data Sharing • Shared Cache Line: • More than one core accesses the same cache line during its lifetime in the cache • Shared Access: • Access to a shared cache line • Active-Shared Access: • Access to a shared cache line and the last core current core • Ex: Accesses by core ids in redare active-shared accesses Core IDs: …1, 2, 2,2, 1, 3, 4, 3, 2, 2, 2… C3 C2 Shared Cache C0 C1

  10. 100 100 80 60 80 40 20 60 0 PLSA GeneNet SEMPHY SNP SVM 40 Access Frequency 20 0 16MB 32MB 64MB 4MB 8MB Data Sharing Behavior 4 Thread How Much Shared? (4 Threaded Run) 3 Thread 2 Thread (4 Threaded Run) 1 Thread • Sharing is dependent on algorithm and varies with cache size • Workloads fully utilize a 64MB LLC • Reducing cache misses improves data sharing • Despite size of shared footprint, shared data frequently referenced Cache Miss

  11. 1 Thread 2 Thread 3 Thread 4 Thread Sharing Phase Dependent & f (cache size) 4 MB LLC 16 MB LLC 64 MB LLC How Much Shared? (a) SEMPHY SEQUENTIAL SEQUENTIAL SEQUENTIAL How Much Shared? (b) SVM 4 Threaded Run:

  12. Shared/Private Cache – SEMPHY Private Cache (16MB TOTAL LLC, 4MB/CORE) Miss Rate Shared Cache (16MB TOTAL LLC) Miss Rate Total Instructions (billions) • SEMPHY with 4-threads • Shared cache out-performs private caches

  13. Cache Miss 1 Thread 2 Thread 3 Thread 4 Thread Shared Refs & Shared Caches… (4 Threaded Run) • Phase A: Shared caches perform better than private caches (25%) • Phase B: Shared caches marginally better than private caches (5%) • Shared caches BETTER when shared data frequently referenced • Most workloads frequently reference shared data A B % Total Accesses GeneNet – 16MB LLC Private LLC Miss Rate Shared LLC Miss Rate

  14. Summary • This Paper: • Memory behavior of parallel bioinformatics workloads • Key Points: • Workloads exhibit a large amount of data sharing • Data sharing is a function of the total cache available • Eliminating cache misses improves data sharing • Shared data frequently referenced • Shared caches outperform private caches especially when shared data is frequently used

  15. Ongoing Work on Bio-Workloads University of Maryland BioBench: A Benchmark Suite for Bioinformatics Applications BioParallel: Parallel Bioinformatics Applications Suite (In Progress) Brought to you by Maryland Memory-Systems Research "BioBench: A benchmark suite of bioinformatics applications." K. Albayraktaroglu, A. Jaleel, X. Wu, M. Franklin, B. Jacob, C.-W. Tseng, and D. Yeung. Proc. 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2005), pp. 2-9. Austin TX, March 2005. http://www.ece.umd.edu/biobench/

More Related