1 / 15

GPU Computing with Python and Anaconda: The Next Frontier

Learn how Python is becoming the glue that binds data science, how rapid integration empowers data scientists to combine new technologies, and the two primary goals in store for Anaconda.

nvidia
Download Presentation

GPU Computing with Python and Anaconda: The Next Frontier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GPU Computing with Python and Anaconda: The Next Frontier Accelerate. Connect. Empower. Stan Seibert Director of Community Innovation © 2017 Anaconda, Inc. - Confidential & Proprietary

  2. GPUs & Python: A Great Combination • Python is becoming the glue that binds data science • Rapid integration empowers data scientists to combine new technologies • This is our goal for Anaconda: • Free distribution of Python and R for Win/Mac/Linux • Includes GPU-accelerated packages: Caffe, TensorFlow, PyTorch, Theano, Numba, Pyculib... 2 © 2017 Anaconda, Inc. - Confidential & Proprietary

  3. Deep Learning: An Early Success • Powerful machine learning technique • Many great open source options • Every major package has a Python interface • Very compute intensive ➡Perfect for GPU acceleration ReLU ReLU ReLU ReLU 3 © 2017 Anaconda, Inc. - Confidential & Proprietary

  4. Numba: JIT Python Compilation • Compile numerical Python functions for CPU or GPU • Based on the LLVM compiler library • Great for rapid, custom algorithm development 4 © 2017 Anaconda, Inc. - Confidential & Proprietary

  5. Problem: An Ecosystem of Silos? ETL/Data Prep Machine Learning Data Data Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary

  6. Problem: An Ecosystem of Silos? CPU transfer ETL/Data Prep Machine Learning Data Data CPU transfer CPU transfer Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary

  7. Problem: An Ecosystem of Silos? CPU transfer ETL/Data Prep Machine Learning Data Data Why do GPU applications share data through slow CPU memory? CPU transfer CPU transfer Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary

  8. GPU Open Analytics Initiative Goal: Standardize data exchange between GPU analytics applications Current Members: MapD, Anaconda, H2O.ai, BlazingDB, Graphistry, Gunrock http://gpuopenanalytics.com/ © 2017 Anaconda, Inc. - Confidential & Proprietary

  9. Streamlining the Data Science Pipeline Packed Array GDF Apache Arrow Python Data Transformation Generalized Linear Model GPU Database All data stays on the GPU 9 © 2017 Anaconda, Inc. - Confidential & Proprietary

  10. GPU Dataframe (GDF) • A format for tabular data in GPU memory • Exchange GDF between different libraries • Move between processes using CUDA IPC • Based on Apache Arrow • Code in separate library • Work in progress to move functionality into Arrow project 10 © 2017 Anaconda, Inc. - Confidential & Proprietary

  11. PyGDF: Python GPU Dataframes • A Python library of manipulating GPU Dataframes: • Create from NumPy arrays and Pandas Dataframes • Exchange between processes • Math operations • Sort, Filter, Join, Group By • Ideal for data manipulation and feature engineering stages between data source and machine learning • Not intended to replace dedicated database applications • Interoperates with our Python compiler for GPU: Numba 11 © 2017 Anaconda, Inc. - Confidential & Proprietary

  12. PyGDF: Group By Performance GPU speedup become very large above 10 million elements Aggregation functions are extremely efficient on the GPU 12 © 2017 Anaconda, Inc. - Confidential & Proprietary

  13. Dask: Distributed Computing • Scalable execution task graphs of task graphs from single computers to 1000+ node clusters • Scheduler is "resource aware" and can direct GPU tasks to nodes with appropriate hardware. Great for heterogeneous clusters! 13 © 2017 Anaconda, Inc. - Confidential & Proprietary

  14. The Future • In flight: • Merger of common code into Apache Arrow GPU support • Node.js interface to GDF (Graphistry) • Dask GDF: Distributed GPU dataframe • Other potential future projects: • Tensor exchange between Python GPU libraries • GPU shared memory service (Plasma for GPU) • Can we improve the interaction of unified memory and IPC? • What do you want to see? 14 © 2017 Anaconda, Inc. - Confidential & Proprietary

  15. Learn More GPU Open Analytics Website http://gpuopenanalytics.com GOAI Github Organization https://github.com/gpuopenanalytics/ GOAI Google Group https://groups.google.com/forum/#!forum/gpuopenanalytics © 2017 Anaconda, Inc. - Confidential & Proprietary

More Related