1 / 44

Deep Learning Workflows: Training and Inference

Discover the different AI applications and the different tools for the deep learning workflows to achieve them.

nvidia
Download Presentation

Deep Learning Workflows: Training and Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oct 18thAI Connect Speakers Deep Learning in Enterprise WiBD Introduction & DL Use Cases Deep Learning Workflows (w/ a demo) Nazanin Zaker Data Scientist Renee Yao Kari Briski Product Marketing Manager, Deep Learning and Analytics Director of Deep Learning Software Product SAP Innovation Center Network NVIDIA NVIDIA Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data

  2. AI CONNECT Renee Yao Product Marketing Manager, NVIDIA

  3. Agenda AI Connect February Apache Hadoop Training @ Cloudera May Apache Drill and Apache Spark @ MapR • 6:00-7:00pm – Registration and Networking • 7:00-7:15pm – “WiBD Introduction & DL Use Cases”, Renee Yao, Product Marketing Manager, Deep Learning and Analytics, NVIDIA • 7:15-7:45pm – “Deep Learning Workflows (with a live demo)”, Kari Briski, Director of Deep Learning Software Product, NVIDIA • 7:45-8:15pm – “Deep Learning in Enterprise” by Nazanin Zaker, Data Scientist, SAP Innovation Center Network • 8:15-8:30pm - Wrap-up & Giveaways March @ Strata+Hadoop World SJ June @ Hadoop Summit June Career Empowerment @ Andreessen Horowitz June @ Spark Summit Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data

  4. Join us Be Part of The Solution Become a member or a sponsor • Website: womeninbigdata.org • LinkedIn: “Women in Big Data Forum” • Meetup: meetup.com/Women-in-Big-Data-Meetup/ • Twitter: @DataWomen • Video: https://www.youtube.com/channel/UCOaMT7A9SVkeBdvYNxiITVA Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data Forum

  5. DEEP LEARNING WORKFLOWS: DEEP LEARNING TRAINING AND INFERENCE Kari Briski, 10-18-17

  6. AI APPLICATIONS Recommendation Engines Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection NATURAL LANGUAGE PROCESSING SPEECH & AUDIO COMPUTER VISION 7

  7. AI APPLICATIONS Recommendation Engines Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection NATURAL LANGUAGE PROCESSING COMPUTER VISION SPEECH & AUDIO Neural Machine Translation Object Detection ASR automatic speech recognition Classification Generation Question & Answer Segmentation Processing Sentiment Analysis Visual Q&A Audio-classification Search and recommendation engines Denoising 8

  8. ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification COMPUTER VISION Voice Recognition Language Translation Object Detection SPEECH AND AUDIO AI Applications are Built on NVIDIA Hardware and Software End-to-End 9

  9. NVIDIA TOOLS FOR DEEP LEARNING WORKFLOW TRAINING DEPLOY WITH TENSORRT DATA: GATHER AND LABEL Accelerated Deep Learning Training Software Stack Gather Data Curate data sets EMBEDDED Jetson TX DATA MANAGEMENT TRAINED NETWORK TRAINING DATA AUTOMOTIVE Drive PX (XAVIER) TRAINING Rapidly label data, guide training get insights DATA CENTER Tesla (Pascal, Volta) MODEL ASSESSMENT NVIDIA DEEP LEARNING SDK 10

  10. DL FLOW MODEL ZOO REST API Source Dataset Curated Dataset PREPROCESS DEPLOY IMPORT INFERENCE & MICROSERVICES clean, clip, label, Normalize, .. TRAIN tune, compile + runtime Format… RESULT * SCORE + OPTIMIZE, VISUALIZATION inference, prediction VISUALIZATION 11

  11. INFRASTRUCTURE FOR AI 12

  12. GATHER DATA, CURATE LABEL 13

  13. Crowd Source Tools Free Labeled Data VATIC Computer Vision Translation ViPER Speech & Audio Home-grown 14

  14. STEP 1 STEP 2 STEP 3 STEP 4 STEP 5 Project Setup Curation Labeling QA Export Export Project named Which pieces of data make the Labels created Frames accepted or rejected Data sent to training Data set sent to training Classifier types defined most sense to us Attributes of labels selected Rejection reason specified Labeling task settings Frames committed for QA defined Sequences added Project Manager Curator Data Labeler Data Labeler Data Labeler 15

  15. TRAINING 16

  16. Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION NVIDIA DEEP LEARNING SOFTWARE TRAINING STACK In-the-Cloud At Your Desk On-Prem 17

  17. ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras Productivity: Workflow, Data and Job Management, Experiments UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION Deep Learning Software Libraries (AKA Frameworks) DEEP LEARNING FRAMEWORKS cuFFT cuBLAS Architecture Specific Libraries cuSPARSE cuDNN NCCL DEEP LEARNING MATH LIBRARIES COMMUNICATION In-the-Cloud At Your Desk On-Prem 18

  18. ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, NVDocker, Keras, Kubernetes UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION Paddle NV OPTIMIZED NV ACCELERATED cuFFT cuBLAS cuSPARSE cuDNN NCCL DEEP LEARNING MATH LIBRARIES COMMUNICATION In-the-Cloud At Your Desk On-Prem 19

  19. GENERATIONAL GPU PERFORMANCE & TENSOR CORES 8 7 6 5 4 3 2 1 0 k80 p100 v100 v100 TC ResNet-50; 1,4,8 GPU training on DGX-1 Volta Single GPU Generational Training Scaling 20

  20. GENERATIONAL GPU PERFORMANCE & TENSOR CORES 3-3.5X CNN training over Pascal 8 7 6 5 4 3 2 1 0 k80 p100 v100 v100 TC ResNet-50; 1,4,8 GPU training on DGX-1 Volta with Volta Tensor Core math Single GPU Generational Training Scaling 21

  21. TIME TO SOLUTION (HOURS) Recursive Neural Networks Convolutional Neural Networks K80 1 weekend 8x K80 1 day P100 8x P100 1 afternoon 8x-V100 V100 0 10 20 30 40 50 0 10 20 30 40 Training OpenNMT to accuracy (13 epochs) Training ImageNet to accuracy(90 epochs) with ResNet-50 22

  22. WHERE TO TRAIN In-the-Cloud At Your Desk On-Prem 23

  23. INFERENCE DEPLOY YOUR TRAINED NETWORK TO INFER IN APPLICATIONS 24

  24. NOW WHAT? 2500 2000 TRAINED NETWORK MODEL 1500 Images/sec 1000 500 0 CPU K80 TF P100 TF P100 TRT Throughput 25

  25. OPTIMIZE 2500 2000 TRAINED NETWORK MODEL 1500 Images/sec 1000 500 0 CPU K80 TF P100 TF P100 TRT Throughput 26

  26. NVIDIA TENSOR RT Maximize inference throughput for latency critical services High performance neural network inference optimizer and runtime engine for production deployment TensorRT Optimizer EMBEDDED Jetson TX TRAINED NETWORK MODEL AUTOMOTIVE Drive PX (XAVIER) TensorRT Runtime Engine OPTIMIZED NETWORK DATA CENTER Tesla (Pascal, Volta) 27

  27. NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM TESLA P4 JETSON TX2 TensorRT DRIVE PX 2 NVIDIA DLA TESLA V100 28

  28. NVIDIA TensorRT Programmable Inference Accelerator Automotive Data center Embedded Tesla Jetson Drive PX Maximize throughput and minimize latency Deploy reduced precision without retraining and without accuracy loss Train in any framework, deploy in TensorRT without overhead 29 developer.nvidia.com/tensorrt

  29. VOLTA ON A BUDGET LATENCY BENCHMARKS Throughput on a 200 ms latency budget Throughput (image/s) vs Latency (ms) 6000 5000 19 4000 CPU-Only 3000 V100 + TensorFlow V100 + TensorRT 2000 7 6 1000 3X 6X 0 CPU-Only V100 + TensorFlow V100 + TensorRT ResNet-50 (ImageNet) OpenNMT (English to Deutsch) 30

  30. ENABLE INT8 INFERENCE TensorRT is ENABLER for entropy quantization FP32 TOP 1 INT8 TOP 1 DIFFERENCE Alexnet 57.22% 56.96% 0.26% 100’s of samples of training data Googlenet 68.87% 68.49% 0.38% TensorRT int8 VGG 68.56% 68.45% 0.11% fp32 Training Framework Calibrate & Quantize Inference Resnet-50 73.11% 72.54% 0.57% Resnet- 101 Resnet- 152 74.58% 74.14% 0.44% Maintain accuracy without retraining 75.18% 74.56% 0.61% 31

  31. NVIDIA TENSOR RT Maximize inference throughput for latency critical services Large Batch, Low Latency, Production-ready DATA CENTER Tesla (Pascal, Volta) Real-time execution, high resolution, high throughput, small footprint AUTOMOTIVE Drive PX (XAVIER) Low power small footprint, multi-inference EMBEDDED Jetson TX 32

  32. “On average TensorRT has doubled the speed of our inference which is pretty amazing!” Source: Paul Kruszewski; CEO WRNCH “Self-driving car’s having real-time execution is obviously very important. With our ResNet101 network, TensorRT brought our inference time down from 250ms to 89ms.” “On average we see around 10x speedup, with between 3-70x speedups depending on the scenarios ” Source: Matthew Zieler CEO Clarifai 33

  33. 34

  34. FAST IMPLEMENTATION OF TENSORFLOW 35 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  35. EXAMPLE WORKFLOWS 36

  36. DL DATACENTER WORKFLOW TensorRT increases productivity and time to results MODEL ZOO REST API DEPLOY INFERENCE & MICROSERVICES TRAIN tune, compile + runtime A/B Testing, Use data Automated with TensorRT RESULT SCORE + OPTIMIZE, VISUALIZATION inference, prediction 37

  37. DL EDGE/ IVA WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute NVIDIA DIGITS >10k pulls >2.5k stars 38

  38. DEMO DEEP LEARNING WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute A special THANK YOU! Zheng Liu & Varun Praveen 39

  39. IN SUMMARY 40

  40. WHO, WHAT, WHERE APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. RESEARCHERS Explore the “next big thing” opportunity to fuel business APPLIED DL/ DATA SCIENTISTS Retrain w/ data, productize models for consistency, focus on quality 41

  41. WHO, WHAT, WHERE RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it APPLIED DL/ DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. Recommendation Engines Language Translation Image Classification Sentiment Analysis Voice Recognition Object Detection Paddle 42

  42. WHO, WHAT, WHERE DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. Recommendation Engines Language Translation Image Classification Sentiment Analysis Voice Recognition Object Detection Paddle TensorRT Deploying Training or 43

  43. 44

More Related