Deep Learning Workflows: Training and Inference

Oct 18thAI Connect Speakers Deep Learning in Enterprise WiBD Introduction & DL Use Cases Deep Learning Workflows (w/ a demo) Nazanin Zaker Data Scientist Renee Yao Kari Briski Product Marketing Manager, Deep Learning and Analytics Director of Deep Learning Software Product SAP Innovation Center Network NVIDIA NVIDIA Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data

AI CONNECT Renee Yao Product Marketing Manager, NVIDIA

Agenda AI Connect February Apache Hadoop Training @ Cloudera May Apache Drill and Apache Spark @ MapR • 6:00-7:00pm – Registration and Networking • 7:00-7:15pm – “WiBD Introduction & DL Use Cases”, Renee Yao, Product Marketing Manager, Deep Learning and Analytics, NVIDIA • 7:15-7:45pm – “Deep Learning Workflows (with a live demo)”, Kari Briski, Director of Deep Learning Software Product, NVIDIA • 7:45-8:15pm – “Deep Learning in Enterprise” by Nazanin Zaker, Data Scientist, SAP Innovation Center Network • 8:15-8:30pm - Wrap-up & Giveaways March @ Strata+Hadoop World SJ June @ Hadoop Summit June Career Empowerment @ Andreessen Horowitz June @ Spark Summit Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data

Join us Be Part of The Solution Become a member or a sponsor • Website: womeninbigdata.org • LinkedIn: “Women in Big Data Forum” • Meetup: meetup.com/Women-in-Big-Data-Meetup/ • Twitter: @DataWomen • Video: https://www.youtube.com/channel/UCOaMT7A9SVkeBdvYNxiITVA Event Hashtags: #IamAI, #WiBD 10/20/2017 Women in Big Data Forum

DEEP LEARNING WORKFLOWS: DEEP LEARNING TRAINING AND INFERENCE Kari Briski, 10-18-17

AI APPLICATIONS Recommendation Engines Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection NATURAL LANGUAGE PROCESSING SPEECH & AUDIO COMPUTER VISION 7

AI APPLICATIONS Recommendation Engines Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection NATURAL LANGUAGE PROCESSING COMPUTER VISION SPEECH & AUDIO Neural Machine Translation Object Detection ASR automatic speech recognition Classification Generation Question & Answer Segmentation Processing Sentiment Analysis Visual Q&A Audio-classification Search and recommendation engines Denoising 8

ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification COMPUTER VISION Voice Recognition Language Translation Object Detection SPEECH AND AUDIO AI Applications are Built on NVIDIA Hardware and Software End-to-End 9

NVIDIA TOOLS FOR DEEP LEARNING WORKFLOW TRAINING DEPLOY WITH TENSORRT DATA: GATHER AND LABEL Accelerated Deep Learning Training Software Stack Gather Data Curate data sets EMBEDDED Jetson TX DATA MANAGEMENT TRAINED NETWORK TRAINING DATA AUTOMOTIVE Drive PX (XAVIER) TRAINING Rapidly label data, guide training get insights DATA CENTER Tesla (Pascal, Volta) MODEL ASSESSMENT NVIDIA DEEP LEARNING SDK 10

DL FLOW MODEL ZOO REST API Source Dataset Curated Dataset PREPROCESS DEPLOY IMPORT INFERENCE & MICROSERVICES clean, clip, label, Normalize, .. TRAIN tune, compile + runtime Format… RESULT * SCORE + OPTIMIZE, VISUALIZATION inference, prediction VISUALIZATION 11

INFRASTRUCTURE FOR AI 12

GATHER DATA, CURATE LABEL 13

Crowd Source Tools Free Labeled Data VATIC Computer Vision Translation ViPER Speech & Audio Home-grown 14

STEP 1 STEP 2 STEP 3 STEP 4 STEP 5 Project Setup Curation Labeling QA Export Export Project named Which pieces of data make the Labels created Frames accepted or rejected Data sent to training Data set sent to training Classifier types defined most sense to us Attributes of labels selected Rejection reason specified Labeling task settings Frames committed for QA defined Sequences added Project Manager Curator Data Labeler Data Labeler Data Labeler 15

TRAINING 16

Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION NVIDIA DEEP LEARNING SOFTWARE TRAINING STACK In-the-Cloud At Your Desk On-Prem 17

ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, HumanLoop, MagLev,Keras Productivity: Workflow, Data and Job Management, Experiments UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION Deep Learning Software Libraries (AKA Frameworks) DEEP LEARNING FRAMEWORKS cuFFT cuBLAS Architecture Specific Libraries cuSPARSE cuDNN NCCL DEEP LEARNING MATH LIBRARIES COMMUNICATION In-the-Cloud At Your Desk On-Prem 18

ACCELERATED DEEP LEARNING TRAINING STACK Recommendation Engines NATURAL LANGUAGE PROCESSING Sentiment Analysis Image Classification Voice Recognition Language Translation Object Detection SPEECH AND AUDIO COMPUTER VISION DIGITS, NVIDIA GPU Cloud, NVDocker, Keras, Kubernetes UI / JOB MANAGEMENT / DATASET VERSIONING/ VISUALIZATION Paddle NV OPTIMIZED NV ACCELERATED cuFFT cuBLAS cuSPARSE cuDNN NCCL DEEP LEARNING MATH LIBRARIES COMMUNICATION In-the-Cloud At Your Desk On-Prem 19

GENERATIONAL GPU PERFORMANCE & TENSOR CORES 8 7 6 5 4 3 2 1 0 k80 p100 v100 v100 TC ResNet-50; 1,4,8 GPU training on DGX-1 Volta Single GPU Generational Training Scaling 20

GENERATIONAL GPU PERFORMANCE & TENSOR CORES 3-3.5X CNN training over Pascal 8 7 6 5 4 3 2 1 0 k80 p100 v100 v100 TC ResNet-50; 1,4,8 GPU training on DGX-1 Volta with Volta Tensor Core math Single GPU Generational Training Scaling 21

TIME TO SOLUTION (HOURS) Recursive Neural Networks Convolutional Neural Networks K80 1 weekend 8x K80 1 day P100 8x P100 1 afternoon 8x-V100 V100 0 10 20 30 40 50 0 10 20 30 40 Training OpenNMT to accuracy (13 epochs) Training ImageNet to accuracy(90 epochs) with ResNet-50 22

WHERE TO TRAIN In-the-Cloud At Your Desk On-Prem 23

INFERENCE DEPLOY YOUR TRAINED NETWORK TO INFER IN APPLICATIONS 24

NOW WHAT? 2500 2000 TRAINED NETWORK MODEL 1500 Images/sec 1000 500 0 CPU K80 TF P100 TF P100 TRT Throughput 25

OPTIMIZE 2500 2000 TRAINED NETWORK MODEL 1500 Images/sec 1000 500 0 CPU K80 TF P100 TF P100 TRT Throughput 26

NVIDIA TENSOR RT Maximize inference throughput for latency critical services High performance neural network inference optimizer and runtime engine for production deployment TensorRT Optimizer EMBEDDED Jetson TX TRAINED NETWORK MODEL AUTOMOTIVE Drive PX (XAVIER) TensorRT Runtime Engine OPTIMIZED NETWORK DATA CENTER Tesla (Pascal, Volta) 27

NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM NVIDIA TENSORRT PROGRAMMABLE INFERENCING PLATFORM TESLA P4 JETSON TX2 TensorRT DRIVE PX 2 NVIDIA DLA TESLA V100 28

NVIDIA TensorRT Programmable Inference Accelerator Automotive Data center Embedded Tesla Jetson Drive PX Maximize throughput and minimize latency Deploy reduced precision without retraining and without accuracy loss Train in any framework, deploy in TensorRT without overhead 29 developer.nvidia.com/tensorrt

VOLTA ON A BUDGET LATENCY BENCHMARKS Throughput on a 200 ms latency budget Throughput (image/s) vs Latency (ms) 6000 5000 19 4000 CPU-Only 3000 V100 + TensorFlow V100 + TensorRT 2000 7 6 1000 3X 6X 0 CPU-Only V100 + TensorFlow V100 + TensorRT ResNet-50 (ImageNet) OpenNMT (English to Deutsch) 30

ENABLE INT8 INFERENCE TensorRT is ENABLER for entropy quantization FP32 TOP 1 INT8 TOP 1 DIFFERENCE Alexnet 57.22% 56.96% 0.26% 100’s of samples of training data Googlenet 68.87% 68.49% 0.38% TensorRT int8 VGG 68.56% 68.45% 0.11% fp32 Training Framework Calibrate & Quantize Inference Resnet-50 73.11% 72.54% 0.57% Resnet- 101 Resnet- 152 74.58% 74.14% 0.44% Maintain accuracy without retraining 75.18% 74.56% 0.61% 31

NVIDIA TENSOR RT Maximize inference throughput for latency critical services Large Batch, Low Latency, Production-ready DATA CENTER Tesla (Pascal, Volta) Real-time execution, high resolution, high throughput, small footprint AUTOMOTIVE Drive PX (XAVIER) Low power small footprint, multi-inference EMBEDDED Jetson TX 32

“On average TensorRT has doubled the speed of our inference which is pretty amazing!” Source: Paul Kruszewski; CEO WRNCH “Self-driving car’s having real-time execution is obviously very important. With our ResNet101 network, TensorRT brought our inference time down from 250ms to 89ms.” “On average we see around 10x speedup, with between 3-70x speedups depending on the scenarios ” Source: Matthew Zieler CEO Clarifai 33

FAST IMPLEMENTATION OF TENSORFLOW 35 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

EXAMPLE WORKFLOWS 36

DL DATACENTER WORKFLOW TensorRT increases productivity and time to results MODEL ZOO REST API DEPLOY INFERENCE & MICROSERVICES TRAIN tune, compile + runtime A/B Testing, Use data Automated with TensorRT RESULT SCORE + OPTIMIZE, VISUALIZATION inference, prediction 37

DL EDGE/ IVA WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute NVIDIA DIGITS >10k pulls >2.5k stars 38

DEMO DEEP LEARNING WORKFLOW Transfer Learning: Train and deploy to edge in less than a minute A special THANK YOU! Zheng Liu & Varun Praveen 39

IN SUMMARY 40

WHO, WHAT, WHERE APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. RESEARCHERS Explore the “next big thing” opportunity to fuel business APPLIED DL/ DATA SCIENTISTS Retrain w/ data, productize models for consistency, focus on quality 41

WHO, WHAT, WHERE RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it APPLIED DL/ DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. Recommendation Engines Language Translation Image Classification Sentiment Analysis Voice Recognition Object Detection Paddle 42

WHO, WHAT, WHERE DATA SCIENTISTS Retrain, productize models for consistency, quality, tuning with right data RESEARCHERS Explore the “next big thing” opportunity to fuel business, and find ways to productize it APPLICATION DEVELOPER Scale and deploy successful applications w/ great user ex. Recommendation Engines Language Translation Image Classification Sentiment Analysis Voice Recognition Object Detection Paddle TensorRT Deploying Training or 43

Deep Learning Workflows: Training and Inference

Deep Learning Workflows: Training and Inference

Presentation Transcript

Deep Learning

Deep Learning!!!!

Deep learning

Deep Learning

Deep Learning

Deep Learning

Deep Learning and HPC

Deep Learning

Deep Learning

Learning Common GIS Workflows

Deep learning

Deep learning Online Training

Deep learning artificial intelligence training Bangalore

Deep Learning Training in Noida

Learning Common GIS Workflows

Deep Learning

Machine Learning and Deep Learning

Deep Learning Training Institute in Noida

Discriminate between deep learning and deep q learning

The Power of Deep Learning Training