1 / 27

SINGA: Putting Deep Learning into the Hands of Multimedia Users

SINGA: Putting Deep Learning into the Hands of Multimedia Users. http://singa.apache.org/ Wei Wang , Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi , Kian-Lee Tan, and Sheng Wang. Introduction Multimedia data and application Motivations

lonniec
Download Presentation

SINGA: Putting Deep Learning into the Hands of Multimedia Users

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SINGA: Putting Deep Learning into the Hands of Multimedia Users http://singa.apache.org/ Wei Wang, Gang Chen, Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi, Kian-Lee Tan, and Sheng Wang

  2. Introduction • Multimedia data and application • Motivations • Deep learning models and training, and design principles • SINGA • Usability • Scalability • Implementation • Experiment

  3. Introduction Social Media VocallIQ (acquired by Apple) Audio Madbits (acquired by Twitter) MultimediaData Perceptio (acquired by Apple) Image/video LookFlow (acquired by Yahoo! Flickr) Deepomatic (e-commerce product search) E-commerce Descartes Labs (satellite images) Text Clarifai (tagging) Health-care Ldibon ParallelDots Deep Learning has been noted for its effectiveness for multimedia applications! AlchemyAPI  (acquired by IBM) Semantria (NLP tasks >10 languages)

  4. Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification CNN Krizhevsky, Sutskever, and Hinton, 2012; Szegedy et al., 2014; Simonyan and Zisserman, 2014a

  5. Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification DBN, RBM, DBM Speech recognition Energy models DBN RBM Dahl et al., 2012

  6. Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification Recurrent Neural Networks Energy models DBN, RBM, DBM Speech recognition RNN, LSTM, GRU Natural language processing Mikolov et al., 2010; Cho et al., 2014

  7. Motivations Model Categories Feedforward Models CNN, MLP, Auto-encoder Image/video classification Design Goal I Usability: easy to implement various models Recurrent Neural Networks Energy models DBN, RBM, DBM Speech recognition RNN, LSTM, GRU Natural language processing

  8. Motivations: Training Process • Training process • Update model parameters to minimize prediction error • Training algorithm • Mini-batch Stochastic Gradient Descent (SGD) • Training time • (time per SGD iteration) x (number of SGD iterations) • Long time to train large models over large datasets, e.g., 2 weeks for training Overfeat (Pierre, et al.) reported by Intel (https://software.intel.com/sites/default/files/managed/74/15/SPCS008.pdf). Back-propagation (BP) Contrastive Divergence (CD)

  9. Motivations: Distributed Training Frameworks • Synchronous training (Google Sandblaster, Dean et al., 2012; Baidu AllReduce, Wu et al., 2015) • Reduce time per iteration • Scalable for single-node with multiple GPUs • Cannot scale to large cluster • Asynchronous training (Google Downpour, Dean et al., 2012, Hogwild!, Recht et al., 2011) • Reduce number of iterations per machine • Scalable for big cluster with commodity machine(CPU) • Not stable • Hybrid frameworks Design Goal II Scalability: not just flexible, but also efficient and adaptive to run different training frameworks

  10. SINGA: A Distributed Deep Learning Platform

  11. Usability: Abstraction NeuralNet stop Layer classLayer{ vector<Blob> data, grad; vector<Param*> param; ... void Setup(LayerProto& conf, vector<Layer*> src); void ComputeFeature(int flag, vector<Layer*> src); void ComputeGradient(int flag, vector<Layer*> src); }; Driver::RegisterLayer<FooLayer>("Foo"); // register new layers TrainOneBatch

  12. Usability: Neural Net Representation NeuralNet stop Loss Layer labels Hidden TrainOneBatch Input Feedforward models (e.g., CNN) RBM RNN

  13. Usability: TrainOneBatch NeuralNet stop Loss Layer labels Hidden TrainOneBatch Input Feedforward models (e.g., CNN) Back-propagation (BP) Contrastive Divergence (CD) Just need to override the TrainOneBatch function to implement other algorithms! RBM RNN

  14. Scalability: Partitioning for Distributed Training 1 • NeuralNet Partitioning: • 1. Partition layers into different subsets • 2. Partition each singe layer on batch dimension. • 3. Partition each singe layer on feature dimension. • 4. Hybrid partitioning strategy of 1, 2 and 3. Worker 1 Worker 2 2 3 Users just need to CONFIGURE the partitioning scheme and SINGA takes care of the real work (eg. slice and connect layers) Worker 2 Worker 1 Worker 2 Worker 1 Worker 1

  15. Scalability:Training Framework Legends: Cluster Topology Neural Net Worker Server Node Worker Worker Worker Group Parameters Inter-node Communication Server Server Server Synchronous training cannot scale to large group size Server Group

  16. Scalability:Training Framework Legends: Cluster Topology Worker Server Node Group Inter-node Communication Communication is the bottleneck!

  17. Scalability:Training Frameworks Legends: Cluster Topology Worker Server Node Group Inter-node Communication sync async SINGA is able to configure most known frameworks. (c) Downpour (d) Distributed Hogwild (a) Sandblaster (b) AllReduce

  18. Implementation While(not stop): Worker::TrainOneBatch() While(not stop): Server::Update() Worker thread SINGA Software Stack Server thread Driver::Train() Stub::Run() Main Thread Remote Nodes CNN RBM RNN Driver Legend: Server Stub Worker Mesos Zookeeper HDFS DiskFile Docker SINGA Component Ubuntu CentOS MacOS Optional Component

  19. Deep learning as a Service (DLaaS) Third party APPs (Web app, Mobile,..) ---------------------- API Developers (Browser) ---------------------- GUI http request http request Rafiki Server User, Job, Model, Node Management Data Base File Storage System (e.g. HDFS) Routing(Load balancing) http request http request Rafiki Agent Rafiki Agent Timon (c++ wrapper) Timon (c++ wrapper) Timon (c++ wrapper) Timon (c++ wrapper) … … … 1. To improve the Usability of SINGA; 2. To “level” the playing field by taking care of complex system plumbing work, its reliability, efficiency and scalability. SINGA SINGA SINGA SINGA SINGA’s RAFIKI

  20. Comparison:Features of the Systems MXNet on 28/09/15 Comparison with other open source projects

  21. Experiment --- Usability • Used SINGA to train three known models and verify the results Hinton, G. E. and Salakhutdinov, R. R. (2006)Reducing the dimensionality of data with neural networks.Science, Vol. 313. no. 5786, pp. 504 - 507, 28 July 2006. … RBM Deep Auto-Encoders

  22. Experiment --- Usability W. Wang, X. Yang, B. C. Ooi, D. Zhang, Y. Zhuang: Effective Deep Learning Based Multi-Modal Retrieval. VLDB Journal - Special issue of VLDB'14 best papers, 2015. W. Wang, B.C. Ooi, X. Yang, D. Zhang, Y. Zhuang: Effective MultiModal Retrieval based on Stacked AutoEncoders. Int'l Conference on Very Large Data Bases (VLDB), 2014. Deep Multi-Model Neural Network CNN MLP

  23. Experiment --- Usability Mikolov Tomá, Karafiát Martin, Burget Luká, Èernocký Jan, Khudanpur Sanjeev: Recurrent neural network based language model, INTERSPEECH 2010), Makuhari, Chiba, JP

  24. Experiment --- Efficiency and Scalability Train DCNN over CIFAR10: https://code.google.com/p/cuda-convnet • Single Node • 4 NUMA nodes (Intel Xeon 7540, 2.0GHz) • Each node has 6 cores • hyper-threading enabled • 500 GB memory • Cluster • Quad-core Intel Xeon 3.1 GHz CPU and 8GB memory, 1Gbps switch • 32 nodes, 4 workers per node Caffe, GTX 970 Synchronous

  25. Experiment --- Scalability Train DCNN over CIFAR10: https://code.google.com/p/cuda-convnet • Single Node • Cluster Caffe SINGA Asynchronous

  26. Conclusions • Programming Model, Abstraction, and System Architecture • Easy to implement different models • Flexible and efficient to run different frameworks • Experiments • Train models from different categories • Scalability test for different training frameworks • SINGA • Usable, extensible, efficient and scalable • Apache SINGA v0.1.0 has been released • V0.2.0 (with GPU-CPU, DLaaS, more features) out next month • Being used for healthcare analytics, product search, …

  27. Thank You! Acknowledgement: Apache SINGA Team (ASF mentors, contributors, committers, and users) + funding agencies (NRF, MOE, ASTAR)

More Related