1 / 24

READINGS IN DEEP LEARNING

READINGS IN DEEP LEARNING. 4 Sep 2013. ADMINSTRIVIA. New course numbers (11-785/786) are assigned Should be up on the hub shortly Lab assignment 1 up Due date: 2 weeks from today Google group: is everyone on? Website issues.. Wordpress not yet an option (CMU CS setup) Piazza?.

lorin
Download Presentation

READINGS IN DEEP LEARNING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. READINGS IN DEEP LEARNING 4 Sep 2013

  2. ADMINSTRIVIA • New course numbers (11-785/786) are assigned • Should be up on the hub shortly • Lab assignment 1 up • Due date: 2 weeks from today • Google group: is everyone on? • Website issues.. • Wordpress not yet an option (CMU CS setup) • Piazza?

  3. Poll for next 2 classes • Monday, Sep 9 • The perceptron: A probabilistic model for information storage and organization in the brain • Rosenblatt • Not really about the logistic perceptron, more about the probabilistic interpretation of learning in connectionist networks • Organization of behavior • Donald Hebb • About the Hebbian learning rule

  4. Poll for next 2 classes • Wed, Sep 11 • Optimal unsupervised learning in a single-layer linear feedforward neural network. • Terence Sanger • Generalized Hebbian learning rule • The Widrow Hoff learning rule • Widrow and Hoff  • Will be presented by PallaviBaljekar

  5. Notices • Success of course depends on good presentations • Please send in your slides 1-2 days before the presentations • So that we can ensure they are OK • You are encouraged to discuss your papers with us/your classmates while preparing for them • Use the google group for discussion

  6. A new project • Distributed large scale training of NNs.. • Looking for volunteers

  7. The Problem: Distributed data • Training enormous networks • Billions of units • from large amounts of data • Billions or Trillions of instances • Data may be localized.. • Or distributed

  8. The problem: Distributed computing • A single computer will not suffice • Need many processors • Tens or hundreds or thousands of computers • Of possibly varying types and capacity

  9. Challenge • Getting the data to the computers • Tons of data to many computers • Bandwidth problems • Timing issues • Synchronizing the learning

  10. Logistic Challenges • How to transfer vast amounts of data to processors • Which processor gets how much data.. • Not all processors equally fast • Not all data take equal amounts of time to process • .. and which data • Data locality

  11. Learning Challenges • How to transfer parameters to processors • Networks are large, billions or trillions of parameters • Each processor must have the latest copy of parameters • How to receive updates from processors • Each processor learns on local data • Updates from all processors must be pooled

  12. Learning Challenges • Synchronizing processor updates • Some processors slower than others • Inefficient to wait for slower ones • In order to update parameters at all processors • Requires asynchronous updates • Each processor updates when done • Problem: Different processors now have different set of parameters • Other processors may have updated parameters already • Requires algorithmic changes • How to update asynchronously • Which updates to trust

  13. Current Solutions • Faster processors • GPUs • GPU programming required • Large simple clusters • Simple distributed programming • Large heterogeneous clusters • Techniques for asynchronouslearning

  14. Current Solutions • Still assume data distribution nota major problem • Assume relatively fast connectivity • Gigabit ethernet • Fundamentally cluster-computingbased • Local area network

  15. New project • Distributed learning • Wide area network • Computers distributed across the world

  16. New project • Supervisor/Worker architecture • One or more supervisors • May be a hierarchy • A large number of workers • Supervisors in charge of resource and task allocation, gathering and redistributing updates, synchronization

  17. New project • Challenges • Data allocation • Optimal policy for data distribution • Minimal latency • Maximum locality

  18. New project • Challenges • Computation allocation • Optimal policy for learning • Compute load proportional to compute capacity • Reallocation of data/task asrequired

  19. New project • Challenges • Parameter allocation • Do we have to distribute all parameters • Can learning be local

  20. New project • Challenges • Trustable updates • Different processors/LANs have different speeds • How do we trust their updates • Do we incorporate or reject?

  21. New project • Optimal resychronization: how much do we transmit • Should not have to retransmit everything • Entropy coding? • Bit-level optimization?

  22. Possibilities • Massively parallel learning • Never ending learning • Multimodal learning • GAIA..

  23. Asking for Volunteers • Will be an open source project • Write to Anders

  24. Today • Bain’s theory: Lars Mahler • Linguist, mathematician, philosopher • One of the earliest people to propose connectionist architecture • Anticipated much of modern ideas • McCulloch and Pitts: KartikGoyal • Early model of neuron: Threshold gates • Earliest model to consider excitation and inhibition

More Related