1 / 33

HiPS : Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning

This paper discusses the challenges in communication and synchronization in large-scale distributed machine learning and proposes HiPS, a hierarchical parameter synchronization approach. The paper presents the HiPS design, theoretical evaluation, simulation evaluation, and testbed evaluation, showcasing its effectiveness in reducing communication costs and improving performance in distributed machine learning.

dolby
Download Presentation

HiPS : Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HiPS:Hierarchical Parameter Synchronization inLarge-ScaleDistributedMachineLearning JinkunGeng,DanLi,YangCheng,ShuaiWang,andJunfengLi

  2. ACMSIGCOMMWorkshoponNetAI Net AI for

  3. Background Computation Communication DistributedMachineLearning

  4. Background • StrongComputationPower(GPU&TPU)

  5. Background • Communication Challenge • TCP: High Latency & Low Throughput, Kernel Overheads, etc. • RDMA-PromisingAlternativetoTCP

  6. Background • AMNISTBenchmarkwith1millionparas

  7. Background • RoCE/RDMA–multi-vendorecosystem • ManyProblemsinFat-TreebasedDeployment

  8. Background • Fat-TreebasedDeployment • PFCpauseframestorm[SIGCOMM’15,’16,NS-3Simulation] • ResilientRoCE-PerformanceSacrifice [Chelsio-Tech] • SynchronizationPerformance

  9. Background • Fat-TreebasedDeployment • PFCpauseframestorm[SIGCOM’15,’16] • ResilientRoCE-PerformanceSacrifice Server-CentricNetworks

  10. Background • Fat-TreebasedDeployment • SynchronizationPerformance Hierarchical Synchronization

  11. Background • Server-CentricNetworks • LesshopsleadtolessPFCpauseframes • ServerspreventcascadingeffectofPFCpauseframe

  12. Background • SynchronizationAlgorithm • PS-based • Mesh-based • Ring-based

  13. Background • SynchronizationAlgorithm • PS-based(Pull+Push)

  14. Background • SynchronizationAlgorithm • Mesh-based(Diffuse+Collect)

  15. Background • SynchronizationAlgorithm • Ring-based(Scatter+Gather)

  16. Background • SynchronizationAlgorithm • Ring-based(Scatter+Gather)

  17. HiPSDesign • MapLogicViewandPhysicalStructure • Flexible(Topology-Aware) • Hierarchical(Efficient)

  18. HiPSDesign • HiPSinBCube

  19. HiPSDesign • HiPSinBCube

  20. HiPSDesign • HiPSinBCube

  21. HiPSDesign • HiPSinBCube(Server<01>)

  22. HiPSDesign • HiPSinBCube

  23. HiPSDesign • HiPSinTorus

  24. Theoretical Evaluation

  25. Theoretical Evaluation

  26. Theoretical Evaluation

  27. FutureWork • ConductFurtherComparativeStudy • IntegrateHiPSintoDMLsystems

  28. Simulation Evaluation • NS-3SimulationwithVGGWorkload • BCube:GSTreducedby37.5%∼61.9%. • Torus:GSTreducedby49.6%∼66.4% GST Comparison with RDMAinBCube GST Comparison with RDMAinTorus

  29. Testbed Evaluation • SystemInstanceofHiPS:BML • AddanOPinTensorflow • 9Servers,eachequippedwith2RNICs(BCube(3,1)) • MINISTandVGG19asbenchmarks • RingAllreduceinRingandMesh-based(P2P)SyncinFat-TreeasBaseline

  30. Testbed Evaluation

  31. Testbed Evaluation 18.7%~56.4%

  32. OngoingWork • ConductFurtherComparativeStudy • OptimizeHiPSinDMLsystems • MoreCasesofNetworkforAI

  33. Thanks! NASPResearchGroup https://nasp.cs.tsinghua.edu.cn/

More Related