1 / 24

The Pitfalls and Guidelines for Mobile ML Benchmarking

The Pitfalls and Guidelines for Mobile ML Benchmarking. Fei Sun. Software Engineer, AI platform, Facebook. Mobile ML Deployment at Facebook. ML Deployment at Facebook. Machine learning workload used at Facebook Neural style transfer Person segmentation Object detection Classification

joanc
Download Presentation

The Pitfalls and Guidelines for Mobile ML Benchmarking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Pitfalls and Guidelines for Mobile ML Benchmarking • Fei Sun • Software Engineer, AI platform, Facebook

  2. Mobile ML Deployment at Facebook

  3. ML Deployment at Facebook • Machine learning workload used at Facebook • Neural style transfer • Person segmentation • Object detection • Classification • etc.

  4. Neural Style Transfer

  5. Person Segmentation • Mermaid effect in Instagram

  6. Object Detection

  7. Pitfalls of Mobile ML Benchmarking

  8. Hardware Variations • CPU • GPU • DSP • NPU

  9. Software Variations • ML frameworks (e.g. Pytorch/Caffe2, TFLite) • Kernel libraries (e.g. NNPACK, EIGEN) • Compute libraries (e.g. Open CL, Open GL ES) • Proprietary libraries • Different build flags

  10. OEM Influence • Some performance related knobs are not tunable at user domain. • Require rooted phone or special built phone. • But the OEM/Chipset vendors can change them for different scenarios. • Some algorithms (e.g. scheduling) are hard-coded by OEM/Chipset vendors, but they influence performance a lot.

  11. Run to Run Variation • Run-to-run variation on mobile is very obvious • Difference between different iterations of the same run • Difference between different runs • How to correlate the metrics collected from the benchmark runs to the production runs? • How to specify the benchmark condition?

  12. How to Avoid Benchmark Engineering • Is it useful if • the hardware only runs models in benchmark suite efficiently • the software is only bug free running models in the benchmark suite • a different binary is used to benchmark each model • the code is bloated if the binary includes different algorithms for different models

  13. Guidelines of Mobile ML Benchmarking

  14. Key Components of Mobile ML Benchmarking • Relevant mobile ML Models • Representative mobile ML workload • Unified and standardized methodology of evaluating the metrics in different scenarios

  15. Make Mobile ML Repeatable • Too many variables and pitfalls • Judge benchmarking result by another person • Easily reproduce the result • Easily separate out the steps • Easily visualize results from different benchmark runs

  16. Facebook AI Performance Evaluation Platform

  17. Challenges of ML Benchmarking • What ML models matter? • How to enhance ML model performance? • How to evaluate system performance? • How to guide future HW design?

  18. Goals of AI Performance Evaluation Platform • Provide a model zoo on important models • Normalize the benchmarking metrics and conditions • Automate the benchmarking process • Honest measurement on performance

  19. Backend & Framework Agnostic Benchmarking Platform • Same code base deployed to different backends • Vendor needs to plugin their backend to the platform • The build/run environment is unique • The benchmark metrics are displayed in different formats • Good faith approach • Easily reproducing the result

  20. Benchmarking Flow

  21. Benchmark Harness Diagram Repository Models Harness Benchmark preprocess Repo driver Meta data Regression detection Task Benchmark driver Data Preprocess pipeline Online Data Postprocess pipeline Reporters Files Custom framework driver Caffe2 driver Benchmark inputs Screen Data processing Backend drivers Backend drivers Standard API Data Converter Data Converter Harness provides Backend Backend Vendor provides Caffe2 Custom framework

  22. Current Features • Support android devices, ios, windows, linux. • Support Pytorch/Caffe2, TFLite etc. frameworks. • Support latency, accuracy, FLOPs, power, energy metrics. • Save results locally, remotely, or display on screen. • Deal with git, hg version controls. • Cache models, binaries locally. • Continuously run benchmarks on new commits. • A/B testing on regression detection. 

  23. Open Source • Facebook AI Performance Evaluation Platform • https://github.com/facebook/FAI-PEP • Encourage community to contribute

  24. Questions

More Related