The Pitfalls and Guidelines for Mobile ML Benchmarking

The Pitfalls and Guidelines for Mobile ML Benchmarking • Fei Sun • Software Engineer, AI platform, Facebook

Mobile ML Deployment at Facebook

ML Deployment at Facebook • Machine learning workload used at Facebook • Neural style transfer • Person segmentation • Object detection • Classification • etc.

Neural Style Transfer

Person Segmentation • Mermaid effect in Instagram

Object Detection

Pitfalls of Mobile ML Benchmarking

Hardware Variations • CPU • GPU • DSP • NPU

Software Variations • ML frameworks (e.g. Pytorch/Caffe2, TFLite) • Kernel libraries (e.g. NNPACK, EIGEN) • Compute libraries (e.g. Open CL, Open GL ES) • Proprietary libraries • Different build flags

OEM Influence • Some performance related knobs are not tunable at user domain. • Require rooted phone or special built phone. • But the OEM/Chipset vendors can change them for different scenarios. • Some algorithms (e.g. scheduling) are hard-coded by OEM/Chipset vendors, but they influence performance a lot.

Run to Run Variation • Run-to-run variation on mobile is very obvious • Difference between different iterations of the same run • Difference between different runs • How to correlate the metrics collected from the benchmark runs to the production runs? • How to specify the benchmark condition?

How to Avoid Benchmark Engineering • Is it useful if • the hardware only runs models in benchmark suite efficiently • the software is only bug free running models in the benchmark suite • a different binary is used to benchmark each model • the code is bloated if the binary includes different algorithms for different models

Guidelines of Mobile ML Benchmarking

Key Components of Mobile ML Benchmarking • Relevant mobile ML Models • Representative mobile ML workload • Unified and standardized methodology of evaluating the metrics in different scenarios

Make Mobile ML Repeatable • Too many variables and pitfalls • Judge benchmarking result by another person • Easily reproduce the result • Easily separate out the steps • Easily visualize results from different benchmark runs

Facebook AI Performance Evaluation Platform

Challenges of ML Benchmarking • What ML models matter? • How to enhance ML model performance? • How to evaluate system performance? • How to guide future HW design?

Goals of AI Performance Evaluation Platform • Provide a model zoo on important models • Normalize the benchmarking metrics and conditions • Automate the benchmarking process • Honest measurement on performance

Backend & Framework Agnostic Benchmarking Platform • Same code base deployed to different backends • Vendor needs to plugin their backend to the platform • The build/run environment is unique • The benchmark metrics are displayed in different formats • Good faith approach • Easily reproducing the result

Benchmarking Flow

Benchmark Harness Diagram Repository Models Harness Benchmark preprocess Repo driver Meta data Regression detection Task Benchmark driver Data Preprocess pipeline Online Data Postprocess pipeline Reporters Files Custom framework driver Caffe2 driver Benchmark inputs Screen Data processing Backend drivers Backend drivers Standard API Data Converter Data Converter Harness provides Backend Backend Vendor provides Caffe2 Custom framework

Current Features • Support android devices, ios, windows, linux. • Support Pytorch/Caffe2, TFLite etc. frameworks. • Support latency, accuracy, FLOPs, power, energy metrics. • Save results locally, remotely, or display on screen. • Deal with git, hg version controls. • Cache models, binaries locally. • Continuously run benchmarks on new commits. • A/B testing on regression detection.

Open Source • Facebook AI Performance Evaluation Platform • https://github.com/facebook/FAI-PEP • Encourage community to contribute

Questions

The Pitfalls and Guidelines for Mobile ML Benchmarking

The Pitfalls and Guidelines for Mobile ML Benchmarking

Presentation Transcript

Benchmarking for improvement

Mobile Banking – Common Pitfalls and How to avoid them

Benchmarking for Improvement

CitiDirect MOBILE REDESIGN COMPETITIVE BENCHMARKING

[ML-5510N/ ML-5510ND/ML-6510ND ML-5512ND/ML-6512ND

Energy Efficiency Benchmarking for Mobile Networks

IMPLEMENTATION OF BENCHMARKING GUIDELINES IN COMMUNITY PHARMACIES

The Algol Family and ML

The Algol Family and ML

ML: the metalanguage

The Algol Family and ML

Guidelines For Mobile Website Designing And Development

5 Mobile Learning Pitfalls To Avoid

Security Testing Guidelines for mobile Apps

Bookkeeping and Benchmarking for Franchises.

How To Overcome The Most Common Mobile App Pitfalls

AI and ML

Mobile App Development Have Revamped AI and ML

The Common UX Pitfalls in Mobile App Design

IoT and ML for Predictive Maintenance

User-education Guidelines for Mobile Terminals and E-services

Mobile Remote Payments General Guidelines for Ecosystems