430 likes | 449 Views
Explore techniques like Interval Bound Propagation & Linear Programming for robust neural network verification. Understand non-convex problem-solving & optimize erroneous outputs. Achieve error-safe deep learning models.
E N D
Neural Network VerificationPart 4: Incomplete Methods Some true properties can be proved true
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe Non-convexity makes the problem NP-hard
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe Replace by a convex superset
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe Say, non-convex set has no erroneous output
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe Convex superset might give incorrect answer
Outline • Interval Bound Propagation • Linear Programming Relaxation • Results Mirman et al., 2018; Gowal et al., 2018
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe
Condition on inputs Robust Deep Learning Post Is there an erroneous output? Error Safe Axis aligned convex superset
Example Post -2 ≤ x1≤ 2 a [-4, 4] [0, 4] 1 -2 ≤ x2 ≤ 2 -1 x1 1 [-2, 2] ain = x1 + x2 z x2 [-2, 2] 1 -1 aout = max{ain,0} -1 b Minimum value of ain? -4 Minimum value of aout? 0 Maximum value of aout? 4 Maximum value of ain? 4
Example Post -2 ≤ x1≤ 2 a [-4, 4] [0, 4] 1 -2 ≤ x2 ≤ 2 -1 x1 1 [-2, 2] bin = x1 - x2 z x2 [-2, 2] 1 -1 bout = max{bin,0} -1 b [-4, 4] [0, 4] Minimum value of bin? -4 Minimum value of bout? 0 Maximum value of bout? 4 Maximum value of bin? 4
Example Post -2 ≤ x1≤ 2 a [-4, 4] [0, 4] 1 -2 ≤ x2 ≤ 2 -1 x1 1 [-2, 2] [-8, 0] bin = x1 - x2 z x2 [-2, 2] 1 -1 bout = max{bin,0} -1 b [-4, 4] [0, 4] z = -aout-bout Minimum value of z? -8 Maximum value of z? 0
Outline • Interval Bound Propagation • Linear Programming Relaxation • Results Wong and Kolter, 2018
Example Post min z s.t. -2 ≤ x1≤ 2 a 1 -2 ≤ x2 ≤ 2 -1 x1 1 [-2, 2] ain = x1 + x2 z x2 [-2, 2] 1 -1 bin = x1 - x2 -1 aout = max{ain,0} b bout = max{bin,0} z = - aout - bout
Example Post min z Linear constraints s.t. -2 ≤ x1≤ 2 -2 ≤ x2 ≤ 2 Easy to handle ain = x1 + x2 bin = x1 - x2 aout = max{ain,0} bout = max{bin,0} z = - aout - bout
Example Post min z s.t. -2 ≤ x1≤ 2 -2 ≤ x2 ≤ 2 ain = x1 + x2 bin = x1 - x2 aout = max{ain,0} Non-linear constraints bout = max{bin,0} NP-hard problem z = - aout - bout
Relaxation Post ain ∈ [l,u] aout = max{ain,0} aout ain l u Ehlers 2017 Replace with convex superset
Example Post min z s.t. -2 ≤ x1≤ 2 -2 ≤ x2 ≤ 2 ain = x1 + x2 bin = x1 - x2 aout = max{ain,0} bout = max{bin,0} z = - aout - bout
Example Post min z Linear Program s.t. -2 ≤ x1≤ 2 -2 ≤ x2 ≤ 2 Several “efficient” solvers ain = x1 + x2 bin = x1 - x2 aout ≥ 0, aout≥ ain, aout ≤ 0.5ain + 2 bout ≥ 0, bout ≥ bin, bout ≤ 0.5bin + 2 z = - aout - bout
Outline • Interval Bound Propagation • Linear Programming Relaxation • LP Duality • Results
Example minx -3x1 - x2 - 2x3 7 x 2 x -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. x1 + x2 + 3x3≤ 30 3 x 2x1 + 2x2 + 5x3 ≤ 24 4x1 + x2 + 2x3 ≤ 36 Scale the constraints, add them up 3x1 + x2 + 2x3 ≤ 90
Example minx -3x1 - x2 - 2x3 7 x 2 x -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. x1 + x2 + 3x3≤ 30 3 x 2x1 + 2x2 + 5x3 ≤ 24 4x1 + x2 + 2x3 ≤ 36 Scale the constraints, add them up Lower bound on solution -3x1 - x2 - 2x3 ≥ -90
Example minx -3x1 - x2 - 2x3 1 x -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. x1 + x2 + 3x3≤ 30 2x1 + 2x2 + 5x3 ≤ 24 1 x 4x1 + x2 + 2x3 ≤ 36 Scale the constraints, add them up 3x1 + x2 + 2x3 ≤ 36
Example minx -3x1 - x2 - 2x3 1 x -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. x1 + x2 + 3x3≤ 30 2x1 + 2x2 + 5x3 ≤ 24 1 x 4x1 + x2 + 2x3 ≤ 36 Scale the constraints, add them up Lower bound on solution -3x1 - x2 - 2x3 ≥ -36
Example minx -3x1 - x2 - 2x3 1 x -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. x1 + x2 + 3x3≤ 30 2x1 + 2x2 + 5x3 ≤ 24 1 x 4x1 + x2 + 2x3 ≤ 36 Scale the constraints, add them up Tightest lower bound? -3x1 - x2 - 2x3 ≥ -36
Example minx -3x1 - x2 - 2x3 y1 y2 y3 -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. y4 x1 + x2 + 3x3≤ 30 y5 2x1 + 2x2 + 5x3 ≤ 24 y6 4x1 + x2 + 2x3 ≤ 36 We should be able to add up the inequalities y1, y2, y3, y4, y5, y6 ≥ 0
Example minx -3x1 - x2 - 2x3 y1 y2 y3 -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. y4 x1 + x2 + 3x3≤ 30 y5 2x1 + 2x2 + 5x3 ≤ 24 y6 4x1 + x2 + 2x3 ≤ 36 Coefficient of x1 should be 3 -y1+ y4 + 2y5 + 4y6 = 3
Example minx -3x1 - x2 - 2x3 y1 y2 y3 -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. y4 x1 + x2 + 3x3≤ 30 y5 2x1 + 2x2 + 5x3 ≤ 24 y6 4x1 + x2 + 2x3 ≤ 36 Coefficient of x2 should be 1 -y2 + y4 + 2y5 + y6 = 1
Example minx -3x1 - x2 - 2x3 y1 y2 y3 -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. y4 x1 + x2 + 3x3≤ 30 y5 2x1 + 2x2 + 5x3 ≤ 24 y6 4x1 + x2 + 2x3 ≤ 36 Coefficient of x3 should be 2 -y3 + 3y4 + 5y5 + 2y6 = 2
Example minx -3x1 - x2 - 2x3 y1 y2 y3 -x1≤ 0, -x2≤ 0, -x3≤ 0 s.t. y4 x1 + x2 + 3x3≤ 30 y5 2x1 + 2x2 + 5x3 ≤ 24 y6 4x1 + x2 + 2x3 ≤ 36 Lower bound should be tightest maxy -30y4 - 24y5 - 36y6
Dual maxy -30y4 - 24y5 - 36y6 s.t. y1, y2, y3, y4, y5, y6 ≥ 0 -y1+ y4 + 2y5 + 4y6 = 3 -y2 + y4 + 2y5 + y6 = 1 -y3 + 3y4 + 5y5 + 2y6 = 2 Original problem is called primal Dual of dual is primal
minxcTx Primal s.t. A x ≤ b maxy≥0 bTy Dual s.t.ATy = c
Weak Duality Post Any feasible primal solution x Any feasible dual solution y Primal value at x ≥ Dual value at y Dual provides a lower bound
Strong Duality Post Optimal primal solution x* Optimal dual solution y* Primal value at x* = Dual value at y* Some mild conditions have to be satisfied
Example Post min z Linear Program s.t. -2 ≤ x1≤ 2 -2 ≤ x2 ≤ 2 Dual feasible solution ain = x1 + x2 Evaluated using dual network bin = x1 - x2 aout ≥ 0, aout≥ ain, aout ≤ 0.5ain + 2 bout ≥ 0, bout ≥ bin, bout ≤ 0.5bin + 2 z = - aout - bout
Outline • Interval Bound Propagation • Linear Programming Relaxation • Results
Experimental Setup Post Input model is ε-perturbation of an image Robust deep learning
MNIST with ε = 0.1 Post Nominal: Test error = 0.65%, PGD = 27.72%
MNIST with ε = 0.3 Post Nominal: Test error = 0.65%, PGD = 99.63%
CIFAR-10 with ε = 8/255 Post * Nominal: Test error = 16.66%, PGD = 100% * PGD error not available (replaced by Incomplete)
Results Post Interval bounds are useful in practice Significantly faster than LP relaxation methods Requires careful tuning of more hyperparameters Still a long way to go to reach “real” networks
Questions? References for other incomplete methods on tutorial webpage