15.2. The Optim package#

For more advanced optimization method, Julia provided a number of state-of-the-art packages. Here we will consider the Optim package. It has a number of options and features, but here we will only demonstrate some basic features.

using Optim

In the simplest form, you can solve our optimization problem by only providing the function \(f(x)\) and the initial guess \(x_0\). The output shows that it found a local minimum, and that it used the “Nelder-Mead” method which is derivative free (zeroth order).

f(x) = -sin(x[1]^2/2 - x[2]^2/4 + 3) * cos(2x[1] + 1 - exp(x[2])) # Same as last section
res = optimize(f, [0, 0.5])
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    31
    f(x) calls:    60

If a gradient function is available, a gradient-based solver will be used automatically (in this case, the L-BFGS method). The inplace=false is used since our gradient function df returns the gradient instead of modifying it.

function df(x) 
    # Same as last section
    a1 = x[1]^2/2 - x[2]^2/4 + 3
    a2 = 2x[1] + 1 - exp(x[2])
    b1 = cos(a1)*cos(a2)
    b2 = sin(a1)*sin(a2)
    return -[x[1]*b1 - 2b2, -x[2]/2*b1 + exp(x[2])*b2]
end
res = optimize(f, df, [0, 0.5]; inplace=false)
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     L-BFGS

 * Convergence measures
    |x - x'|               = 9.09e-07 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.07e-06 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.19e-12 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 5.74e-12 ≰ 0.0e+00
    |g(x)|                 = 5.19e-10 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    8
    f(x) calls:    20
    ∇f(x) calls:   20

The method can also be specified explicitly, e.g. the gradient descent method:

res = optimize(f, df, [0, 0.5], GradientDescent(); inplace=false)
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     Gradient Descent

 * Convergence measures
    |x - x'|               = 2.20e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 2.60e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.75e-15 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 8.44e-15 ≰ 0.0e+00
    |g(x)|                 = 5.64e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    24
    f(x) calls:    65
    ∇f(x) calls:   65

If the gradient function is not available, it can be computed using automatic differentiation:

res = optimize(f, [0, 0.5], GradientDescent(); autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     Gradient Descent

 * Convergence measures
    |x - x'|               = 2.20e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 2.60e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.97e-15 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 9.51e-15 ≰ 0.0e+00
    |g(x)|                 = 5.64e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    24
    f(x) calls:    65
    ∇f(x) calls:   65

This also works for the Hessian matrix in Newton’s method. Note the extremely fast convergence (number of iterations):

res = optimize(f, [0, 0.5], Newton(); autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 4.82e-07 ≰ 0.0e+00
    |x - x'|/|x'|          = 5.69e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.69e-13 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 8.17e-13 ≰ 0.0e+00
    |g(x)|                 = 2.90e-14 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    4
    f(x) calls:    9
    ∇f(x) calls:   9
    ∇²f(x) calls:  4

Finally, we use the BFGS solver using automatic differentiation. This is a widely used method, since it obtains convergence comparable to Newton’s method but without requiring explicit Hessian matrices:

res = optimize(f, [0, 0.5], BFGS(); autodiff=:forward)
 * Status: success

 * Candidate solution
    Final objective value:     -2.072854e-01

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 1.09e-05 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.28e-05 ≰ 0.0e+00
    |f(x) - f(x')|         = 2.91e-10 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.41e-09 ≰ 0.0e+00
    |g(x)|                 = 4.04e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    6
    f(x) calls:    14
    ∇f(x) calls:   14