SpeedMapping.jl

SpeedMapping.speedmappingFunction
SpeedMapping

speedmapping(x₀; m!, kwargs...) accelerates the convergence of a mapping m!(x_out, x_in) to a fixed point of m! by the Alternating cyclic extrapolation algorithm (ACX). Since gradient descent is an example of such mapping, speedmapping(x0; g!, kwargs...) can also perform multivariate optimization based on the gradient function g!(∇, x).

Reference: N. Lepage-Saucier, Alternating cyclic extrapolation methods for optimization algorithms, arXiv:2104.04974 (2021). https://arxiv.org/abs/2104.04974

Arguments

  • x₀ :: AbstractArray: The starting point; must be of eltype Float.

Main keyword arguments:

  • m!(x_out, x_in): A map for which a fixed-point must be found.
  • g!(∇, x): The gradient of a function to be minimized.
  • f(x): The objective of the function to be minimized. It is useful to i) compute a good initial α (learning rate) for the gradient descent, ii) optimize using autodiff or iii) track the progress of the algorithm.
  • lower, upper :: Union{AbstractArray, Nothing} = nothing: Box constraints for the optimization. NOTE: When appropriate, it should also be used with m!. Even if m! always keeps x_out within bounds, an extrapolation step could throw x out of bounds.
  • tol :: Float64 = 1e-10, Lp :: Real = 2 When using m!, the algorithm stops when norm(F(xₖ) - xₖ, Lp) ≤ tol. When using g!, the algorithm stops when norm(∇f(xₖ), Lp) ≤ tol.
  • maps_limit :: Real = 1e6: Maximum number of m! calls or g! calls.
  • time_limit :: Real = 1000: Maximum time in seconds.

Minor keyword arguments:

  • orders :: Array{Int64} = [3, 3, 2] determines ACX's alternating order. Must be between 1 and 3 (where 1 means no extrapolation). The two recommended orders are [3, 2] and [3, 3, 2], the latter being potentially better for highly non-linear applications (see paper).
  • check_obj :: Bool = false: In case of NaN or Inf values, the algorithm restarts at the best past iterate. If check_obj = true, progress is monitored with the value of the objective (requires f). Otherwise, it is monitored with norm(F(xₖ) - xₖ, Lp). Advantages of check_obj = true: more precise and if the algorithm convergences on a bad local minimum, the algorithm instead returns the best past iterate. Advantages of check_obj = false: for well-behaved applications, it avoids the effort and time of providing f and calling it at every iteration.
  • store_info :: Bool = false: Stores xₖ, σₖ and αₖ (see paper).
  • buffer :: Float64 = 0.01 If xₖ goes out of bounds, it is brought back in with a buffer. Ex. xₖ = buffer * xₖ₋₁ + (1 - buffer) * upper. Setting buffer = 0.001 may speed-up box-constrained optimization.

Keyword arguments to fine-tune fixed-point mapping acceleration (using m!):

  • σ_min :: Real = 0.0: Setting to 1 may avoid stalling (see paper).
  • stabilize :: Bool = false: performs a stabilization mapping before extrapolating. Setting to true may improve the performance for applications like the EM or MM algorithm (see paper).

Example: Finding a dominant eigenvalue

julia> using LinearAlgebra

julia> using SpeedMapping

julia> A = ones(10) * ones(10)' + Diagonal(1:10);

julia> function power_iteration!(x_out, x_in, A)
           mul!(x_out, A, x_in)
           x_out ./= maximum(abs.(x_out))
       end;

julia> res = speedmapping(ones(10); m! = (x_out, x_in) -> power_iteration!(x_out, x_in, A))
(minimizer = [0.41214914122181, 0.44095060739689546, 0.4740798646565511, 0.5125916147320679, 0.5579135738427364, 0.6120273727167589, 0.677766040697063, 0.7593262786058278, 0.8632012019116191, 1.0], maps = 16, f_calls = 0, converged = true, norm_∇ = 2.8042637093700587e-9)

julia> V = res.minimizer;

julia> dominant_eigenvalue = V'A * V / V'V
16.310005690792195

Example: Minimizing a multidimensional Rosenbrock

julia> f(x) = sum(100 * (x[i,1]^2 - x[i,2])^2 + (x[i,1] - 1)^2 for i ∈ 1:size(x,1));

julia> function g!(∇, x)
           ∇[:,1] .=  400(x[:,1].^2 .- x[:,2]) .* x[:,1] .+ 2(x[:,1] .- 1)
           ∇[:,2] .= -200(x[:,1].^2 .- x[:,2])
           return nothing
       end;

julia> x₀ = 1.0 * [-4 -3 -2 -1; 0 1 2 3]';

Optimizing, providing f and g!

julia> speedmapping(x₀; f, g!)
(minimizer = [0.9999999999962509 0.9999999999924867; 0.9999999999941136 0.9999999999882022; 0.9999999999752885 0.9999999999504781; 0.9999999989360835 0.9999999978679094], maps = 156, f_calls = 11, converged = true, norm_∇ = 9.524035730285234e-10)

Optimizing without objective

julia> speedmapping(x₀; g!)
(minimizer = [1.0000000000000542 1.0000000000001086; 0.9999999999999626 0.999999999999925; 0.9999999999997893 0.9999999999995778; 0.9999999999998465 0.9999999999996921], maps = 142, f_calls = 0, converged = true, norm_∇ = 3.524659991326121e-13)

Optimizing without gradient

julia> speedmapping(x₀; f)
(minimizer = [0.9999999999973896 0.9999999999947689; 0.9999999999958996 0.9999999999917817; 0.9999999999829567 0.9999999999658452; 0.9999999992743016 0.9999999985456991], maps = 156, f_calls = 11, converged = true, norm_∇ = 6.496402053553401e-10)

Optimizing with a box constraint

julia> upper = [0.5ones(4) Inf * ones(4)];

julia> speedmapping(x₀; f, g!, upper)
(minimizer = [0.5 0.25; 0.5 0.24999999999999864; 0.49999999999999795 0.2499999999975679; 0.4999999999999807 0.2499999999773891], maps = 138, f_calls = 11, converged = true, norm_∇ = 7.196533863189794e-9)
source