SpeedMapping.jl
SpeedMapping.speedmapping
— FunctionSpeedMapping
speedmapping(x₀; m!, kwargs...)
accelerates the convergence of a mapping m!(x_out, x_in)
to a fixed point of m!
by the Alternating cyclic extrapolation algorithm (ACX). Since gradient descent is an example of such mapping, speedmapping(x0; g!, kwargs...)
can also perform multivariate optimization based on the gradient function g!(∇, x)
.
Reference: N. Lepage-Saucier, Alternating cyclic extrapolation methods for optimization algorithms, arXiv:2104.04974 (2021). https://arxiv.org/abs/2104.04974
Arguments
x₀ :: AbstractArray
: The starting point; must be of eltypeFloat
.
Main keyword arguments:
m!(x_out, x_in)
: A map for which a fixed-point must be found.g!(∇, x)
: The gradient of a function to be minimized.f(x)
: The objective of the function to be minimized. It is useful to i) compute a good initial α (learning rate) for the gradient descent, ii) optimize using autodiff or iii) track the progress of the algorithm.lower, upper :: Union{AbstractArray, Nothing} = nothing
: Box constraints for the optimization. NOTE: When appropriate, it should also be used withm!
. Even ifm!
always keepsx_out
within bounds, an extrapolation step could throwx
out of bounds.tol :: Float64 = 1e-10, Lp :: Real = 2
When usingm!
, the algorithm stops whennorm(F(xₖ) - xₖ, Lp) ≤ tol
. When usingg!
, the algorithm stops whennorm(∇f(xₖ), Lp) ≤ tol
.maps_limit :: Real = 1e6
: Maximum number ofm!
calls org!
calls.time_limit :: Real = 1000
: Maximum time in seconds.
Minor keyword arguments:
orders :: Array{Int64} = [3, 3, 2]
determinesACX
's alternating order. Must be between 1 and 3 (where 1 means no extrapolation). The two recommended orders are [3, 2] and [3, 3, 2], the latter being potentially better for highly non-linear applications (see paper).check_obj :: Bool = false
: In case ofNaN
orInf
values, the algorithm restarts at the best past iterate. Ifcheck_obj = true
, progress is monitored with the value of the objective (requiresf
). Otherwise, it is monitored withnorm(F(xₖ) - xₖ, Lp)
. Advantages ofcheck_obj = true
: more precise and if the algorithm convergences on a bad local minimum, the algorithm instead returns the best past iterate. Advantages ofcheck_obj = false
: for well-behaved applications, it avoids the effort and time of providingf
and calling it at every iteration.store_info :: Bool = false
: Storesxₖ
,σₖ
andαₖ
(see paper).buffer :: Float64 = 0.01
Ifxₖ
goes out of bounds, it is brought back in with a buffer. Ex.xₖ = buffer * xₖ₋₁ + (1 - buffer) * upper
. Settingbuffer = 0.001
may speed-up box-constrained optimization.
Keyword arguments to fine-tune fixed-point mapping acceleration (using m!
):
σ_min :: Real = 0.0
: Setting to1
may avoid stalling (see paper).stabilize :: Bool = false
: performs a stabilization mapping before extrapolating. Setting totrue
may improve the performance for applications like the EM or MM algorithm (see paper).
Example: Finding a dominant eigenvalue
julia> using LinearAlgebra
julia> using SpeedMapping
julia> A = ones(10) * ones(10)' + Diagonal(1:10);
julia> function power_iteration!(x_out, x_in, A)
mul!(x_out, A, x_in)
x_out ./= maximum(abs.(x_out))
end;
julia> res = speedmapping(ones(10); m! = (x_out, x_in) -> power_iteration!(x_out, x_in, A))
(minimizer = [0.41214914122181, 0.44095060739689546, 0.4740798646565511, 0.5125916147320679, 0.5579135738427364, 0.6120273727167589, 0.677766040697063, 0.7593262786058278, 0.8632012019116191, 1.0], maps = 16, f_calls = 0, converged = true, norm_∇ = 2.8042637093700587e-9)
julia> V = res.minimizer;
julia> dominant_eigenvalue = V'A * V / V'V
16.310005690792195
Example: Minimizing a multidimensional Rosenbrock
julia> f(x) = sum(100 * (x[i,1]^2 - x[i,2])^2 + (x[i,1] - 1)^2 for i ∈ 1:size(x,1));
julia> function g!(∇, x)
∇[:,1] .= 400(x[:,1].^2 .- x[:,2]) .* x[:,1] .+ 2(x[:,1] .- 1)
∇[:,2] .= -200(x[:,1].^2 .- x[:,2])
return nothing
end;
julia> x₀ = 1.0 * [-4 -3 -2 -1; 0 1 2 3]';
Optimizing, providing f and g!
julia> speedmapping(x₀; f, g!)
(minimizer = [0.9999999999962509 0.9999999999924867; 0.9999999999941136 0.9999999999882022; 0.9999999999752885 0.9999999999504781; 0.9999999989360835 0.9999999978679094], maps = 156, f_calls = 11, converged = true, norm_∇ = 9.524035730285234e-10)
Optimizing without objective
julia> speedmapping(x₀; g!)
(minimizer = [1.0000000000000542 1.0000000000001086; 0.9999999999999626 0.999999999999925; 0.9999999999997893 0.9999999999995778; 0.9999999999998465 0.9999999999996921], maps = 142, f_calls = 0, converged = true, norm_∇ = 3.524659991326121e-13)
Optimizing without gradient
julia> speedmapping(x₀; f)
(minimizer = [0.9999999999973896 0.9999999999947689; 0.9999999999958996 0.9999999999917817; 0.9999999999829567 0.9999999999658452; 0.9999999992743016 0.9999999985456991], maps = 156, f_calls = 11, converged = true, norm_∇ = 6.496402053553401e-10)
Optimizing with a box constraint
julia> upper = [0.5ones(4) Inf * ones(4)];
julia> speedmapping(x₀; f, g!, upper)
(minimizer = [0.5 0.25; 0.5 0.24999999999999864; 0.49999999999999795 0.2499999999975679; 0.4999999999999807 0.2499999999773891], maps = 138, f_calls = 11, converged = true, norm_∇ = 7.196533863189794e-9)