-
Notifications
You must be signed in to change notification settings - Fork 15
Description
I have a nonlinear (nonconvex) program that I am solving with Ipopt and using DiffOpt to differentiate through. I have been having trouble getting correct gradients and am not sure if this is a numerical issue with my model formulation, or if there is something more fundamental going on. Here is a pared-down example that demonstrates my issue.
# define parameter value
ϕ0=[0.0];
# define constants used in model
Phigh = 0.4412123910369964;
Plow = 0.357171935601378;
c(Nsep) = (1 / (1 + 2*Nsep))^2
cols = 2;
reqwakes = 1;
Nwakes = 1;
aprev = 0.1 * ones(2, 1);
rhist = Phigh * ones(1);
normalized_rhist = [1];
predhorz = 2;
KP = 0.5 * π/4;
model = DiffOpt.nonlinear_diff_model(Ipopt.Optimizer)
# define decision variables
JuMP.@variable(model, 0.05 <= a[1:cols, -reqwakes:Nwakes] <= 1/3)
# fix certain values
for j=1:cols, n=1:reqwakes
JuMP.fix(a[j, n - cols], aprev[j, n]; force=true)
end
JuMP.fix.(a[end, 0:end], 0.333333; force=true)
# define parameter
JuMP.@variable(model, ϕ in JuMP.Parameter(ϕ0[1]))
# define parameterized signal
scale = (Phigh - Plow) / 2;
shift = (Phigh + Plow) / 2;
JuMP.@variables(model, begin
Pref_unscaled[t=1:predhorz]
Pref_var[t=1:predhorz]
end)
JuMP.fix.(Pref_unscaled[1], reverse(normalized_rhist))
JuMP.@constraints(model, begin
[t=2:predhorz], Pref_unscaled[t] .== ϕ * Pref_unscaled[t-1]
[t=1:predhorz], Pref_var[t] == (scale * Pref_unscaled[t] + shift)
end)
# define NL model expressions
JuMP.@expressions(model, begin
CP[j=1:cols, n=0:Nwakes], 4 * a[j,n] * (1 - a[j,n])^2
δV[j=1:cols, n=0:Nwakes], 2 * sqrt( sum( (a[prev,n-(j-prev)] * c(j-prev) )^2 for prev=1:j-1; init=0) )
u[j=1:cols, n=0:Nwakes], (1 - δV[j,n])
Pcap[j=1:cols, n=0:Nwakes], CP[j,n] * u[j,n]^3
Pgen[n=0:Nwakes], sum(Pcap[j,n] for j=1:cols)
end)
# define objective
JuMP.@objective(model, Min, sum( (KP * Pgen[n] - Pref_var[1+n])^2 for n=0:Nwakes))
With this model, I vary φ and find the optimal a, producing the following curve.

I then take the derivative of a w.r.t. φ using both DiffOpt
DiffOpt.empty_input_sensitivities!(model)
DiffOpt.set_reverse_variable.(model, model[:a][1, 0], 1)
DiffOpt.reverse_differentiate!(model)
dadϕ = DiffOpt.get_reverse_parameter.(model, model[:ϕ]);
and finite differencing.
function solve_a_ϕ0_manual(ϕ0)
JuMP.set_parameter_value.(model[:ϕ], ϕ0)
JuMP.optimize!(model)
JuMP.assert_is_solved_and_feasible(model)
asolved_new = JuMP.value.(model[:a]);
return asolved_new[1, 0]
end
dadϕ_fdm = FiniteDifferences.grad(FiniteDifferences.forward_fdm(2, 1; factor=1e6), solve_a_ϕ0_manual, ϕ0);
This returns problematic results. In the flat portion of the plot, e.g., φ0 = 0.0, both gradients are "close" to 0, but still differ significantly.
dadϕ = -9.871980439887741e-10
dadϕ_fdm = ([-5.972732875606716e-8],)
dadϕ_fdm[1] / dadϕ = [60.501871047818184]
In the changing portion of the plot, e.g., φ0 = 1.0, the gradients are closer but don't really match well.
dadϕ = -0.03988513330240084
dadϕ_fdm = ([-0.04102676532022559],)
dadϕ_fdm[1] / dadϕ = [1.02862299617176]
Plotting the gradients as tangents on the graphs shows that the finite differencing is giving me more accurate results here.
Any insight would be appreciated!