Skip to content

Trouble debugging gradients of nonlinear model #318

@sunnyxxchen

Description

@sunnyxxchen

I have a nonlinear (nonconvex) program that I am solving with Ipopt and using DiffOpt to differentiate through. I have been having trouble getting correct gradients and am not sure if this is a numerical issue with my model formulation, or if there is something more fundamental going on. Here is a pared-down example that demonstrates my issue.

# define parameter value
ϕ0=[0.0];

# define constants used in model
Phigh = 0.4412123910369964;
Plow = 0.357171935601378;
c(Nsep) = (1 / (1 + 2*Nsep))^2
cols = 2; 
reqwakes = 1; 
Nwakes = 1;
aprev = 0.1 * ones(2, 1);
rhist = Phigh * ones(1);
normalized_rhist = [1];
predhorz = 2;
KP = 0.5 * π/4;

model = DiffOpt.nonlinear_diff_model(Ipopt.Optimizer)

# define decision variables
JuMP.@variable(model, 0.05 <= a[1:cols, -reqwakes:Nwakes] <= 1/3)

# fix certain values
for j=1:cols, n=1:reqwakes
    JuMP.fix(a[j, n - cols], aprev[j, n]; force=true)
end

JuMP.fix.(a[end, 0:end], 0.333333; force=true)

# define parameter
JuMP.@variable(model, ϕ in JuMP.Parameter(ϕ0[1]))

# define parameterized signal
scale = (Phigh - Plow) / 2;
shift = (Phigh + Plow) / 2;

JuMP.@variables(model, begin
               Pref_unscaled[t=1:predhorz]
               Pref_var[t=1:predhorz]
           end)

JuMP.fix.(Pref_unscaled[1], reverse(normalized_rhist))

JuMP.@constraints(model, begin
                 [t=2:predhorz], Pref_unscaled[t] .== ϕ * Pref_unscaled[t-1]
                 [t=1:predhorz], Pref_var[t] == (scale * Pref_unscaled[t] + shift)
             end)

# define NL model expressions
JuMP.@expressions(model, begin
                     CP[j=1:cols, n=0:Nwakes], 4 * a[j,n] * (1 - a[j,n])^2
                     δV[j=1:cols, n=0:Nwakes], 2 * sqrt( sum( (a[prev,n-(j-prev)] * c(j-prev) )^2 for prev=1:j-1; init=0) )
                     u[j=1:cols, n=0:Nwakes], (1 - δV[j,n])
                     Pcap[j=1:cols, n=0:Nwakes], CP[j,n] * u[j,n]^3
                     Pgen[n=0:Nwakes], sum(Pcap[j,n] for j=1:cols)
                 end)

# define objective
JuMP.@objective(model, Min, sum( (KP * Pgen[n] - Pref_var[1+n])^2 for n=0:Nwakes))

With this model, I vary φ and find the optimal a, producing the following curve.
Image

I then take the derivative of a w.r.t. φ using both DiffOpt

DiffOpt.empty_input_sensitivities!(model)

DiffOpt.set_reverse_variable.(model, model[:a][1, 0], 1)
DiffOpt.reverse_differentiate!(model)

dadϕ = DiffOpt.get_reverse_parameter.(model, model[:ϕ]);

and finite differencing.

function solve_a_ϕ0_manual(ϕ0)
    JuMP.set_parameter_value.(model[:ϕ], ϕ0)
    
    JuMP.optimize!(model)

    JuMP.assert_is_solved_and_feasible(model)

    asolved_new = JuMP.value.(model[:a]);

    return asolved_new[1, 0]
end

dadϕ_fdm = FiniteDifferences.grad(FiniteDifferences.forward_fdm(2, 1; factor=1e6), solve_a_ϕ0_manual, ϕ0);

This returns problematic results. In the flat portion of the plot, e.g., φ0 = 0.0, both gradients are "close" to 0, but still differ significantly.

dadϕ = -9.871980439887741e-10
dadϕ_fdm = ([-5.972732875606716e-8],)
dadϕ_fdm[1] / dadϕ = [60.501871047818184]

In the changing portion of the plot, e.g., φ0 = 1.0, the gradients are closer but don't really match well.

dadϕ = -0.03988513330240084
dadϕ_fdm = ([-0.04102676532022559],)
dadϕ_fdm[1] / dadϕ = [1.02862299617176]

Plotting the gradients as tangents on the graphs shows that the finite differencing is giving me more accurate results here.

Any insight would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions