Adjoint state method

Numerical method

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.^[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.^[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.^[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem^[4] and is used e.g. in the Landweber iteration method.^[5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix $A^{*}={\overline {A}}^{T}$ is used.

When the initial problem consists of calculating the product $s^{T}x$ and $x$ must satisfy $Ax=b$ , the dual problem can be realized as calculating the product $r^{T}b$ ( $=s^{T}x$ ), where $r$ must satisfy $A^{*}r=s$ . And $r$ is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,^[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable $u\in {\mathcal {U}}$ , an optimization variable $v\in {\mathcal {V}}$ , an objective functional $J:{\mathcal {U}}\times {\mathcal {V}}\to \mathbb {R}$ is defined. The state variable $u$ is often implicitly dependant on $v$ through the (direct) state equation $D_{v}(u)=0$ (usually the weak form of a partial differential equation), thus the considered objective is $j(v)=J(u_{v},v)$ . Usually, one would be interested in calculating $\nabla j(v)$ using the chain rule:

\nabla j(v)=\nabla _{v}J(u_{v},v)+\nabla _{u}J(u_{v})\nabla _{v}u_{v}.

Unfortunately, the term $\nabla _{v}u_{v}$ is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of $j$ , the problem

{\text{minimize}}\ j(v)=J(u_{v},v)

{\text{subject to}}\ D_{v}(u_{v})=0

has an associate Lagrangian functional ${\mathcal {L}}:{\mathcal {U}}\times {\mathcal {V}}\times {\mathcal {U}}\to \mathbb {R}$ defined by

{\mathcal {L}}(u,v,\lambda )=J(u,v)+\langle D_{v}(u),\lambda \rangle ,

where $\lambda \in {\mathcal {U}}$ is a Lagrange multiplier or adjoint state variable and $\langle \cdot ,\cdot \rangle$ is an inner product on ${\mathcal {U}}$ . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

{\begin{cases}d_{u}{\mathcal {L}}(u,v,\lambda ;\delta _{u})=d_{u}J(u,v;\delta _{u})+\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =0&\forall \delta _{u}\in {\mathcal {U}},\\d_{v}{\mathcal {L}}(u,v,\lambda ;\delta _{v})=d_{v}J(u,v;\delta _{v})+\langle d_{v}D_{v}(u;\delta _{v}),\lambda \rangle =0&\forall \delta _{v}\in {\mathcal {V}},\\d_{\lambda }{\mathcal {L}}(u,v,\lambda ;\delta _{\lambda })=\langle D_{v}(u),\delta _{\lambda }\rangle =0\quad &\forall \delta _{\lambda }\in {\mathcal {U}},\end{cases}}

where $d_{x}F(x;\delta _{x})$ is the Gateaux derivative of $F$ with respect to $x$ in the direction $\delta _{x}$ . The last equation is equivalent to $D_{v}(u)=0$ , the state equation, to which the solution is $u_{v}$ . The first equation is the so-called adjoint state equation,

\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =-d_{u}J(u_{v},v;\delta _{u})\quad \forall \delta _{u}\in {\mathcal {U}},

because the operator involved is the adjoint operator of $D_{v}$ , $D_{v}^{*}$ . Resolving this equation yields the adjoint state $\lambda _{v}$ . The gradient of the quantity of interest $j$ with respect to $v$ is $\langle \nabla j(v),\delta _{v}\rangle =d_{v}j(v;\delta _{v})=d_{v}{\mathcal {L}}(u_{v},v,\lambda _{v};\delta _{v})$ (the second equation with $u=u_{v}$ and $\lambda =\lambda _{v}$ ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator $D_{v}$ is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be $J(u,v)=\langle Au,v\rangle$ , for $v\in \mathbb {R} ^{n}$ , $u\in \mathbb {R} ^{m}$ and $A\in \mathbb {R} ^{n\times m}$ , and let the state equation be $B_{v}u=b$ , with $B_{v}\in \mathbb {R} ^{m\times m}$ and $b\in \mathbb {R} ^{m}$ .

The lagrangian function of the problem is ${\mathcal {L}}(u,v,\lambda )=\langle Au,v\rangle +\langle B_{v}u-b,\lambda \rangle$ , where $\lambda \in \mathbb {R} ^{m}$ .

The derivative of ${\mathcal {L}}$ with respect to $\lambda$ yields the state equation as shown before, and the state variable is $u_{v}=B_{v}^{-1}b$ . The derivative of ${\mathcal {L}}$ with respect to $u$ is equivalent to the adjoint equation, which is, for every $\delta _{u}\in \mathbb {R} ^{m}$ ,

d_{u}[\langle B_{v}\cdot -b,\lambda \rangle ](u;\delta _{u})=-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}\delta _{u},\lambda \rangle =-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}^{\top }\lambda +A^{\top }v,\delta _{u}\rangle =0\iff B_{v}^{\top }\lambda =-A^{\top }v.

Thus, we can write symbolically $\lambda _{v}=B_{v}^{-\top }A^{\top }v$ . The gradient would be

\langle \nabla j(v),\delta _{v}\rangle =\langle Au_{v},\delta _{v}\rangle +\langle \nabla _{v}B_{v}:\lambda _{v}\otimes u_{v},\delta _{v}\rangle ,

where $\nabla _{v}B_{v}={\frac {\partial B_{ij}}{\partial v_{k}}}$ is a third order tensor, $\lambda _{v}\otimes u_{v}=\lambda _{v}^{\top }u_{v}$ is the dyadic product between the direct and adjoint states and $:$ denotes a double tensor contraction. It is assumed that $B_{v}$ has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator $B_{v}$ was self-adjoint, $B_{v}=B_{v}^{\top }$ , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in $O(m^{3})$ operations for the decomposition and $O(m^{2})$ operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only $O(m^{2})$ operations since the matrices are the same.

References

^ Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers". Structural and Multidisciplinary Optimization. 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN 1615-1488. S2CID 125712091.
^ Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
^ Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
^ McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method" (PDF). ACM Transactions on Graphics. 23 (3): 449–456. doi:10.1145/1015706.1015744. Archived (PDF) from the original on 29 January 2022. Retrieved 28 October 2022.
^ Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (PDF). Sweden: Linköping University of Technology. Archived (PDF) from the original on 9 October 2022. Retrieved 28 October 2022.
^ Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût". ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique (in French). 20 (3): 371–402. doi:10.1051/m2an/1986200303711.

External links

A well written explanation by Errico: What is an adjoint Model?
Another well written explanation with worked examples, written by Bradley [1]
More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
MIT course [2]
MIT notes [3]