Adjoint state method

Numerical method

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem.[1] It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.[2]

The adjoint state space is chosen to simplify the physical interpretation of equation constraints.[3]

Adjoint state techniques allow the use of integration by parts, resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from the dual problem[4] and is used e.g. in the Landweber iteration method.[5]

The name adjoint state method refers to the dual form of the problem, where the adjoint matrix A = A ¯ T {\displaystyle A^{*}={\overline {A}}^{T}} is used.

When the initial problem consists of calculating the product s T x {\displaystyle s^{T}x} and x {\displaystyle x} must satisfy A x = b {\displaystyle Ax=b} , the dual problem can be realized as calculating the product r T b {\displaystyle r^{T}b} ( = s T x {\displaystyle =s^{T}x} ), where r {\displaystyle r} must satisfy A r = s {\displaystyle A^{*}r=s} . And r {\displaystyle r} is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,[6] with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.

For a state variable u U {\displaystyle u\in {\mathcal {U}}} , an optimization variable v V {\displaystyle v\in {\mathcal {V}}} , an objective functional J : U × V R {\displaystyle J:{\mathcal {U}}\times {\mathcal {V}}\to \mathbb {R} } is defined. The state variable u {\displaystyle u} is often implicitly dependant on v {\displaystyle v} through the (direct) state equation D v ( u ) = 0 {\displaystyle D_{v}(u)=0} (usually the weak form of a partial differential equation), thus the considered objective is j ( v ) = J ( u v , v ) {\displaystyle j(v)=J(u_{v},v)} . Usually, one would be interested in calculating j ( v ) {\displaystyle \nabla j(v)} using the chain rule:

j ( v ) = v J ( u v , v ) + u J ( u v ) v u v . {\displaystyle \nabla j(v)=\nabla _{v}J(u_{v},v)+\nabla _{u}J(u_{v})\nabla _{v}u_{v}.}

Unfortunately, the term v u v {\displaystyle \nabla _{v}u_{v}} is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of j {\displaystyle j} , the problem

minimize   j ( v ) = J ( u v , v ) {\displaystyle {\text{minimize}}\ j(v)=J(u_{v},v)}
subject to   D v ( u v ) = 0 {\displaystyle {\text{subject to}}\ D_{v}(u_{v})=0}

has an associate Lagrangian functional L : U × V × U R {\displaystyle {\mathcal {L}}:{\mathcal {U}}\times {\mathcal {V}}\times {\mathcal {U}}\to \mathbb {R} } defined by

L ( u , v , λ ) = J ( u , v ) + D v ( u ) , λ , {\displaystyle {\mathcal {L}}(u,v,\lambda )=J(u,v)+\langle D_{v}(u),\lambda \rangle ,}

where λ U {\displaystyle \lambda \in {\mathcal {U}}} is a Lagrange multiplier or adjoint state variable and , {\displaystyle \langle \cdot ,\cdot \rangle } is an inner product on U {\displaystyle {\mathcal {U}}} . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely

{ d u L ( u , v , λ ; δ u ) = d u J ( u , v ; δ u ) + δ u , D v ( λ ) = 0 δ u U , d v L ( u , v , λ ; δ v ) = d v J ( u , v ; δ v ) + d v D v ( u ; δ v ) , λ = 0 δ v V , d λ L ( u , v , λ ; δ λ ) = D v ( u ) , δ λ = 0 δ λ U , {\displaystyle {\begin{cases}d_{u}{\mathcal {L}}(u,v,\lambda ;\delta _{u})=d_{u}J(u,v;\delta _{u})+\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =0&\forall \delta _{u}\in {\mathcal {U}},\\d_{v}{\mathcal {L}}(u,v,\lambda ;\delta _{v})=d_{v}J(u,v;\delta _{v})+\langle d_{v}D_{v}(u;\delta _{v}),\lambda \rangle =0&\forall \delta _{v}\in {\mathcal {V}},\\d_{\lambda }{\mathcal {L}}(u,v,\lambda ;\delta _{\lambda })=\langle D_{v}(u),\delta _{\lambda }\rangle =0\quad &\forall \delta _{\lambda }\in {\mathcal {U}},\end{cases}}}

where d x F ( x ; δ x ) {\displaystyle d_{x}F(x;\delta _{x})} is the Gateaux derivative of F {\displaystyle F} with respect to x {\displaystyle x} in the direction δ x {\displaystyle \delta _{x}} . The last equation is equivalent to D v ( u ) = 0 {\displaystyle D_{v}(u)=0} , the state equation, to which the solution is u v {\displaystyle u_{v}} . The first equation is the so-called adjoint state equation,

δ u , D v ( λ ) = d u J ( u v , v ; δ u ) δ u U , {\displaystyle \langle \delta _{u},D_{v}^{*}(\lambda )\rangle =-d_{u}J(u_{v},v;\delta _{u})\quad \forall \delta _{u}\in {\mathcal {U}},}

because the operator involved is the adjoint operator of D v {\displaystyle D_{v}} , D v {\displaystyle D_{v}^{*}} . Resolving this equation yields the adjoint state λ v {\displaystyle \lambda _{v}} . The gradient of the quantity of interest j {\displaystyle j} with respect to v {\displaystyle v} is j ( v ) , δ v = d v j ( v ; δ v ) = d v L ( u v , v , λ v ; δ v ) {\displaystyle \langle \nabla j(v),\delta _{v}\rangle =d_{v}j(v;\delta _{v})=d_{v}{\mathcal {L}}(u_{v},v,\lambda _{v};\delta _{v})} (the second equation with u = u v {\displaystyle u=u_{v}} and λ = λ v {\displaystyle \lambda =\lambda _{v}} ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator D v {\displaystyle D_{v}} is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensional linear programming context, the objective function could be J ( u , v ) = A u , v {\displaystyle J(u,v)=\langle Au,v\rangle } , for v R n {\displaystyle v\in \mathbb {R} ^{n}} , u R m {\displaystyle u\in \mathbb {R} ^{m}} and A R n × m {\displaystyle A\in \mathbb {R} ^{n\times m}} , and let the state equation be B v u = b {\displaystyle B_{v}u=b} , with B v R m × m {\displaystyle B_{v}\in \mathbb {R} ^{m\times m}} and b R m {\displaystyle b\in \mathbb {R} ^{m}} .

The lagrangian function of the problem is L ( u , v , λ ) = A u , v + B v u b , λ {\displaystyle {\mathcal {L}}(u,v,\lambda )=\langle Au,v\rangle +\langle B_{v}u-b,\lambda \rangle } , where λ R m {\displaystyle \lambda \in \mathbb {R} ^{m}} .

The derivative of L {\displaystyle {\mathcal {L}}} with respect to λ {\displaystyle \lambda } yields the state equation as shown before, and the state variable is u v = B v 1 b {\displaystyle u_{v}=B_{v}^{-1}b} . The derivative of L {\displaystyle {\mathcal {L}}} with respect to u {\displaystyle u} is equivalent to the adjoint equation, which is, for every δ u R m {\displaystyle \delta _{u}\in \mathbb {R} ^{m}} ,

d u [ B v b , λ ] ( u ; δ u ) = A v , δ u B v δ u , λ = A v , δ u B v λ + A v , δ u = 0 B v λ = A v . {\displaystyle d_{u}[\langle B_{v}\cdot -b,\lambda \rangle ](u;\delta _{u})=-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}\delta _{u},\lambda \rangle =-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}^{\top }\lambda +A^{\top }v,\delta _{u}\rangle =0\iff B_{v}^{\top }\lambda =-A^{\top }v.}

Thus, we can write symbolically λ v = B v A v {\displaystyle \lambda _{v}=B_{v}^{-\top }A^{\top }v} . The gradient would be

j ( v ) , δ v = A u v , δ v + v B v : λ v u v , δ v , {\displaystyle \langle \nabla j(v),\delta _{v}\rangle =\langle Au_{v},\delta _{v}\rangle +\langle \nabla _{v}B_{v}:\lambda _{v}\otimes u_{v},\delta _{v}\rangle ,}

where v B v = B i j v k {\displaystyle \nabla _{v}B_{v}={\frac {\partial B_{ij}}{\partial v_{k}}}} is a third order tensor, λ v u v = λ v u v {\displaystyle \lambda _{v}\otimes u_{v}=\lambda _{v}^{\top }u_{v}} is the dyadic product between the direct and adjoint states and : {\displaystyle :} denotes a double tensor contraction. It is assumed that B v {\displaystyle B_{v}} has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator B v {\displaystyle B_{v}} was self-adjoint, B v = B v {\displaystyle B_{v}=B_{v}^{\top }} , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in O ( m 3 ) {\displaystyle O(m^{3})} operations for the decomposition and O ( m 2 ) {\displaystyle O(m^{2})} operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only O ( m 2 ) {\displaystyle O(m^{2})} operations since the matrices are the same.

See also

References

  1. ^ Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers". Structural and Multidisciplinary Optimization. 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN 1615-1488. S2CID 125712091.
  2. ^ Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online
  3. ^ Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website
  4. ^ McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method" (PDF). ACM Transactions on Graphics. 23 (3): 449–456. doi:10.1145/1015706.1015744. Archived (PDF) from the original on 29 January 2022. Retrieved 28 October 2022.
  5. ^ Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (PDF). Sweden: Linköping University of Technology. Archived (PDF) from the original on 9 October 2022. Retrieved 28 October 2022.
  6. ^ Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût". ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique (in French). 20 (3): 371–402. doi:10.1051/m2an/1986200303711.

External links

  • A well written explanation by Errico: What is an adjoint Model?
  • Another well written explanation with worked examples, written by Bradley [1]
  • More technical explanation: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications
  • MIT course [2]
  • MIT notes [3]


  • v
  • t
  • e