Optimal transmission algorithm — benamou brenier algorithm


Benamou brenier algorithm


It is a continuous numerical method, which transforms the optimal transmission problem into an easy to deal with problem\(d+1\)Dimensional convex variational problem. We will describe it with Wasserstein’s geodesic theory (this method is to find the geodesic curve instead of finding the mapping)\(\mu_t\))。

The other two classical continuous methods are:

  • Angent hacker Tannenbaum: Based on the fact that the optimal transmission mapping should be a gradient, remove the non gradient term to reduce energy;

  • Loeper rapetti: resolution based on monger ampere equation.

Both methods require smooth and non vanishing densities, as well as special domains to process boundary data (a rectangle, or better, a torus). The problem of solving the Monge amp è re equation in a more general domain is more subtle and has been solved recently.

Benamou brenier’s conception and its numerical application

The results of sections 5.3 and 5.4 allow us to set the corresponding cost as\(|x-y|^P\)The optimization problem is rewritten in a smart way and transformed into a convex optimization problem. In fact:

  • Looking for a price\(|x-y|^P\)OT is equivalent to finding\(\mathbb W_P\)The constant velocity geodesic in space, because from the optimal plan, we can reconstruct the geodesic; It is also possible to reconstruct ot from geodesics (through their velocity fields);

  • Constant velocity geodesics can be minimized by\(\int_0^1|\mu’|(t)^pdt\)To find;

  • In the case of Wasserstein space, we have\(|\mu’|(t)^p = \int _\Omega|\mathbf v_t|^pd\mu_t\), where\(\mathbf v\)It’s a and\(\mu\)Solve the velocity field of the continuity equation together. This field is not unique, but it measures the derivative\(|\mu’|(t)\)Equal to the sum of all possible fields\(L^p\)Minimum value of norm.

    As a result of these considerations, for\(p>1\)Solve the problem of minimum kinetic energy

    \[\min \{\int_0^1\int_\Omega|\mathbf v_t|^pd\varrho_t dt:\partial_t\varrho_t + \nabla\cdot(\mathbf v_t\varrho_t) = 0,\varrho_0=\mu,\varrho_t=\nu\}

Connection selected\(\mu\)reach\(\nu\)Therefore, it is allowed to find the optimal transmission between the two measures.

​ On the other hand, variables\((\varrho_t,\mathbf v_t)\)This minimization problem has nonlinear constraints (due to product)\(\mathbf v_t\varrho_t\))And the functional is nonconvex (because\((t,x)\mapsto t|x|^p\)Is nonconvex). However, we can know that it is possible to transform it into a convex problem combined with the tools in section 5.3.1.

​ For this purpose, it is sufficient to convert variables from\((\varrho_t,\mathbf v_t)\)reach\((\varrho_t,E_t)\), where\(E_t=\mathbf v_t\varrho_t\), using functionals in space-time\(\mathscr B_p\)。 memory\(\mathscr B_p(\varrho,E):=\int f_p(\varrho,E)\), where\(f_p:\mathbb R \times \mathbb R^d \to \mathbb R \cup \{+\infty\}\), defined in lemma 5.17.

\[f_{p}(t, x):=\sup _{(a, b) \in K_{q}}(a t+b \cdot x)=\left\{\begin{array}{ll}
\frac{1}{p} \frac{|x|^p}{t^{p-1}} & \text { if } t>0, \\
0 & \text { if } t=0, x=0 \\
+\infty & \text { if } t=0, x \neq 0, \text { or } t<0,

among\(K_{q}:=\left\{(a, b) \in \mathbb{R} \times \mathbb{R}^{d}: a+\frac{1}{q}|b|^{q} \leq 0\right\}\)。 The problem becomes

Question 6.1solve

\[\left(\mathrm{B}_{p} \mathrm{P}\right) \quad \min \left\{\mathscr{B}_{p}(\varrho, E): \partial_{t} \varrho_{t}+\nabla \cdot E_{t}=0, \varrho_{0}=\mu, \varrho_{1}=v\right\}.

We can write this:\(\mathscr{B}_{p}(\varrho, E)=\int_{0}^{1} \mathscr{B}_{p}\left(\varrho_{t}, E_{t}\right) \mathrm{d} t=\int_{0}^{1} \int_{\Omega} f_{p}\left(\varrho_{t}(x), E_{t}(x)\right) \mathrm{d} x \mathrm{~d} t\),
The third expression of the function implicitly assumes\(\varrho_{t}, E_{t} \ll \mathscr{L}^{d}\)。 Indeed, as we can see in proposition 5.18, functional\(\mathscr{B}_{p}\)Have a complete representation of this form, as long as\(\varrho_{t}\) and \(E_{t}\)With respect to coupling, the same positive measure is absolutely continuous.

​ We would also like to emphasize that\(\partial_{t} \varrho_{t}+\nabla \cdot E_{t}=0, \varrho_{0}=\) \(\mu, \varrho_{1}=v\)The constraints given are actually divergent constraints in space-time (considering vectors)\(\left.(\varrho, E):[0,1] \times \Omega \rightarrow \mathbb{R}^{d+1}\right)\)。 Space boundary constraints are alreadyNo flux type\(\varrho\)The initial and final values of provide boundaries\(\{0\} \times \Omega\)and\(\{1\} \times \Omega\)Nonhomogeneous Neumann data on. Indeed, the entire constraint can be understood as\(\nabla_{t, x} \cdot(\varrho, E)=\delta_{0} \otimes \mu-\delta_{1} \otimes v\)(note, use\((t, x)\)The view of the derivative of will appear again in this section). The functional we will minimize is a 1-dimensional homogeneous functional, so\(\left(\mathrm{B}_{p} \mathrm{P}\right)\)It becomes a dynamic version of the Beckmann problem we saw in Section 4.2 (in space-time). This is if we apply the cost proposed in [196]\(|x-y|^{p}\)The resulting problem is reduced, which translates it into a homogeneous transportation cost in time and space.

​ Now, the constraint is linear and the functional is convex. However, functional (and function\(f_P\))It’s convex, but not many, because, as we said, it’s 1-homogeneous. In particular, it is not strictly convex and non differentiable. This reduces the efficiency of any gradient descent algorithm, but some improved methods can be used.

​ In [34], the author proposed a numerical method based on the convexity of variables, duality and the so-called “enhanced Lagrangian”.

notesSaddle point, Uzawa and augmented Lagrangian

Suppose we need to minimize a convex function\(f: \mathbb{R}^{N} \rightarrow \mathbb{R}\), satisfied\(k\)A linear equality constraint\(A x=b\)(among them)\(b \in \mathbb{R}^{k}\), and\(A \in \mathrm{M}^{k \times N}\))。 This question is equivalent to:

\[\min _{x \in \mathbb{R}^{N}} f(x)+\sup _{\lambda \in \mathbb{R}^{k}} \lambda \cdot(A x-b).

This gives a Lagrangian function with\(L(x, \lambda):=f(x)+\lambda \cdot(A x-b)\)Min max problem. If we believe in duality, right\(f\)Finding a minimizer under constraints is equivalent to finding a minimizer\(g(\lambda):=\inf _{x \in \mathbb{R}^{N}} f(x)+\lambda \cdot(A x-b)\)Maximizer of. This is also equivalent to\(L\)Find saddle point (a point)\((\bar{x}, \bar{\lambda})\), satisfied\(L(x, \bar{\lambda}) \geq L(\bar{x}, \bar{\lambda}) \geq L(\bar{x}, \lambda)\), for each\(x\)And each\(\lambda\), that is, a point\(x\)Minimum at\(\lambda\)(maximum at).

​ stay\(\lambda\)Maximization at is easier because the problem is unconstrained: a gradient algorithm can be used (see box 6.8). We just need to calculate\(\nabla g(\lambda)\), pass\(\nabla g(\lambda)=A x(\lambda)-b\), which is also relatively easy to obtain\(x(\lambda)\)Yes (hope only)\(x \mapsto f(x)+\lambda \cdot(A x-b)\)Minimizer of.
​ The algorithm we get from this idea is calledUzawa algorithm, as follows:\(\left(x_{k}, \lambda_{k}\right)\), set\(x_{k+1}:=x\left(\lambda_{k}\right)\)as well as\(\lambda_{k+1}=\lambda_{k}+\tau \nabla g\left(\lambda_{k}\right)=\lambda_{k}+\tau\left(A x_{k+1}-b\right)\), for a given small\(\tau\), point\(x_{k}\)The sequence converges to the minimizer of the original problem under reasonable assumptions.

​ A making point\(x(\lambda)\)The alternative ideas of easier calculation and accelerated convergence are as follows: compared with the use of Lagrangian\(L(x, \lambda)\), use the following variants\(-\widetilde{L}(x, \lambda):=L(x, \lambda)+\frac{\tau}{2}|A x-b|^{2}\), for a\(\tilde{\tau}\)The given value of. become\(\tilde{L}\)and\(L\)The conditions of a saddle point are:

\[\text { for } \tilde{L}:\left\{\begin{array}{l}
\nabla f(x)+A^{t} \lambda+\tilde{\tau}(A x-b)=0, \\
A x-b=0,
\end{array} \quad \text { for } L:\left\{\begin{array}{l}
\nabla f(x)+A^{t} \lambda=0 \\
A x-b=0

So,\(\tilde{L}\)and\(L\)The saddle points are the same, but use\(\tilde{L}\)Make the problem more convex, in\(x\)Points are easier to handle. Then, in\(\tilde{g}(\lambda):=\inf _{x}f(x)+\lambda \cdot(A x-b)+\frac{\tilde{\tau}}{2}|A x-b|^{2}\)The same gradient algorithm is used on, using step size\(\tau\)。 take\(\tau=\tilde{\tau}\), iterate the following algorithm:

x_{k+1}=\operatorname{argmin} f(x)+\lambda_{k} \cdot(A x-b)+\frac{\tau}{2}|A x-b|^{2} \\
\lambda_{k+1}=\lambda_{k}+\tau\left(A x_{k+1}-b\right)

​ The following are the main steps in conceiving this algorithm.

First, we write the constraints in weak form (according to (4.3)). That means we want to solve it

\[\min _{\varrho, E} \quad \mathscr{B}_{p}(\varrho, E)+\sup _{\phi}\left(-\int_{0}^{1} \int_{\Omega}\left(\left(\partial_{t} \phi\right) \varrho_{t}+\nabla \phi \cdot E_{t}\right)+G(\phi)\right),

Among them, we set

\[G(\phi):=\int_{\Omega} \phi(1, x) \mathrm{d} v(x)-\int_{\Omega} \phi(0, x) \mathrm{d} \mu(x),

This sup is defined by traversal\([0,1]\times \Omega\)Calculated by all functions on. (we don’t concern their regularity here, because they will be determined by\([0,1] \times \mathbb R^d\)Function representation defined on a point of a grid in

Note 6.2Let’s observe the relationship between the above problem and Hamilton Jacobi equation. We will consider the simplest case,\(p=2\)。 In this case, we can write the question as follows:

\[\min _{(E, \varrho): \varrho \geq 0} \int_{0}^{1} \int_{\Omega} \frac{|E|^{2}}{2 \varrho}+\sup _{\phi}-\int_{0}^{1} \int_{\Omega}\left(\left(\partial_{t} \phi\right) \varrho+\nabla \phi \cdot E\right)+G(\phi),

Among them, we use\(\mathscr{B}_{2}\)It is effective under absolute continuous measures, where\(\varrho \geq 0\)(if)\(\varrho=0\), we must make\(E=0\), in order to get limited energy).

​ If we exchange inf and sup, we will get the following question:

\[\sup _{\phi} \quad G(\phi)+\inf _{(E, \varrho): \varrho \geq 0} \int_{0}^{1} \int_{\Omega}\left(\frac{|E|^{2}}{2 \varrho}-\left(\partial_{t} \phi\right) \varrho-\nabla \phi \cdot E\right) .

We can first calculate the optimal solution\(E\), for fixed\(\varrho\)and\(\phi\)Therefore, we get\(E=\varrho \nabla \phi\)。 The problem becomes

\[\sup _{\phi} \quad G(\phi)+\inf _{\varrho \geq 0} \int_{0}^{1} \int_{\Omega}\left(-\left(\partial_{t} \phi\right) \varrho-\frac{1}{2}|\nabla \phi|^{2} \varrho\right) .

The condition that the infimum is finite (and therefore vanishes) is\(\partial_{t} \phi+\frac{1}{2}|\nabla \phi|^{2} \leq 0\)Under the optimal condition, we must have equality\(\{\varrho>0\}\)Come on. This gives the Hamilton Jacobi equation:

\[\partial_{t} \phi+\frac{1}{2}|\nabla \phi|^{2}=0 \quad \varrho \text { -a.e. }

Moreover, from the optimal\(\phi\)Yes, we can use it\(\psi(x):=\phi(1, x)\)and\(\varphi(x):=-\phi(0, x)\)To restore the Kantorovich potential.
In this variational problem,\(\mathscr{B}_{p}\)It may be expressed as sup, so we get:

\[\min _{\varrho, E} \sup _{(a, b) \in K_{q}, \phi} \int_{0}^{1} \int_{\Omega}\left(a(t, x) \mathrm{d} \varrho+b(t, x) \cdot \mathrm{d} E-\partial_{t} \phi \mathrm{d} \varrho-\nabla \phi \cdot \mathrm{d} E\right)+G(\phi).

​ order\(m=(\varrho, E)\)。 here\(m:[0,1] \times \Omega \rightarrow \mathbb{R}^{d+1}\)Is a definition in\(d+1\)Dimensional space\(d+1\)Dimensional vector field. Here, we don’t consider\(m\)Is it a measure or a real function, because we will consider it in a discrete setting anyway,\(m\)Will be a\([0,1] \times \mathbb{R}^{d}\)A function defined at each point in a grid. Similarly, let $\ xi = (a, b) $. We use it, too\(\nabla_{t, x} \phi\)express\(\phi\)Spatiotemporal gradient, i.e\(\nabla_{t, x} \phi=\left(\partial_{t} \phi, \nabla \phi\right)\)

​ The question is rewritten as:

\[\min _{m} \sup _{\xi, \phi: \xi \in K_{q}}\left\langle\xi-\nabla_{t, x} \phi, m\right\rangle+G(\phi).

​ This is the idea of using the augmented Lagrangian method. In fact, the above question is reminiscent of Lagrange, but in the opposite way. We must consider the dual variable to be\(m\), the original variable is\((\xi, \phi)\)。 function\(f(\xi, \phi)\)Include item\(G(\phi)\)And constraints\(\xi \in K_{q}\)There is also an equality constraint\(\xi=\nabla_{t, x} \phi\)。 In fact, we don’t care about the original constraint problem that produces this Lagrangian, but intend to add an item for optimization\(\frac{\tau}{2}\left|\xi-\nabla_{t, x} \phi\right|^{2}\)(for a small step size)\(\tau\))。

​ Therefore, we look for solutions to the following problems:

\[\min _{m} \sup _{\xi, \phi: \xi \in K_{q}}\left\langle\xi-\nabla_{t, x} \phi, m\right\rangle+G(\phi)-\frac{\tau}{2}\left|\xi-\nabla_{t, x} \phi\right|^{2}

​ To find the optimal algorithm, we need to do the following work: generate a sequence\(m_{k}\), for each\(k\), find the best\(\left(\xi_{k}, \phi_{k}\right)\)。 Therefore, compared with finding the optimal solution accurately\(\left(\xi_{k}, \phi_{k}\right)\), we will use two steps to optimize (first, fix)\(\xi\), optimization\(\phi\), for this\(\phi\), re optimization\(\xi\))。

​ The algorithm includes the following three iterative steps. Suppose we have a triple\(\left(m_{k}, \xi_{k}, \phi_{k}\right)\)

  • In given\(m_{k}\)and\(\xi_{k}\)In this case, the optimal solution is found by solving the following formula\(\varphi_{k+1}\),

    \[ \max _{\varphi}-\left\langle\nabla_{t, x} \varphi, m_{k}\right\rangle+G(\varphi)-\frac{\tau}{2}\left\|\xi_{k}-\nabla_{t, x} \varphi\right\|^{2},

​ This boils down to minimizing about\(\nabla_{t, x} \varphi\)Second problem. The solution can be obtained by solving the Laplace equation\(
\tau \Delta_{t, x} \varphi=\nabla \cdot\left(\tau \xi_{k}-m_{k}\right)\)
To find, as well as a spatiotemporal Laplacian operator and Neumann boundary conditions. These conditions are homogeneous with respect to space because\(G\)The existence of\(t=0\)and\(t=1\)Time is nonhomogeneous. Most Laplace solvers are available in\(O(n \log n)\)The multiple complexity of the solution is found here\(n\)Is the number of discrete points.

  • given\(m_{k}\)and\(\varphi_{k+1}\)In this case, the optimal solution is found by solving the following formula\(\xi_{k+1}\),

    \[\max _{\xi \in K_{q}}\left\langle\xi, m_{k}\right\rangle-\frac{\tau}{2}\left|\xi-\nabla_{t, x} \varphi_{k+1}\right|^{2},

    By expanding the square term, we see that this problem is equivalent to calculation\(\nabla_{t, x} \varphi_{k+1}+\frac{1}{\tau} m_{k}\)And no gradient appears in this optimization step. The optimization algorithm is executed point by point\((t, x)\)Pick point\(\xi=(a, b)\), this point is on a convex set\(K_{q}\)Closest in\(\nabla_{t, x} \varphi_{k+1}(t, x)+\frac{1}{\tau} m_{k}(t, x)\)Point. If we have one in\(\mathbb{R}^{n+1}\)If the projection algorithm requires a constant operation, the cost of this point-by-point step is\(O(N)\)

  • Finally, we set\(m_{k+1} \leftarrow m_{k}-\tau\left(\xi_{k+1}-\nabla_{t, x} \varphi_{k+1}\right)\)To update\(m\)

    ​ This algorithm needs a total of in each iteration process\(O(N \log N)\)Second operation (n is given by discretization in space-time). It can be proved to be convergent. Compared with other algorithms, benamou brenier method has some important advantages:

    1. So far, it is almost the only method to deal with the disappearance of density effortlessly, and there is no need for special assumptions about their support;

    2. No specific quadratic cost is required: we use power cost\(c(x,y)=|x-y|^p\)It is proposed, but it can also be applied to other convex cost functions\(h(x-y)\), and the cost function generated by all Lagrange actions; It can contain costs depending on location and time, and is suitable for Riemannian manifolds;

    3. It may be appropriate to consider density\(\varrho\)Convex constraints on (e.g., lower or upper bounds);

    4. It can deal with different “dynamic” problems, add penalties on density or speed, as occurs in mean field game theory (see Chapter 8.4.4), and has been used, for example, in [35];

    5. It can deal with multiple populations with possible interaction behavior.

​ We refer to [96248] to numerically process the variants of points 2 and 3 above, and refer to [37] to realize multiple population cases. Here we give a simple graph (Fig. 6.1), the simplest case obtained by benamou brenier method on a torus.


Geodesic between a Gaussian distribution of 2D torus and the same Gaussian distribution with a vector (1 / 2,1 / 2) displacement. Note that in order to avoid cut locus, the initial density is divided into four pieces.