This course is on finite dimensional optimization, which means having finite number of parameters. We now went over the syllabus. Here is a summary:
Homeworks will be graded using E,S,U grades. Most will get S, few will get E. Matlab will be used.
Cardinal rule Develop an answer using given material in class only. Can use other basic things like Laplace transform, etc...
A wise person once said: “Fundamental difficulties are invariant under reformulation”.
In a problem, identify the objective function, constraints and variables. Optimization problems from different fields can be formulated into a common framework for solving using optimization methods.
Ingredients in this case: Set \(U\subseteq \Re ^{n}\), the constraint set. Problem has \(n\) parameters (the decision variables). Therefore \(u\in U\). One dimensional problem has \(n=1\). An example is to find optimal resistor value where \(U=100\cdots 200\) Ohms. For \(n=2\), an example is to find two resistors \(u= \begin{bmatrix} u_{1}\\ u_{2} \end{bmatrix} \) with \(100\leq u_{1}\leq 200\) and \(300\leq u_{2}\leq 400\).
Reader Often we describe \(U\) graphically in \(\Re ^{n}\). Typically for only \(n=1,2,3\). Do this for the above example.
Reader design a bandpass filter with passband from \(\omega _{1}\) to \(\omega _{2}\) with \(\omega _{1},\omega _{2}>0\). Describe graphically the set \(U\).
Reader Often \(U\) is sphere in \(\Re ^{n}\) described by \({\displaystyle \sum \limits _{i=0}^{n}} \left ( u_{i}-u_{i,0}\right ) ^{2}\leq R\) where \(u_{i},_{0}\) are coordinates of center of sphere.
Reader Suppose we are designing an input voltage \(u\left ( t\right ) \) on \(t\in \left [ 0,1\right ] \) such that \(\int _{0}^{1}u^{2}\left ( t\right ) dt\leq 5\). This does not fit in above framework. This is function space problem.
An important class of \(U\): A generalization of rectangle in 2D to hypercube in \(\Re ^{n}\) with \(\left \vert u_{i}-u_{i,0}\right \vert \leq r_{i}\) for \(i=1\cdots n\).
Reader For \(n=3\) and \(u_{0}=\begin{bmatrix} u_{1,0}\\ u_{2,0}\\ u_{3,0}\end{bmatrix} =\begin{bmatrix} 1\\ 1\\ 1 \end{bmatrix} \) and \(r_{1}=1,r_{2}=2,r_{3}=3\), then sketch \(U\).
Unconstrained problem is one when we say \(U=\Re ^{n}\). There are no constraints on \(U\). These are easier to solve than constrained problems. In practice, there will always be constraints. \(U\) is sometimes called the set of decision variables, or also called the input. The other criteria, is the objective function \(J\left ( u\right ) \), or more formally \(J:\Re ^{n}\rightarrow \Re \). The objective function \(J\left ( u\right ) \) is obtained from your goals.
Example we want to go from \(A\) to \(B\) in least time. Often the selection of \(J\left ( u\right ) \) is not straight forward. Selection of \(J\left ( u\right )\) can be difficult in areas such as social or environmental science.
In stock markets, \(J(u)\) is given and we just use it. Example is \(J(u) = u^{2}+6u+e^{-u}\) in \(\Re ^{n}\). Often we test the algorithm first in \(\Re ^{1}\) or \(\Re ^{2}\) before going to higher dimensions. In \(\Re ^{4}\) an example is \[ J(u) =u_1 u_2-3 u_1 u_3 u_4+6u_2 u_3-6 u_1-4 u_2-u_3 u_4+12 \] This has \(16\) vertices.
Reader Can you find max \(J(u)\) subject to \(U\) described by \(\left \vert u_{i}\right \vert \leq i\) using common sense?
Answer It will be on the vertices of hypercube.
In VLSI, with hundreds of resistors, there are \(2^{n}\) vertices, so the problem becomes computationally harder to solve very quickly. To proof that the optimal value is at a vertex, the idea is to freeze all other variables except for one at a time. This gives a straight line in the free variable. Hence the optimal is at ends of the line.
The main case problem: Find \(u^{\ast }\) that minimizes \(J\left ( u\right ) \) with the constraints \(u\in U\). \(u^{\ast }\) might not exist. When it exists, then \[ J\left ( u^{\ast }\right ) =J^{\ast }=\min _{u\in U}J\left ( u\right ) \] When we begin, we do not know if \(u^{\ast }\) exist. We write \[ J\left ( u^{\ast }\right ) =\inf _{u\in U}J\left ( u\right ) \] Example \(U=\Re ^{n}\), \(J\left ( u\right ) =e^{-u}\). So \(u^{\ast }\) do not exist. We do not allow \(u^{\ast }\) to take values \(\pm \infty \). But we allow \(J\left ( u^{\ast }\right ) \) itself to become \(\pm \infty \). For example, if \(J\left ( u\right ) =u^{2}\) then \(u^{\ast }=0\).
Tractability. \(U\subseteq \Re ^{n}\), the decision variables. \(J:\Re ^{n}\rightarrow \Re \). Problem: find \(u^{\ast }\) such that \(J\left ( u^{\ast }\right ) =\inf _{u\in U}J\left ( u\right ) \).
\(u^{\ast }\), when it exists, is called the optimal element. We do not allow \(u^{\ast }=\pm \infty \), but allow \(J\left ( u^{\ast }\right ) =\pm \infty \). For example. \(J\left ( u\right ) =\frac{1}{u}\) on \(U=\left ( 0,\infty \right ) \). We say \(\sup _{u\in U}J\left ( u\right ) =J^{\ast }=0\,\) but \(u^{\ast }=\infty \) do not exist. Hence \(J^{\ast }\) is a limiting supremum value. Another example is \(J\left ( u\right ) =-u\) on \(\Re \). Therefore \(J^{\ast }=-\infty \) but \(u^{\ast }\) do not exist.
Reader: We can consider \(\max _{u\in U}J\left ( u\right ) \equiv \sup _{u\in U}J\left ( u\right ) \). Note that \[ \max _{u\in U}J\left ( u\right ) =-\min _{u\in U}J\left ( u\right ) \] But \(u_{\min }^{\ast }\neq u_{\max }^{\ast }\).
Some problems will be tractable and some are not. Some problems will not have defined algorithms and some might not have certified algorithms. Some problems have algorithms but are not tractable. NP hard problems can be either tractable or not tractable. What about stochastic problems? If \(u\) is random variable, we can not write \(J\left ( u\right ) \). But instead we work with \(\tilde{J}\left ( u\right ) =E\left ( J\left ( u\right ) \right ) \) where \(E\) is expectation. So now \(\tilde{J}\left ( u\right ) \) fits in the frameworks we used earlier.
We also need to make distinction between explicit and implicit \(J\left ( u\right ) \). When we write \(J\left ( u\right ) =u_{1}^{2}+e^{u_{2}}\cos u_{1}+u_{2}\), then \(J\left ( u\right ) \) here is explicit. But if we have a circuit as below, where \(J\left ( u\right ) =V_{out}\), then here \(J\left ( u\right ) \) is implicit, since we have to solve the complicated RLC circuit to find \(J\left ( u\right ) \), so we implicitly assume \(J\left (u\right ) \) exists.
First detailed example Optimal farming example. Let \(y\left ( k\right ) \) be the annual crop value where \(k\) is the year. This is the end of year value of the crop. At end of year, we use some of this to invest and the rest we keep as profit. Let \(u\left ( k\right ) \in \left [ 0,1\right ] \) be the fraction of \(y\left ( k\right ) \) invested back. Therefore \(\left ( 1-u\left ( k\right ) \right ) \) is the fraction of \(y\left ( k\right ) \) which is taken out as profit. Dynamics of the problem are: \(y\left ( k+1\right ) =y\left ( k\right ) \) if we invest nothing (i.e. \(u=0\)). But if we invest, then \[ y\left ( k+1\right ) =y\left ( k\right ) +\omega \left ( k\right ) \overset{\text{amount invested}}{\overbrace{u\left ( k\right ) y\left ( k\right ) }}\] Where \(\omega \left ( k\right ) \) is an independent and identically distributed (i.i.d.) random variable which depends on the weather and other variables. Let \(E\left ( \omega \left ( k\right ) \right ) =\varpi \). We have \(N\) years planning horizon. What about \(J\left ( u\right ) \)? If we model things as a convex problem, we can solve it, but if we do not, it becomes hard to solve.\begin{equation} J\left ( u\right ) =E\left ( y\left ( k\right ) +{\displaystyle \sum \limits _{k=0}^{N-1}} y\left ( k\right ) \left ( 1-u\left ( k\right ) \right ) \right ) \tag{1} \end{equation}
The set \(U\) here is \(\left \{ u\left ( 0\right ) ,u\left ( 1\right ) ,\cdots ,u\left ( N-1\right ) \right \} \). In this example \(J\left ( u\right ) \) is implicit. We need to make \(J\left ( u\right ) \) explicit. Let us now calculate \(J\left ( u\right ) \) for \(N=2\)\begin{align*} y_{1} & =y_{0}+\omega _{0}u_{0}y_{0}\\ & =y_{0}\left ( 1+\omega _{0}u_{0}\right ) \end{align*}
In class, we assumed that \(y_{0}=1\). But will keep it here as \(Y\), which is the initial conditions.\begin{equation} y_{1}=Y\left ( 1+\omega _{0}u_{0}\right ) \tag{2} \end{equation} Now for the second year, we have\begin{align*} y_{2} & =y_{1}+\omega _{1}u_{1}y_{1}\\ & =y_{1}\left ( 1+\omega _{1}u_{1}\right ) \end{align*}
Substituting (2) into the above gives\begin{align*} y_{2} & =Y\left ( 1+\omega _{0}u_{0}\right ) \left ( 1+\omega _{1}u_{1}\right )\\ & =Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1} \end{align*}
Therefore from (1) we have for \(N=2\)\begin{align*} J\left ( u\right ) & =E\left ( y_{2}+{\displaystyle \sum \limits _{k=0}^{1}} y_{k}\left ( 1-u_{k}\right ) \right ) \\ & =E\left ( \overbrace{Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}}^{y}_{2}+\left [ Y\left ( 1-u_{0}\right ) +y_{1}\left ( 1-u_{1}\right ) \right ] \right ) \\ & =E\left ( Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}+\left [ Y\left ( 1-u_{0}\right ) +Y\left ( 1+\omega _{0}u_{0}\right ) \left ( 1-u_{1}\right ) \right ] \right ) \\ & =E\left ( Y+Y\omega _{0}u_{0}+Y\omega _{1}u_{1}+Y\omega _{0}\omega _{1}u_{0}u_{1}+Y-Yu_{0}+Y-Yu_{1}+Y\omega _{0}u_{0}-Y\omega _{0}u_{0}u_{1}\right ) \\ & =Y\left ( 3+\varpi u_{0}+\varpi u_{1}+\varpi ^{2}u_{0}u_{1}-u_{0}-u_{1}+\varpi u_{0}-\varpi u_{0}u_{1}\right ) \\ & =Y\left ( 3+2\varpi u_{0}+u_{1}\left ( \varpi -1\right ) +\varpi ^{2}u_{0}u_{1}-u_{0}-\varpi u_{0}u_{1}\right ) \\ & =Y\left ( 3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \right ) \end{align*}
Reader Use syms to find \(J\left ( u\right ) \) for \(N=4\).
If we want maximum profit, then common sense says that \(\varpi \) should be large (good weather). Then \(u^{\ast }=\left ( 1,1\right ) \). This says \(u_{0}=1\) and \(u_{1}=1\). In other words, we invest everything back into the crop each year. The above was obtained by maximizing term by term since all terms are positive. If \(\varpi <\frac{1}{2}\) (bad weather), then \(u^{\ast }=\left ( 0,0\right ) \) and all coefficients are negative.
In the above, \(J\left ( u\right ) \) is multilinear function in \(u_{0},u_{1}\). If all \(u_{k}\) are held constant except for one at a time, then \(J\left ( u\right )\) becomes linear in the free parameter. In HW 1, we need to proof that extreme point of a multilinear function is at a vertex. So for this problem, the possible solutions are \(\left ( 0,0\right ) ,\left ( 0,1\right ) ,\left ( 1,0\right ) ,\left ( 1,1\right ) \).
Reader Is there a value of \(\varpi \) that gives \(\left ( 1,0\right ) \) and \(\left ( 0,1\right ) \)?
For arbitrary \(N\) there are \(2^{N}\) vertices in hypercube for a multilinear \(J\left ( u\right ) \) where \(u\in \Re ^{n}\).So for large \(n\), a multilinear problem is much harder to solve than a convex optimization problem. Even simple problems demonstrate intractability. Let \(M=N\times N\) matrix. Let each every \(m_{ij}\) be known within bounds \(m_{ij}^{-}\leq m_{ij}\leq m_{ij}^{+}\). The question is: What are the bounds on the determinant of matrix \(M\)?
Reader ponder this question in the context of multilinear problem. To determine this for small \(n\) in Matlab, here is some code for \(n=4\) (uses allcomb which is a file exchange file)
Running the above gives
Today will be on multilinear functions, then back to infimum of \(J\left ( u\right ) \). Then we will go over real analysis to see when a \(\min \) is reached. We need to find first if there exists a minimum before starting to search for it. For the farming problem we looked at, say \(\varpi =2\). We want to look at contours in \(\Re ^{2}\) and \(\Re ^{3}\). Try first in \(\Re ^{2}\).
Reader Obtain level sets, contour lines, defined as \(\Lambda _{\gamma }=\left \{ u:J\left ( u\right ) =\gamma \right \} \). In words, contour line is curve of equal values of \(J\left ( u\right ) \). For farming problem, show contour lines of \(J\left ( u\right ) \) looks like
For \(\varpi =2\), and using results from last lecture, where we had \[ J\left ( u\right ) =Y\left ( 3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \right ) \] Where \(Y=y\left ( 0\right ) =1\), the above becomes\begin{equation} J\left ( u\right ) =3+u_{0}\left ( 2\varpi -1\right ) +u_{1}\left ( \varpi -1\right ) +u_{0}u_{1}\left ( \varpi ^{2}-\varpi \right ) \tag{1} \end{equation}
Reader Use Matlab syms to obtain \(J\left ( u\right ) \) for any \(N\)
Answer Below is a function which generates \(J\left ( u\right ) \) for any \(N\). Function called nma_farm(N,y0), takes in \(N\), which is how many years and \(y\left ( 0\right ) \), the initial value. \(\omega 0\) below is \(\varpi \), the mean of the \(\omega \) random variable).
Here are example run outputs
At \(u^{\ast }=\left ( u_{0},u_{1}\right ) =\left ( 1,1\right ) \) then (1) becomes \begin{align*} J^{\ast } & =3+\left ( 4-1\right ) +\left ( 2-1\right ) +\left ( 4-2\right ) \\ & =9 \end{align*}
Reader Look at \(J\left ( u\right ) \) in \(\Re ^{3}\) for the farming problem. Today’s topic: When does the optimal element \(u^{\ast }\) exist?
When we have more than one objective function, \(J_{i}\left ( u\right ) :U\subseteq \Re ^{n}\rightarrow \Re \), for \(i=1\cdots m\). We call \(u^{\ast }\in U\) a Pareto optimal, if the following holds: Given any other \(u\in U\), we can not have the following two relations be true at the same time:
This is related to efficiency in economics. So something that is not Pareto optimal, can be eliminated.
Reader Say \(U=\left ( 0,\infty \right ) \), \(J_{1}\left ( u\right ) =u+1,J_{2}\left ( u\right ) =\frac{1}{u+1}\), describe the set of Pareto optimal solutions.
Reader By creating \(J\left ( u\right ) =\alpha _{1}J_{1}\left ( u\right ) +\alpha _{2}J_{2}\left ( u\right ) \), with \(\alpha _{i}\geq 0\), show the solution (optimal \(J\left ( u\right ) \)), depends on \(\alpha _{i}\) which reflects different utility functions.
Now we will talk about existence of optimal element \(u^{\ast }\). We will always assume that \(J\left ( u\right ) \) is continuous function in \(u\in U\). Two different conditions on the set \(U\) can be made
In both cases, we still want conditions on \(J\left ( u\right ) \) itself as well. We begin with the definition of \[ \boxed{ J^{\ast }=\inf _{u\in U}J\left ( u\right ) } \] Remember that we do not allow \(u=\pm \infty \), but \(J^{\ast }\) can be \(\pm \infty \).
Example \(J\left ( u\right ) =-u\) on \(\Re \). Then \(J^{\ast }=-\infty \), but there is no optimal element \(u^{\ast }=-\infty \). But it is always true that there is a sequence \(\left \{ u^{k}\right \} _{k=1}^{\infty }=u^{k}\) such that \(J\left ( u^{k}\right ) \rightarrow J^{\ast }\) even though there is no \(u^{\ast }\) element. We write\[ \lim _{k\rightarrow \infty }J\left ( u^{k}\right ) =J^{\ast }\] A bad set \(U\) is often one which is not closed. Example is of a gas pedal which when pressed to the floor will cause the car to malfunction, but the goal is to obtain the shortest travel time, which will require one to press the gas pedal to the floor in order to obtain the highest car speed.
This shows that there is no optimal \(u^{\ast }\) but we can get as close to it as we want. This is an example of a set \(U\) that is not closed on one end. We always prefer to work with closed sets. An open set is one such as \(U=(0,\frac{\pi }{4}]\), and a closed set is one such as \(U=\left [ 0,\frac{\pi }{4}\right ] \).
Definition of continuity If \(u^{k}\rightarrow u_{0}\) then \(J\left ( u^{k}\right ) \rightarrow J\left ( u_{0}\right ) \), we write \(J\left ( u^{0}\right ) =\lim _{k\rightarrow \infty }J\left ( u^{k}\right ) \). This is for every sequence \(u^{k}\).
Equivalent definition: There is continuity at \(u^{0}\) if the following holds: Given any \(\varepsilon \geq 0\,\ \) there exists \(\delta \) such that \(\left \vert J\left ( u\right ) -J\left ( u^{0}\right ) \right \vert <\varepsilon \), whenever \(\left \vert u-u^{0}\right \vert <\delta \).
Closed sets Includes all its limit points. Examples:
A set can be both open and closed. \(\Re \) is such a set. To show a set is closed, we need to show that the limit of any sequence is also in the set.
The intersection of closed sets remains closed, but the union can be an open set. This is important in optimization. If \(U_{i}\) represent constraint sets, then the intersection of the constraints remain a closed set. Closed sets also contain its boundaries.
Now we will talk about boundness of set. A set is bounded if we can put it inside a sphere of some radius. We always use the Euclidean norm. If a set is bounded, using one norm, then it is bounded in all other definitions of norms. There is more than one definition of a norm. But we will use Euclidean norm.
We like to work with compact sets. A Compact set is one which is both bounded and closed. These are the best for optimization.
Each bounded sequence in \(\Re ^{n}\) has a convergent sub-sequence. This is useful, since it says if the sequence is bounded, then we can always find at least one sub-sequence in it, which is convergent. For example \(u^{k}=\cos k\). This does not converge, but is has a subsequence in it which does converge. The same for \(u^{k}=\left (-1\right ) ^{k}\).
We will spend few minutes to review existence of optimal solutions then we will talk about computability and convexity. We know now what \(J\left ( u\right ) \) being continuous means. Closed sets are very important for well posed optimization problems. Typical closed set is \(u\leq k\), where \(k\) is constant. If the function \(J\left ( u\right ) \) where \(u\in U\), is a continuous function, then the set \(U\) must be closed.
Reader Give a proof.
We talked about \(U\) being closed and bounded (i.e. compact). Compact sets are best for optimization. We talked about sequence \(\left \{ u^{k}\right \} _{k=1}^{\infty }=u^{k}\) and a subsequence in this sequence \(\left \{ u^{k_{i}}\right \} _{i=1}^{\infty }=u^{k_{i}}\). If \(u^{k}\) converges to \(u^{\ast }\) then we say \(\lim _{k\rightarrow \infty }\left \Vert u^{k}-u^{\ast }\right \Vert =0.\) Finally, we talked about Bolzano-Weierstrass theorem.
Suppose \(J:U\rightarrow \Re \) is continuous and assume \(U\) is compact (i.e. bounded and closed) and non-empty, then there exists an optimal element \(u^{\ast }\in U\,\ \)such that \(J\left ( u^{\ast }\right ) =J^{\ast }=\min _{u\in U}J\left ( u\right ) \). This does not say that \(u^{\ast }\) is unique. Just that it exists.
Proof Let \(u^{k}\in U\) be a sequence such that \(J\left ( u^{k}\right ) \rightarrow J^{\ast }\), then by Bolzano-Weierstrass \(u^{k_{i}}\) be a convergent subsequence with limit \(u^{\ast }\in U\).
Reader Show that \(J\left ( u^{k_{i}}\right ) \) also converges to \(J^{\ast }\). Note: \(J\left ( u^{k_{i}}\right ) \) is sequence of real numbers, which converges to a real number \(J^{\ast }\).
Example \(u^{k}=\left ( -1\right ) ^{k}\). Let \(J\left ( u\right ) =e^{-u^{2}}\) then \(J\left ( u\right ) \) converges, but \(u^{k}\) does not. Hence we need to look for subsequence \(u^{k_{i}}\) in \(u^{k}\). Now, by continuity, \(J\left ( u^{\ast }\right ) =\lim _{i\rightarrow \infty }J\left ( u^{k_{i}}\right ) =J^{\ast }\).
There are many problems where the set is open, as in unconstrained problems. These are called open cone problems.
Coercive function Suppose \(U\subset \Re ^{n}\) and \(J:U\rightarrow \not \Re \). We say that \(J\) is positive coercive if \begin{equation} \boxed{ \lim _{\left \Vert u\right \Vert \rightarrow \infty }J\left ( u\right ) =\infty \tag{*} } \end{equation} Initially, think of \(U\) as the whole \(\Re ^{n}\) space. So \(J\left ( u\right ) \) blows up at \(\infty \). Note: there is a type of uniform continuity implied by Eq (*). What Eq (*) means is that given any \(\gamma \ggg 0\), arbitrarily large, there exists radius \(R\) such that \(J\left ( u\right ) >\gamma \), whenever \(\left \Vert u\right \Vert >R\). This basically says that for a Coercive function we can always find a sphere, where all values of this function are larger than some value for any \(\left \Vert u\right \Vert >R\). This is useful, if we are searching for a minimum, in that we can obtain a cut off on the values in \(U\) to search for.
Example \(J\left ( u\right ) =u^{2}\) is positive coercive, but \(J\left ( u\right ) =e^{u}\) is not (since we can’t find a sphere to limit values within it to some number), since as \(u\rightarrow -\infty \), the function \(e^{u}\) does not blow up. \[ J\left ( u\right ) =au^{2}+bu+c \] Is coercive only when \(a>0\). The most famous coercive function is the positive definite quadratic form. Let \(A\) be \(n\times n\) symmetric and positive definite matrix and let \(b\in \Re ^{n}\). Let \(c\) be any real number, then \(J\left ( u\right ) =u^{T}Au+b^{T}u+c\) is coercive. This is essential to quadratic programming.
Why is uT Au + bT u + c coercive? From matrix algebra, if \(A\) is \(n\times n\) symmetric and \(x\in \Re ^{n}\), then \(\lambda _{\min }\left \Vert x\right \Vert ^{2}\leq x^{T}Ax\leq \lambda _{\max }\left \Vert x\right \Vert ^{2}\). For \(x,y\in \Re ^{n}\), by Schwarz inequality, \(\left \vert \left \langle x,y\right \rangle \right \vert ^{2}\leq \left \langle x,x\right \rangle \left \langle y,y\right \rangle \) or \(x^{T}y\leq \left \Vert x\right \Vert \left \Vert y\right \Vert \), to establish \(J\left ( u\right ) =u^{T}Au+b^{T}u+c\,\ \)notice then \(J\left ( u\right ) \geq \lambda _{\min }\left ( A\right ) \left \Vert u\right \Vert ^{2}-\left \Vert b\right \Vert \left \Vert u\right \Vert +c\). Since \(A\) is positive definite, then \(\lambda _{\min }\left ( A\right ) >0\) (smallest eigenvalue must be positive). So this is the same as scalar problem \(au^{2}+by+c\). Hence \(J\left ( u\right ) \) is coercive function in \(u\).
Reader What if we have a physical problem, leading to \(u^{T}Au+b^{T}u+c\), but \(A\) is not symmetric, what to do? Solution: We can always work with the symmetrical part of \(A\) using \(u^{T}Au=\frac{1}{2}u^{T}\left ( A+A^{T}\right ) u\), hence we work with \[ J\left ( u\right ) =\frac{1}{2}u^{T}\left ( A+A^{T}\right ) u+b^{T}u+c \] Instead.
Coercivity theorem Suppose \(J:U\rightarrow \Re \) is a continuous function and coercive. And \(U\subseteq \Re ^{n}\) is closed. Then an optimal element \(u^{\ast }\in U\) exist minimizing \(J\left ( u\right ) \). Reader Consider the maximization problem instead of minimization.
Now we start on a new topic: Convexity. Toward finding optimal. A set \(U\subseteq \Re ^{n}\) is said to be convex set if the following holds: For any \(u^{0},u^{1}\in U\), and \(\lambda \in \left [ 0,1\right ] \) then it follows that \(\lambda u^{0}+\left ( 1-\lambda \right ) u^{1}\in U\). In words, this says that all points on the straight line between any points in the set, are also in the set. Examples:
Reader If \(U_{1},U_{2}\) are both convex then show the intersection is also convex.
What about countable intersections \(\cap _{i=1}^{\infty }U_{i}\) ?
Answer Yes. Example \(U_{i}=\left \{ u\in \Re ^{n}:\left \Vert u\right \Vert \leq e^{-i}\right \} \). The union is not a convex set.
Constraints using OR are union. So harder to work with OR constraints, since union of convex sets is not convex.
Reader Describe all possible convex sets on \(\Re ^{1}\).
Reader Suppose \(U_{i}\) is defined by set of inequalities \(a_{i}^{T}u\leq b_{i}\), \(i=1\cdots m\), these are intersections of sets. This is used in Linear programming.
Back to the most important topic in optimization, which is convexity. We want to build on convex sets.
Definition A set \(U\subseteq \Re ^{n}\) is convex if the following holds: Given any \(u^{0},u^{1}\in U\,\ \) and \(\lambda \in \left [ 0,1\right ] \), then \(u^{\lambda }=\left ( 1-\lambda \right ) u^{0}+\lambda u^{1}\) is also in \(U\). We will discuss convex function also soon.
Reader Show that the set \(\left \{ u\in \Re ^{n},\left \Vert u\right \Vert \leq r\right \} \) is convex.
Answer: Let \(\left \Vert u_{1}\right \Vert \leq r,\left \Vert u_{2}\right \Vert \leq r\). Then \(\left \Vert \left ( 1-\lambda \right ) u_{1}+\lambda u_{2}\right \Vert \leq \left \Vert \left ( 1-\lambda \right ) u_{1}\right \Vert +\left \Vert \lambda u_{2}\right \Vert \) by triangle inequality. Hence
\[ \left \Vert \left ( 1-\lambda \right ) u_{1}+\lambda u_{2}\right \Vert \leq \left ( 1-\lambda \right ) \left \Vert u_{1}\right \Vert +\lambda \left \Vert u_{2}\right \Vert \]
If some center point, say \(\hat{u}\) is given, then \(\left \{ u:\left \Vert u-\hat{u}\right \Vert \leq r\right \} \) is convex set. The translation \(u+\hat{u}\) is also convex set. Other norms on \(\Re ^{n}\) are important. \(\left \Vert u\right \Vert _{1}={\displaystyle \sum \limits _{i=1}^{n}} \left \vert u_{i}\right \vert \) and \(\left \Vert u\right \Vert _{\infty }=\max \left \{ \left \vert u_{1}\right \vert ,\left \vert u_{2}\right \vert ,\cdots ,\left \vert u_{n}\right \vert \right \} \).
Reader In \(\Re ^{2}\), sketch unit ball \(\left \{ u:\left \Vert u\right \Vert \leq r\right \} \) using norms \(\left \Vert u\right \Vert _{1}\) and \(\left \Vert u\right \Vert _{\infty }\)
Reader Do the above for \(\Re ^{3}\).
Reader Suppose \(U\) is convex, and \(u^{0},u^{1},\cdots ,u^{m}\in U\) and \(\lambda _{0},\lambda _{1},\cdots ,\lambda _{m}\geq 0\) with \({\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}=1\), then show that \({\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}u^{i}\in U\). (See HW 2). For three points, \(u^{0},u^{1},u^{2}\), then \({\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}u^{i}\) is the mixture, which is convex set between all the three points:
In words, these are flat sided shapes, which are convex. Let \(v^{1},v^{2},\cdots ,v^{m}\in \Re ^{n}\) be given. We call set \(U\) a polytope generated by \(v^{i}\) if \(U\) is a set of mixtures of \(v^{i}\). That is, \[ U=\left \{{\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}v^{i}:\lambda _{i}\geq 0\text{ and }{\displaystyle \sum \limits _{i=1}^{m}} \lambda _{i}=1\right \} \] The following are some illustration. Given these points
Notice that the points \(v^{6},v^{7}\) are redundant and have not been used. In higher dimensions, it will be harder to know which are the vertices of the extreme points.
Extreme points Let \(U\subseteq \Re ^{n}\) be convex set. A point \(u\in U\) is said to be an extreme point if the following holds: \(u\) can not be written as a convex combination of other points \(u^{0},u^{1},\cdots \). Examples:
So far we talked about convex sets. Now we will talk about convex functions. \(J\left ( u\right ) :\Re ^{n}\rightarrow \Re \) is said to be convex function is the following is satisfied. Given any \(u^{0},u^{1}\in U\subseteq \Re ^{n}\) and \(\lambda \in \left [ 0,1\right ] \) then \[ \boxed{ J\left ( \left ( 1-\lambda \right ) u^{0}+\lambda u^{1}\right ) \leq \left ( 1-\lambda \right ) J\left ( u^{0}\right ) +\lambda J\left ( u^{1}\right ) } \] In words, it means the function value \(J\left ( u\right ) \) between 2 points, is always below the straight cord points joining \(J\left ( u^{0}\right ) \) and \(J\left ( u^{1}\right ) \)
For example, the following is not a convex function.
But the above is convex over some regions. But overall, it is not a convex function. We can not use this definition to check if a function is convex for higher dimensions. In that case, we have to use the Hessian to check.
Reader A function \(J\left ( u\right ) \) is concave if \(-J\left ( u\right ) \) is convex. Example: \(e^{\alpha u}\) is convex. Any linear function is convex function. \(a^{T}u+b\) is convex. Also \(-\log \left ( u\right ) \) is a convex function. And \(J\left ( u\right ) =au^{2}+bu+c\) with \(a>0\) is a convex function. What about the following? \[ \max \left \{ J_{1},J_{2}\right \} =\max \left \{ \left ( a_{1}\right ) ^{T}u+b_{1},\left ( a_{2}\right ) ^{T}u+b_{2}\right \} \] Is it convex function? The pointwise maximum of two or more convex functions is a convex function.
Reader Suppose \(J_{i}:\Re ^{n}\rightarrow \Re \) are convex functions, not necessarily linear, for \(i=1,\cdots m\,\), then \(J\left ( u\right ) =\max \left \{ J_{1}\left ( u\right ) ,\cdots ,J_{m}\left ( u\right ) \right \} \) is convex function. Hint, use \(\left ( 1-\lambda \right ) J\left ( u\right ) +\lambda J\left ( u\right ) =\left ( 1-\lambda \right ) \max \left ( \cdots \right ) +\cdots \).
Next is to connect convex functions to convex sets.
An interpretation of convex function in 1D is bowl shaped. But pictures are only for low dimensions. Convex applications have taken off in the last 15 years. For example, CVX software. Why is convex so great? There are useful properties of convex functions
Reader Suppose \(U\subseteq \Re ^{n}\) is convex set, and \(J:U\rightarrow \Re \) is convex function. Then show the set of minimizer elements \(U^{\ast }=\left \{ u\in U,J\left ( u\right ) =\min _{u\in U}J\left ( u\right ) \right \} \) is a convex set.
Other nice properties of convex functions is that point wise maximum of convex functions is also a convex function. The maximum over an indexed collection is also a convex function. Let \(I\) be a set, perhaps uncountable, for each \(i\in I\), suppose we have a convex function \(J_{i}:\Re ^{n}\rightarrow \Re \) is convex. Let \(J\left ( u\right ) =\sup _{i\in I}J_{i}\left ( u\right ) \) and assume \(J\left ( u\right ) <\infty \).
Reader Show that \(J\left ( u\right ) \) is convex function. Example: \(J_{q}\left ( u\right ) =6u^{2}+\left ( 6q-\cos q\right ) +e^{-q}\). Where \(\left \Vert q\right \Vert \leq 1\). Then \(J\left ( u\right ) =\max _{\left \vert q\right \vert \leq 1}J_{q}\left ( u\right ) \).
Given \(J:\Re ^{n}\rightarrow \Re \), define \(epiJ\) as set \[ \boxed{ \left \{ \left ( u,v\right ) \in \Re ^{n+1}:v\geq J\left ( u\right ) \right \} } \] \(J\left ( u\right ) \) is convex function and \(epiJ\) is a convex set. See HW2 \(epi\) problem.
Begin with \(J:\Re ^{n}\rightarrow \Re \). How to check \(J\left ( u\right ) \) is convex? We assume \(J\left ( u\right ) \) is twice differentiable, called \(C^{2}\). And also assume \(U\) is open and convex set.
Definition: Gradient \(\nabla J\left ( u\right ) =\begin{bmatrix} \frac{\partial J}{\partial u_{1}} & \frac{\partial J}{\partial u_{1}} & \cdots & \frac{\partial J}{\partial u_{n}}\end{bmatrix} ^{T}\in \Re ^{n}\).
The Hessian \[ \nabla ^{2}J\left ( u\right ) =\begin{bmatrix} \frac{\partial J^{2}}{\partial u_{1}^{2}} & \frac{\partial ^{2}J}{\partial u_{1}\partial u_{2}} & \cdots & \frac{\partial ^{2}J}{\partial u_{1}\partial u_{n}}\\ \frac{\partial ^{2}J}{\partial u_{2}\partial u_{1}} & \frac{\partial J^{2}}{\partial u_{2}^{2}} & \cdots & \frac{\partial ^{2}J}{\partial u_{2}\partial u_{n}}\\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial ^{2}J}{\partial u_{n}\partial u_{1}} & \frac{\partial ^{2}J}{\partial u_{n}\partial u_{2}} & \cdots & \frac{\partial J^{2}}{\partial u_{n}^{2}}\end{bmatrix} \]
Reader Why the Hessian is always symmetric? (Order of differentiation does not matter).
Positive definite Hessian is the same as saying second derivative is greater than zero.
Suppose \(J:\Re ^{n}\rightarrow \Re \) is \(C^{2}\). Then \(J\) is convex function in \(U\) iff \(\nabla ^{2}J\left ( u\right ) \) is positive semi-definite matrix for all \(u\in U\).
Proof We first proof for \(n=1\). Sufficiency: Suppose \(\nabla ^{2}J\left ( u\right ) \geq 0\) for all \(u\in U\), (i.e. this is the same as saying \(\frac{\partial ^{2}J}{\partial u^{2}}>0\), and let \(u^{0},u^{1}\) be given in \(U\). We need to show that for \(\lambda \in \left [ 0,1\right ] \), that \(J\left ( \lambda u^{0}+\left ( 1-\lambda \right ) u^{1}\right ) \leq \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \). We write \[ J\left ( u^{\lambda }\right ) =J\left ( u^{0}\right ) +\int _{u^{0}}^{u^{\lambda }}J^{\prime }\left ( \xi \right ) d\xi \] Since \(J^{\prime }\left ( \xi \right ) \) non-decreasing and \(J^{\prime \prime }\left ( \xi \right ) >0\), so the above can be upper bounded as\begin{equation} J\left ( u^{\lambda }\right ) \leq J\left ( u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{0}\right ) \tag{1} \end{equation} Similarly, \begin{align} J\left ( u^{1}\right ) & =J\left ( u^{\lambda }\right ) +\int _{u^{\lambda }}^{u^{1}}J^{\prime }\left ( \xi \right ) d\xi \nonumber \\ J\left ( u^{1}\right ) & \geq J\left ( u^{\lambda }\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{\lambda }\right ) \nonumber \\ J\left ( u^{\lambda }\right ) & \leq J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \tag{2} \end{align}
Therefore \(\lambda \overset{(1)}{\overbrace{J\left ( u^{\lambda }\right ) }}+\left ( 1-\lambda \right ) \overset{(2)}{\overbrace{J\left ( u^{\lambda }\right ) }}\) using (1) and (2) gives \begin{align*} \lambda J\left ( u^{\lambda }\right ) +\left ( 1-\lambda \right ) J\left ( u^{\lambda }\right ) & \leq \lambda \left [ J\left ( u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{0}\right ) \right ] +\left ( 1-\lambda \right ) \left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ J\left ( u^{\lambda }\right ) & \leq \lambda \left [ J\left ( u^{0}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{0}J^{\prime }\left ( u^{\lambda }\right ) \right ] +\left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] -\lambda \left [ J\left ( u^{1}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ & =\lambda J\left ( u^{0}\right ) +\lambda u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +J\left ( u^{1}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) -\left [ \lambda J\left ( u^{1}\right ) +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \right ] \\ & =\lambda J\left ( u^{0}\right ) +\lambda u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +J\left ( u^{1}\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) -\lambda J\left ( u^{1}\right ) -u^{\lambda }\lambda J^{\prime }\left ( u^{\lambda }\right ) +u^{1}\lambda J^{\prime }\left ( u^{\lambda }\right ) \\ & =\lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) -\lambda u^{0}J^{\prime }\left ( u^{\lambda }\right ) +u^{\lambda }J^{\prime }\left ( u^{\lambda }\right ) -u^{1}J^{\prime }\left ( u^{\lambda }\right ) +u^{1}\lambda J^{\prime }\left ( u^{\lambda }\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{\lambda }-u^{1}\right ) \end{align*}
But \(u^{\lambda }=\lambda u^{0}+\left ( 1-\lambda \right ) u^{1}\), hence the above becomes
\begin{align*} J\left ( u^{\lambda }\right ) & \leq \left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}+\left ( 1-\lambda \right ) u^{1}-u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}+u^{1}-\lambda u^{1}-u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) +J^{\prime }\left ( u^{\lambda }\right ) \left ( \lambda u^{0}-\lambda u^{1}\right ) \\ & =\left [ \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \right ] +\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) -\lambda J^{\prime }\left ( u^{\lambda }\right ) \left ( u^{1}-u^{0}\right ) \\ & =\lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) \end{align*}
Hence we showed that \(J\left ( u^{\lambda }\right ) \leq \lambda J\left ( u^{0}\right ) +\left ( 1-\lambda \right ) J\left ( u^{1}\right ) .\) Same idea can be used to establish necessity. We now establish \(\nabla ^{2}J\left ( u\right ) >0\) and convexity next time. We carry this to \(n\) dimensions.
For \(n=1\), Let \(J:U\rightarrow \Re ^{n}\), where \(U\) is open set (so that we can differentiate on it), and convex set. Let \(J\) is \(C^{2}\). \(J\) is convex iff \(\frac{d^{2}J}{du^{2}}\geq 0\) for all \(u\in U\). We are trying to establish the Hessian theorem. Now we want to show the above for \(n>1\). We start with the bridging lemma, which will use to proof the Hessian theorem.
Given \(J:U\rightarrow \Re ^{n}\), and \(U\) is convex set, we want to know if \(J\) is a convex function. The lemma says that \(J\) is convex iff the following condition holds:
Given any \(\mathbf{u}\in U\), \(\mathbf{z} \in \Re ^{n}\), then the function \(\tilde{J}\left ( \lambda \right ) =J\left ( \mathbf{u}+\lambda \mathbf{z}\right ) \) is convex on the set \(\Lambda =\left \{ \mathbf{u},\lambda \mathbf{z\in U}\right \} \). Notice that the function \(\tilde{J}\left ( \lambda \right ) \) is scalar valued. It depends on scalar \(\lambda \). Hence we say \(\tilde{J}\left ( \lambda \right ) \doteqdot \Re \rightarrow \Re \). This lemma says that the function \(\tilde{J}\left ( \lambda \right ) \) is convex in any direction we move to from \(\mathbf{u}\) in the direction of \(\mathbf{z}\) within the set \(U\) iff \(J\) is convex function.
Proof See also the handout. Necessity: Assume \(J\) is convex function. We must show that \(\tilde{J}\) is convex function. Pick \(\mathbf{z}\in \Re \) and \(\lambda \in \left [ 0,1\right ] \) and any scalars \(\alpha ^{0},\alpha ^{1}\in \Lambda \). We must show that\[ \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) \leq \lambda \tilde{J}\left ( \alpha ^{0}\right ) +\left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{1}\right ) \] Indeed, from \(\tilde{J}\left ( \lambda \right ) \doteqdot J\left ( \mathbf{u}+\lambda \mathbf{z}\right ) \), then\begin{align} \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) & \doteqdot J\left ( \mathbf{u}+\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) \mathbf{z}\right ) \tag{1}\\ & \doteqdot J\left ( \lambda \left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) \left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \right ) \tag{2} \end{align}
Reader Going from (1) to (2) above is just a rewriting and manipulation only. Now since \(J\) is assumed convex, then\[ J\left ( \lambda \left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) \left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \right ) \leq \lambda J\left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) J\left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \] Therefore (1) becomes\begin{align*} \tilde{J}\left ( \lambda \alpha ^{0}+\left ( 1-\lambda \right ) \alpha ^{1}\right ) & \leq \lambda J\left ( \mathbf{u}+\alpha ^{0}\mathbf{z}\right ) +\left ( 1-\lambda \right ) J\left ( \mathbf{u}+\alpha ^{1}\mathbf{z}\right ) \\ & \leq \lambda \tilde{J}\left ( \alpha ^{0}\right ) +\left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{1}\right ) \end{align*}
Hence \(\tilde{J}\) is convex function. QED. Now to proof sufficiency. Assume that \(\tilde{J}\) is convex, we need to show that this implies that \(J\) is convex on \(U\). Since \(\tilde{J}\) then\begin{align} \tilde{J}\left ( \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ) & \leq \left ( 1-\lambda \right ) \tilde{J}\left ( \alpha ^{0}\right ) +\lambda \tilde{J}\left ( \alpha ^{1}\right ) \nonumber \\ & =\left ( 1-\lambda \right ) J\left ( u+\alpha ^{0}z\right ) +\lambda J\left ( u+\alpha ^{1}z\right ) \tag{3} \end{align}
But \begin{align} \tilde{J}\left ( \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ) & =J\left ( u+\left [ \left ( 1-\lambda \right ) \alpha ^{0}+\lambda \alpha ^{1}\right ] z\right ) \tag{4}\\ & =J\left ( \left ( 1-\lambda \right ) \left ( u+\alpha ^{0}z\right ) +\lambda \left ( u+\alpha ^{1}z\right ) \right ) \tag{5} \end{align}
Where the main trick was going from (4) to (5) by just rewriting, so it match what we have in (3). Now replacing (5) into LHS of (3) we find\[ J\left ( \left ( 1-\lambda \right ) \left ( u+\alpha ^{0}z\right ) +\lambda \left ( u+\alpha ^{1}z\right ) \right ) \leq \left ( 1-\lambda \right ) J\left ( u+\alpha ^{0}z\right ) +\lambda J\left ( u+\alpha ^{1}z\right ) \] Let \(\left ( u+\alpha ^{0}z\right ) \equiv u^{0},u+\alpha ^{1}z\equiv u^{1}\), both in \(U\), then the above becomes\[ J\left ( \left ( 1-\lambda \right ) u^{0}+\lambda u^{1}\right ) \leq \left ( 1-\lambda \right ) J\left ( u^{0}\right ) +\lambda J\left ( u^{1}\right ) \] Hence \(J\) is convex function. QED. Now the bridging lemma is proved. we use it to proof the Hessian theorem.
Let \(J=U\rightarrow \Re \), where \(U\) is open set in \(\Re ^{n}\). Hessian theorem says that \(J\) is convex function on \(U\) iff \(\nabla ^{2}J\left ( u\right ) \) is PSD (positive semi-definite) evaluated at each \(u\in U\).
Reader Suppose \(\nabla ^{2}J\left ( u\right ) \) is PSD, does this imply strict convexity on \(J\left ( u\right ) \)? Answer: No. Need an example.
See handout Hessian for proof.
Algorithms We will now start new chapter. Looking at algorithms to find optimal of \(J\left ( u\right ) \). Preliminaries: Begin with \(J:\Re ^{n}\rightarrow \Re \).
Strong local minimum \(u^{\ast }\) is strong local minimum if there exists \(\delta >0\) such that \(J\left ( u^{\ast }\right ) <J\left ( u\right ) \) for all \(u\) such that \(\left \Vert u^{\ast }-u\right \Vert <\delta \).
We say \(u^{\ast }\) is global minimum if \(J\left ( u^{\ast }\right ) \leq J\left ( u\right ) \) for all \(u\). Henceforth, \(J\left ( u\right ) \) is \(C^{2}\). From undergraduate calculus, \(u^{\ast }\) is strong local minimum if the following is satisfied: (for \(n=2\))
For \(J:\Re ^{n}\rightarrow \Re \), in other words, in higher dimensions, define gradient\[ \boxed{ \left ( \nabla J\left ( u\right ) \right ) ^{T}=\begin{bmatrix} \frac{\partial J}{\partial u_{1}}\\ \frac{\partial J}{\partial u_{2}}\\ \vdots \\ \frac{\partial J}{\partial u_{n}}\end{bmatrix} } \] Normally we consider gradient as column vector. Then we say that a point \(u^{\ast }\in \Re ^{n}\) is strong local min. if \(\nabla J\left ( u\right ) =\mathbf{0}\), and \(\nabla ^{2}J\left ( u\right ) >0\).
Proof: Suppose \(u^{\ast }\) is strong local min. Then looking at neighborhood of \(u^{\ast }\), let \(v\in \Re ^{n}\) be arbitrary. Look at \(J\left ( u^{\ast }+v\right ) \) and expand in Taylor series.\[ J\left ( u^{\ast }+v\right ) =J\left ( u^{\ast }\right ) +\nabla J\left ( u^{\ast }\right ) v+v^{T}\nabla ^{2}J\left ( u\right ) v+H.O.T\left ( O\left ( \left \Vert v\right \Vert ^{3}\right ) \right ) \] Since \(u^{\ast }\) is strong local min. then \(\nabla J\left ( u^{\ast }\right ) =0\). Hence \(J\left ( u^{\ast }+v\right ) =J\left ( u^{\ast }\right ) +v^{T}\nabla ^{2}J\left ( u\right ) v+H.O.T.\) Since \(J\left ( u^{\ast }+v\right ) >J\left ( u^{\ast }\right ) \) (since strong local minimum), then this implies that \[ v^{T}\nabla ^{2}J\left ( u\right ) v>0 \] Since \(v^{T}\nabla ^{2}J\left ( u\right ) v\) dominate over \(H.O.T.\). This complete the proof.
Reminder, test 1 next Thursday Feb. 18, 2016. Up to and including HW 3. Today’s lecture on gradient based optimization. We developed two conditions. \(u^{\ast }\) is strong local minimum when \(\nabla J\left ( u^{\ast }\right ) =0\) and \(\nabla ^{2}J\left ( u^{\ast }\right ) <0\).
We mention line searches. It is about optimization for one variable functions only. There will be reading assignment on line search. Some methods used are
And more.
We will now set up application areas. Optimal gain control and circuit analysis problems. In general we have\[ \dot{x}=Ax+Bu \] Where \(\dot{x}\) is \(n\times 1,A\) is \(n\times n\), \(B\) is \(n\times m\) and \(u\) is \(m\times 1\). System has \(n\) states. \(u\) is the input. This can be voltage or current sources. \(u\) is the control and \(x\) is the state. We want to select \(u\) so that \(x\left ( t\right ) \) behaves optimally. Classical setup is to use state feedback \[ u=kx+v \] Where \(k\) is \(m\times n\) is called the feedback gain matrix and \(v\) is extra input but we will not use it. It is there for extra flexibility if needed. We use optimization to determine \(k\). Entries of \(k_{ij}\) are our optimization variables.
Often, with \(u=0\), the system \(\dot{x}=Ax\) may be unstable or have overshoot. We will set up a performance objective aimed at reducing or eliminating the badness of the original response (with no feedback control). Let \[ J\left ( k\right ) =\int _{0}^{\infty }\overset{\left \Vert x\right \Vert ^{2}}{\overbrace{x^{T}\left ( t\right ) x\left ( t\right ) }}dt \] In the above, \(J\left ( k\right )\) is implicit function of \(k\). This cost function was found to work well in practice. We now want to make \(J\left ( k\right ) \) explicit in \(k\). We can solve for \(x\left ( t\right ) \) from \begin{align*} \dot{x} & =\left ( A+Bk\right ) x\\ x & =x\left ( 0\right ) e^{\left ( A+Bk\right ) t} \end{align*}
Then \(J\left ( k\right ) =\int _{0}^{\infty }x^{T}\left ( 0\right ) e^{\left ( A+Bk\right ) ^{T}t}dt\). But this is not practical to use. This is not closed form and hard to compute. So how can we come up with closed form for \(J\left ( k\right ) \) which is easier to work with? Let us look at the closed loop. Let \(v=0\) and we have\begin{align*} \dot{x} & =Ax+Bkx\\ & =\left ( A+Bk\right ) x\\ & =A_{c}x \end{align*}
Where \(A_{c}\) is the closed loop system matrix. Let us find a matrix \(P\) if possible such that \[ d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) =-x^{T}\left ( t\right ) x\left ( t\right ) \] So that now \begin{align*} J\left ( k\right ) & =\int _{a}^{b}x^{T}\left ( t\right ) x\left ( t\right ) dt\\ & =-\int _{a}^{b}d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) \\ & =\int _{b}^{a}d\left ( x^{T}\left ( t\right ) Px\left ( t\right ) \right ) \\ & =x^{T}\left ( a\right ) Px\left ( a\right ) -x^{T}\left ( b\right ) Px\left ( b\right ) \end{align*}
Can we find \(P\)?\begin{align*} d\left ( x^{T}Px\right ) & =x^{T}P\dot{x}+\dot{x}^{T}Px\overset{?}{=}-x^{T}x\\ & =x^{T}P\left ( A_{c}x\right ) +\left ( A_{c}x\right ) ^{T}Px\overset{?}{=}-x^{T}x\\ & =x^{T}P\left ( A_{c}x\right ) +\left ( x^{T}A_{c}^{T}\right ) Px\overset{?}{=}-x^{T}x \end{align*}
Bring all the \(x\) to LHS then \[ A_{c}^{T}x+PA_{c}=-I \] Where \(I\) is the identity matrix. This is called the Lyapunov equation. This is the equation to determine \(P\). Without loss of generality, we insist on \(P\) being symmetric matrix. Using this \(P\), now we write\begin{align*} J\left ( k\right ) & =\int _{0}^{\infty }x^{T}\left ( t\right ) x\left ( t\right ) dt\\ & =-\int _{0}^{\infty }d\left ( x^{T}Px\right ) \\ & =\left . x^{T}Px\right \vert _{\infty }^{0}\\ & =x^{T}\left ( 0\right ) Px\left ( 0\right ) -x^{T}\left ( \infty \right ) Px\left ( \infty \right ) \end{align*}
For stable system, \(x\left ( \infty \right ) \rightarrow 0\) (remember that we set \(v=0\), so there is no external input, hence if the system is stable, it must end up in zero state eventually). Therefore\[ J\left ( k\right ) =x^{T}\left ( 0\right ) Px\left ( 0\right ) \] With \(k\) satisfying \[ A_{c}^{T}\left ( k\right ) x+PA_{c}\left ( k\right ) =-I \] Example Let \(y^{\prime \prime }=u\). Hence \(x_{1}^{\prime }=x_{2},x_{2}^{\prime }=u\). Note, this is not stable with \(u=0\). Using linear state feedback,\begin{align*} u & =kx\\ u & =\begin{bmatrix} k_{1} & k_{2}\end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \end{align*}
Hence \begin{align*} x^{\prime } & =Ax+Bu\\ & =Ax+Bkx\\ & =\left ( A+Bk\right ) x\\\begin{bmatrix} x_{1}^{\prime }\\ x_{2}^{\prime }\end{bmatrix} & =\left ( \overset{A}{\overbrace{\begin{bmatrix} 0 & 1\\ 0 & 0 \end{bmatrix} }}+\overset{B}{\overbrace{\begin{bmatrix} 0\\ 1 \end{bmatrix} }}\overset{k}{\overbrace{\begin{bmatrix} k_{1} & k_{2}\end{bmatrix} }}\right ) \begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \\ & =\left ( \begin{bmatrix} 0 & 1\\ 0 & 0 \end{bmatrix} +\begin{bmatrix} 0 & 0\\ k_{1} & k_{2}\end{bmatrix} \right ) \begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \\ & =\overset{A_{c}}{\overbrace{\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} }}\begin{bmatrix} x_{1}\\ x_{2}\end{bmatrix} \end{align*}
For stable closed loop, we need \(k_{1}<0,k_{2}<0\,\) by looking at characteristic polynomial roots. Now we solve the Lyapunov equation.\begin{align*} A_{c}^{T}P+PA_{c} & =-I\\\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} ^{T}\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix} +\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix}\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} & =\begin{bmatrix} -1 & 0\\ 0 & -1 \end{bmatrix} \\\begin{bmatrix} 0 & k_{1}\\ 1 & k_{2}\end{bmatrix}\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix} +\begin{bmatrix} p_{11} & p_{12}\\ p_{21} & p_{22}\end{bmatrix}\begin{bmatrix} 0 & 1\\ k_{1} & k_{2}\end{bmatrix} & =\begin{bmatrix} -1 & 0\\ 0 & -1 \end{bmatrix} \end{align*}
Solving for \(P\) gives\[ P=\begin{bmatrix} \frac{k_{2}^{2}-k_{1}+k_{1}^{2}}{2k_{1}k_{2}} & -\frac{1}{2k_{1}}\\ -\frac{1}{2k_{1}} & \frac{1-k_{1}}{2k_{1}k_{2}}\end{bmatrix} \] With \(x\left ( 0\right ) =\begin{bmatrix} 1\\ 1 \end{bmatrix} \), then\begin{align*} J\left ( k\right ) & =x\left ( 0\right ) ^{T}Px\left ( 0\right ) \\ & =\begin{bmatrix} 1 & 1 \end{bmatrix}\begin{bmatrix} \frac{k_{2}^{2}-k_{1}+k_{1}^{2}}{2k_{1}k_{2}} & -\frac{1}{2k_{1}}\\ -\frac{1}{2k_{1}} & \frac{1-k_{1}}{2k_{1}k_{2}}\end{bmatrix}\begin{bmatrix} 1\\ 1 \end{bmatrix} \\ & =\frac{k_{1}^{2}+k_{2}^{2}-2k_{1}-2k_{2}+1}{2k_{1}k_{2}} \end{align*}
Here is Matlab script to generate the above
Let \(k_{1}=k_{2}=k<0\), we obtain\[ J\left ( k\right ) =\frac{2k^{2}-4k+1}{2k^{2}}\] As \(k\rightarrow -\infty \) then \(J^{\ast }\rightarrow 1\). Therefore \(J^{\ast }\) can never get to zero. This means there is no \(k_{1}^{\ast },k_{2}^{\ast }\) such that \(\nabla J\left ( k^{\ast }\right ) =0\). Set \(k\) is not compact. Not coercive either. This is ill posed problem. This can be remedied by changing the control to \[ J\left ( k\right ) =\int _{0}^{\infty }x^{T}xdt+\lambda \int _{0}^{\infty }u^{T}udt \]
Starting new chapter. Gradient based optimization. Many algorithms involve line searches. In other words, optimization in \(\Re ^{n}\) is often solved by performing many optimization (line searches) in \(\Re \).
Optimization algorithm: Starting with \(\mathbf{u}^{k}\) and direction \(\mathbf{v}\) we study \(J\left ( \mathbf{u}^{k}+h\mathbf{v}\right )\) where \(h\) is the step size. This is called line search. \(h\in \left [ 0,h_{\max }\right ] \). We want to use optimal step size \(h^{\ast }\). Once found, then \[ \boxed{ \mathbf{u}^{k+1}=\mathbf{u}^{k}+h^{\ast }\mathbf{v} } \] Reader Read and learn about line search. Bisection, Golden section, Fibonacci and many more. We will not cover this in this course. We also want to minimize the number of function evaluations, since these can be expensive.
One way to do line search, is to do \(h=0\, \delta \, h_{\max }\) and then evaluate \(J\left ( h\right ) \) and pick the minimizing \(h^{\ast }\). For stopping criteria, we can check for the following
We also need to pick a starting point for the search. This is \(\mathbf{u}^{0}\). What if we do not know where to start? We can pick multiple starting locations. And pick the best result obtained.
Reader Find \(\min _{\left \Vert \mathbf{v}\right \Vert =1}J\left ( \mathbf{u}+\mathbf{v}\right ) \). Show optimal \(\mathbf{v}\) is \[ \boxed{ v^{\ast } =\frac{-\nabla J\left ( u\right ) }{\left \Vert \nabla J\left ( u\right ) \right \Vert } } \] This is called myopic local terrain. Gets us to local minimum. The steepest descent algorithm is the following:
See my class study notes for detailed algorithm of all search methods we did in this course.
Later we will study conjugate gradient methods.
Example: Let \(u^{0}=\left [ 1,1\right ] \). Let \(h=0.1\). Let \(J\left ( u\right ) =u_{1}^{2}+2u_{2}^{2}-6u_{1}u_{2}+2u_{1}+u_{2}+4\). Then \[ \nabla J\left ( u\right ) =\begin{bmatrix} 2u_{1}-6u_{2}+2\\ 6u_{2}-6u_{1}+1 \end{bmatrix} \]
So \(\nabla J\left ( u^{0}\right ) =\begin{bmatrix} -2\\ 1 \end{bmatrix} \) and \(\left \Vert \nabla J\left ( u^{0}\right ) \right \Vert =\frac{1}{\sqrt{5}}\). Hence \begin{align*} u^{1} & =u^{0}-h\begin{bmatrix} -2\\ 1 \end{bmatrix} \frac{1}{\sqrt{5}}\\ & =\begin{bmatrix} 1\\ 1 \end{bmatrix} -0.1\begin{bmatrix} -2\\ 1 \end{bmatrix} \frac{1}{\sqrt{5}}\\ & =\begin{bmatrix} 1.0894\\ 0.95528 \end{bmatrix} \end{align*}
Reader Find \(u^{2}\).
Exam 1
Watch for HW4 going out today. Implementation of steepest descent with optimal step size. See handout circuit for 2 stage amplifier.
As the number of stages increases it becomes harder to analytically determine the optimal capacitance of each stage to produce maximum power. For two stages, by direct circuit analysis we obtain \[ J\left ( u\right ) =\left ( 11-u_{1}-u_{2}\right ) ^{2}+\left ( 1+u_{1}+10u_{2}-u_{1}u_{2}\right ) ^{2}\] Where \(u_{i}\) is capacitance. There are two optimal values, they are \(u^{\ast }=\left ( 10,1\right ) \) which is a maximum and \(u^{\ast }=\left ( 13,4\right ) \) which is a minimum. There is also a minimum at \(\left (7,-2\right ) \).
Geometric insight on what might go wrong with steepest descent: Gradient algorithms work best from a far but as they get close to the optimal point, there are better algorithms such as the generalized Newton Raphson method which works best when close to the optimal point. If the step size \(h\) is big, we approach the optimal fast, but because the step size is large, we can overshoot and will end up oscillating around the optimal point. If \(h\) is too small, the search will become very slow. Hence we use steepest descent but with optimal step size, where the step size is calculated at each step. Ingredients of the steepest descent algorithm are:
Reader: Consider oscillation issue.
Convergence result. From Polak. Let \(J\left ( u\right ) \) be smooth and differentiable. Let \(u^{\ast }\) be strong local minimum. Assume constant \(0\leq m\leq M\) s.t.
\[ mu^{T}m\leq u^{T}\nabla ^{2}J\left ( u\right ) u\leq Mu^{T}M \]
In neighborhood of \(u^{\ast }\). This criteria says that there is a good convexity and a bad convexity. What does this mean? We’ll say more about this. In the neighborhood of \(u^{\ast }\) let \(\theta =\frac{m}{M}\). Interpretation:
Define \(E=J\left ( u^{0}\right ) -J\left ( u^{\ast }\right ) \) then Polak says
\[ 0\leq J\left ( u^{k}\right ) -J\left ( u^{\ast }\right ) \leq E\theta ^{k}\]
Best case is when \(E\) is small and \(\theta \) is small. This is local result.
Convergence can be
These are the three convergence types we will cover. The second algorithm has quadratic convergence, which is the generalized Newton-Raphson method. We will start on this now but will cover it fully next lecture.
The idea is to approximate \(J\left ( u\right ) \) as quadratic at each step and obtain \(h_{k}\). By assuming \(J\left ( k\right ) \) is quadratic locally, we approximate \(J\left (u\right ) \) using Taylor and drop all terms after the Hessian. Now we find where the minimum is and use the step size to find \(u^{k+1}\). More on this next lecture.
We will start the class with a reader problem. Consider \begin{align} J\left ( u\right ) & =\frac{1}{2}u^{T}Au+b^{T}u+c\tag{1}\\ \nabla J\left ( u\right ) & =Au^{k}+b \tag{2} \end{align}
with \(A\) being symmetric positive definite (PSD) matrix. This is classic quadratic objective function. You can take a complete course on quadratic optimization. The global optimal is at the solution for \(\nabla J(u) =0\). Hence we write\begin{align*} \nabla J(u) & = 0\\ Au^{\ast } +b & = 0\\ u^{\ast } & = -A^{-1} b \tag{3} \end{align*}
Note that since \(A\) is PSD (the Hessian is PSD), then we know that \(J\left ( u\right ) \) is convex. Hence the local minimum is also a global minimum. Now we imagine we are doing steepest descent on this function and we are at iterate \(u^{k}\) with optimal step size, which we can make as large as we want. Hence we need to optimize \[ \tilde{J}\left ( h\right ) =J\left ( u^{k}-h\left [ \nabla J\left ( u\right ) \right ] \right ) \] for \(h\). Notice we did not divide by \(\left \Vert \nabla J\left ( u\right ) \right \Vert \) here, since the step size is free to be as large as needed. Expanding the above using (2) gives\begin{align*} \tilde{J}\left ( h\right ) & =J\left ( u^{k}-h\left ( Au^{k}+b\right ) \right ) \\ & =J\left ( \left ( I-hA\right ) u^{k}-hb\right ) \end{align*}
Using (1) for RHS of the above gives\[ \tilde{J}\left ( h\right ) =\frac{1}{2}\left ( \left ( I-hA\right ) u^{k}-hb\right ) ^{T}A\left ( \left ( I-hA\right ) u^{k}-hb\right ) +b^{T}\left ( \left ( I-hA\right ) u^{k}-hb\right ) +c \] The above is quadratic in \(h\). The optimal \(h\) we are solving for. Simplifying gives\[ \tilde{J}\left ( h\right ) =\frac{1}{2}\left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) h^{2}-\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) h+\text{ constant terms}\] To find optimal \(h\), then\begin{align*} \frac{d\tilde{J}\left ( h\right ) }{dh} & =0\\ \left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) h-\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) & =0\\ h^{\ast } & =\frac{\left ( Au^{k}+b\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( Au^{k}+b\right ) ^{T}A\left ( Au^{k}+b\right ) } \end{align*}
In practice, we would need to check \(\frac{d^{2}\tilde{J}\left ( h\right ) }{dh^{2}}\) also to make sure \(h\) is minimizer.
Reader Why does it take multiple iterations to get the common sense answer \(u^{\ast }=-A^{-1}b\)?
For quadratic objective function \(J\left ( u\right ) \) we can obtain \(u^{\ast }\) in one step, using \(u^{\ast }=-A^{-1}b\). This is the idea behind the generalized Newton-Raphson method. \(J\left ( u^{k}\right ) \) is approximated as quadratic function at each step, and \(h^{\ast }\) is found from above. To elaborate, expanding by Taylor\[ J\left ( u^{k}+\Delta u\right ) =J\left ( u^{k}\right ) +\nabla J\left ( u^{k}\right ) ^{T}\Delta u+\frac{1}{2}\Delta u^{T}\nabla ^{2}J\left ( u^{k}\right ) \Delta u+HOT \] We approximate as quadratic by dropping higher order terms and optimize for \(\Delta u\) (same as \(h\) used earlier), and here \(\nabla ^{2}J\left ( u^{k}\right ) \) is same as the \(A\) above also. Therefore we find \[ \Delta u^{\ast }=-\left [ \nabla ^{2}J\left ( u^{k}\right ) \right ] ^{-1}\nabla J\left ( u^{k}\right ) \] This converges in one step \(\Delta u^{\ast }\) if \(J\left ( u\right ) \) was actually a quadratic function. Notice that Newton method is expensive if used repeatedly, as it requires finding Hessian at each step and also finding the inverse of it. The algorithm is: Initialize \(u^{0}\). Then iterate, where\begin{equation} u^{k+1}=u^{k}-\left [ \nabla ^{2}J\left ( u^{k}\right ) \right ] ^{-1}\nabla J\left ( u^{k}\right ) \tag{4} \end{equation} Then check for convergence. If Hessian fails to be PSD, in this case \(J\left ( u^{k+1}\right ) \) can end up increasing not decreasing. How to stop? We can try more iterations to see if \(J\left ( u^{k}\right ) \) will decrease again.
Example Let \[ J\left ( u\right ) =\left ( 11-u_{1}-u_{2}\right ) ^{2}+\left ( 1+10u_{2}+u_{1}-u_{1}u_{2}\right ) ^{2}\] Pick \(u^{0}=\left ( 18,3\right ) \) then\begin{align*} \nabla J\left ( u^{0}\right ) & =\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ \nabla ^{2}J\left ( u^{0}\right ) & =\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix} \end{align*}
We see here that \(\nabla ^{2}J\left ( u^{0}\right ) \) is not PSD (determinant is negative). Now we do the iterate equation (4), obtaining\begin{align*} u^{1} & =u^{0}-\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix}\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ & =\begin{bmatrix} 18\\ 3 \end{bmatrix} -\begin{bmatrix} 10 & 44\\ 44 & 130 \end{bmatrix}\begin{bmatrix} 40\\ 100 \end{bmatrix} \\ & =\begin{bmatrix} 19.3\\ 1.8 \end{bmatrix} \end{align*}
If we look at the contour plot, we will see this point made \(J\left ( u\right ) \) larger (away from optimal) but if we let it iterate more, it will turn and move back to the optimal point.
Now we will start on the third algorithm. Conjugate gradient algorithms (CG) are family of algorithms with property of superlinear convergence. It also has quadratic convergence.
Quadratic convergence says the following: If \(J\left ( u\right ) \) is positive definite quadratic form, the iterate \(u^{k}\rightarrow u^{\ast }\) completes in finite number of steps \(N\). This means if we give the algorithm a quadratic form function, it will converge in \(N\) steps to the optimal. The difference between this and Newton-Raphson, is that there is no Hessian to be calculated using this algorithm as the case was with Newton-Raphson. The conjugate direction algorithms work well from far away and also when close to the optimal point. (note: Steepest descent worked well from a far, but not when getting close to the optimal point \(u^{\ast }\)). The CG algorithms also have the property of superlinear convergence. This property do not apply to steepest descent.
What is superlinear convergence? A sequence \(\left \{ u^{k}\right \} \) in \(\Re ^{n}\) is said to converge super-linearly to \(u^{\ast }\) if the following holds: Given any \(\theta \in (0,1]\), then \(\frac{\left \Vert u^{k}-u^{\ast }\right \Vert }{\theta ^{k}}\rightarrow 0\). The important part of this definition is that the above should go to zero for any \(\theta \in (0,1]\) and not some \(\theta \). Examples below illustrate this. Let \(\left \{ u^{k}\right \} =\frac{1}{k}\). Hence the sequence is \(\left \{ 1,\frac{1}{2},\frac{1}{3},\cdots \right \} \). Clearly this sequence goes to \(u^{\ast }=0\). Does it converge super-linearly to \(u^{\ast }\)? Applying the definition\[ \frac{\left \Vert \frac{1}{k}-0\right \Vert }{\theta ^{k}}=\frac{1}{k\theta ^{k}}\] If we can find one \(\theta \) that do not converge to zero, then we are done. Trying \(\theta =\frac{3}{4}\), then the above becomes \(\frac{4^{k}}{k3^{k}}\) which do not go to zero as \(k\rightarrow \infty \). Hence this is not superlinear. How about \(\left \{ u^{k}\right \} =\frac{1}{k^{2}}\). This is still not superlinear. Similarly \(\left \{ u^{k}\right \} =\frac{1}{k^{m}}\). What about \(\left \{ u^{k}\right \} =\frac{1}{e^{k}}\). Here we get \[ \frac{\left \Vert \frac{1}{k}-0\right \Vert }{\theta ^{k}}=\frac{e^{-k}}{\theta ^{k}}\] Trying \(\theta =\frac{1}{2}\) gives \(\frac{2^{k}}{e^{k}}\) which is not superlinear (do not go to zero for large \(k\)). But if \(\theta =\frac{2}{3}\) it will converge. But it has to converge for all \(\theta \), so \(\frac{1}{e^{k}}\) is not superlinear. How about \(\left \{ u^{k}\right \} =\frac{1}{e^{k^{2}}}\,\) here we find it is superlinear. We obtain \(\frac{e^{-k^{2}}}{\theta ^{k}}\) and this goes to zero for any \(\theta \). To show this, use \(\log \) on it and simplify. (Reader).
Next time we will go over conjugate direction algorithm in more details.
Today lecture will be devoted to conjugate direction (C.D.) algorithms. We will start by remembering that there are many C.D. algorithms. From last lecture, be aware of: Superlinear convergence and quadratic convergence. The quadratic convergence concept is that on a P.S.D. (positive symmetric definite) form, the algorithm will converge in \(n\) steps or less (where \(n\) is the size \(A\)). Using exact arithmetic (not counting for floating point errors). We will proof this today. We will some preliminaries, then go over ingredients and go over examples, then go over properties of conjugate gradient.
Preliminaries: Let \(A^{n\times n}\) be positive definite symmetric, then the pair of vectors \(u,v\) are said to be mutually conjugate w.r.t. \(A\) if \[ u^{T}Av=0 \] This is generalization of orthogonality. Because we can take \(A=I_{n}\) which is PSD.
Reader A set of distinct mutually conjugate vectors always exist for \(A\). These are the eigenvectors of \(A\). The proof starts with writing \(Av=\lambda _{1}v\) and \(Au=\lambda _{2}u\), then applying \(u^{T}Av=0\).
We will use this set of vectors as search directions. We will generate these vectors on the fly during the search and do line search along these directions. So instead of using \(\nabla J\left ( u\right ) \) as the direction we did line search on when using steepest descent, we will now use the conjugate vectors instead.
Properties: Suppose \(v^{0},v^{1},\cdots ,v^{n-1}\) is a set of mutually conjugate vectors w.r.t \(A\). (\(A\) is PSD). First step is to show these vectors are linearly independent.
Lemma: \(v^{i}\) are linearly independent. Proof: Suppose \begin{equation}{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}v^{i}=0 \tag{1} \end{equation} For scalars \(\alpha _{i}\). We must show that all \(\alpha _{i}=0\). Let use consider \(\alpha _{k}\). If we can show that \(\alpha _{k}=0\) for any \(k\), then we are done. Multiply (1) by \(\left ( v^{k}\right ) ^{T}A\). Then\begin{align*} \left ( v^{k}\right ) ^{T}A{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}v^{i} & =0\\{\displaystyle \sum \limits _{i=0}^{n-1}} \alpha _{i}\left ( v^{k}\right ) ^{T}Av^{i} & =0 \end{align*}
By mutual conjugate property, then all terms above vanish except \(\alpha _{k}\left ( v^{k}\right ) ^{T}Av^{k}\). Hence \[ \alpha _{k}\left ( v^{k}\right ) ^{T}Av^{k}=0 \] But \(A\) is PSD and \(v^{k}\neq 0\), therefore \(\alpha _{k}=0\) is only choice. QED. We have proved that \(v^{0},v^{1},\cdots ,v^{n-1}\) are linearly independent So we can expand any vector \(u\in \Re ^{n}\) using these as basis vectors\begin{equation} u={\displaystyle \sum \limits _{i=0}^{n-1}} a_{i}v^{i} \tag{2} \end{equation} Let us find the coefficients \(a_{i}\). Premultiply by \(\left ( v^{k}\right ) ^{T}A\) both sides\[ \left ( v^{k}\right ) ^{T}Au={\displaystyle \sum \limits _{i=0}^{n-1}} a_{i}\left ( v^{k}\right ) ^{T}Av^{i}\] As before, by mutual conjugate, the RHS becomes \(a_{k}\left ( v^{k}\right ) ^{T}Av^{k}\). Solving for \(a_{k}\) gives\[ a_{k}=\frac{\left ( v^{k}\right ) ^{T}Au}{\left ( v^{k}\right ) ^{T}Av^{k}}\] Hence (2) becomes\[ u={\displaystyle \sum \limits _{i=0}^{n-1}} \frac{\left ( v^{i}\right ) ^{T}Au}{\left ( v^{i}\right ) ^{T}Av^{i}}v^{i}\] This gives any vector \(u\) in terms of set of vectors \(v^{i}\) that are linearly independent
Conjugate directions algorithm ingredients are:
Example: Fletcher Reeves. \begin{align*} v^{0} & =-\nabla J\left ( u\right ) \\ v^{k+1} & =-\nabla J\left ( u^{k+1}\right ) +\frac{\left \Vert \nabla J\left ( u^{k+1}\right ) \right \Vert ^{2}}{\left \Vert \nabla J\left ( u^{k}\right ) \right \Vert ^{2}}v^{k} \end{align*}
Reader Normalize above for implementation.
Reader What is \(A\) above? Where are these \(v^{k}\) vectors mutually conjugate? \(A\) is the Hessian. Note: These algorithms (C.D.) converge for convex \(J\left ( u\right ) \). If \(J\left ( u\right ) \) is not convex or we do not know, we need to put conditions to make sure it is converging.
For Polyak-Ribieve, see homework.
Consider quadratic form \[ J\left ( u\right ) =\frac{1}{2}u^{T}Au+b^{T}u+c \] With \(A=A^{T}\) and positive definite \(n\times n~\)matrix. Let \(v^{0},\cdots ,v^{n+1}\) be mutually conjugate w.r.t. \(A\). Let step size be as large as we want. The conjugate direction algorithm converges to optimal \(u^{\ast }=-A^{-1}b\) in \(n\) steps or less
Proof
Let \(u^{k}\) be the \(k^{th}\) iterate. If \(u^{k}=u^{\ast }\) and \(k\leq n\) then we are done. Without loss of generality, assume \(u^{k}\neq u^{\ast }\). We must show that \(u^{n}=u^{\ast }\). We first find \(h_{k}\), the step size at iterate \(k\). From \begin{align*} \tilde{J}\left ( h\right ) & =J\left ( u^{k}+hv^{k}\right ) \\ & =\frac{1}{2}\left ( u^{k}+hv^{k}\right ) ^{T}A\left ( u^{k}+hv^{k}\right ) +b^{T}\left ( u^{k}+hv^{k}\right ) +c \end{align*}
This is quadratic in \(h\). \[ \tilde{J}(h) = \frac{1}{2} (v^{k})^{T} A v^{k} h^{2}+ \left ( (v^{k})^{T} A u^{k} + b^{T} v^{k} \right ) h + \overbrace{ \frac{1}{2} u^{k} A u^{k} +c}^{\text{constant term}} \] Now taking derivative gives\[ \frac{d\tilde{J}\left ( h\right ) }{dh}=\left ( v^{k}\right ) ^{T}Av^{k}h+\left ( \left ( v^{k}\right ) ^{T}Au^{k}+b^{T}v^{k}\right ) \] Setting this to zero and solving for \(h\) gives\begin{align} h^{\ast } & =-\frac{\left ( v^{k}\right ) ^{T}Au^{k}+b^{T}v^{k}}{\left ( v^{k}\right ) ^{T}Av^{k}}\nonumber \\ & =-\frac{\left ( v^{k}\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}} \tag{3} \end{align}
Hence\begin{align*} u^{n} & =u^{0}+h_{0}v^{0}+\cdots +h_{n-1}v^{n-1}\\ & =u^{0}+{\displaystyle \sum \limits _{k=0}^{n-1}} h_{k}v^{k} \end{align*}
Using (3) in the RHS of above, replacing each \(h_{k}\) with the optimal \(h\) at each iterate gives\begin{align*} u^{n} & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}\left ( Au^{k}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}Au^{k}}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k} \end{align*}
Replacing \(u^{k}\) in the second term above in the RHS with \(u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}\) gives\begin{equation} u^{n}=u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \frac{\left ( v^{k}\right ) ^{T}A\left ( u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}v^{k} \tag{4} \end{equation} But \begin{align*} \left ( v^{k}\right ) ^{T}A\left ( u^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}v^{i}\right ) & =\left ( v^{k}\right ) ^{T}Au^{0}+{\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}\left ( v^{k}\right ) ^{T}v^{i}\\ & =\left ( v^{k}\right ) ^{T}Au^{0} \end{align*}
Since all terms in \({\displaystyle \sum \limits _{i=0}^{k-1}} h_{i}\left ( v^{k}\right ) ^{T}v^{i}\) vanish by mutual conjugate property. Using this to simplify (4) gives\begin{align*} u^{n} & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}Au^{0}}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =u^{0}-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}\left ( Au^{0}+b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k} \end{align*}
Reader \(\left ( v^{k}\right ) ^{T}Au^{0}\) is expansion of \(u^{0}\). Using this in the above reduces it to \[ u^{n}=-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\] Insert \(AA^{-1}\) into the above gives\begin{align*} u^{n} & =-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( v^{k}\right ) ^{T}AA^{-1}b}{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =-{\displaystyle \sum \limits _{k=0}^{n-1}} \left ( \frac{\left ( \left ( v^{k}\right ) ^{T}A\right ) \left ( A^{-1}b\right ) }{\left ( v^{k}\right ) ^{T}Av^{k}}\right ) v^{k}\\ & =-A^{-1}b \end{align*}
But \(-A^{-1}b=u^{\ast }\). QED.
Following some extra remarks added later from An introduction to optimization by Ching and Zak, 1996:
See Example 10.1, page 133 of the above text for illustrations how to determine each of \(v^{i}\) vectors for given \(A\) matrix.
We are about to enter new phase of the course with constraints and linear programming. Until now we used iterative methods to solve unconstrained problems. These are gradient based. Also looked at Newton-Raphson. We used steepest descent and conjugate directions. These methods are mainly applied to problem without constraints. i.e. \(u\) is free, where \(u\) are the variables. But in HW4 we had problem where \(u\) was the capacitance. This can not be negative. But we did not account for this. When we have constraints and want to use the above iterative methods, there are ad hoc methods to handle this, but we will not cover these ad-hoc methods in this course, but will mention some of them.
We can check that no constraint is violated during the search and start a new search. There are literature on what is called “projection methods” and other names.
One good method is called the “penalty function method”. This works as follows
\[ \tilde{J}\left ( u\right ) =J\left ( u\right ) +J_{penalty}\left ( u\right ) \]
The original objective function is \(J\left ( u\right ) \) and \(J_{penalty}\left ( u\right ) \) is function we add such that it becomes very large when \(u\) constraint is violated (\(u\notin U\)). (assuming we are minimizing \(J(u)\)).
This method works on many problems. For example, if we do not want \(u_{1}\) to be negative, we can add\[ J_{penalty}\left ( u\right ) =-100\min \left ( 0,u_{1}\right ) \] This way, when \(u_{1} \leq 0\), the result will be very large and positive. Hence \(\tilde{J}\left ( u\right ) \) will become very large and the search will avoid this region and turn away during the line search.
But a theory with name of “Kohn Tucker” is the main way to handle such problems under heading of “non-linear programming”.
Now we go back to linear programming which we will cover over the next 4–5 lectures. key points of linear programming are
L.P. works fast by not visiting each vertex. But with some input L.P. can become slow and force it to visit all vertices.
Polytopes are central to L.P since polytopes are described by constraints. See handout “Polytopes” taken from textbook Barmish, Robust Control.
Polytope is convex hull of finite point set. These are the generators \(v^{1},\cdots ,v^{N}\). Polytopes have extreme vertices. L.P. visits vertices. If the set is bounded, we call it polytope, else we call it polyhedron. So with linear inequalities constraints and bounded, we have polytopes.
Reader A linear function \(J\left ( u\right ) =a^{T}u\) on polytope \(P\) achieves its Max. or Min. at an extreme point. We showed that the Max. of convex function is at a vertex. we can also show that Min. of concave is at a vertex. Linear functions are both concave and convex. QED.
Often we do not have list of vertices. Need to first generate them. They are generated from the constraint inequalities.
How many vertices to search? McMullen’s Upper Bound Theorem gives us the answer. Assuming we have \(m\) constraints and \(n\) variables, where \(m\geq n\). Then\[ V\left ( m,n\right ) =\begin{pmatrix} m-\left \lfloor \frac{1}{2}(n+1)\right \rfloor \\ m-n \end{pmatrix} +\begin{pmatrix} m-\left \lfloor \frac{1}{2}(n+2)\right \rfloor \\ m-n \end{pmatrix} \] Example for \(m=20,n=10\,\ \) we get \(4004\) vertices. So this is a small problem.
Reader Consider \(n=500,m=2000\) which is very modest in terms of current technology, assuming it takes 1 microsecond per vertex, then it will take order of \(10^{296}\) years to search all the vertices.
Up to last few years, L.P. was considered a completed research. But in the last 5–10 years, there is new L.P. research starting. Modern L.P. solvers use linear inequalities description as input. This is expressed and formulated as \(Ax=b\). What if vertices or some vertices generating mechanism was given as input instead of the constraints themselves? How to convert the vertices to constraints? This is a difficult problem.
Today we will begin the mechanism of doing L.P. (linear programming). We know the extreme points is where the optimal will occur. But searching all extreme points is not practical for large \(N\) as shown before. One can take a whole course just on L.P. but here we will cover the main ideas. We basically have a linear objective function in \(u\) and linear constraints in \(u\) where \(u\) are the variables. This is called the raw L.P. formulation. This is converted to standard form L.P. and solved using the simplex method.
The solver finds the first vertex then in a clever way moves to another until it finds the optimal one. The solver solves two linear programming problems and these are:
Standard form LP is\[ \boxed{ \begin{array} [c]{cc}\min & c^{T}x\\ st & Ax=b \end{array} } \] Notice that solution might be infimum above. Example.\[\begin{array} [c]{cc}\min & x_{1}\\ st & \begin{array} [c]{c}x_{2}\leq 5\\ x_{1}<0 \end{array} \end{array} \] The solution is \(x_{1}=-\infty \). This is closed by unbounded.
Ingredients of Linear programming are:
What if we have some variables \(x_{i}\) which we want to be negative? We replace \(x_{i}\) with new variable \(x_{j}=-x_{i}.\) Now \(x_{j}\geq 0\). Now in each place we have \(x_{i}\) which is negative and can’t use, then we replace it with \(-x_{j}\). This is now the same as before, but \(x_{i}\) is gone and replaced with \(-x_{j}\) and \(x_{j}\) is now positive. So it is standard form.
At the end, when we obtain the solution, we replace \(x_{j}\) back to \(-x_{i}\). (what about free variables?).
What if we have inequality in the raw L.P.? how to convert to equality for standard form? We use Slack variables and Surplus variables .
Example given \(x_{1}+2x_{2}-x_{3}\leq 6\), then introduce new slack variable \(x_{4}\) and rewrite the constraint as \(x_{1}+2x_{2}-x_{3}+x_{4}=6.\)
If we have constraint \(x_{1}+2x_{2}-x_{3}\geq 6\), then we need surplus variable \(x_{4}\). Rewrite as \(x_{1}+2x_{2}-x_{3}-x_{4}=6.\) Once we solve the LP problem and obtain \(x^{\ast }\), we need to recover from this solution the actual variables of the raw LP (these are the \(u\) variables) and these do not contain any slack nor surplus variables.
See also handout sector patrol. The objective function is \(E\left ( T\right ) =\frac{\frac{u_{1}}{3}}{10}+\frac{\frac{u_{2}}{3}}{5}=\frac{u_{1}}{30}+\frac{u_{2}}{15}\). Constraints are \(u_{1}\geq 0,u_{2}\geq 0\) and\begin{align*} 2u_{1}+2u_{2} & \geq 4\\ 2u_{1}+2u_{2} & \leq 10 \end{align*}
And as per handout, we need to add this constraint in order to obtain a realistic solution\[ u_{2}\geq 1.5u_{1}\] The above is the raw LP. Convert to standard form, using \(x\) as variables. It becomes\begin{align*} 2x_{1}+2x_{2}-x_{4} & =4\\ 2x_{1}+2x_{2}+x_{3} & =10\\ -1.5x_{1}+x_{2}-x_{5} & =0 \end{align*}
Where \(x_{3},x_{4},x_{5}\) above were added to make it standard form. Writing it as \(Ax=b\), the constraint equation is (we put the slack variables first by convention)\[\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \] And \(c^{T}x\) becomes (this is the objective function, notice we added the slack and surplus variables to it, but they are all zeros).\[\begin{bmatrix} \frac{1}{30} & \frac{1}{15} & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} \] Reader find common sense solution working in \(u_{1},u_{2}\) domain. (will do this next lecture).
We will define two solutions: The basic solution, and basic feasible solution.
A vector \(x\) is said to be basic solution if it solves \(Ax=b\) and the non-zeros elements of \(x\) correspond to the linearly independent columns of \(A.\)
Reader Is there a basic infeasible solution?
Recall, the standard LP problem is
\[\begin{array} [c]{cc}\min & c^{T}x\\ st & Ax=b \end{array} \]
We talked about transforming the problem from raw LP to standard LP. The patrol sector problem, solved using common sense graphical approach is given below
The optimal \(u^{\ast }\) has to be at one of the vertices of the feasible region. It will be at the vertex shown
Using Matlab, the above is solved as follows\[\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \] And \(c^{T}x\) becomes (this is the objective function, notice we add the slack and surplus variables to it, but they are all zeros).\[\begin{bmatrix} \frac{1}{30} & \frac{1}{15} & 0 & 0 & 0 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{4}\end{bmatrix} \] The code is
Result of above run
In the above, we only need to map \(x(1)\) to \(u_{1}\) and \(x(2)\) to \(u_{2}\) to read the result. We see that Matlab result matches the graphical solution.
definition For LP, we say \(\mathbf{x}\) is feasible if \(\mathbf{x}\) satisfies the constraints.
For example, for the sector patrol, let \(U\) be the feasible set in \(\Re ^{2}\) (raw LP). However, in standard LP, the feasible set is in \(\Re ^{5}\).
Reader Obtain feasible set in \(\Re ^{5}\). Obtain feasible point with either \(x_{3},x_{4},x_{5}\) nonzero.
Basic solution A vector \(x\) is basic solution if the non-zero components of \(x\) corresponds to the linearly independent columns of \(A\). We do not require feasibility to be basic solution).
The difference in LP and standard \(A\mathbf{x}=\mathbf{b}\) solution we have seen before many times in linear algebra, is that in LP, we want to solve \(A\mathbf{x}=\mathbf{b}\) but with \(\mathbf{x}\geq \mathbf{0}\) and at same time have \(\mathbf{x}\) be optimal. This what makes LP different from standard methods of solving this problem.
The algorithm takes a solution which is feasible and makes it feasible basic solution. Then after that, we move from one basic feasible solution to another basic feasible solution while at the same time making \(J\left ( u\right ) \) smaller until it reaches the optimal value.
Reader LP has at least one basic solution.
\(A\) has \(m\) linearly independent columns, since it has rank \(m\).\begin{align*} A\mathbf{x} & =\mathbf{b}\\\begin{bmatrix} A_{basic} & A_{notbasic}\end{bmatrix}\begin{bmatrix} x_{basic}\\ x_{notbasic}\end{bmatrix} & =\begin{bmatrix} b_{basic}\\ b_{notbasic}\end{bmatrix} \end{align*}
Theorem Suppose LP has feasible solution, then it has a basic feasible solution. Remember: \(\mathbf{x}\) is feasible if it is in the feasible region (satisfies constraints), and \(\mathbf{x}\) is basic solution if the non-zero elements of \(\mathbf{x}\) correspond to linearly independent columns of \(A\).
Proof say \(\mathbf{x}\) is feasible. Let \(a^{1},a^{2},\cdots ,a^{p}\) denote columns of \(A\) corresponding to nonzero entries of \(\mathbf{x}\). Without loss of generality, say the first \(p\) columns of \(A\). (we can always rearrange \(A\) to make it so). There are two cases:
Now we do the squeezing process. For \(\varepsilon >0\) define vector \(\eta ^{\varepsilon }\) with components\[ \eta ^{\varepsilon }=\begin{array} [c]{cc}x_{i}-\varepsilon y_{i} & \text{for }i\leq p\\ 0 & i>p \end{array} \] Reader \(\eta ^{\varepsilon }\) is feasible for small \(\varepsilon \). Hence \[ A\eta ^{\varepsilon }=A\mathbf{x}-\varepsilon A\begin{bmatrix} y_{1}\\ y_{2}\\ \vdots \\ y_{p}\\ 0\\ \vdots \\ 0 \end{bmatrix} \] Let \(\varepsilon =\min \left \{ \frac{x_{i}}{y_{i}};y_{i}>0\right \} \). Reader \(\eta ^{\varepsilon }\) is basic and has at least one more zero entry than \(\mathbf{x}\). So now \(\mathbf{x}\) has \(p-1\) columns of \(A\) corresponding to non-zero entries in \(\mathbf{x}\). Continuing this process, we keep finding other basic feasible solutions.
example \[\begin{bmatrix} 1 & 2 & 1 & 2 & 0\\ 1 & 1 & 1 & 2 & 1\\ 2 & -1 & 1 & 2 & 1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 3\\ 4\\ 4 \end{bmatrix} \] Let starting \(\mathbf{x}=\begin{bmatrix} 0\\ 0\\ 1\\ 1\\ 1 \end{bmatrix} \) This is feasible since it satisfies \(A\mathbf{x}=\mathbf{b}\) with \(\mathbf{x}\geq \mathbf{0}\). But not basic, since the last 3 columns are not linearly independent. (the last three columns of \(A\). Since these are the ones that correspond to non-zero elements of \(\mathbf{x}.\) we now write\[ y_{3}a^{3}+y_{4}a^{4}+y_{5}a^{5}=0 \] Where \(a^{3},a^{4},a^{5}\) represent the last three columns of \(A\) and \(y^{i}\) are the scalars we want to solve for. Solving, gives \begin{align*} y_{3} & =2\\ y_{4} & =-1\\ y_{5} & =0 \end{align*}
Hence, first find \(\varepsilon =\min \left \{ \frac{x_{i}}{y_{i}};y>0\right \} \), which we find to be \(\varepsilon =\frac{1}{2}\). Now we find\[ \mathbf{x}^{new}=\eta ^{\varepsilon }=\begin{bmatrix} x_{1}-\varepsilon y_{1}\\ x_{2}-\varepsilon y_{2}\\ x_{3}-\varepsilon y_{3}\\ x_{4}-\varepsilon y_{4}\\ x_{5}-\varepsilon y_{5}\end{bmatrix} =\begin{bmatrix} 0-\frac{1}{2}\left ( 0\right ) \\ 0-\frac{1}{2}\left ( 0\right ) \\ 1-\frac{1}{2}\left ( 2\right ) \\ 1-\frac{1}{2}\left ( -1\right ) \\ 1-\frac{1}{2}\left ( 0\right ) \end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 0\\ \frac{3}{2}\\ 1 \end{bmatrix} \] Since now \(a^{4},a^{5}\) are linearly independent (these are the fourth and fifth columns of \(A\)) and these correspond to the non zero of the new \(\mathbf{x}=\eta ^{\varepsilon }\), then the new \(\mathbf{x}\) is basic and feasible. So we have started with feasible solution, and from it obtained a basic feasible solution. This is not the final optimal solution \(\mathbf{x}\) but we repeat this process now.
If we have a feasible solution, we can obtain a basic feasible solution from it using the squeezing method. We have not talked about optimality yet. The next thing to consider is optimality. We will proof if we have an optimal feasible solution, then by obtaining a basic solution from it, the basic solution will remain optimal. This is called the optimality theorem. Then we will talk about extreme points.
If an optimal feasible solution exist, then an optimal feasible and basic solution exist as well.
Proof Suppose \(\mathbf{x}\) is optimal and feasible. If \(\mathbf{x}\) is basic, we are done. If not, now we need to do the squeeze process to make it basic, but now we have to do the squeeze making sure it remains optimal. Say \(a^{1},a^{2},\cdots ,a^{p}\) are the columns associated with non-zero entries in \(\mathbf{x}\). As before, we way, WLOG these are the first \(p\) columns in \(A\). Hence there exist scalars \(y_{i}\), not all zero, such that \(\sum _{i=1}^{p}y_{i}a^{i}=0\). Define \(\eta ^{\varepsilon }\) for scalar \(\varepsilon \) such that \[ \eta ^{\varepsilon }=\begin{array} [c]{cc}x_{i}-\varepsilon y_{i} & \text{for }i\leq p\\ 0 & i>p \end{array} \] Reader For small \(\varepsilon \), say \(\left \vert \varepsilon \right \vert \leq \delta \), then \(\eta ^{\varepsilon }\) is still feasible. Claim: For \(\varepsilon \) suitable small, \(\eta ^{\varepsilon }\) is optimal. It suffice to show that \(c^{T}y=0\) with \(y_{i}=0\) for \(i>p\). This being the case, then \(c^{T}\eta ^{\varepsilon }=c^{T}\mathbf{x}=J^{\ast }.\) By contradiction: Say \(c^{T}y\neq 0\). Let \(\varepsilon =\delta sgn\left ( c^{T}y\right ) \). Let us show that \(\eta ^{\varepsilon }\) is better than \(\mathbf{x}\). This contradicts optimality. Now\begin{align*} c^{T}\eta ^{\varepsilon } & =c^{T}\left ( x-\varepsilon y\right ) \\ & =c^{T}x-c^{T}\left ( \delta sgn\left ( c^{T}y\right ) \right ) y\\ & =c^{T}x-\delta sgn\left ( c^{T}y\right ) \left ( c^{T}y\right ) \\ & =J^{\ast }-\delta \left \vert c^{T}y\right \vert \\ & <J^{\ast } \end{align*}
This contradicts optimality. QED.
Now we will talk about extreme points. Extreme points and basic feasible solution are the same thing.
Let \(P=\left \{ x\in \Re ^{n},x\geq 0,Ax\leq b\right \} \). This polyhedron is the feasible set. The set of extreme points of \(P\) are the basic feasible solution.
Proof See handout extreme send today. We need to get to the first feasible solution. This can be hard to obtain. Once we find a feasible solution, then we use it to find the first basic feasible solution, and from them we repeat the process (using the squeeze method). As we move from one basic feasible solution to another, we do this by making \(J\left ( u\right ) \) smaller.
Example Consider LP with constraints \begin{align*} x_{i} & \geq 0,i=1,2,3\\ 2x_{1}+3x_{2} & =1\\ x_{1}+x_{2}+x_{3} & =1 \end{align*}
Note that \(x_{1}+x_{2}+x_{3}=1\) is common in LP optimization. It is called unit simplex. Here is a plot of the above.
Reader Plane for \(2x_{1}+3x_{2}=1\) intersect the unit simplex else no feasible region exist. So there are two basic feasible solutions at \(\left ( 0,\frac{1}{3},\frac{2}{3}\right ) \) and \(\left ( \frac{1}{2},0,\frac{1}{2}\right ) \). These are the two red points shown in the above plot.
We will use the sector patrol problem to show the mechanisms of simplex. \begin{align*} Ax & =b\\\begin{bmatrix} 2 & 2 & 1 & 0 & 0\\ 2 & 2 & 0 & -1 & 0\\ -1.5 & 1 & 0 & 0 & -1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} & =\begin{bmatrix} 10\\ 4\\ 0 \end{bmatrix} \end{align*}
Step 1. Form partial tableau. \[\begin{bmatrix} A & b \end{bmatrix} =\begin{bmatrix} 2 & 2 & 1 & 0 & 0 & 10\\ 2 & 2 & 0 & -1 & 0 & 4\\ -1.5 & 1 & 0 & 0 & -1 & 0 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}\\ r_{2}\\ r_{3}\end{bmatrix} \] We need to identify an identify matrix. By row operations we obtain\begin{equation} \begin{bmatrix} r_{1}\\ -r_{2}\\ -r_{3}\end{bmatrix} =\begin{bmatrix} 2 & 2 & 1 & 0 & 0 & 10\\ -2 & -2 & 0 & 1 & 0 & -4\\ 1.5 & -1 & 0 & 0 & 1 & 0 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}^{\prime }\\ r_{2}^{\prime }\\ r_{3}^{\prime }\end{bmatrix} \tag{1} \end{equation} Now we can read out \(\mathbf{x}^{1}\) and we see that \(x_{3}=10,x_{4}=-4,x_{5}=0\). These are the entries in \(x\) which corresponds to the columns of the unit matrix inside \(A\). All other entries in \(x\) are assumed zero. Hence \[ \mathbf{x}^{1}=\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 0\\ 0\\ 10\\ -4\\ 0 \end{bmatrix} \] Note this is not feasible but basic. Now we go to next step. We have to remove an entry from \(x^{1}\) and move in its place another entry. Let us pick \(x_{3}\) to kick out and move in \(x_{1}\). By row operations applied to (1) we obtain\[\begin{bmatrix} \frac{r_{1}^{\prime }}{2}\\ r_{2}^{\prime }+2r_{1}^{\prime \prime }\\ r_{3}^{\prime }-1.5r_{1}^{\prime \prime }\end{bmatrix} =\begin{bmatrix} 1 & 1 & \frac{1}{2} & 0 & 0 & 5\\ 0 & 0 & 1 & 1 & 0 & 6\\ 0 & -2.5 & -0.75 & 0 & 1 & -7.5 \end{bmatrix} \rightarrow \begin{bmatrix} r_{1}^{\prime \prime }\\ r_{2}^{\prime \prime }\\ r_{3}^{\prime \prime }\end{bmatrix} \] Hence the new identity matrix is \(a^{1},a^{4},a^{5}\) and therefore \(x_{1}=5,x_{4}=6,x_{5}=-7.5\), These are the entries in \(x\) which corresponds to the columns of the unit matrix inside \(A\). All other entries in \(x\) are assumed zero. Hence \[ \mathbf{x}^{2}=\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\end{bmatrix} =\begin{bmatrix} 5\\ 0\\ 0\\ 6\\ -7.5 \end{bmatrix} \] This is the second basic \(\mathbf{x}\) we found.
Reader Find basic solution via pivoting row operations. Now we want to redo the above, with feasibility in mind which we did not consider above when moving elements out and selecting which one to move in.
Example
\begin{align*} Ax & =b\\\begin{bmatrix} 1 & 0 & 0 & 4 & 3 & 2\\ 0 & 1 & 0 & 2 & -1 & 2\\ 0 & 0 & 1 & -1 & 2 & 1 \end{bmatrix}\begin{bmatrix} x_{1}\\ x_{2}\\ x_{3}\\ x_{4}\\ x_{5}\\ x_{6}\end{bmatrix} & =\begin{bmatrix} 1\\ 2\\ 3 \end{bmatrix} \\\begin{bmatrix} A & b \end{bmatrix} & =\begin{bmatrix} 1 & 0 & 0 & 4 & 3 & 2 & 1\\ 0 & 1 & 0 & 2 & -1 & 2 & 2\\ 0 & 0 & 1 & -1 & 2 & 1 & 3 \end{bmatrix} \end{align*}
Now we need to start with feasible solution. Last example we did not care about this, but now we need the first solution to be feasible. Here \(x_{1}=1,x_{2}=2,x_{3}=3\) (read out, from the identity matrix, since the first three columns are linearly independent). We we use the squeeze process, to decide which one to kick out and which to move in. Let us for now choose arbitrarily \(x_{4}\) (fourth column) to move in and we have to kick out a column. Need \(\varepsilon ^{\ast }\) to decide. \[ \varepsilon ^{\ast }=\min \left \{ \frac{1}{4},\frac{1}{2}\right \} =\frac{1}{4}\] Hence it is the first column to kick out. Remember that when doing the above to determine which \(x\) to kick out, we only divide by those entries in the column which are positive. If there is a negative entry, then do not use it. That is why in the above, we did not write \(\varepsilon ^{\ast }=\min \left \{ \frac{1}{4},\frac{1}{2},\frac{3}{-1}\right \}\). We will continue next lecture.
Class planning:
Exercise to do
We still need to know how to find first feasible solution. In the following example
\(x_{1}\) | \(x_{2}\) | \(x_{3}\) | \(x_{4}\) | \(x_{5}\) | \(x_{6}\) | b |
\(1\) | \(0\) | \(0\) | \(2\) | \(4\) | \(6\) | \(4\) |
\(0\) | \(1\) | \(0\) | \(1\) | \(2\) | \(3\) | \(3\) |
\(0\) | \(0\) | \(1\) | \(-1\) | \(2\) | \(1\) | \(1\) |
A solution which is basic and feasible is \(x_{1}=4,x_{2}=3,x_{3}=1\) and all others \(x_{i}=0,i=4\cdots 6\).
\[ x=\begin{pmatrix} 4\\ 3\\ 1\\ 0\\ 0\\ 0 \end{pmatrix} \]
This was just read out from the identity matrix in the first 3 columns above. Let us now assume we want fourth column to be in the basis. We have to remove one of the current columns in the basis in order to let another column in. \(\min \left \{ \frac{4}{2},\frac{3}{1}\right \} =2\). This means the first row is the pivot row. Notice we do not consider \(\frac{1}{-1}\) when doing the minimum operation. Any negative value in a column is bypassed. Now we know that first row is pivot row and that we want fourth column in. This is all what we need to go to the next step. We know need to normalize entry \(\left ( 1,4\right ) \) to one. (before it was 2). After normalizing the pivot row (by dividing the whole row by 2) we obtain
\(x_{1}\) | \(x_{2}\) | \(x_{3}\) | \(x_{4}\) | \(x_{5}\) | \(x_{6}\)
| b |
\(\frac{1}{2}\) | \(0\) | \(0\) | \(1\) | \(2\) | \(3\) | \(2\) |
\(0\) | \(1\) | \(0\) | \(1\) | \(2\) | \(3\) | \(3\) |
\(0\) | \(0\) | \(1\) | \(-1\) | \(2\) | \(1\) | \(1\) |
Only now we start applying row operations, with row one as pivot row, we make all other entries in fourth column below \(\left ( 1,4\right ) \) zero, This gives the following
\(x_{1}\) | \(x_{2}\) | \(x_{3}\) | \(x_{4}\) | \(x_{5}\) | \(x_{6}\) | b | |
\(r_{1}\) | \(\frac{1}{2}\) | \(0\) | \(0\) | \(1\) | \(2\) | \(3\) | \(2\) |
\(^{r_{2}-r_{1}}\) | \(-\frac{1}{2}\) | \(1\) | \(0\) | \(0\) | \(0\) | \(0\) | \(1\) |
\(r_{3}+r_{1}\) | \(\frac{1}{2}\) | \(0\) | \(1\) | \(0\) | \(4\) | \(4\) | \(3\) |
Now that we have a new identity matrix, we read out the new solution which is \(x_{2}=1,x_{4}=2,x_{3}=3\) and all other \(x\) entries zero.\[ x=\begin{pmatrix} 0\\ 1\\ 3\\ 2\\ 0\\ 0 \end{pmatrix} \] We will now revisit this, with optimality in mind. This means we need to know which column to bring into the basis. The question is, which \(x_{i}\) to bring in. Set up this tableau1
\[ \overset{A}{\overbrace{\begin{pmatrix} 1 & \cdots & 0 & a\left ( 1,m+1\right ) & \cdots & a\left ( 1,n\right ) \\ \vdots & \ddots & \vdots & \vdots & \ddots & \cdots \\ 0 & \cdots & 1 & a\left ( m,m+1\right ) & \cdots & a\left ( m,n\right ) \end{pmatrix} }}\overset{b}{\overbrace{\begin{pmatrix} y\left ( 1,0\right ) \\ y\left ( 2,0\right ) \\ \vdots \\ y\left ( m,0\right ) \end{pmatrix} }}\] Therefore, the current feasible and basic solution is \(x_{1}=y\left ( 1,0\right ) ,x_{2}=y\left ( 2,0\right ) ,\cdots ,x_{m}=y\left ( m,0\right ) \). All other \(x_{i}=0\). We need now to do a feasibility preserving perturbation.
Allow \(x_{m+1},x_{m+2},\cdots ,x_{n}\) to be part of the solution.\begin{align*} x_{1}+\overset{\text{we allow this in}}{\overbrace{\sum _{i=m+1}^{n}a\left ( 1,i\right ) x_{i}}} & =y\left ( 1,0\right ) \\ x_{2}+\sum _{i=m+1}^{n}a\left ( 2,i\right ) x_{i} & =y\left ( 2,0\right ) \\ & \vdots \\ x_{m}+\sum _{i=m+1}^{n}a\left ( m,i\right ) x_{i} & =y\left ( 2,0\right ) \end{align*}
Now \(\boldsymbol{x}=\begin{pmatrix} x_{1} & x_{2} & \cdots & x_{m} & 0 & \cdots & 0 \end{pmatrix} ^{T}\) is still feasible, but no longer basic. We know that \(J=c^{T}x\), hence\begin{align*} J & =c_{1}\left ( y(1,0) - \sum _{i=m+1}^{n} a(1,i) x_{i}\right ) + c_{2}\left ( y(2,0) - \sum _{i=m+1}^{n} a(2,i) x_{i}\right ) + \cdots +c_{m} \left ( y(m,0) -\sum _{i=m+1}^{n} a(m,i) x_{i}\right ) + c_{m+1} x_{m+1} + \cdots +c_{n} x_{n}\\ J & = \overbrace{\sum _{i=1}^{m} c_{i} y(i,0)}^{ \text{current }J_{0} \text{value}} -c_{1} \sum _{i=1}^{m} a(1,i) x_{i} -c_{2} \sum _{i=1}^{m} a(2,i) x_{i} -\cdots -c_{m} \sum _{i=1}^{m} a(m,i) x_{i} +c_{m+1}x_{m+1} +\cdots +c_{n} x_{n} \end{align*}
Therefore \[ J=J_{0}-\left ( -c_{m+1}+c_{1}a\left ( 1,m+1\right ) +c_{2}a\left ( 2,m+1\right ) +\cdots +c_{m}a\left ( m,m+1\right ) \right ) x_{m+1}-\left ( -c_{m+1}+c_{1}a\left ( 2,m+1\right ) + \cdots +c_{m}a\left ( m,m+\right ) \right ) x_{m+2}-\cdots -\left ( -c_{n}+c_{1}a\left ( 1,n\right ) +c_{2}a\left ( 2,n\right ) +\cdots +c_{m}a\left ( m,n\right ) \right ) x_{n}\] Define cost coefficients, for \(j=m+1,\cdots ,n\)\[ r_{j} = c_{j} - \sum _{i=1}^{m} c_{i} a(i,j) \] Hence\[ J = J_{0} + \sum _{j=m+1}^{n} r_{j} x_{j} \] To minimize \(J\) we need to pick the most negative \(r_{j}\). This tells us which \(x_{j}\) we need to bring into the basis in order to reduce \(J\). Now we add extra row (last row) which is \(J\left ( u\right ) \) to the table, to keep cost in it. Here is the table with the cost coefficient row added\[ \overset{A}{\overbrace{\begin{pmatrix} 1 & \cdots & 0 & a\left ( 1,m+1\right ) & \cdots & a\left ( 1,n\right ) \\ \vdots & \ddots & \vdots & \vdots & \ddots & \cdots \\ 0 & \cdots & 1 & a\left ( m,m+1\right ) & \cdots & a\left ( m,n\right ) \\ 0 & 0 & 0 & r_{m+1} & \cdots & r_{n}\end{pmatrix} }}\overset{b}{\overbrace{\begin{pmatrix} y\left ( 1,0\right ) \\ y\left ( 2,0\right ) \\ \vdots \\ -J_{0}\end{pmatrix} }}\] state street vendor example Selling rings and bracelets. Ring has 3 oz. gold., 1 oz. silver. Bracelet has 1 oz. gold, 2 oz. silver. Profit on ring is $4 and profit on bracelet is $5. Initially we have 8 oz. gold and 9 0z silver. How many rings and bracelets to produce to maximize profit? This is the heart of many production problems. Note: we need integer LP to solve this. But if the quantities are very large, it will make little difference. So for now we can ignore the issue of integer programming. \begin{align*} u_{1} & =\text{number of rings}\\ u_{2} & =\text{number of bracelets}\\ J\left ( u\right ) & =4u_{1}+5u_{2} \end{align*}
Since we want to maximize this, then \[ J\left ( u\right ) =-4u_{1}-5u_{2}\] With \(u_{i}\geq 0\). Constraints are \(3u_{1}+u_{2}\leq 8\) and \(u_{1}+2u_{2}\leq 9\). We convert to standard LP with slack variables and make the first table
\(x_{1}\) | \(x_{2}\) | \(x_{3}\) | \(x_{4}\) | \(b\) | |
row 1 | \(3\) | \(2\) | \(1\) | \(0\) | \(8\) |
row 2 | \(1\) | \(2\) | \(0\) | \(1\) | \(9\) |
\(J\left ( u\right ) \) | \(-4\) | \(-5\) | \(0\) | \(0\) | \(0\) |
The first basic feasible solution is \(x_{3}=8,x_{4}=9\). To decide which column to bring in, we see the most negative is column 2 (last row is \(-5\)). To find the pivot row, we see it is first row since \(\min \left \{ \frac{8}{2},\frac{9}{2}\right \} =\left \{ 4,4.5\right \} =4\). Now we do the first stage.
Reader obtain the following
\(x_{1}\) | \(x_{2}\) | \(x_{3}\) | \(x_{4}\) | \(b\) | |
row 1 | \(2.5\) | \(0\) | \(1\) | \(-0.5\) | \(3.5\) |
row 2 | \(0.5\) | \(1\) | \(0\) | \(0.5\) | \(4.5\) |
\(J\left ( u\right ) \) | \(-1.5\) | \(0\) | \(0\) | \(2.5\) | \(22.5\) |
Hence \(x_{2}=4.5,x_{3}=3.5\). Since there is still a negative entry in the last row we need to repeat the process again. We Keep doing this until there are no negative entries in the last (third) row.
Now we will now talk about how to find the first basic feasible solution. There are two cases. If the number of slack variables is \(m\) then first basic feasible solution can be read out. This means there is no phase one LP. Case two. The number of slack variables is \(z<m\). Now we need to solve the first phase LP. Then use its result to solve second phase LP. In phase one, we introduce new artificial variables \(y_{i}\) as many as \(m-z\) and new artificial cost function \(J\left ( y\right ) \) which we want to minimize to zero.
See HW6, first problem for an example of how to solve first phase LP.
No class.
No class.
Coming up soon is the special problem. It is like one HW but can count up to two HW’s weight. Note: Possible rescheduling for April 6, 2016 remain in place.
Now we will start on today lecture. Often a problem is given to you, but it is not an LP problem. Sometimes it is not obvious how to convert it to an LP problem. Sometime we need algebra or cleaner reformulation of the problem to make it an LP problem. For example, in HW6, we had a min-max problem but it is was possible to convert it to an LP problem. See key solution for HW6.
Another application area for LP is control. Example is the minimum fuel problem. In this, we want to go from some state to final state with minimum control effort. The control effort is generic name which can mean many things depending on the problem itself. We also want to do this in minimal time. We begin with the discrete state equation \[ x(k+1)= A x(k) + B u(k) \] With \(x(0)\) given as the initial state. In the above, \(x\) is an \(n\times 1\) vector, and \(A\) is \(n\times n\) and \(B\) is \(n\times m\) where \(m\) is number of inputs, and \(u\) is \(m\times 1\) input vector.
We want to select \(u(k)\) sending \(x(0)\) to some target \(x^{\ast }\) at some future time \(k=N\). With \(N\) being minimal, and control effort minimum. Assume for now that \(u\) is scalar, which means one input, then an energy measure is \[ \sum _{k=0}^{N-1} \left | u(k) \right | ^{2} \] On the other hand, a peak measure is \[ \max \{ u(k),k=0,\dots ,N-1\} \] But we will consider the fuel measure given by \[ \sum _{k=0}^{N-1} \left | u(k) \right | \] We will use fuel measure in the LP problem. The constraint is \begin{align*} \left | u(k) \right | \leq U \tag{$\ast $} \end{align*}
Which says that control is bounded. Note that is \((A,B)\) is controllable, we can get from initial state to final state in one step if we want, but the effort will be very large. We also want \(x(N)=x^{\ast }\). The above two are the constraints in this problem. The objective function is \[ J(u) = \sum _{k=0}^{N-1} \left | u(k) \right | \] Therefore \begin{align*} x(1) = & A x(0) + B u(0)\\ x(2) = & A^{2} x(0) + A B u(0) + B u(1)\\ x(3) = & A^{3} x(0) + A^{2} B u(0) + A B u(1) + B u(2)\\ & \vdots \\ x(N) = & \underbrace{ A^{N} x(0) + \sum _{k=0}^{N-1} A^{N-1-k} B u(k) }_{x^{\ast } \text{This is the linear constraint}} \end{align*}
Now we rewrite the constraint \(\left | u(k) \right | \leq U \) as \[ u(k) = u_{p}(k) - u_{n}(k) \] with \(u_{p}(k),u_{n}(k)\) being positive. The objective function becomes (where we now put \(N\) as parameter, to say this is for a specific value of \(N\) \begin{align*} J_{N}(u) & = \sum _{k=0}^{N-1} \left | u(k) \right | \\ & = \sum _{k=0}^{N-1} u_{p}(k) + u_{n}(k) \end{align*}
Equation \(\ast \) above becomes \begin{align*} u_{p}(k) & \leq U\\ u_{n}(k) & \leq U \end{align*}
So minimizing \(J_{N}(u)\) is now an LP problem in \(2N\) raw variables (we still need to add the needed slack variables). So by doubling the number of variables, we were able to convert this control problem to an LP problem. Let \[ N^{\ast } = \inf \{ N : LP_{N} \hspace{5pt} \text{feasible} \} \] Reader Argue that \(l^{\infty }\) measure also lead to an LP problem. \(l^{\infty }\) measure is \(\max \{u(0),u(1),\dots \}\).
Example Let \(A = \begin{pmatrix} 1 & 1-e^{-T}\\ 0 & e^{-T}\end{pmatrix} \) where \(T=1\) is the sample time. And let \(B=\begin{pmatrix} 0\\ 1 \end{pmatrix} \). The bound on \(u(k)=1\). This means \(U=1\). Let \(x(0)=\begin{pmatrix} -40.91\\ 43.50 \end{pmatrix} \) and let the target \(x^{\ast }=\begin{pmatrix} 0\\ 0 \end{pmatrix} \). Find \(u^{\ast }\) and \(N^{\ast }\). Running LP for different values of \(N\) from \(1,\dots ,4\) we find the first feasible solution at \(N=4\). These is the resulting optimal effort \(u(k)\) \begin{align*} u(0) & = -0.3009\\ u(1) & = -1\\ u(2) & = -0.2999\\ u(3) & = -1\\ \end{align*}
And the corresponding optimal objective function \(J^{\ast }=2.6008\). In the above, we have priorities on minimal time first.
Now we will start on the next topic, which is dynamic programming. This involves discrete decisions. In this course we will cover only discrete dynamic programming and not continuous dynamic programming. We will be making sequential decisions in time. For example, if \(u(k)=\{-1,0,1\}\) then the decision tree will look like
We will get tree with potentially large number of branches. Combinatories arise. We get the curse of dimensionality problem again. For dynamic programming, Bellman is considered the person who originated the subject.
Exam.
For steepest descent problem, with optimal step size, max is \(1\), need to use \(\|u^{k+1}-u^{k}\| \leq 1\).
Next we will have special problem. Expect it next week.
Back to dynamic programming. Bellman secret is simple but theory is powerful. Main things to take from this course are
Suppose we want to take trip from NY to San Francisco, such that the total toll is minimum. We also must go west at each step we take once we start from NY. Not allowed to go east direction, even if the cost might be lower.
This diagram shows the possible routes and the cost (toll) for each segment.
Dynamic programming is now used to find optimal route in the above problem. (i.e. the route with least toll (cost) from NY to SFO). Dynamic programming is based on what if decisions. Instead of starting from NY and trying every possible route, we instead start backwards, and ask, what if we were in PORT, which route would we take?
Clearly the only route PORT to SFO with cost of 5 exist. Then we ask, what if we were in LV, which route would we take? we see it is LV to SFO with cost of 3. Then we ask what if we were in SNY? Then since LV had cost 3, then the route \(\text{SNY}\to \text{LV}\to \text{SFO}\) would be the one to take, with cost of \(2+3=5\). Each time we find the cost from one city to SFO, we label the city with this cost. We keep moving back to the east, doing the same. When we arrive all the way to JFK, then we see that the lowest cost is \[ J^{\ast } = \text{JFK} \to \text{NO} \to \text{PHX} \to \text{LV} \to \text{SFO} \]
The following diagram shows the route above, with the cost of moving from each city to SFO given next to each city name on the diagram.
If we were to solve the above problem using direct evaluation the number of computations is of order (assuming even \(n\) is \(\frac{(n-1)! n!}{\frac{n}{2}! \frac{n}{2}!}\) while with dynamic programming method as explained above, it is \(\frac{n^{2}}{2}+n\). For \(20\) cities, this given \(220\) for dynamic programming compare to over one million computation for the direct approach (trying all the possible routes).
We will only consider discrete dynamic programming. This is an optimization problem. The variables are not continuous. The variable take in discrete fixed values of choices each time. These applications are useful for integrate programming problems. An integer programming problem is much harder than continuous ones with much larger complexity.
When we are given an integer programming problem which is hard, we can try to approximate it to continuous programming problem and solve it more easily that way. For example \[ \min _{x} c^{T} x \quad \text{subject to} \quad Ax=b \] where \(x\) is allowed to be integers, is a hard problem. But if we relax it and allow \(x\) to take any value so that the problem becomes continuous, then it will become much easier to solve using Linear Programming.
There are papers written on when we can approximate integer programming problems as continuous.
We will be making sequential decisions. \(u_{0},u_{1},\dots ,u_{N-1}\) i.e. \(N\) decisions where \(u\in \mathbb{R}^{n}\). If we are given a problem which is not sequential, we can treat it as one for this purpose. For example, the car toll problem above, we formulate it to sequential but we did not have to do this. But the final answer \(J^{\ast }\) should come out the same no matter how it was formulated of course.
There will be states \(x(k),\, k=0,1,\dots ,N\) where \(x(N)\) is the terminal state. The constraints are state dependent. Because it depends on where we are when making the decision. For the car toll problem above, the decision depended on which city we were in. We denote the decision as \(u(k) \in \Omega \) and the state equation is \[ x(k+1) = f(x(k),u(k),k) \] and the objective function is \[ J(u)= \underbrace{ \sum _{k=0}^{N-1} J(x(k),u(k),k) }_{\text{stage cost}} + \underbrace{\psi (x(N))}_{\text{terminal cost}} \]
The terminal cost, is a cost applied once we reach the terminal state.
Now we need to define a subproblem. Notion of subproblem: Will begin at \(k=L\) and will be in state \(x(L)\). The cost when in state \(L\) is therefore \[ \boxed{ J_{L}(u) = \sum _{k=L}^{N-1} J(x(k),u(k),k) + \psi (x(N)) } \] Suppose \(u^{\ast }\) is the optimal decision when we are at state \(x(k)\). Let \(x^{\ast }{k}\) be the optimal trajectory from \(x(k)\). i.e. \(x^{\ast }(k)\) is corresponding state path beginning at given \(x(0)\). Hence \[ x^{\ast }(k+1) = f(x^{\ast }(k),u^{\ast }(k),k), \,\, k=0,1,\dots ,N-1 \]
If the subproblem begins at \(x(L)=x^{\ast }(L)\) i.e. we being subproblem along optimal trajectory of the original problem, then \(u^{\ast }(L),u^{\ast }(L+1),\dots ,u^{\ast }(N-1)\) is optimal for the subproblem.
What all this means, is that if the subproblem is optimal, then its trajectory has to be part of the overall problem optimal trajectory. An optimal subproblem, can not become sub-optimal when viewed as part of the main problem.
No class.
The first part of the lecture was on describing the special problem we have to do. This is described in the special problem HW itself included in HW chapters, under special problem.
Now we go back to dynamic programming. The state equation is\[ x\left ( k+1\right ) =f\left ( x\left ( k\right ) ,u\left ( k\right ) ,k\right ) \qquad k=0,1,\cdots N \quad \text{and} \quad u\left ( k\right ) \in \Omega _{k}\] And the objective function is\[ J=\Psi \left ( x\left ( N\right ) \right ) +\sum _{k=0}^{N-1}J_{k}\left ( x\left ( k\right ) ,u\left ( k\right ) \right ) \] Where \(\Psi \left ( x\left ( N\right ) \right ) \) is the cost of the terminal stage.
A subproblem is defined at intermediate point.
P.O.O. (principle Of optimality): if initial state of a subproblem is on the optimal trajectory of original problem, then the subproblem is also optimal.
Proof by contradiction Let \[ u\left ( k\right ) =\left \{ \begin{array} [c]{c}u^{\ast }\left ( k\right ) \qquad k=0,\cdots ,L-1\\ u_{new}^{\ast }\left ( k\right ) \qquad k=L,\cdots N-1 \end{array} \right . \]
Plug the above in the original problem. We will get a suboptimal solution.
Translating POO. to dynamic programming.
\[ \boxed{ I\left ( x\left ( N-1\right ) ,1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{N-1}}\left \{ \overset{\text{cost of one step}}{\overbrace{J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) }}+\Psi \left ( x\left ( N\right ) \right ) \right \} } \]
The above is the optimal cost with one step to terminal stage. This is similar to what we did for the routing problem from NY to San Francisco before. The above is when we are standing in Portland and looking for the last step to take to San Francisco.
To minimize the above, we have to express everything in the same state \(x\left ( N-1\right ) \). So we write the above, using the state equation, as
\begin{equation} I\left ( x\left ( N-1\right ) ,1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{N-1}}\left \{ J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) \right ) \right \} \tag{1} \end{equation}
Now we find \(u^{\ast }\left ( N-1\right ) \) of the above, using \[ \frac{dI\left ( x\left ( N-1\right ) ,1\right ) }{du\left ( N-1\right ) }=\frac{d}{du\left ( N-1\right ) }J\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u\left ( N-1\right ) \right ) \right ) =0 \]
And solve for \(u\left ( N-1\right ) \) using standard calculus. Once we find \(u^{\ast }\left ( N-1\right ) \,\), we plug it back into (1) and obtain
\[ I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) =J\left ( x\left ( N-1\right ) ,u^{\ast }\left ( N-1\right ) \right ) +\Psi \left ( f\left ( x\left ( N-1\right ) ,u^{\ast }\left ( N-1\right ) \right ) \right ) \]
Notice now there is no \(\min _{u\left ( N-1\right ) \in \Omega _{N-1}}\) since we already done this. We now apply POO. again to find cost from stage \(N-2\)
\[ I\left ( x\left ( N-2\right ) ,2\right ) =\min _{u\left ( N-2\right ) \in \Omega _{N-2}}\left \{ J\left ( x\left ( N-2\right ) ,u\left ( N-2\right ) \right ) +I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) \right \} \]
Notice the difference now. For all stages back, beyond \(N-1\), we use the cost found from the ahead stage, which is \(I^{\ast }\left ( x\left ( N-1\right ) ,1\right ) \) in the above case. We now repeat the process, and find optimal \(u^{\ast }\left ( x\left ( N-2\right ) ,2\right ) \). See HW 7, problem 2 for detailed example how to do this. More generally,
\[ I\left ( x\left ( L\right ) ,N-L\right ) =\min _{u\left ( L\right ) \in \Omega _{L}}\left \{ J\left ( x\left ( L\right ) ,u\left ( L\right ) \right ) +I^{\ast }\left ( x\left ( L+1\right ) ,N-L-1\right ) \right \} \]
The above is called the dynamic programming equation.
At stage \(L\) the optimal cost from stage \(L\) with \(N-L\) steps to go is\[ \boxed{ I\left ( x_{L},N-L\right ) =\min _{u\left ( N-L\right ) \in \Omega _{L}}\left \{ J\left ( x_{L},u\left ( L\right ) \right ) +I\left ( x_{L+1},N-\left ( L+1\right ) \right ) \right \} } \] With appropriate initialization.
Some comments: The trickiest part is how to use this equation. Must be careful. Think of \(u\left ( L\right ) \) as feedback. We call the optimal \(u^{\ast }\left ( L\right ) .\)
Warning. There is a constraint on \(u\). Do not use derivative to find optimal without being careful about the limits and constraints. For example, if \(\left \vert u\left ( L\right ) \right \vert \leq 1\) and we have quadratic form. We will now use an example to show how to use these equations. Let\begin{equation} x_{k+1}=x_{k}-u_{k} \tag{1} \end{equation} Where \(u_{k}\) is free to take any value. Let the objective function be\begin{equation} J\left ( x_{k},u_{k}\right ) =\sum _{k=0}^{N-1}\left ( x_{k+1}-u_{k}\right ) ^{2}+u_{k}^{2} \tag{2} \end{equation} Reflecting a simple tracking mechanism. We always start at \(x\left ( N-1\right ) \) with one stage to go. Hence the optimal cost from \(x_{N-1}\) with one step to go is\[ I\left ( x_{N-1},1\right ) =\min _{u\left ( N-1\right ) \in \Omega _{1}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) +\Psi \left ( x_{N}\right ) \right \} \] \(\Psi \left ( x_{N}\right ) \) is the terminal cost. Let us now remove it from the rest of the computation to simplify things. We also replace \(J\) in the above from (2) and obtain\begin{align*} I\left ( x_{N-1},1\right ) & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{\left ( \left ( N-1\right ) +1\right ) }-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \\ & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{N}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \\ & =\min _{u\left ( N-1\right ) }\left \{ \left ( x_{N}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \end{align*}
We want everything in terms of \(x_{N-1}\). So we use (1) to write \(x_{N}=x_{N-1}-u_{N-1}\) and plug it back in the last equation above to obtain\begin{align} I\left ( x_{N-1},1\right ) & =\min _{u_{N-1}}\left \{ \left ( x_{N-1}-u_{N-1}-u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \nonumber \\ & =\min _{u_{N-1}}\left \{ \left ( x_{N-1}-2u_{N-1}\right ) ^{2}+u_{N-1}^{2}\right \} \tag{3} \end{align}
Only now to take derivative, in order to find \(u_{N-1}^{\ast }\). therefore\begin{align*} \frac{dI\left ( x_{N-1},1\right ) }{u_{N-1}} & =0\\ 2\left ( x_{N-1}-2u_{N-1}\right ) \left ( -2\right ) +2u_{N-1} & =0\\ u_{N-1}^{\ast } & =\frac{2}{5}x_{N-1} \end{align*}
Now that we found the optimal \(u_{N-1}^{\ast }\) we go back to (3) and replace \(u_{N-1}\) in (3) by \(u_{N-1}^{\ast }\). Hence\begin{align*} I\left ( x_{N-1},1\right ) & =\left ( x_{N-1}-2\left ( \frac{2}{5}x_{N-1}\right ) \right ) ^{2}+\left ( \frac{2}{5}x_{N-1}\right ) ^{2}\\ & =\frac{1}{5}x_{N-1}^{2} \end{align*}
Now we backup one step. We need to find\begin{equation} I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ J\left ( x_{N-2},u_{N-1}\right ) +I\left ( x_{N-1},1\right ) \right \} \tag{4} \end{equation} Notice that we used \(I\left ( x_{N-1},1\right ) \) in place of what we had before, which was the terminal cost \(\Psi \left ( x\left ( N\right ) \right ) \). Since now we are two steps behind. All the work before was for finding optimal \(I\left ( x_{N-1},1\right ) \). So now (4) becomes\begin{equation} I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) +\frac{1}{5}x_{N-1}^{2}\right \} \tag{5} \end{equation} But from (2) \[ J\left ( x_{N-1},u_{N-2}\right ) =\left ( x_{N-1}-u_{N-2}\right ) ^{2}+u_{N-2}^{2}\] Hence (5) becomes\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}}\left \{ \left ( x_{N-1}-u_{N-2}\right ) ^{2}+u_{N-2}^{2}+\frac{1}{5}x_{N-1}^{2}\right \} \] We need to use the state equation \(x_{k+1}=x_{k}-u_{k}\) to rewrite \(x_{N-1}\) in the above, since we want everything in \(N-2\) terms. Therefore the above becomes\begin{align} I\left ( x_{N-2},2\right ) & =\min _{u_{N-2}}\left \{ \left ( \left ( x_{N-2}-u_{N-2}\right ) -u_{N-2}\right ) ^{2}+u_{N-2}^{2}+\frac{1}{5}\left ( x_{N-2}-u_{N-2}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-2}}\left \{ \frac{26}{5}u_{N-2}^{2}-\frac{22}{5}u_{N-2}x_{N-2}+\frac{6}{5}x_{N-2}^{2}\right \} \tag{6} \end{align}
Now we take derivative, to find \(u_{N-2}^{\ast }\)\begin{align*} \frac{dI\left ( x_{N-2},2\right ) }{u_{N-2}} & =0\\ \frac{52}{5}u_{N-2}-\frac{22}{5}x_{N-2} & =0\\ u_{N-2}^{\ast } & =\frac{11}{26}x_{N-2} \end{align*}
Now that we found the optimal \(u_{N-2}^{\ast }\), we go back to (6) and replace \(u_{N-2}\) there with \(u_{N-2}^{\ast }\)\begin{align*} I\left ( x_{N-2},2\right ) & =\frac{26}{5}\left ( \frac{11}{26}x_{N-2}\right ) ^{2}-\frac{22}{5}\left ( \frac{11}{26}x_{N-2}\right ) x_{N-2}+\frac{6}{5}x_{N-2}^{2}\\ & =\frac{7}{26}x_{N-2}^{2} \end{align*}
Reader Carry out one more stage and obtain \(J^{\ast }=\left ( x\left ( 0\right ) ,3\right ) \)
Answer \begin{align*} I\left ( x_{N-3},3\right ) & =\min _{u_{N-3}}\left \{ J\left ( x_{N-3},u_{N-3}\right ) +I\left ( x_{N-2},2\right ) \right \} \\ & =\min _{u_{N-3}}\left \{ J\left ( x_{N-3},u_{N-3}\right ) +\frac{7}{26}x_{N-2}^{2}\right \} \end{align*}
But from (2) \(J\left ( x_{N-3},u_{N-3}\right ) =\left ( x_{N-2}-u_{N-3}\right ) ^{2}+u_{N-3}^{2}\), hence the above becomes\[ I\left ( x_{N-3},3\right ) =\min _{u_{N-3}}\left \{ \left ( x_{N-2}-u_{N-3}\right ) ^{2}+u_{N-3}^{2}+\frac{7}{26}x_{N-2}^{2}\right \} \] We need to use the state equation \(x_{k+1}=x_{k}-u_{k}\) to rewrite \(x_{N-2}\) in the above, since we want everything in \(N-3\) terms. Therefore the above becomes\begin{align} I\left ( x_{N-3},3\right ) & =\min _{u_{N-3}}\left \{ \left ( \left ( x_{N-3}-u_{N-3}\right ) -u_{N-3}\right ) ^{2}+u_{N-3}^{2}+\frac{7}{26}\left ( x_{N-3}-u_{N-3}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-3}}\left \{ \frac{137}{26}u_{N-3}^{2}-\frac{59}{13}u_{N-3}x_{N-3}+\frac{33}{26}x_{N-3}^{2}\right \} \tag{7} \end{align}
Now we take derivative, to find \(u_{N-3}^{\ast }\)\begin{align*} \frac{dI\left ( x_{N-3},3\right ) }{u_{N-3}} & =0\\ \frac{274}{26}u_{N-3}-\frac{59}{13}x_{N-3} & =0\\ u_{N-3}^{\ast } & =\frac{59}{137}x_{N-3} \end{align*}
Replace this back in (7)\begin{align*} I\left ( x_{N-3},3\right ) & =\frac{137}{26}\left ( \frac{59}{137}x_{N-3}\right ) ^{2}-\frac{59}{13}\left ( \frac{59}{137}x_{N-3}\right ) x_{N-3}+\frac{33}{26}x_{N-3}^{2}\\ & =\frac{40}{137}x_{N-3}^{2} \end{align*}
Let us do one more one, \(N=4.\)\begin{align*} I\left ( x_{N-4},4\right ) & =\min _{u_{N-4}}\left \{ J\left ( x_{N-4},u_{N-4}\right ) +I\left ( x_{N-3},3\right ) \right \} \\ & =\min _{u_{N-4}}\left \{ J\left ( x_{N-4},u_{N-4}\right ) +\frac{40}{137}x_{N-3}^{2}\right \} \end{align*}
But from (2) \(J\left ( x_{N-4},u_{N-4}\right ) =\left ( x_{N-3}-u_{N-4}\right ) ^{2}+u_{N-4}^{2}\), hence the above becomes\[ I\left ( x_{N-4},4\right ) =\min _{u_{N-4}}\left \{ \left ( x_{N-3}-u_{N-4}\right ) ^{2}+u_{N-4}^{2}+\frac{40}{137}x_{N-3}^{2}\right \} \] We need to use the state equation \(x_{k+1}=x_{k}-u_{k}\) to rewrite \(x_{N-3}\) in the above, since we want everything in \(N-4\) terms. Therefore the above becomes\begin{align} I\left ( x_{N-4},4\right ) & =\min _{u_{N-4}}\left \{ \left ( \left ( x_{N-4}-u_{N-4}\right ) -u_{N-4}\right ) ^{2}+u_{N-4}^{2}+\frac{40}{137}\left ( x_{N-4}-u_{N-4}\right ) ^{2}\right \} \nonumber \\ & =\min _{u_{N-4}}\left \{ \frac{725}{137}u_{N-4}^{2}-\frac{628}{137}u_{N-4}x_{N-4}+\frac{177}{137}x_{N-4}^{2}\right \} \tag{8} \end{align}
Now we take derivative, to find \(u_{N-4}^{\ast }\)\begin{align*} \frac{dI\left ( x_{N-4},4\right ) }{u_{N-4}} & =0\\ \frac{1450}{137}u_{N-4}-\frac{628}{137}x_{N-4} & =0\\ u_{N-4}^{\ast } & =\frac{314}{725}x_{N-4} \end{align*}
Therefore (8) becomes\begin{align*} I\left ( x_{N-4},4\right ) & =\frac{725}{137}\left ( \frac{314}{725}x_{N-4}\right ) ^{2}-\frac{628}{137}\left ( \frac{314}{725}x_{N-4}\right ) x_{N-4}+\frac{177}{137}x_{N-4}^{2}\\ & =\frac{217}{725}x_{N-4}^{2} \end{align*}
A table of the summary
\(L\) | \(I\left ( x_{N-L},L\right ) \) |
\(1\) | \(0.2\ x_{N-1}^{2}\) |
\(2\) | \(0.2692\ x_{N-2}^{2}\) |
\(3\) | \(0.29197\ x_{N-3}^{2}\) |
\(4\) | \(0.29931\ x_{N-4}^{2}\) |
So for \(N=4\)
\(L\) | \(I\left ( x_{4-L},L\right ) \) |
\(1\) | \(0.2\ x_{3}^{2}\) |
\(2\) | \(0.2692\ x_{2}^{2}\) |
\(3\) | \(0.29197\ x_{1}^{2}\) |
\(4\) | \(0.29931\ x_{0}^{2}\) |
Using \(x_{k+1}=x_{k}-u_{k}\) to write everything in terms of \(x_{0}\)
\(L\) | \(I\left ( x_{4-L},L\right ) \) |
\(1\) | \(0.2\ \left ( \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) -u_{2}\right ) ^{2}\) |
\(2\) | \(0.2692\ \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) ^{2}\) |
\(3\) | \(0.29197\ x_{1}^{2}\) \(\left ( x_{0}-u_{0}\right ) ^{2}\) |
\(4\) | \(0.29931\ x_{0}^{2}\) |
So total cost is
\[ I=0.2\ \left ( \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) -u_{2}\right ) ^{2}+0.2692\ \left ( \left ( x_{0}-u_{0}\right ) -u_{1}\right ) ^{2}+0.29197\ x_{1}^{2}\left ( x_{0}-u_{0}\right ) ^{2}+0.29931\ x_{0}^{2}\] This example is special case of LQR. What happens in this example above, or more generally when \(N\rightarrow \infty \) ? As \(N\rightarrow \infty \) we will see later that the feedback gains become time invariant. This is called steady state LQR and will we arrive at the Riccati equations.
Next example is allocation problem We will do it in two steps. We can solve this without using D.P. but will use D.P. to illustrate the method. Consider two investments.
Let say with start with fixed amount of money \(k\) dollars. We have constraint: \(b_{r}\) is maximum allowed amount of investment in real estate, \(b_{o}\) is the maximum amount allowed for oil investment. To avoid trivial solution, assume also that \(b_{r}+b_{o}>k\). Let \(u_{r}\) the amount invested in real estate, and let \(u_{o}\) amount invested in oil.
Common sense solution is \(u_{0}^{\ast }=k-b_{o}\) and \(u_{1}^{\ast }=b_{o}\) since investment in oil has higher return. \(u_{1}\) is investment in oil, and \(u_{0}\) is investment in real estate. Let do this using D.P. The objective function is\[ J\left ( u\right ) =2u_{0}+4u_{1}\] And the state equation is (we only have two states \(x_{1},x_{0}\))\[ x_{1}=x_{0}-u_{0}\] Initial state is \(x_{0}=k\) which is the money we have at the start. There are two stages. Hence \(N=2\). We start with\begin{align*} I\left ( x_{N-1},1\right ) & =I\left ( x_{1},1\right ) \\ & =\max _{u_{1}\in \Omega _{1}}\left \{ J\left ( u\right ) \right \} \end{align*}
i.e. we assume we have one stage to go, and that we have made initial investment in real estate and now we are making investment in oil. Hence\[ \Omega _{1}=\left [ 0\cdots \min \left \{ b_{o},x_{1}\right \} \right ] \] Hence \(u_{1}^{\ast }=\min \left \{ b_{o},x_{1}\right \} \) and\[ I\left ( x_{1},1\right ) =4\min \left \{ b_{o},x_{1}\right \} \] Now backup one step.\begin{align*} I\left ( x_{0},2\right ) & =\max _{u_{0}\in \Omega _{0}}\left \{ J\left ( u\right ) +I\left ( x_{1},1\right ) \right \} \\ & =\max _{u_{0}\in \Omega _{0}}\left \{ 2u_{0}+4\min \left \{ b_{o},x_{1}\right \} \right \} \end{align*}
Where \(\Omega _{0}=\left [ 0\cdots b_{o}\right ] \,\). Therefore since \(x_{1}=x_{0}-u_{0}\) the above becomes\[ I\left ( x_{0},2\right ) =\max _{u_{0}\in \Omega _{0}}\left \{ 2u_{0}+4\min \left \{ b_{o},x_{0}-u_{0}\right \} \right \} \] Let \begin{align*} F\left ( u_{0}\right ) & =2u_{0}+4\min \left \{ b_{o},x_{0}-u_{0}\right \} \\ & =2u_{0}+4\min \left \{ b_{o},k-u_{0}\right \} \end{align*}
Find the maximum using graphics method. This gives \(u_{o}^{\ast }=k-b_{o}\) which is the same using the common sense approach.
The following diagram shows the solution of the above using the dynamic graph method.
In the following, some of the terms were re-written with the index being as subscript, as it is easier to see on the screen, Hence instead of \(x\left ( k\right ) \) as was done in the lecture, it becomes \(x_{k}\) and \(u\left ( k\right ) \) becomes \(u_{k}\) and so on.
Now we start with the state equation\[ \boxed{ x_{k+1}=Ax_{k}+Bu_{k} } \] In the above, \(A\) is \(n\times n\) and \(B\) is \(n\times m\) where \(m\) is the number of inputs, and \(u_{k}\) is column vector of size \(m\), and similarly for \(x_{k+1}\) and \(x_{k}\). In continuous time, the above is\[ \boxed{ \dot{x}\left ( t\right ) =Ax\left ( t\right ) +Bu\left ( t\right ) } \] We can also allow time varying \(A,B\) in the above, but in this discussion we assumed these are constant matrices. The goal is to make \(x\left ( k\right ) \) track, or approach some desired \(x^{d}\left ( k\right ) \) with as little effort \(u\left ( k\right ) \) as possible (what about also as fast as possible at same time?). \(u\left ( k\right ) \) is called the control effort. This diagram illustrates this
When \(x^{d}\left ( \infty \right ) =0\), this means we want to bring the system to stable state. We want now to quantify the goal \(J\left ( u\right ) \). So the problem is to bring the state \(x\left ( k\right ) \) to zero using as small effort \(u\) as possible. We write the cost \(J\left ( u\right ) \) as\[ \boxed{ J\left ( u\right ) =\sum _{k=0}^{N-1}x_{k+1}^{T}Qx_{k+1}+u_{k}^{T}Ru_{k} } \] This is just something we have to accept as given. We did not derive this, but it makes sense. Notice for example, when \(x\left ( k\right ) \) is small, then \(J\left ( u\right ) \) is small. But from now on, we just use the above as given. In the above, \(Q\) and \(R\) are called weighting matrices. For example, if \(u\left ( 1\right ) \) is more important than \(u\left ( 2\right ) \), we adjust the values in \(Q\) to reflect different weights we want to assign. In the above, \(Q\) is \(n\times m\) and \(R\) is \(m\times m\). Both \(Q\) and \(R\) are positive definite and symmetric. Now we start using the Bellman dynamic equations.\begin{align} I\left ( x_{N-1},1\right ) & =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ J\left ( x_{N-1},u_{N-1}\right ) ,\Psi \left ( x_{N}\right ) \right \} \nonumber \\ & =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ x_{N}^{T}Qx_{N}+u_{N-1}^{T}Ru_{N-1}\right \} \tag{1} \end{align}
We ignore the terminal cost \(\Psi \left ( x_{N}\right ) \) for now. Be careful with the indices. Notice in the above, after replacing \(J\left ( u\right ) \), that \(x\) has index \(N\) and the \(u\) has the \(N-1\) index on it. This is due to how \(J\left ( u\right ) \) is given to us to use, which has \(x_{k+1}\) in it already. EQ (1) is our starting point. Now we start applying the dynamic equations recursively. First , we replace the state equation in the above and obtain\begin{equation} I\left ( x_{N-1},1\right ) =\min _{u_{N-1}\in \Omega _{N-1}}\left \{ \left ( Ax_{N-1}+Bu_{N-1}\right ) ^{T}Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\right \} \tag{2} \end{equation} Notice that now all indices on \(x\) are \(N-1\), this is because the state equation being \(x_{k+1}=Ax_{k}+Bu_{k}\). Notice that (2) is quadratic in \(u_{N-1}\). This is important. Doing one more simplification on (2) gives (where the leading \(\min _{u_{N-1}\in \Omega _{N-1}}\) is now removed, just to make the equations fit), but it is assumed to be on each equation on what follows
\begin{align*} I\left ( x_{N-1},1\right ) & =\left ( \left ( Ax_{N-1}\right ) ^{T}+\left ( Bu_{N-1}\right ) ^{T}\right ) Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}+u_{N-1}^{T}B^{T}\right ) Q\left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Ax_{N-1}+Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Ax_{N-1}\right ) +\left ( x_{N-1}^{T}A^{T}Q+u_{N-1}^{T}B^{T}Q\right ) \left ( Bu_{N-1}\right ) +u_{N-1}^{T}Ru_{N-1}\\ & =x_{N-1}^{T}A^{T}QAx_{N-1}+u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1} + u_{N-1}^{T}B^{T}QBu_{N-1} +u_{N-1}^{T}Ru_{N-1}\\ & =\left ( u_{N-1}^{T}B^{T}QBu_{N-1}+u_{N-1}^{T}Ru_{N-1}\right ) + u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1}+x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+x_{N-1}^{T}A^{T}QBu_{N-1}\right ) +\left ( x_{N-1}^{T} A^{T}QAx_{N-1}\right ) \\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( QBu_{N-1}\right ) ^{T}\left ( x_{N-1}^{T}A^{T}\right ) ^{T}\right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} + \left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( QBu_{N-1}\right ) ^{T}\left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( \left ( u_{N-1}^{T}\left ( QB\right ) ^{T}\right ) \left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}\left ( B^{T}Q^{T}\right ) \left ( Ax_{N-1}\right ) \right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1} +\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}Q^{T}Ax_{N-1}\right ) ^{T}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1} \end{align*}
But \(Q^{T}=Q\) and the above becomes\begin{align*} I\left ( x_{N-1},1\right ) &=u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) ^{T}\right ) \\ & \qquad +x_{N-1}^{T}A^{T}QAx_{N-1} \tag{2} \end{align*}
Using \[ \boxed{H=\frac{1}{2}\left ( H+H^{T}\right ) } \] On the middle term \(\left ( u_{N-1}^{T}B^{T}QAx_{N-1}+\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) ^{T}\right ) \), where \(H=u_{N-1}^{T}B^{T}QAx_{N-1}\) reduces (2) to \begin{align*} I\left ( x_{N-1},1\right ) &=\min _{u_{N-1}\in \Omega _{N-1}}\left \{ u_{N-1}^{T}\left ( B^{T}QB+R\right ) u_{N-1}+2\left ( u_{N-1}^{T}B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\right \} \tag{3} \end{align*}
The above is in the form \(I=a_{1}u^{2}+2a_{2}u+a_{3}\), therefore it is quadratic in \(u_{N-1}\). Taking derivative w.r.t. \(u_{N-1}\) since this is what we are minimizing \(I\) with respect to, we obtain from (3)\begin{align*} \frac{\partial I\left ( x_{N-1},1\right ) }{\partial u_{N-1}} & =0\\ 0 & =2\left ( B^{T}QB+R\right ) u_{N-1}+2\left ( B^{T}QAx_{N-1}\right ) \end{align*}
\(R\) is positive definite, and \(Q\) is positive definite. Solving for \(u_{N-1}\) gives\begin{align} u_{N-1}^{\ast } & =-\left ( B^{T}QB+R\right ) ^{-1}\left ( B^{T}QAx_{N-1}\right ) \nonumber \\ & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QAx_{N-1} \tag{4} \end{align}
Reader \(\left ( B^{T}QB+R\right ) \) is the Hessian. Show it is positive definite matrix. Since the Hessian is P.D., then \(u_{N-1}^{\ast }\) is global min. Eq (4) is linear feedback on state \(\left ( N-1\right ) \). i.e. we write\[ u^{\ast }\left ( N-1\right ) =K\left ( N-1\right ) x\left ( N-1\right ) \] Where \(K\left ( N-1\right ) \) is called the gain matrix which is\[ K\left ( N-1\right ) =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA \] In all the expressions below, we see the term \(\left ( B^{T}QB+R\right ) \) repeating. So to simplify things and make the equations smaller, let \[ \fbox{$\Phi =B^TQB+R$}\] Hence\[ K\left ( N-1\right ) =-\Phi ^{-1}B^{T}QA \] And therefore\begin{align*} u_{N-1}^{\ast } & =\left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\\ & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QAx_{N-1} \end{align*}
Now we go back to (3) and replace the \(u_{N-1}\) in that expression with \(u_{N-1}^{\ast }\) we found in (4). (we remove the \(\min _{u_{N-1}\in \Omega _{N-1}}\) we had in (3), since it is now the minimum)\begin{align*} I^{\ast }\left ( x_{N-1},1\right ) & =u_{N-1}^{\ast T}\Phi u_{N-1}^{\ast }+2\left ( u_{N-1}^{\ast T}B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =\left ( \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\right ) ^{T}\Phi \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\\ & \qquad +2\left ( \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}\right ) ^{T}\left ( B^{T}QAx_{N-1}\right ) +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =-x_{N-1}^{T}\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\Phi \left ( -\Phi ^{-1}B^{T}QA\right ) x_{N-1}-2x_{N-1}^{T}\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QAx_{N-1}\right ) \\ & \qquad +x_{N-1}^{T}A^{T}QAx_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( \Phi ^{-1}B^{T}QA\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( B^{T}QA\right ) ^{T}\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( B^{T}QA\right ) ^{T}\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ \left ( QA\right ) ^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2\left ( QA\right ) ^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ A^{T}Q^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) -2A^{T}Q^{T}B\left ( \Phi ^{-1}\right ) ^{T}\left ( B^{T}QA\right ) +A^{T}QA\right ] x_{N-1} \end{align*}
But \(Q=Q^{T}\) and \(\left ( \Phi ^{-1}\right ) ^{T}=\Phi ^{-1}\). Note that \(\Phi \) is the Hessian matrix, and it is positive definite, and assumed symmetric. That is why \(\left ( \Phi ^{-1}\right ) ^{T}=\Phi ^{-1}\). But we did not proof this. It was a reader to show this is positive definite. The above therefore becomes\begin{align*} I^{\ast }\left ( x_{N-1},1\right ) & =x_{N-1}^{T}\left [ A^{T}QB\Phi ^{-1}B^{T}QA-2A^{T}Q^{T}B\Phi ^{-1}B^{T}QA+A^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}\left [ A^{T}Q-A^{T}QB\Phi ^{-1}B^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}A^{T}Q\left [ I-B\Phi ^{-1}B^{T}QA\right ] x_{N-1}\\ & =x_{N-1}^{T}A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QAx_{N-1} \end{align*}
Let\[ M_{N-1}=A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA \] Then\[ I^{\ast }\left ( x_{N-1},1\right ) =x_{N-1}^{T}M_{N-1}x_{N-1}\] Now that we found \(I^{\ast }\left ( x_{N-1},1\right ) \), we go back one more step\begin{align} I\left ( x_{N-2},2\right ) & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ J\left ( x_{N-2},u_{N-2}\right ) +I^{\ast }\left ( x_{N-1},1\right ) \right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}Qx_{N-1}+u_{N-2}^{T}Ru_{N-2}+I^{\ast }\left ( x_{N-1},1\right ) \right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}Qx_{N-1}+u_{N-2}^{T}Ru_{N-2}+x_{N-1}^{T}M_{N-1}x_{N-1}\right \} \nonumber \\ & =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ x_{N-1}^{T}\left ( Q+M_{N-1}\right ) x_{N-1}+u_{N-2}^{T}Ru_{N-2}\right \} \tag{6} \end{align}
Think of \(\left ( Q+M_{N-1}\right ) \) as the new \(Q\) matrix at stage \(N-2\). We need to replace everything to be at \(N-2\) stage, using the state equation \(x_{k+1}=Ax_{k}+Bu_{k}\) then\[ x_{N-1}=Ax_{N-2}+Bu_{N-2}\] Hence (6) becomes\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ \left ( Ax_{N-2}+Bu_{N-2}\right ) ^{T}\left ( Q+M_{N-1}\right ) \left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\right \} \] As above, remove \(\min _{u_{N-2}\in \Omega _{N-2}}\) in what follows so that equations fit on the page and let \[ \fbox{$Q^\prime =Q+M_N-1$}\] Then\begin{align} I\left ( x_{N-2},2\right ) & =\left ( \left ( Ax_{N-2}\right ) ^{T}+\left ( Bu_{N-2}\right ) ^{T}\right ) Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( \left ( Ax_{N-2}\right ) ^{T}Q^{\prime }+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }\right ) \left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }\left ( Ax_{N-2}+Bu_{N-2}\right ) +u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }Ax_{N-2}+\left ( Ax_{N-2}\right ) ^{T}Q^{\prime }Bu_{N-2}+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }Ax_{N-2}+\left ( Bu_{N-2}\right ) ^{T}Q^{\prime }Bu_{N-2} +\\ & \qquad u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =x_{N-2}^{T}A^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}Ru_{N-2}\nonumber \\ & =u_{N-2}^{T}\left ( B^{T}Q^{\prime }B+R\right ) u_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} \tag{7} \end{align}
But \begin{align*} x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} & =\left ( u_{N-2}^{T}\left ( x_{N-2}^{T}A^{T}Q^{\prime }B\right ) ^{T}\right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( \left ( A^{T}Q^{\prime }B\right ) ^{T}x_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( \left ( Q^{\prime }B\right ) ^{T}Ax_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( u_{N-2}^{T}\left ( B^{T}\left ( Q^{\prime }\right ) ^{T}Ax_{N-2}\right ) \right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} \end{align*}
But \(Q^{\prime }=\left ( Q^{\prime }\right ) ^{T}\) since symmetric2 then the above becomes\begin{align} x_{N-2}^{T}A^{T}Q^{\prime }Bu_{N-2}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2} & =\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}+u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\nonumber \\ & =2\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \tag{8} \end{align}
Using \(H=\frac{1}{2}\left ( H+H^{T}\right ) \). Replacing (8) into (7) gives\begin{equation} I\left ( x_{N-2},2\right ) =u_{N-2}^{T}\left ( B^{T}Q^{\prime }B+R\right ) u_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+2\left ( u_{N-2}^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \tag{9} \end{equation} We now find \(u_{N-2}^{\ast }\)\begin{align*} \frac{\partial I\left ( x_{N-2},2\right ) }{\partial u_{N-2}} & =0\\ 0 & =2u_{N-2}\left ( B^{T}Q^{\prime }B+R\right ) +2B^{T}Q^{\prime }Ax_{N-2} \end{align*}
Hence\[ u_{N-2}^{\ast }=\left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A\right ) x_{N-2}\] Where \(Q^{\prime }=Q+M_{N-1}\). Hence\begin{align*} K\left ( N-2\right ) & =-\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A\\ & =-\left ( B^{T}\left ( Q+M_{N-1}\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N-1}\right ) A \end{align*}
Now we go back to \(I\left ( x_{N-2},2\right ) \) and replace \(u_{N-2}\) with \(u_{N-2}^{\ast }\) we just found. From (9)\begin{align*} I^{\ast }\left ( x_{N-2},2\right ) & =\left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) \left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) +\\ & \qquad x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}+2\left ( \left ( -\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}B^{T}Q^{\prime }Ax_{N-2}\right ) \\ & =\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}\left ( B^{T}Q^{\prime }B+R\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+\\ & \qquad x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}-2\left ( \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}B^{T}Q^{\prime }Ax_{N-2}\\ & =\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}\\ & -2\left ( B^{T}Q^{\prime }Ax_{N-2}\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}\\ & =x_{N-2}^{T}\left ( B^{T}Q^{\prime }A\right ) ^{T}\left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }Ax_{N-2}+x_{N-2}^{T}\left ( A^{T}Q^{\prime }A\right ) x_{N-2}\\ & =x_{N-2}^{T}\left [ \left ( A^{T}Q^{\prime T}B\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A+\left ( A^{T}Q^{\prime }A\right ) \right ] x_{N-2} \end{align*}
Reader Argue that \(I^{\ast }\left ( x_{N-2},2\right ) \) looks like\[ I^{\ast }\left ( x_{N-2},2\right ) =x_{N-2}^{T}M_{N-2}x_{N-2}\] Reader Find \(M_{N-2}\)
\begin{align*} M_{N-2} & =\left ( A^{T}Q^{\prime T}B\right ) \left ( B^{T}Q^{\prime }B+R\right ) ^{-1}B^{T}Q^{\prime }A+\left ( A^{T}Q^{\prime }A\right ) \\ & =\left ( A^{T}\left ( Q+M_{N}-1\right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N}-1\right ) A\\ & \qquad +\left (A^{T}\left ( Q+M_{N}-1\right ) A\right ) \end{align*}
But \(M_{N}-1=A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\), hence
\begin{align*} M_{N-2} & =\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\\ & \left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) A+\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) \right ) A\right ) \end{align*}
But \(\Phi =B^{T}QB+R\), hence the above becomes
\begin{align*} M_{N-2} & =\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) ^{T}B\right ) \left ( B^{T}\left ( Q+M_{N}-1\right ) B+R\right ) ^{-1}B^{T}\\ & \hspace{0.75in}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) A\\ & \hspace{1in}+\left ( A^{T}\left ( Q+\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) \right ) A\right ) \end{align*}
Summary
\[ \boxed{ \begin{array}{rl} u_{N-i}^{\ast } & =K_{N-i}x_{N-i}\\ I^{\ast }\left ( x_{N-i}\right ) & =x_{N-i}^{T}M_{N-i}x_{N-i} \end{array} }\] Reader Find a formula for \(K\left ( N-3\right ) \).
The expression for \(K_{N-i}\) is\begin{align*} K_{N-1} & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA\\ K_{N-2} & =-\left ( B^{T}\left ( Q+M_{N-1}\right ) B+R\right ) ^{-1}B^{T}\left ( Q+M_{N-1}\right ) A\\ & =-\left ( B^{T}QB+R+B^{T}M_{N-1}B\right ) ^{-1}\left ( B^{T}QA+B^{T}M_{N-1}A\right ) \end{align*}
Where \(M_{N-1}=A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\) and \(\Phi =B^{T}QB+R\), hence \begin{align*} K_{N-2} & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\\ & \qquad \left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\Phi ^{-1}B^{T}\right ] QA\right ) A\right ) \\ & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\\ & \qquad \hspace{0.1in}\hspace{1in}\left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) A\right ) \end{align*}
The final optimal cost will be\[ J^{\ast }=x_{0}^{T}M_{0}x_{0}\]
Reader Let \(A=\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} ,B=\begin{pmatrix} 0\\ 1 \end{pmatrix} ,Q=\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \) and \(R=1\), the weight on input. For \(N=3\), find \(K\left ( 2\right ) ,K\left ( 1\right ) ,K\left ( 0\right ) \) solving for LQR problem. We will get optimal gain \(K\left ( i\right ) \) and these will not be the same. We do not like time varying gains, as in this case. We like the gain matrix to be constant, as it is easier to manger and more safe to use. If we make \(N\) very large, then gain will become constant. We start from very large \(N\) and go back to zero.
Solution
\(N=3,\)\begin{align*} K\left ( N-1\right ) & =-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA\\ K\left ( 2\right ) & =-\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1\right ) ^{-1}\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \\ & =-\left ( 2+1\right ) ^{-1}\begin{pmatrix} -1 & 4 \end{pmatrix} \\ & =\frac{-1}{3}\begin{pmatrix} -1 & 4 \end{pmatrix} \\ & =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} \end{align*}
Hence \[ \fbox{$K\left ( 2\right ) =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} =\begin{pmatrix} 0.33333 & -1.333\,3 \end{pmatrix} $}\] Will do things from now on using the state equations directly, as it seems easier. Starting again\begin{align*} I\left ( x\left ( 3-1\right ) ,1\right ) & =\min _{u\left ( 2\right ) }\left \{ J\left ( x_{2},u_{2}\right ) \right \} \\ I\left ( x_{2},1\right ) & =\min _{u_{2}}\left \{ x_{3}^{T}Qx_{3}+u_{2}^{T}Ru_{2}\right \} \end{align*}
But \(x_{3}=Ax_{2}+Bu_{2}\) from state equation, hence the above becomes\begin{align*} I\left ( x_{2},1\right ) & =\min _{u_{2}}\left \{ \left ( Ax_{2}+Bu_{2}\right ) ^{T}Q\left ( Ax_{2}+Bu_{2}\right ) +u_{2}^{T}Ru_{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} x_{2}+\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} x_{2}+\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) +u_{2}^{T}\left ( 1\right ) u_{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{2}\end{pmatrix} \right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{2}\end{pmatrix} \right ) +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ \begin{pmatrix} x_{11}-2x_{21}\\ u_{2}-x_{11}+3x_{21}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ u_{2}-x_{11}+3x_{21}\end{pmatrix} +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ \left ( x_{11}-2x_{21}\right ) \left ( u_{2}+x_{11}-x_{21}\right ) +\left ( 2u_{2}-x_{11}+4x_{21}\right ) \left ( u_{2}-x_{11}+3x_{21}\right ) +u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ 2u_{2}^{2}-2u_{2}x_{11}+8u_{2}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}+u_{2}^{2}\right \} \\ & =\min _{u_{2}}\left \{ 3u_{2}^{2}-2u_{2}x_{11}+8u_{2}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\right \} \end{align*}
Hence\begin{align*} \frac{\partial I\left ( x_{2},1\right ) }{\partial u_{2}} & =0\\ 0 & =6u_{2}-2x_{11}+8x_{21}\\ 0 & =6u_{2}+\begin{pmatrix} -2 & 8 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ 0 & =6u_{2}+\begin{pmatrix} -2 & 8 \end{pmatrix} x_{1}\\ u_{2}^{\ast } & =-\frac{1}{6}\begin{pmatrix} -2 & 8 \end{pmatrix} x_{1}\\ & =\begin{pmatrix} \frac{2}{6} & \frac{-8}{6}\end{pmatrix} x_{1} \end{align*}
Therefore\[ \fbox{$u_2^\ast =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ \fbox{$K\left ( 2\right ) =\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix} =$ $\begin{pmatrix} 0.33333 & -1.333\,3 \end{pmatrix} $}\] Which is the same as above using \(-\left ( B^{T}QB+R\right ) ^{-1}B^{T}QA\).
Now we find \(I^{\ast }\left ( x_{2},1\right ) \) by using \(u_{2}^{\ast }\) found above back in \(I\left ( x_{2},1\right ) \)\begin{align*} I^{\ast }\left ( x_{2},1\right ) & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \\ & \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{2}\right ) +u_{2}^{T}\left ( 1\right ) u_{2}\\ & \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\\ & \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) \\ & \\ & +\left ( \begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{1}{3} & \frac{-4}{3}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =\begin{pmatrix} x_{11}-2x_{21}\\ \frac{5}{3}x_{21}-\frac{2}{3}x_{11}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ \frac{5}{3}x_{21}-\frac{2}{3}x_{11}\end{pmatrix} \\ & +x_{11}\left ( \frac{1}{9}x_{11}-\frac{4}{9}x_{21}\right ) -x_{21}\left ( \frac{4}{9}x_{11}-\frac{16}{9}x_{21}\right ) \\ & \\ & =\left ( x_{11}-2x_{21}\right ) \left ( \frac{4}{3}x_{11}-\frac{7}{3}x_{21}\right ) +\left ( \frac{1}{3}x_{11}-\frac{4}{3}x_{21}\right ) \left ( \frac{2}{3}x_{11}-\frac{5}{3}x_{21}\right ) \\ & +x_{11}\left ( \frac{1}{9}x_{11}-\frac{4}{9}x_{21}\right ) -x_{21}\left ( \frac{4}{9}x_{11}-\frac{16}{9}x_{21}\right ) \\ & \\ & =\frac{5}{3}x_{11}^{2}-\frac{22}{3}x_{11}x_{21}+\frac{26}{3}x_{21}^{2} \end{align*}
We have to convert this to \(x_{1}^{T}x_{1}\) to be able to use it in the next stage since we need to apply the state equation to it. We see that\[ \frac{5}{3}x_{11}^{2}-\frac{22}{3}x_{11}x_{21}+\frac{26}{3}x_{21}^{2}=\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \] Solving gives \(\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix} =\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} \), hence\[ \fbox{$I^\ast \left ( x_2,1\right ) =x_2^T\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} x_2$}\] Now we find \(I\left ( x\left ( N-2\right ) ,2\right ) =I\left ( x\left ( 3-2\right ) ,2\right ) =I\left ( x_{1},2\right ) \)\begin{align*} I\left ( x_{1},2\right ) & =\min _{u_{1}}\left \{ J\left ( x_{1},u_{1}\right ) +I^{\ast }\left ( x_{2},1\right ) \right \} \\ & =\min _{u_{3}}\left \{ x_{2}^{T}Qx_{2}+u_{1}^{T}Ru_{1}+I^{\ast }\left ( x_{2},1\right ) \right \} \end{align*}
But \(x_{2}=Ax_{1}+Bu_{1}\) from state equation, hence the above becomes\begin{align*} I\left ( x_{1},2\right ) & =\min _{u_{3}}\left \{ \left ( Ax_{1}+Bu_{1}\right ) ^{T}Q\left ( Ax_{1}+Bu_{1}\right ) +u_{1}^{T}Ru_{1}+I^{\ast }\left ( x_{2},1\right ) \right \} \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) +u_{1}^{2}\\ & +x_{2}^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} x_{2} \end{align*}
Now we have to use the state equations in \(I^{\ast }\left ( x_{2},1\right ) \) to update the last term above, this is important, since everything should be at the same state\begin{align*} I\left ( x_{1},2\right ) & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & +\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) ^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}\right ) \\ & \\ & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{1}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{5}{3} & -\frac{22}{6}\\ -\frac{22}{6} & \frac{26}{3}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{1}\end{pmatrix} \right ) \\ & \\ & =3u_{1}^{2}-2u_{1}x_{11}+8u_{1}x_{21}+2x_{11}^{2}-10x_{11}x_{21}+14x_{21}^{2}\\ & \frac{26}{3}u_{1}^{2}-\frac{74}{3}u_{1}x_{11}+\frac{200}{3}u_{1}x_{21}+\frac{53}{3}x_{11}^{2}-\frac{286}{3}x_{11}x_{21}+\frac{386}{3}x_{21}^{2}\\ & \\ & =\frac{35}{3}u_{1}^{2}-\frac{80}{3}u_{1}x_{11}+\frac{224}{3}u_{1}x_{21}+\frac{59}{3}x_{11}^{2}-\frac{316}{3}x_{11}x_{21}+\frac{428}{3}x_{21}^{2} \end{align*}
Now we take derivative to find optimal \(u_{1}^{\ast }\)\begin{align*} \frac{\partial I\left ( x_{1},2\right ) }{\partial u_{1}} & =0\\ 0 & =\frac{70}{3}u_{1}-\frac{80}{3}x_{11}+\frac{224}{3}x_{21}\\ u_{1}^{\ast } & =\frac{3}{70}\left ( \frac{80}{3}x_{11}-\frac{224}{3}x_{21}\right ) \\ & =\frac{8}{7}x_{11}-\frac{16}{5}x_{21}\\ & =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \end{align*}
Hence \[ \fbox{$u_1^\ast =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ K\left ( 1\right ) =\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix} =\begin{pmatrix} 1.1429 & -3.2 \end{pmatrix} \] Therefore, we find \(I^{\ast }\left ( x_{1},2\right ) \) using \(u_{1}^{\ast }\)\[ I^{\ast }\left ( x_{1},2\right ) =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}^{\ast }\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{1}^{\ast }\right ) +u_{1}^{\ast T}\left ( 1\right ) u_{1}^{\ast }\] But \(u_{1}^{\ast }=\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \), hence the above becomes\begin{align*} I^{\ast }\left ( x_{1},2\right ) & =\begin{pmatrix} x_{11}-2x_{21}\\ \frac{1}{7}x_{11}-\frac{1}{5}x_{21}\end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} x_{11}-2x_{21}\\ \frac{1}{7}x_{11}-\frac{1}{5}x_{21}\end{pmatrix} +\left ( \begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{8}{7} & -\frac{16}{5}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =\left ( x_{11}-2x_{21}\right ) \left ( \frac{15}{7}x_{11}-\frac{21}{5}x_{21}\right ) -\left ( \frac{1}{5}x_{21}-\frac{1}{7}x_{11}\right ) \left ( \frac{9}{7}x_{11}-\frac{12}{5}x_{21}\right ) \\ & +x_{11}\left ( \frac{64}{49}x_{11}-\frac{128}{35}x_{21}\right ) -x_{21}\left ( \frac{128}{35}x_{11}-\frac{256}{25}x_{21}\right ) \\ & \\ & =\frac{178}{49}x_{11}^{2}-\frac{82}{5}x_{11}x_{21}+\frac{478}{25}x_{21}^{2} \end{align*}
We have to write the above as \(x^{T}Cx\) in order to update the state the next stage. As before, we solve for \(C\) from\begin{align*} \frac{178}{49}x_{11}^{2}-\frac{82}{5}x_{11}x_{21}+\frac{478}{25}x_{21}^{2} & =\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} c_{11} & c_{12}\\ c_{12} & c_{22}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \\ & =c_{11}x_{11}^{2}+2c_{12}x_{11}x_{21}c_{22}x_{21}^{2} \end{align*}
Hence \(c_{11}=\frac{178}{49},c_{22}=\frac{478}{25},c_{12}=-\frac{82}{10}\), therefore\[ I^{\ast }\left ( x_{1},2\right ) =\begin{pmatrix} x_{11} & x_{21}\end{pmatrix}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} \] Now we back one more final step to find \(K\left ( 0\right ) \). We find \(I\left ( x\left ( N-3\right ) ,3\right ) =I\left ( x\left ( 0\right ) ,3\right ) \)\begin{align*} I\left ( x_{0},3\right ) & =\min _{u0}\left \{ J\left ( x_{0},u_{0}\right ) +I^{\ast }\left ( x_{1},2\right ) \right \} \\ & =\min _{u_{0}}\left \{ x_{1}^{T}Qx_{1}+u_{0}^{T}Ru_{0}+I^{\ast }\left ( x_{1},2\right ) \right \} \end{align*}
But \(x_{1}=Ax_{0}+Bu_{0}\) from state equation, hence the above becomes\begin{align*} I\left ( x_{0},3\right ) & =\min _{u_{0}}\left \{ \left ( Ax_{0}+Bu_{0}\right ) ^{T}Q\left ( Ax_{0}+Bu_{0}\right ) +u_{0}^{T}Ru_{0}+I^{\ast }\left ( x_{1},2\right ) \right \} \\ & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{0}\right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ 1 \end{pmatrix} u_{0}\right ) +u_{0}^{2}\\ & +x_{1}^{T}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix} x_{1} \end{align*}
Now we have to use the state equations in \(I^{\ast }\left ( x_{1},2\right ) \) to update the last term above, this is important, since everything should be at the same state\begin{align*} I\left ( x_{0},3\right ) & =\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) +u_{0}^{2}\\ & +\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) ^{T}\begin{pmatrix} \frac{178}{49} & \frac{82}{10}\\ \frac{82}{10} & \frac{478}{25}\end{pmatrix} \left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} +\begin{pmatrix} 0\\ u_{0}\end{pmatrix} \right ) \\ & \\ & =\left ( x_{11}-2x_{21}\right ) \left ( u_{0}+x_{11}-x_{21}\right ) +u_{0}^{2}+\left ( 2u_{0}-x_{11}+4x_{21}\right ) \left ( u_{0}-x_{11}+3x_{21}\right ) \\ & +\left ( x_{11}-2x_{21}\right ) \left ( \frac{41}{5}u_{0}-\frac{1119}{245}x_{11}+\frac{4247}{245}x_{21}\right ) +\\ & \left ( \frac{478}{25}u_{0}-\frac{273}{25}x_{11}+\frac{1024}{25}x_{21}\right ) \left ( u_{0}-x_{11}+3x_{21}\right ) \\ & \\ & =\frac{553}{25}u_{0}^{2}-\frac{596}{25}u_{0}x_{11}+\frac{2248}{25}u_{0}x_{21}+\frac{10\,232}{1225}x_{11}^{2}-\frac{70\,132}{1225}x_{11}x_{21}+\frac{125\,208}{1225}x_{21}^{2} \end{align*}
Now we take derivative to find optimal \(u_{0}^{\ast }\)\begin{align*} \frac{\partial I\left ( x_{0},3\right ) }{\partial u_{0}} & =0\\ 0 & =2\frac{553}{25}u_{0}-\frac{596}{25}x_{11}+\frac{2248}{25}x_{21} \end{align*}
Hence\[ u_{0}^{\ast }=\frac{596}{603}x_{11}-\frac{2248}{603}x_{21}\] Therefore\[ \fbox{$u_0^\ast =\begin{pmatrix} \frac{596}{603} & -\frac{2248}{603}\end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} =\begin{pmatrix} 0.98839 & -3.728 \end{pmatrix}\begin{pmatrix} x_{11}\\ x_{21}\end{pmatrix} $}\] And \[ K\left ( 0\right ) =\begin{pmatrix} 0.988\,39 & -3.728 \end{pmatrix} \] Matlab dlqr gives a slightly different result and the signs are switched. This needs to be looked at it more. Let us verify \(K\left ( 1\right ) \) using the Bellman dynamic equations we derived earlier which is\begin{align} K\left ( 1\right ) & =-\left ( B^{T}QB+R+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) B\right ) ^{-1}\nonumber \\ & \left ( B^{T}QA+B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) A\right ) \tag{10} \end{align}
Let \(\Delta =B^{T}\left ( A^{T}Q\left [ Q^{-1}-B\left ( B^{T}QB+R\right ) ^{-1}B^{T}\right ] QA\right ) ,\) hence\[ K\left ( 1\right ) =-\left ( B^{T}QB+R+\Delta B\right ) ^{-1}\left ( B^{T}QA+\Delta A\right ) \] We already found \(K\left ( 1\right ) =\begin{pmatrix} 1.1429 & -3.2 \end{pmatrix} \) using the direct method. Just need to verify using the dynamic equations found.\begin{align*} \Delta & =\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\left ( \begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} \left [ \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix} ^{-1}-\begin{pmatrix} 0\\ 1 \end{pmatrix} \left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1\right ) ^{-1}\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\right ] \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \right ) \\ & =\begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 1 & -1\\ -1 & 4 \end{pmatrix}\begin{pmatrix} \frac{2}{3} & -\frac{1}{3}\\ -\frac{1}{3} & \frac{1}{3}\end{pmatrix}\begin{pmatrix} 1 & -1\\ -1 & 4 \end{pmatrix} \\ & =\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix} \end{align*}
Hence (10) becomes\begin{align*} K\left ( 1\right ) & =-\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} +1+\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix}\begin{pmatrix} 0\\ 1 \end{pmatrix} \right ) ^{-1}\left ( \begin{pmatrix} 0\\ 1 \end{pmatrix} ^{T}\begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} +\begin{pmatrix} -\frac{11}{3} & \frac{26}{3}\end{pmatrix}\begin{pmatrix} 1 & -2\\ -1 & 3 \end{pmatrix} \right ) \\ & =-\begin{pmatrix} -\frac{8}{7} & \frac{16}{5}\end{pmatrix} \\ & =\begin{pmatrix} 1.142\,9 & -3.2 \end{pmatrix} \end{align*}
Which matches the direct method used above. Similar results are obtained using Matlab. Matlab has the signs reversed, this needs to be investigated to find out why.
In Mathematica:
Next we will talk about variations of dynamic programming. Floor and ceiling. We might want to optimize for maximum of some variable. For economy, this could be ceiling of unemployment. We can modify the dynamic equations to handle these problems. Floor and ceiling violate the additive process we used in D.P. but we can still manage to use the dynamic equations with some modifications.
Today’s lecture is on variations of dynamic programming. Many integer programming problem can be cast as D.P. The emphasis will be heuristics more than proofs.
One such variation is when the objective function is in terms of the floor or ceiling of variable. Another variation is steady state (this is when the number of stages becomes very large and goes to infinity).
Floor and ceiling To begin, say that \(x_{k}\) is scalar. We are interested in the floor of \(x_{k}\). This is \(\min _{k}x_{k}\). We can also talk about ceiling. This is \(\max _{k}x\left ( k\right ) \). More formally
\[ x_{k+1}=f\left ( x_{k},u_{k}\right ) \]
Where \(u\left ( k\right ) \in \Omega \). We introduce a monitoring function \(g\left ( x\left ( k\right ) \right ) \). Therefore, the floor problem can be stated as
\[ \max _{u_{k}\in \Omega _{k}}\min _{k=1\cdots N}g\left ( x_{k}\right ) \]
For example, the ceiling problem could be to minimize the maximum of unemployment, stated as
\[ \min _{u_{k}\in \Omega _{k}}\max _{k=1\cdots N}g\left ( x_{k}\right ) \]
Modeling our D.P. analysis Assume we are doing the floor problem now. Then we write
\[ I\left ( x_{N-1},1\right ) =\max _{u_{N-1}\in \Omega _{N-1}}g\left ( x_{N}\right ) \]
And for the ceiling
\[ I\left ( x_{N-1},1\right ) =\min _{u_{N-1}\in \Omega _{N-1}}g\left ( x_{N}\right ) \]
From now on, we continue with the ceiling problem. For the next stage we obtain
\[ I\left ( x_{N-2},2\right ) =\min _{u_{N-2}\in \Omega _{N-2}}\left \{ \max \left \{ g\left ( x_{N}\right ) ,I^{\ast }\left ( x_{N-1},1\right ) \right \} \right \} \]
And the recursion formula becomes
\[ I\left ( x_{L},N-L\right ) =\min _{u_{L}\in \Omega _{L}}\left \{ \max \left \{ g\left ( x_{L+1}\right ) ,I^{\ast }\left ( x_{L+1},N-L-1\right ) \right \} \right \} \]
When applying the above, we see one difference from previous examples of D.P., now we have \(\Omega _{L}\) which might depends on \(u_{L-1},u_{L-2},\cdots \). Now we will go over one example. A simplified economy model example. Let \(N\) be the years of horizon planning. Let \(y_{k}\) be the national income for the \(k\) year. Let \(c_{k}\) be the consumer expenditure. Let \(I_{k}\) be the business investment. And let \(u_{k}\) be the government expenditure. The constraint is
\[ \sum _{k=0}^{N-1}u_{k}\leq U \]
Where \(U\) is the budget given and \(u_{k}\geq 0\). Hence in a given year we have
\begin{align} y_{k} & =c_{k}+I_{k}+u_{k-1}\tag{1}\\ c_{k} & =\alpha y_{k-1}\tag{2} \end{align}
Where \(\alpha \) is propensity factor to consume. \begin{equation} I_{k}=\beta \left ( c_{k}-c_{k-1}\right ) \tag{3} \end{equation} Reader Eliminate \(c_{k},I_{k}\) from (1,2,3) to obtain\begin{equation} y_{k}=\left ( 1+\beta \right ) \alpha y_{k-1}-\alpha \beta y_{k-2}+u_{k-1}\tag{4} \end{equation} Our goal is to control the output \(y_{k}\) using the input \(u_{k}\).
Reader Let \(z_{k}=y_{k}-y_{k-1}\)
Therefore, now \[ y_{k-2}=y_{k-1}-z_{k-1}\] Substituting the above in (4) gives\begin{align*} y_{k} & =\left ( 1+\beta \right ) \alpha y_{k-1}-\alpha \beta \left ( y_{k-1}-z_{k-1}\right ) +u_{k-1}\\ & =\alpha y_{k-1}+\alpha \beta z_{k-1}+u_{k-1} \end{align*}
We have state equation\[ y_{k+1}=\alpha y_{k}+\alpha \beta z_{k}+u_{k}\] Next we want state equation for \(z_{k}\)
Reader\[ z_{k+1}=\left ( \alpha -1\right ) y_{k}+\alpha \beta z_{k}+u_{k}\] Now define state \(x_{1}\left ( k\right ) =y_{k}\) and \(x_{2}\left ( k\right ) =z_{k}\), hence the state equation in matrix form becomes\[ x_{k+1}=\begin{pmatrix} \alpha & \alpha \beta \\ \alpha -1 & \alpha \beta \end{pmatrix} x_{k}+\begin{pmatrix} 1\\ 1 \end{pmatrix} u_{k}\] We want to maximize the floor of the national income \(y_{k}\) using D.P. To illustrate two stages, i.e. \(N=2,\) let \(U=1\) dollar, and let \(\alpha =\beta =\frac{1}{2}\) Hence\[\begin{pmatrix} x_{1}\left ( k+1\right ) \\ x_{2}\left ( k+1\right ) \end{pmatrix} =\begin{pmatrix} \frac{1}{2} & \frac{1}{4}\\ \frac{-1}{2} & \frac{1}{4}\end{pmatrix}\begin{pmatrix} x_{1}\left ( k\right ) \\ x_{2}\left ( k\right ) \end{pmatrix} +\begin{pmatrix} 1\\ 1 \end{pmatrix} u_{k}\] Monitor function is \(g\left ( x\left ( k\right ) \right ) =x_{1}\left ( k\right ) \)
Begin with one stage to go
\begin{equation} I\left ( x\left ( 1\right ) ,1\right ) =\max _{u\left ( 1\right ) \in \Omega _{1}}x_{1}\left ( 2\right ) \tag{5} \end{equation}
We know \(u\left ( 1\right ) \leq 1-u\left ( 0\right ) \)
Hence (5) becomes
\[ I\left ( x\left ( 1\right ) ,1\right ) =\max _{u\left ( 1\right ) \in \left [ 0\cdots 1-u\left ( 0\right ) \right ] }\left ( \frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +u\left ( 1\right ) \right ) \]
The minimizer is \(u^{\ast }\left ( 1\right ) =1-u\left ( 0\right ) \), therefore
\[ I^{\ast }\left ( x\left ( 1\right ) ,1\right ) =\frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +1-u\left ( 0\right ) \]
Now we go back one more step
\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ g\left ( x\left ( 1\right ) ,I\left ( x\left ( 1\right ) ,1\right ) \right ) \right \} \]
Where \(\Omega _{0}=\left [ 0\ldots 1\right ] \) since we have one dollar to start with. Hence
\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ x_{1}\left ( 1\right ) ,\frac{1}{2}x_{1}\left ( 1\right ) +\frac{1}{4}x_{2}\left ( 1\right ) +1-u\left ( 0\right ) \right \} \]
Back everything to get an equation in \(u\left ( 0\right ) \)\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ \frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) ,\frac{1}{2}\left ( \frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \right ) +\frac{1}{4}\left ( -\frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \right ) 1-u\left ( 0\right ) \right \} \]
This reduces to
\[ I\left ( x\left ( 0\right ) ,2\right ) =\max _{u\left ( 0\right ) \in \left [ 0\ldots 1\right ] }\min \left \{ A+u\left ( 0\right ) ,B\right \} \]
Where \begin{align*} A & =\frac{1}{2}x_{1}\left ( 0\right ) +\frac{1}{4}x_{2}\left ( 0\right ) +u\left ( 0\right ) \\ B & =\frac{1}{8}x\left ( 0\right ) +\frac{3}{16}x_{2}\left ( 0\right ) +1-\frac{1}{4}u\left ( 0\right ) \end{align*}
Hence the function to minimize is \[ F\left ( u\left ( 0\right ) \right ) =\min \left \{ A+u\left ( 0\right ) ,B\right \} \]
Consider case when \(A\geq B\) and case \(A<B\).
Reader work out the different cases.
We will start today with one more dynamic problem which will be useful for HW problem. Then we will start on steady state. Example is a floor problem. \begin{align*} x_{1}\left ( k+1\right ) & =\min \left \{ x_{1}\left ( k\right ) ,x_{2}\left ( k\right ) \right \} +u\left ( k\right ) \\ x_{2}\left ( k+1\right ) & =x_{1}\left ( k\right ) u\left ( k\right ) \end{align*}
And initial state is \begin{align*} x_{1}\left ( 0\right ) & =1\\ x_{2}\left ( 0\right ) & =-1 \end{align*}
And \[ J=\min _{k=1,2}x_{2}\left ( k\right ) \] With \(\left \vert u\left ( k\right ) \right \vert \leq M\). One step to go is\begin{align*} I\left ( x\left ( 1\right ) ,1\right ) & =\max _{u\left ( 1\right ) \in \Omega _{1}}\left \{ \min J\right \} \\ & =\max _{u\left ( 1\right ) \in \Omega _{1}}x_{2}\left ( 2\right ) \\ & =\max _{\left \vert u\left ( 1\right ) \right \vert \leq M}x_{1}\left ( 1\right ) u\left ( 1\right ) \end{align*}
Hence \(u^{\ast }=M\) sign\(\left ( x_{1}\left ( x\right ) \right ) \), therefore\[ I^{\ast }\left ( x\left ( 1\right ) ,1\right ) =M\text{ }abs\left ( x_{1}\left ( 1\right ) \right ) \] With two steps\begin{align*} I\left ( x\left ( 0\right ) ,2\right ) & =\max _{u\left ( 0\right ) \in \Omega _{0}}\left \{ \min \left \{ J\left ( x\left ( 1\right ) ,I\left ( x\left ( 1\right ) ,1\right ) \right ) \right \} \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ x_{2}\left ( 1\right ) ,M\left \vert x_{1}\left ( 1\right ) \right \vert \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ x_{1}\left ( 0\right ) u\left ( 0\right ) ,M\left \vert \min \left \{ x_{1}\left ( 0\right ) ,x_{2}\left ( 0\right ) \right \} +u\left ( 0\right ) \right \vert \right \} \\ & =\max _{u\left ( 0\right ) \in \Omega _{0}}\min \left \{ u\left ( 0\right ) ,M\left \vert u\left ( 0\right ) -1\right \vert \right \} \end{align*}
Where \(F\left ( u\right ) =\min \left \{ u\left ( 0\right ) ,M\left \vert u\left ( 0\right ) -1\right \vert \right \} \). Consider the case \(0<M\leq \sqrt{2}\) and case \(M>\sqrt{2}\). See key solution for HW 7 for the solution.
We now start talking about steady state. Notion of functional equations., then iterative solution to steady state problem, then Riccati equation. We begin with\[ x_{k+1}=f\left ( x_{k},u_{k}\right ) \] With a constraint on \(u_{k}\) given by \(u_{k}\in \Omega _{k}\). The branch cost is \[ J=\Psi \left ( x_{N}\right ) +\sum _{k=0}^{N-1}J_{k}\left ( x_{k},u_{k}\right ) \] Then let \(N\rightarrow \infty \). We are hoping to get \(J_{N}\) and obtain \(u_{N}^{\ast }=u_{0}^{\ast },u_{1}^{\ast },\cdots ,u_{N-1}^{\ast }\) optimal controls at each stage. Notice that when \(N\) changes, then the whole sequence \(u_{N}^{\ast }\) changes also, and not just one term. We now ask, does \(J_{N}\) converge? does \(u_{N}^{\ast }\) converge? To make sense of the above, we remove the terminal cost \(\Psi \left ( x_{N}\right ) \) from this analysis. We also want \(\Omega _{k}\) to be fixed for any \(N\). This means we have the same constraints all the time.
If steady state solution exist, then it satisfies\begin{equation} I\left ( x\right ) =\min _{u\in \Omega }\left \{ J\left ( x,u\right ) +I\left ( f\left ( x,u\right ) \right ) \right \} \tag{1} \end{equation} This is a functional equation. Solving it means finding \(u^{\ast }\). The solution \(u^{\ast }\) in (1) is a function of \(x\). i.e.. \(u\) at state \(x\) is a feedback. Say \(u^{\ast }=\sigma \left ( x\right ) \). Substitute this in (1) gives\[ I\left ( x\right ) =\min _{u\in \Omega }\left \{ J\left ( x,\sigma \left ( x\right ) \right ) +I\left ( f\left ( x,\sigma \left ( x\right ) \right ) \right ) \right \} \] We do not know \(I\left ( x\right ) \) here. Before, we knew \(I\), but now we do not know \(I\). So we have a function space issue to find \(I\left ( x\right ) \).
Example \begin{align*} J\left ( x,u\right ) & =x^{2}+xu+u^{2}\\ f\left ( x,u\right ) & =xu+x+u\\ x\left ( k+1\right ) & =x\left ( k\right ) u\left ( k\right ) +x\left ( k\right ) +u\left ( k\right ) \end{align*}
Let \(u\) be free variable with no constraints. Hence (1) becomes\[ I\left ( x\right ) =\min _{u}\left \{ x^{2}+xu+u^{2}+I\left ( xu+x+u\right ) \right \} \]
This is a functional equation since \(I\left ( x\right ) \), appears on both sides of the equation. There are two famous methods to solve functional equations. The first is the iterative method. Begin with initial \(I_{o}\left ( x\right ) \), then \(I_{k+1}\left ( x\right ) =\min _{u\left ( k\right ) }\left \{ J\left ( x,u\right ) +I_{k}\left ( f\left ( x,u\right ) \right ) \right \} \). We get sequence of solutions of \(u\left ( k\right ) \) and \(I\left ( x\left ( k\right ) \right ) \) and check for convergence.
Example \begin{align*} J\left ( x,u\right ) & =u^{2}+\left ( x-u\right ) ^{2}\\ x\left ( k+1\right ) & =x\left ( k\right ) -u\left ( k\right ) \end{align*}
\(I_{0}\left ( x\right ) =0\), hence\[ I_{1}\left ( x\right ) =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}\right \} \] \(\frac{d}{du}=0\) gives \(u^{\ast }=\frac{x}{2}\), hence \(I_{1}^{\ast }\left ( x\right ) =\frac{x^{2}}{4}+\frac{x^{2}}{4}=\frac{x^{2}}{2}\). Next stage becomes\begin{align*} I_{2}\left ( x\right ) & =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}+I_{1}^{\ast }\right \} \\ & =\min _{u}\left \{ u^{2}+\left ( x-u\right ) ^{2}+\frac{\left ( x-u\right ) ^{2}}{2}\right \} \end{align*}
\(\frac{d}{du}=0\) gives \(u^{\ast }=\frac{3}{5}x\), hence \(I_{1}^{\ast }\left ( x\right ) =\frac{3}{5}x^{2}\).
Reader Continue this process. Does it converge? It will converge eventually leading to \(u^{\ast }\rightarrow kx\)
The special problem will be returned next Tuesday.
The plan for today: we have been talking about steady state. This lead to functional equations in D.P. so far, we talked about iterative solution. Today we will talk about closed form solution. Analogy between differential equations and functional equations. In differential equations, the iterative method is called Picard iterations method.
For linear state equations, we can get closed form solution for the functional equation. We start with a guess of the solution with a parameter to find. Now we will use our main example to illustrate this.\begin{align*} x_{k+1} & =x_{k}-u_{k}\\ J & =\sum u_{k}^{2}+\left ( x_{k+1}-u_{k}\right ) ^{2} \end{align*}
Notice the cross term with \(x\) and \(u\) in it. Now we guess a form for \(I\). Let \(I=px^{2}\) and then we try to find \(p\). First write \(J\) above so that all indices are the same with the help of the state equation. This will reduce the chance of error later on\begin{align*} J & =\sum u_{k}^{2}+\left ( \left ( x_{k}-u_{k}\right ) -u_{k}\right ) ^{2}\\ & =\sum u_{k}^{2}+x_{k}^{2}+4u_{k}^{2}-4x_{k}u_{k}\\ & =\sum x_{k}^{2}+5u_{k}^{2}-4x_{k}u_{k} \end{align*}
Consider\begin{align} I\left ( x\right ) & =\min _{u\in \Omega }\left \{ J\left ( x,u\right ) +I\left ( f\left ( x,u\right ) \right ) \right \} \nonumber \\ px^{2} & =\min _{u\in \Omega }\left \{ \left ( 5u^{2}+x^{2}-4xu\right ) +p\left ( x-u\right ) ^{2}\right \} \tag{1} \end{align}
\begin{align*} \frac{d\left ( 5u^{2}+x^{2}-4xu\right ) +p\left ( x-u\right ) ^{2}}{du} & =0\\ 0 & =2u-4\left ( x-2u\right ) -2p\left ( x-u\right ) \end{align*}
Solving gives \(u^{\ast }=\frac{2+p}{5+p}x\). Substitute back in (1)\[ px^{2}=\left ( 5u^{2}+x^{2}-4x\left ( \frac{2+p}{5+p}x\right ) \right ) +p\left ( x-\left ( \frac{2+p}{5+p}x\right ) \right ) ^{2}\] And obtain an equation in \(p\) and solve for \(p\). We find roots are \(p=0.302\) and \(p=-3.3\). If everything was done correct, there should be a positive root. Always pick the positive one. This is special case of LQR. In LQR there is no cross term between \(x,u\). While in the above there was. Reader For \(x\left ( 0\right ) =1\) find \(J^{\ast }\).
Example Consider \[ x\left ( k+1\right ) =x\left ( k\right ) +2u\left ( k\right ) \] With constraint \(u\left ( k\right ) \in \left [ -1,1\right ] \)\[ J=\sum _{k=0}^{\infty }e^{x\left ( k+1\right ) }\] Guess \(I=ae^{x}\) then\[ I\left ( x\right ) =\min _{u\in \left [ -1,1\right ] }\left ( e^{x+2u}+ae^{x+2u}\right ) \]
Reader \(u^{\ast }=-1\)
Therefore\[ ae^{x}=\left ( e^{x-2}+ae^{x-2}\right ) \] Solving gives \[ a=\frac{1}{e^{2}-1}>0 \] For LQR, the steady state is given by\begin{align*} x\left ( k+1\right ) & =Ax\left ( k\right ) +Bu\left ( k\right ) \\ J & =\sum _{k=0}^{\infty }x^{T}\left ( k+1\right ) Qx\left ( k+1\right ) +u^{T}\left ( k\right ) Ru\left ( k\right ) \end{align*}
Where \(Q,R\) are weight matrices and are positive definite symmetric. \(I\) should be quadratic in the state \(x\left ( k\right ) \). \[ I\left ( x\right ) =\min _{u}x^{T}Qx+u^{T}Ru \] Guess \(I=x^{T}Px\) and now solve for \(P\), this leads to Riccati matrix equation.\begin{equation} I\left ( x\right ) =\min _{u}\left \{ x^{T}Qx+u^{T}Ru+\left ( Ax+Bu\right ) ^{T}P\left ( Ax+Bu\right ) \right \} \tag{2} \end{equation} Taking gradient w.r.t. \(u\) and setting to zero, gives\begin{align*} 2Ru+2B^{T}PBu+2B^{T}PAx & =0\\ u^{\ast } & =-\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx \end{align*}
Back to (2) we find\begin{align*} x^{T}Px & =x^{T}Qx+\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) ^{T}R\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \\ & +\left ( Ax+B\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \right ) ^{T}P\left ( Ax+B\left ( -\left ( R+B^{T}PB\right ) ^{-1}B^{T}PAx\right ) \right ) \end{align*}
Solving to \(P\), we obtain the Riccati matrix equation\[ \boxed{ P=A^{T}PA-A^{T}PB\left ( R+B^{T}PB\right ) ^{-1}B^{T}PA+Q } \] In Matlab, use dare() to solve this for \(P\).
Remarks Are we sure the solution exist? i.e. does there exist positive definite \(P\) that satisfies the Riccati equation above? Yes, a solution exist if \(\left ( A,B\right ) \) is controllable system. Notice that the Riccati equation is not linear in \(P\). This is solved numerically. Consider the special case of LQR with one state and one input. Hence everything is scalar. We obtain\[ p=a^{2}p-apb\frac{1}{r+b^{2}p}bpa+q \] Solving for \(p\), show that there is solution \(p>0\). Assume \(b>0\) for controllability.
This is the last lecture. Final exam is next lecture. Review of special problem and results obtained by different reports. General approaches to solving the special problem included: Cluster analysis, noisy gradient and random search.
We talked about steady state. Quadratic regulator has no cross coupling terms between \(x\) and \(u\)\[ J=\sum _{k=0}^{\infty }x^{T}\left ( x+1\right ) Qx\left ( k+1\right ) +u^{T}Ru \] For general regulator, one can get a cross term as in \(\ J=ax^{2}+bxu+cu\) but we did not discuss this.
Test 3, will have 4 questions on dynamic programming. With dynamic programming, one can solve the problem using the Bellman equations or using the graphical method. If there are finite stages, and the state \(x\) is discrete \(x\left ( k+1\right ) =f\left ( x\left ( k\right ) ,u\left ( k\right ) \right ) \) and if \(u\) is discrete, then a graphical method can be used. If there are constraints, this will reduce the size of the tree more.
Course review
We can take an integer linear programming problem, which is hard to solve and treat it as continuous problem under special conditions and solve it much easier. We did not discuss non-linear programming and kuhn-Tucker conditions. But for many non-linear programming problem, it is possible to use the penalty method. There is also large scale linear programming, where sparsity becomes important. Also parallel programming become important for these problems. For dynamic programming, most of the books are on continuous time, and very few discusses discrete dynamic programming problems.
Final exam l.6546 — TeX4ht warning — “SaveEverypar’s: 3 at “begindocument and 2 “enddocument —