HW 4

Loading web-font TeX/Main/Regular

[next] [prev] [prev-tail] [tail] [up]

4.4 HW 4

  4.4.1 Problem 1
  4.4.2 Problem 2
  4.4.3 Source code listing
  4.4.4 HW 4 key solution

: PDF (letter size)
: PDF (legal size)

4.4.1 Problem 1

   4.4.1.1 part(a)
   4.4.1.2 part(b)
   4.4.1.3 part(c)
   4.4.1.4 Part(d)

Figure 4.1:problem 1 description

4.4.1.1 part(a)

Matlab was used to generate the contour plots. The plots generated are given below and the source code used is listed in the appendix.

Figure 4.2:Matlab output for part(a) problem 1, HW4

4.4.1.2 part(b)

Matlab 2015a was used to implement steepest descent algorithm. Listing is given in the appendix. The following is the outline of general algorithm expressed as pseudo code.

Figure 4.3:Steepest descent with ﬁxed step size search algorithm

4.4.1.3 part(c)

In all of the following results, the convergence to optimal was determined as follows: First a maximum number of iterations was used to guard against slow convergence or other unforeseen cases. This is a safety measure. It is always recommended to use in any iterative method. This number was set very high at one million iterations. If convergence was not reached by this count, the iterations stop.

The second check was the main criteria, which is to check that the norm of the gradient $\left | \nabla (J(u)) \right |$ has reached a minimum value. Since $\left | \nabla (J(u)) \right |$ is zero at the optimal point, this check is the most common one to use to stop the iterations. The norm was calculated after each step. When $\left | \nabla (J(u)) \right |$ became smaller than $0.01$ , the search stopped. The value $0.01$ was selected arbitrarily. All cases below used the same convergence criterion.

A third check was added to verify that the value of the objective function $J(u)$ did not increase after each step. If $J(u)$ increased the search stops, as this indicates the step size taken is too large and oscillation has started. This condition happened many times when using ﬁxed step size, but it did not happen with optimal step size.

The relevant Matlab code used to implement this convergence criteria is the following:

%check if we converged or not
if k>opt.MAX_ITER || gradientNormTol(k)<=opt.gradientNormTol ...
|| (k>1 && levelSets(k)>levelSets(k-1))% check for getting worst
keepRunning = false;
else
....
end

The result of these runs are given below in table form. For each starting point, the search path $u^k$ is plotted on top of the contour plot. Animation of each of these is also available when running the code. The path $u^k$ shows that search direction is along the gradient vector, which is perpendicular to the tangent line at each contour level.

Table 4.1:Starting point

$[8,12]$


$h$	# steps	comments

0.01	1087	Converged with no oscillation detected. Below are the last few values of $J(u)$ K>> levelSets(end-10:end) 40.000847560444 40.0002241269404 40.0000006868238 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) 0.122339282346846 0.0823426325071764 0.042343897716672 0.00234405713552924

0.1	129	Failed to converge. Oscillation started when near optimal point. Below are the last few values of $J(u)$ that shows this. K>> levelSets(end-10:end) 40.0906178557323 40.0146606611128 40.0906176333215 These are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) 1.0342875633952 2.51122413813222 1.03429217902894 2.51122268542765

1	14	Early termination as the objective function started to increase as the step size was large. Oscillation started early. Below are the last few values of $J(u)$ recorded that shows this. K>> levelSets(end-10:end) 43.8208310324077 45.023624369293 43.781716244717 45.006474191836 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) 18.2210845193641 16.4816791388241 18.1783873100515 16.4576741878144

Figure 4.4:Result for using step

$0.01$ starting from

$(8,12)$

Figure 4.5:Result for using step

$0.1$ starting from

$(8,12)$

Figure 4.6:Result for using step

$1$ starting from

$(8,12)$



Table 4.2: Starting point $[5,10]$

$h$	steps to converge	comments



$0.01$	1311	Search reached optimal point $(13,4)$ , but skipped over it and started to oscillate back and forth around the optimal point due to using large ﬁxed step size. Below are the last few values of $J(u)$ recorded showing this. K>> levelSets(end-10:end) ..... 40.000714475783 40.0002484312073 40.0007127154844 40.0002478317567 40.0007121302667 The above shows that $J(u)$ is oscillating around $J^\ast$ , while the $\|\nabla (J(u)) \|$ has not yet become small enough to stop. These are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 0.226174843552625 0.133516310474324 0.226094172083413 0.133583571337061 0.226067390166402 Even though this test used a small step size and the algorithm converged when starting from $(8,12)$ as shown in the earlier case, but this time it did not converge. This shows that the search is sensitive to the starting point when using ﬁxed step size. One way to correct this problem is to relax the convergence criteria.

$0.1$	123	Failed to converge. Oscillation detected near optimal point. Below are the last few values of $J(u)$ recorded showing this. K>> levelSets(end-10:end) ..... 40.1256594812986 40.0053368705834 40.1256594634014 40.0053368695271 40.1256594618631 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 3.06656767477006 0.61736163474876 3.06656750731774 0.617361766949031 3.06656749292543

$1$	19	Early termination due to the objective function starting to increase since the step size was too large. Oscillation started early in the search. Here are the last few values of $J(u)$ showing this K>> levelSets(end-10:end) ..... 43.0823019294829 45.7913265189839 43.0266791615351 45.7622114747819 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 16.1440020280613 17.487837406306 16.092991548592 17.4442963174089

Figure 4.7:Result for using step

$0.01$ starting from

$(5,10)$

Figure 4.8:Result for using step

$0.1$ starting from

$(5,10)$

Figure 4.9:Result for using step

$1$ starting from

$(5,10)$



Table 4.3: Starting point $[12,14]$

$h$	steps to converge	comments



$0.01$	1130	Search reached optimal point $(13,4)$ and did converge. No oscillation were detected. Here are the last few values of $J(u)$ recorded K>> levelSets(end-10:end) ..... 40.0034914479994 40.002020228495 40.0009489569455 40.0002776602642 40.0000063555764 The above shows that $J(u)$ did not oscillate and continued to become smaller with each step. These are the corresponding values of $\|\nabla (J(u)) \|$ showing it reached convergence criteria and stopped. K>> gradientNormTol(end-6:end) .... 0.167118334256662 0.127125180003955 0.0871288452215103 0.047130308356715 0.00713054850822947

$0.1$	131	Failed to converge due to oscillation Below are the last few values of $J(u)$ recorded showing that it has increased. K>> levelSets(end-10:end) ..... 40.1051079160718 40.0105348693244 40.1051060970453 40.0105346146167 40.1051057241206 The above shows that $J(u)$ started to oscillate near the optimal point since the step size was large. These are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 2.80005566566667 0.865917403257339 2.80004081985152 0.865928703656839 2.8000377762892

$1$	19	Early termination due to the objective function increasing since the step size was too large. Below are the last few values of $J(u)$ recorded showing this K>> levelSets(end-10:end) ..... 136.072742913828 147.365512785727 125.964493512448 133.478776121489 115.810171973447 120.277823711614 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 111.538416550055 76.4541018810368 98.2477444652928 70.7519791844584 85.8602921445108

Figure 4.10:Result for using step

$0.01$ starting from

$(12,14)$

Figure 4.11:Result for using step

$0.1$ starting from

$(12,14)$

Figure 4.12:Result for using step

$1$ starting from

$(12,14)$



Table 4.4: Starting point $[12,10]$

$h$	steps to converge	comments



$0.01$	691	Converged with no oscillation. Here are the last few values of $J(u)$ recorded conﬁrming this. K>> levelSets(end-10:end) ..... 40.0046068598544 40.0028871867126 40.0015674398797 40.0006476523458 40.0001278473181 40.0000080382134 Below are the corresponding values of $\|\nabla (J(u)) \|$ K>> gradientNormTol(end-6:end) .... 0.151971746241737 0.111977272332977 0.0719799883799201 0.0319808731053423 0.00801909420920947

$0.1$	87	Did not converge. Oscillation was detected. Below are the last values of $J(u)$ recorded conﬁrming this. K>> levelSets(end-10:end) ..... 40.0940178225724 40.0143577207974 40.0940127829831 40.0143567476265 40.0940114931914 Below are the corresponding values of $\|\nabla (J(u)) \|$ showing it is diverging. K>> gradientNormTol(end-6:end) .... 1.00986396810643 2.64564970050157 1.00989167493457 2.6456402389648

$1$	24	Did not converge. Oscillation was detected early in the search due to using large step size. Below are the last few values of $J(u)$ recorded conﬁrming this. K>> levelSets(end-10:end) ..... 45.2261295001543 43.5283233241446 45.2260318140989 43.5282741210766 45.2260091586802 These are the corresponding values of $\|\nabla (J(u)) \|$ showing it is diverging. K>> gradientNormTol(end-6:end) .... 16.7542019931462 17.5230111072761 16.7540613766743 17.5229596031784 16.7540287643191

Figure 4.13:Result for using step

$0.01$ starting from

$(12,10)$

Figure 4.14:Result for using step

$0.1$ starting from

$(12,10)$

Figure 4.15:Result for using step

$0.1$ starting from

$(12,10)$

4.4.1.4 Part(d)

When trying diﬀerent values of starting points, all with $u_{1}>1,u_{2}>0$ , the search did converge to $u^{\ast }=[7,-2]$ , but it also depended on where the search started from. When starting close to $u^{\ast }$ , for example, from $u^0=[6.5,1]$ the search did converge using ﬁxed step size of $h=0.01$ with no oscillation seen. Below shows this result

Figure 4.16:Converging to

$(7,-2)$ using step size

$0.01$ starting from

$(6.5,1)$

However, when starting from a point too far away from $(7,-2)$ , it did not converge to $(7,-2)$ , but instead converged to the second local minimum at $u^{\ast }=[13,4]$ as seen below. In this case the search started from $[20,20]$ .

If the starting point was relatively close to one local minimum than the other, the search will converge to nearest local minimum.

Figure 4.17:Missing

$u^{\ast }=[7,-2]$ when starting too far it. Starting from

$(20,20)$ using step size

$0.01$

In this problem there are two local minimums, one at $(7,-2)$ and the other at $(4,13)$ . It depends on the location of the starting point $u^0$ as to which $u^{\ast }$ the algorithm will converge to.

4.4.2 Problem 2

   4.4.2.1 Part(a)
   4.4.2.2 Part(b)
   4.4.2.3 Results

Figure 4.18:problem 2 description

Figure 4.19:3D view of

$J(u)$

4.4.2.1 Part(a)

The steepest descent algorithm used in the ﬁrst problem was modiﬁed to support an optimal step size. The following is the updated general algorithm expressed as pseudo code. The optimal step line search used was the standard golden section method. (Listing added to appendix).

Figure 4.20:Steepest descent, optimal step size algorithm

For $n=2$ , the Rosenbrock function is

$\begin{align*} J(u) & = 100 \left ( u_2-u_{1}^{2}\right )^{2} + \left (1-u_{1}\right ) ^{2}\\ \nabla J\left ( u\right ) & =\begin{bmatrix} \frac{\partial J}{\partial u_{1}}\\ \frac{\partial J}{\partial u_{2}}\end{bmatrix} =\begin{bmatrix} -400\left ( u_{2}-u_{1}^{2}\right ) u_{1}-2\left ( 1-u_{1}\right ) \\ 200\left ( u_{2}-u_{1}^{2}\right ) \end{bmatrix} \end{align*}$

For

$-2.5\leq u_{i} \leq 2.5$

The steepest algorithm was run on the above function. The following is the contour plot. These plots show the level set by repeated zooming around at $(1,1)$ , which is the location of the optimal point. The optimal value is at $u^{\ast }=(1,1)$ where $J^{\ast }=0$ .

pict

Figure 4.21:Contour

$J(u)$

Figure 4.22:Zooming on Contour

$J(u)$

pict

Figure 4.23:More zooming. Contour

$J(u)$

Figure 4.24:More zooming. Contour

$J(u)$

pict

Figure 4.25:More zooming on Contour

$J(u)$

Figure 4.26:More zooming on Contour

$J(u)$

In all of the results below, where ﬁxed step is compared to optimal step, the convergence criteria was the same. It was to stop the search when

$\| \nabla J(u) \| \leq 0.001$ The search started from diﬀerent locations. The ﬁrst observation was that when using optimal step, the search jumps very quickly into the small valley of the function moving towards

$u^{\ast }$ . This used one or two steps only. After getting close to the optimal point, the search became very slow moving towards

$u^{\ast }$ inside the valley because the optimal step size was becoming smaller and smaller.

The closer the search was to $u^{\ast }$ , the step size became smaller. Convergence was very slow at the end. The plot below shows the optimal step size used each time. Zooming in shows the zigzag pattern. This pattern was more clear when using small but ﬁxed step size. Below is an example using ﬁxed step size of $h=0.01$ showing the pattern inside the valley of the function.

Figure 4.27:Zoom view of search when inside valley, showing the small steps and zig-zag pattern

Here is a partial list of the level set values, starting from arbitrary point from one run using optimal step. It shows that in one step, $J(u)$ went down from $170$ to $0.00038$ , but after that the search became very slow and the optimal step size became smaller and the rate of reduction of $J(u)$ decreased.

K>> levelSets
170.581669649628
0.000381971009752197
0.000380732839496915
0.000379498903384167
0.000378228775184198
0.000376972670237551
0.000375564628332407
0.00037415586062171
....

Golden section line search was implemented with tolerance of $\sqrt (eps(\text{double}))$ and used for ﬁnding the optimal step size.

....
if opt.STEP_SIZE == -1    %are we using optimal step size?
   h = nma_golden_section(fLambda,currentPoint,...
                     s,0,1,sqrt(eps('double')));
else
   h = opt.STEP_SIZE; %we are using the fixed step size.
end
.....

The following plot shows how the optimal step size changed at each iteration in a typical part of the search, showing how the step size becomes smaller and smaller as the search approaches the optimal point $u^{\ast }$ . The plot to the right shows the path $u^k$ taken.

pict

Figure 4.28:Showing how optimal step size changes at each iteration during typical search

Figure 4.29:Typical search pattern using optimal step size from arbitrary starting point

To compare ﬁxed size step and optimal size $h$ , the search was started from the same point and the number of steps needed to converge was recorded.

In these runs, a maximum iteration limit was set at $10^6$ .

$(-2,0.8)$


step size	number of iterations to converge


optimal	4089

$0.1$	Did not converge within maximum iteration limit

$0.05$	Did not converge within maximum iteration limit, but stopped closer to $u^{\ast }$ than the above case using $h=0.1$

$0.01$	Did not converge within maximum iteration limit, but stopped closer to $u^{\ast }$ than the above case using $h=0.05$

Table 4.5:comparing optimal and ﬁxed step size. Starting from

$(-2,0.8)$

The following shows the path used in the above tests. The plots show that using ﬁxed size step leads to many zigzag steps being taken which slows the search and is not eﬃcient as using optimal step size.

Using ﬁxed size $h=0.1$ resulted in the search not making progress after some point due to oscillation and would be stuck in the middle of the valley.

Following is partial list of the values of $J(u)$ at each iteration using ﬁxed size $h$ , showing that the search ﬂuctuating between two levels as it gets closer to optimal value $u^{\ast }$ but it was not converging.

...
0.0125058920858913
0.0123566727077954
0.0125058573101063
0.0123566379524329
0.0125058226516176
0.0123566033142948
0.0125057881100252
0.0123565687929828
0.0125057536849328
0.0123565343880989
...

Search was terminated when oscillation is detected. Search stopped far away from $u^{\ast }$ when the ﬁxed step was large. As the ﬁxed step size decreased, the ﬁnal $u^{k}$ that was reached was closer to $u^{\ast }$ but did not converge to it within the maximum iteration limit as the case with optimal step size.

The optimal step size produced the best result. It converged to $u^{\ast }$ within the maximum iteration limit and the zigzag pattern was smaller.

pict

Figure 4.30:path of

$u^k$ using optimal step starting from

$(-2,0.8)$

Figure 4.31:path of

$u^k$ using ﬁxed step

$h=0.1$ starting from

$(-2,0.8)$

pict

Figure 4.32:

$u^k$ path, ﬁxed step

$h=0.05$ from

$(-2,0.8)$

Figure 4.33:

$u^k$ path, ﬁxed step

$h=0.01$ from

$(-2,0.8)$

$(-1.4,-2.2)$


step size	number of iterations to converge


optimal	537

$0.1$	Did not converge within maximum iteration limit

$0.05$	Did not converge within maximum iteration limit, but stopped closer to $u^{\ast }$ than the above case using $h=0.1$

$0.01$	Did not converge within maximum iteration limit, but stopped closer to $u^{\ast }$ than the above case using $h=0.05$

Table 4.6:comparing optimal and ﬁxed step size. Starting from

$(-1.4,-2.2)$

The following plots show the path used in the above tests. Similar observation is seen as with the last starting point. In conclusion: One should use an optimal step size even though the optimal step requires more CPU time.

pict

Figure 4.34:

$u^k$ path, optimal step from at

$(-1.4,-2)$

Figure 4.35:

$u^k$ path, ﬁxed step

$h=0.1$ from

$(-1.4,-2)$

pict

Figure 4.36:

$u^k$ path, ﬁxed step

$h=0.05$ from

$(-1.4,-2)$

Figure 4.37:

$u^k$ path, ﬁxed step

$h=0.01$ from

$(-1.4,-2.0)$

4.4.2.2 Part(b)

A program was written to automate the search for arbitrary $n$ . For example, for $n=3$

$\begin{align*} J\left ( u\right ) & =100\left ( u_{2}-u_{1}^{2}\right ) ^{2}+\left ( 1-u_{1}\right ) ^{2}+100\left ( u_{3}-u_{2}^{2}\right ) ^{2}+\left ( 1-u_{2}\right ) ^{2}\\ \nabla J\left ( u\right ) & =\begin{bmatrix} \frac{\partial J}{\partial u_{1}}\\ \frac{\partial J}{\partial u_{2}}\\ \frac{\partial J}{\partial u_{3}}\end{bmatrix} =\begin{bmatrix} -400\left ( u_{2}-u_{1}^{2}\right ) u_{1}-2\left ( 1-u_{1}\right ) \\ 200\left ( u_{2}-u_{1}^{2}\right ) -400\left ( u_{3}-u_{2}\right ) u_{2}-2\left ( 1-u_{2}\right ) \\ 200\left ( u_{3}-u_{2}^{2}\right ) \end{bmatrix} \end{align*}$

And for $n=4$

$\begin{align*} J\left ( u\right ) & =100\left ( u_{2}-u_{1}^{2}\right ) ^{2}+\left ( 1-u_{1}\right ) ^{2}+100\left ( u_{3}-u_{2}^{2}\right ) ^{2}+\left ( 1-u_{2}\right ) ^{2}+100\left ( u_{4}-u_{3}^{2}\right ) ^{2}+\left ( 1-u_{3}\right ) ^{2}\\ \nabla J\left ( u\right ) & =\begin{bmatrix} \frac{\partial J}{\partial u_{1}}\\ \frac{\partial J}{\partial u_{2}}\\ \frac{\partial J}{\partial u_{3}}\\ \frac{\partial J}{\partial u_{4}}\end{bmatrix} =\begin{bmatrix} -400\left ( u_{2}-u_{1}^{2}\right ) u_{1}-2\left ( 1-u_{1}\right ) \\ 200\left ( u_{2}-u_{1}^{2}\right ) -400\left ( u_{3}-u_{2}^{2}\right ) u_{2}-2\left ( 1-u_{2}\right ) \\ 200\left ( u_{3}-u_{2}^{2}\right ) -400\left ( u_{4}-u_{3}^{2}\right ) u_{3}-2\left ( 1-u_{3}\right ) \\ 200\left ( u_{4}-u_{3}^{2}\right ) \end{bmatrix} \end{align*}$

The pattern for $\nabla J(u)$ is now clear. Let $i$ be the row number of $\nabla J\left (u\right )$ , where $i=1\cdots N$ , then the following will generate the gradient vector for any $N$

$\nabla J\left ( u\right ) =\begin{bmatrix} \frac{\partial J}{\partial u_{i}}\\ \frac{\partial J}{\partial u_{i}}\\ \vdots \\ \frac{\partial J}{\partial u_{i}}\\ \frac{\partial J}{\partial u_{i}}\end{bmatrix} =\begin{bmatrix} -400\left ( u_{i+1}-u_{i}^{2}\right ) u_{i}-2\left ( 1-u_{i}\right ) \\ 200\left ( u_{i}-u_{i-1}^{2}\right ) -400\left ( u_{i+1}-u_{i}^{2}\right ) u_{i}-2\left ( 1-u_{i}\right ) \\ \vdots \\ 200\left ( u_{i}-u_{i-1}^{2}\right ) -400\left ( u_{i+1}-u_{i}^{2}\right ) u_{i}-2\left ( 1-u_{i}\right ) \\ 200\left ( u_{i}-u_{i-1}^{2}\right ) \end{bmatrix}$ The program implements the above to automatically generates

$\nabla J\left ( u\right )$ and

$J\left ( u\right )$ for any

$N$ , then perform the search using steepest descent as before. The function that evaluates

$J(u)$ is the following

And the function that evaluates $\nabla J(u)$ is the following

4.4.2.3 Results

Two runs were made. One using ﬁxed step size $h=0.01$ , and one using optimal step size. Both started from the same point $(-2,-2,\dots ,-2)$ . Each time $N$ was increased and the CPU time recorded. The same convergence criteria was used: $| \nabla J(u)| \leq 0.0001$ and a maximum iteration limit of $10^6$ .

Only the CPU time used for the steepest descent call was recorder.

A typical run is given below. An example of the command used for $N=8$ is

>> nma_HW4_problem_2_part_b([-2;-2;-2;-2;-2;-2;-2;-2])

CPU time 13.180029
successful completion. Converged before maximum iterations
Number of coordinates used 8
optimal point found is
1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9999

Number of steps used [13550]

The program was run in increments of $20$ up to $N=1000$ . Here is the ﬁnal result.

Figure 4.38:Comparing CPU time, optimal step and ﬁxed step

The ﬁxed step size $h=0.01$ was selected arbitrarily to compare against. Using ﬁxed step size almost always produced oscillation when the search was near the optimal point and the search would stop.

Using an optimal step size, the search took longer time, as can be seen from the above plot, but it was reliable in that it converged, but at a very slow rate when it was close to the optimal point.

Almost all of the CPU time used was in the line search when using optimal search. This additional computation is the main diﬀerence between the ﬁxed and optimal step size method.

In ﬁxed step, $|\nabla J(u)|$ was evaluated once at each step, while in optimal search, in addition to this computation, the function $J(u)$ itself was also evaluated repeated times at each step inside the golden section line search. However, even though the optimal line search took much more CPU time, it converged much better than the ﬁxed step size search did.

Using optimal line search produces much better convergence, at the cost of using much more CPU time.

The plot above shows that with ﬁxed step size, CPU time grows linearly with the $N$ while with optimal step size, the CPU time grew linearly but at a much larger slope, indicating it is more CPU expensive to use.

4.4.3 Source code listing

   4.4.3.1 steepest descent function
   4.4.3.2 golden section line search
   4.4.3.3 Problem 1 part a
   4.4.3.4 Problem 1 part b
   4.4.3.5 Problem 2 contour
   4.4.3.6 Problem 2 part a
   4.4.3.7 Problem 2 part b
   4.4.3.8 Problem 2 part b CPU time program