question: Consider the solution of \(A\mathbf{\vec{x}}=\mathbf{\vec{b}}+\mathbf{\vec{n}}\) where \(A\) is \(m\times n\) matrix, \(\mathbf{\vec{b}}\epsilon R^{m},\mathbf{\vec{x}}\epsilon R^{n}\) and \(\mathbf{\vec{n}}\) is vector of i.i.d. Gaussian \(N\left ( 0,\sigma ^{2}\mathbf{I}_{m}\right ) \) noise vector. (i.e. noise is white Gaussian noise). Determine the best solution \(\vec{x}\)
Answer
Let us refer to the observed output (which includes the noise \(\mathbf{\vec{n}}\)) as \(\mathbf{\vec{z}}\), hence we write \(\mathbf{\vec{z}}=\mathbf{\vec{b}}+\mathbf{\vec{n}}\) where \(\mathbf{\vec{b}}\) is the uncontaminated output (what the observed output would be if there is no noise).
Since the noise \(\mathbf{\vec{n}}\) is an additive noise to the output \(\mathbf{\vec{b}}\) of the system, the since the noise has zero mean, then the mean of \(\mathbf{\vec{z}}\) will be the same as the mean of \(\mathbf{\vec{b}}\). But \(\mathbf{\vec{b}}\) is a deterministic signal which does not change, hence its mean is its value, hence the mean of \(\mathbf{\vec{z}}\,\ \)is \(\mathbf{\vec{b}\,.}\)
Now, \(\mathbf{\vec{z}}\) is described by a probability density function PDF as follows ( \(\mathbf{\vec{z}}\) is in \(R^{m}\), hence it is \(m\) long vector)
\begin{align*} \Pr \left ( \mathbf{\vec{z};}\left \{ \mu \left ( \mathbf{\vec{z}}\right ) ,\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =\frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}\exp \left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-\mu \left ( \mathbf{\vec{z}}\right ) \right \Vert ^{2}\right ) \\ \Pr \left ( \mathbf{\vec{z};}\left \{ \mathbf{\vec{b},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =\frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}\exp \left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-\mathbf{\vec{b}}\right \Vert ^{2}\right ) \end{align*}
But since \(A\mathbf{\vec{x}}=\mathbf{\vec{b}}\), then the above can be written as
\[ \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) =\frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}\exp \left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \]
Since \(A\) is a constant matrix (system is assumed to be time-invariant), hence from the above we see that the expression gives the probability of observing \(\mathbf{\vec{z}}\) for a given \(\mathbf{\vec{x}}\). Hence the best estimate of \(\mathbf{\vec{x}}\) would be the one which maximizes this probability. Instead of maximizing the PDF directly, we maximize its natural logarithm (a mathematical convenience trick, no more).
Now find the natural logarithm of the above quantity, and find where the result is maximum.
\begin{align} \frac{\partial }{\partial \mathbf{\vec{x}}}\ln \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =\frac{\partial }{\partial \mathbf{\vec{x}}}\left ( \ln \frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}\exp \left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \right ) \nonumber \\ & =\nabla _{\mathbf{\vec{x}}}\left ( \ln \frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}+\ln \exp \left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \right ) \nonumber \\ & =\overset{0}{\overbrace{\nabla _{\mathbf{\vec{x}}}\left ( \ln \frac{1}{\left ( 2\pi \sigma ^{2}\right ) ^{\frac{m}{2}}}\right ) }}+\nabla _{\mathbf{\vec{x}}}\left ( -\frac{1}{2\sigma ^{2}}\left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \nonumber \\ & =-\frac{1}{2\sigma ^{2}}\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \tag{1} \end{align}
Start with \(m=n=1\), hence the above becomes1
\begin{align*} \frac{\partial }{\partial \mathbf{\vec{x}}}\ln \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =-\frac{1}{2\sigma ^{2}}\nabla _{x_{1}}\left ( z_{1}^{2}+a_{11}^{2}x_{1}^{2}-2z_{1}a_{11}x_{1}\right ) \\ & =-\frac{1}{2\sigma ^{2}}\left ( 2a_{11}^{2}x_{1}-2a_{11}z_{1}\right ) \\ & =-\frac{1}{\sigma ^{2}}\left ( a_{11}^{2}x_{1}-a_{11}z_{1}\right ) \end{align*}
Set the above to zero and solve for \(x_{1}\)
\begin{align*} 0 & =\left ( a_{11}x_{1}-z_{1}\right ) \\ x_{1} & =\frac{z_{1}}{a_{11}} \end{align*}
Hence \[ \fbox{$x_1=\frac{z_1}{a_11}$}\]
This matches the least squares solution \(x_{1}=\left ( A^{T}A\right ) ^{-1}A^{T}z_{1}\rightarrow x_{1}=\left ( a_{11}a_{11}\right ) ^{-1}a_{11}z_{1}=\frac{z_{1}}{a_{11}}\)
Now I need to do this for \(m=n=2,\) and assuming that \(var\left ( z_{1}\right ) =var\left ( z_{1}\right ) \). We can start from equation (1) above, shown again below
\begin{align*} \frac{\partial }{\partial \mathbf{\vec{x}}}\ln \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =-\frac{1}{2\sigma ^{2}}\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \\ -2\sigma ^{2}\frac{\partial }{\partial \mathbf{\vec{x}}}\ln \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \mathbf{\vec{z}}-A\mathbf{\vec{x}}\right \Vert ^{2}\right ) \\ & =\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \begin{pmatrix} z_{1}\\ z_{2}\end{pmatrix} -\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix}\begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \begin{pmatrix} z_{1}\\ z_{2}\end{pmatrix} -\begin{pmatrix} a_{11}x_{1}+a_{12}x_{2}\\ a_{21}x_{1}+a_{22}x_{2}\end{pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf{\vec{x}}}\left ( \left \Vert \begin{pmatrix} z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \\ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \end{pmatrix} \right \Vert _{2}^{2}\right ) \\ & =\nabla _{\mathbf{\vec{x}}}\left ( \left [ z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] ^{2}+\left [ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] ^{2}\right ) \end{align*}
Let \begin{align*} f\left ( x_{1},x_{2}\right ) & =\left [ z_{1}-\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] ^{2}+\left [ z_{2}-\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] ^{2}\\ & =\left [ z_{1}^{2}+\left ( a_{11}x_{1}+a_{12}x_{2}\right ) ^{2}-2z_{1}\left ( a_{11}x_{1}+a_{12}x_{2}\right ) \right ] +\left [ z_{2}^{2}+\left ( a_{21}x_{1}+a_{22}x_{2}\right ) ^{2}-2z_{2}\left ( a_{21}x_{1}+a_{22}x_{2}\right ) \right ] \\ & =\left [ z_{1}^{2}+a_{11}^{2}x_{1}^{2}+a_{12}^{2}x_{2}^{2}+2a_{11}a_{12}x_{1}x_{2}-2z_{1}a_{11}x_{1}-2z_{1}a_{12}x_{2}\right ] +\\ & \left [ z_{2}^{2}+a_{21}^{2}x_{1}^{2}+a_{22}^{2}x_{2}^{2}+2a_{21}a_{22}x_{1}x_{2}-2z_{2}a_{21}x_{1}-2z_{2}a_{22}x_{2}\right ] \end{align*}
Then
\[ \nabla _{\mathbf{\vec{x}}}\left ( f\left ( x_{1},x_{2}\right ) \right ) =\begin{pmatrix} \frac{\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{1}}\\ \frac{\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{2}}\end{pmatrix} \]
so \begin{align*} \frac{\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{1}} & =\left [ 0+2a_{11}^{2}x_{1}+0+2a_{11}a_{12}x_{2}-2z_{1}a_{11}-0\right ] +\left [ 0+2a_{21}^{2}x_{1}+0+2a_{21}a_{22}x_{2}-2z_{2}a_{21}-0\right ] \\ & =2a_{11}^{2}x_{1}+2a_{11}a_{12}x_{2}-2z_{1}a_{11}+2a_{21}^{2}x_{1}+2a_{21}a_{22}x_{2}-2z_{2}a_{21}\\ & =x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21} \end{align*}
and\begin{align*} \frac{\partial f\left ( x_{1},x_{2}\right ) }{\partial x_{2}} & =\left [ 0+0+2a_{12}^{2}x_{2}+2a_{11}a_{12}x_{1}-0-2z_{1}a_{12}\right ] +\left [ 0+0+2a_{22}^{2}x_{2}+2a_{21}a_{22}x_{1}-0-2z_{2}a_{22}\right ] \\ & =2a_{12}^{2}x_{2}+2a_{11}a_{12}x_{1}-2z_{1}a_{12}+2a_{22}^{2}x_{2}+2a_{21}a_{22}x_{1}-2z_{2}a_{22}\\ & =x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22} \end{align*}
Hence we obtain that
\begin{align*} \frac{\partial }{\partial \mathbf{\vec{x}}}\ln \Pr \left ( \mathbf{\vec{z};}\left \{ A\mathbf{\vec{x},}\sigma ^{2}\mathbf{I}_{m}\right \} \right ) & =-\frac{1}{2\sigma ^{2}}\nabla _{\mathbf{\vec{x}}}\left ( f\left ( x_{1},x_{2}\right ) \right ) \\ & =-\frac{1}{2\sigma ^{2}}\begin{pmatrix} x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21}\\ x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22}\end{pmatrix} =\begin{pmatrix} 0\\ 0 \end{pmatrix} \end{align*}
Hence \[ x_{1}\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) +x_{2}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) -2z_{1}a_{11}-2z_{2}a_{21}=0 \]
and \[ x_{1}\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) +x_{2}\left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -2z_{1}a_{12}-2z_{2}a_{22}=0 \]
so, solve for \(x_{1},x_{2}\). Write as \(Ax=b\) and solve:
\[\begin{pmatrix} \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) & \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \end{pmatrix}\begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} =\begin{pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end{pmatrix} \]
Solve for \begin{align*} \begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} & =\begin{pmatrix} \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) & \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \end{pmatrix} ^{-1}\begin{pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end{pmatrix} \\ & =\frac{\begin{pmatrix} \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) & -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \\ -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) & \left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \end{pmatrix}\begin{pmatrix} 2z_{1}a_{11}+2z_{2}a_{21}\\ 2z_{1}a_{12}+2z_{2}a_{22}\end{pmatrix} }{\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) }\\ & \\ & =\frac{\begin{pmatrix} \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) \left ( 2z_{1}a_{11}+2z_{2}a_{21}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2z_{1}a_{12}+2z_{2}a_{22}\right ) \\ -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2z_{1}a_{11}+2z_{2}a_{21}\right ) +\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2z_{1}a_{12}+2z_{2}a_{22}\right ) \end{pmatrix} }{\left ( 2a_{11}^{2}+2a_{21}^{2}\right ) \left ( 2a_{12}^{2}+2a_{22}^{2}\right ) -\left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) \left ( 2a_{11}a_{12}+2a_{21}a_{22}\right ) }\\ & \\ & =\frac{\begin{pmatrix} 4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}\\ 4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}\end{pmatrix} }{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & \\ & =\begin{pmatrix} \frac{4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ \frac{4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\end{pmatrix} \end{align*}
so
\begin{align*} x_{1} & =\frac{4a_{11}z_{1}a_{22}^{2}+4a_{12}^{2}a_{21}z_{2}-4a_{11}a_{12}z_{2}a_{22}-4z_{1}\allowbreak a_{12}a_{21}a_{22}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & =\left ( \frac{z_{1}a_{22}-a_{12}z_{2}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end{align*}
and
\begin{align*} x_{2} & =\frac{4z_{1}a_{12}a_{21}^{2}+4a_{11}^{2}z_{2}a_{22}-4a_{11}z_{1}a_{21}a_{22}-4a_{11}\allowbreak a_{12}a_{21}z_{2}}{4a_{11}^{2}a_{22}^{2}-8a_{11}a_{12}a_{21}a_{22}+4a_{12}^{2}a_{21}^{2}}\\ & =\left ( \frac{a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end{align*}
Hence
\begin{equation} \begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} =\begin{pmatrix} \left ( \frac{z_{1}a_{22}-a_{12}z_{2}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \\ \left ( \frac{a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\right ) \end{pmatrix} \tag{2} \end{equation}
is the least squares error. To validate
\begin{align} \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix}\begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} & =\begin{pmatrix} z_{1}\\ z_{2}\end{pmatrix} \nonumber \\\begin{pmatrix} x_{1}\\ x_{2}\end{pmatrix} & =\left ( \begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} ^{T}\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} \right ) ^{-1}\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} ^{T}\begin{pmatrix} z_{1}\\ z_{2}\end{pmatrix} \nonumber \\ & =\left ( \begin{pmatrix} a_{11} & a_{21}\\ a_{12} & a_{22}\end{pmatrix}\begin{pmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{pmatrix} \right ) ^{-1}\begin{pmatrix} a_{11} & a_{21}\\ a_{12} & a_{22}\end{pmatrix}\begin{pmatrix} z_{1}\\ z_{2}\end{pmatrix} \nonumber \\ & =\begin{pmatrix} a_{11}^{2}+a_{21}^{2} & a_{11}a_{12}+a_{21}a_{22}\\ a_{11}a_{12}+a_{21}a_{22} & a_{12}^{2}+a_{22}^{2}\end{pmatrix} ^{-1}\begin{pmatrix} a_{11}z_{1}+a_{21}z_{2}\\ z_{1}a_{12}+z_{2}a_{22}\end{pmatrix} \nonumber \\ & =\begin{pmatrix} \frac{\left ( a_{12}^{2}+a_{22}^{2}\right ) }{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}} & -\frac{a_{11}a_{12}+a_{21}a_{22}}{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}}\\ -\frac{a_{11}a_{12}+a_{21}a_{22}}{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}} & \frac{\left ( a_{11}^{2}+a_{21}^{2}\right ) }{\left ( a_{11}a_{22}-a_{12}a_{21}\right ) ^{2}}\end{pmatrix}\begin{pmatrix} a_{11}z_{1}+a_{21}z_{2}\\ z_{1}a_{12}+z_{2}a_{22}\end{pmatrix} \nonumber \\ & =\begin{pmatrix} \frac{\left ( z_{1}a_{22}-a_{12}z_{2}\right ) }{a_{11}a_{22}-a_{12}a_{21}}\\ \frac{a_{11}z_{2}-z_{1}a_{21}}{a_{11}a_{22}-a_{12}a_{21}}\end{pmatrix} \tag{3} \end{align}
Compare equations (2) and (3) above, they are the same. OK, confirmed.