Linear algebra refresher II

In this post, I will address a few topics that are considered in the realm of geometry, since they are concerned with angles and lengths. However, they also have a close relationship to linear algebra, and they are often discussed in a first course on linear algebra.

Contents

1 The dot product and angles.
2 Some properties of the norm.
3 Angles and Orthogonality.
4 Every subspace has an orthonormal basis.
5 Orthogonal Projection.

The dot product and angles.

If \(\bf u\) and \(\bf v\) are vectors in \(\mathbb R^n\), their dot product \({\bf u}\cdot{\bf v}\) is the number that is equal to \({\bf u}^{\top}{\bf v}\) (thinking of this as matrix multiplication). More explicitly, if \({\bf u} = (u_1,u_2,\ldots,u_n)\) and \({\bf v} = (v_1,v_2,\ldots,v_n)\) then

\({\bf u}\cdot{\bf v} = u_1v_1+u_2v_2+\ldots+u_nv_n = \sum_{i=1}^n u_iv_i.\)

The dot product satisfies some nice properties. Let \({\bf u}, {\bf v}, {\bf w}\) be vectors, and \(\alpha\) a scalar. Then the dot product is…

- symmetric: \({\bf u}\cdot{\bf v} = {\bf v}\cdot{\bf u}\).
- linear on each side — on the right side this means, \({\bf u}\cdot({\bf v}+{\bf w}) = {\bf u}\cdot{\bf v}+{\bf u}\cdot{\bf w}\) and also \({\bf u}\cdot(\alpha{\bf v}) = \alpha ({\bf u}\cdot{\bf v})\). On the left side it works the same.
- positive definite: the inequality \({\bf u}\cdot{\bf u} \geq 0\) is true, and it equals 0 only if \({\bf u} = \vec{\bf 0}\).

These three properties are all direct consequences of how multiplication of real numbers works, and how the dot product is defined. For example, the positive definite property holds because for a real number \(u_i\), it must be that \(u_i^2\geq 0\), and the only way to get 0 when adding non-negative numbers together is if each of the numbers is zero, each \(u_i^2=0\). And so, \(u_i=0\).

Example. Let \({\bf a} = (4, 1, 0)\), \({\bf b} = (1,-4,1)\), and \({\bf c} = (0, 2, 2)\). Then, since \({\bf a}\cdot{\bf b} = 4 – 4 = 0\), you can quickly see that

\({\bf a}\cdot(3{\bf b} – {\bf c}) = (-1){\bf a}\cdot{\bf c} = (-1)\begin{bmatrix}4\\ 1\\ 0\end{bmatrix}\cdot\begin{bmatrix}0\\ 2\\ 2\end{bmatrix} = -2.\)

Vector components and sides of a right triangle.

The length (also called the Euclidean norm, or \(\ell_2\)-norm) of a vector \(\bf u\) in \(\mathbb R^n\) is equal to \(\sqrt{{\bf u}\cdot{\bf u}}\). To help this make sense, if the vector is in the plane (\(\mathbb R^2\)), and say that \({\bf u} = (u_1, u_2)\), then by moving horizontally along the \(x\)-axis to get to \(u_1\), and then moving vertically to get to \(u_2\), you arrive at the end of the arrow representing \(\bf u\) (see the figure). The Pythagorean theorem says that the length of that arrow must be \(\sqrt{u_1^2 + u_2^2} = \sqrt{{\bf u}\cdot{\bf u}}\). When the vector is in \(\mathbb R^n\), you can do this “two coordinates at a time,” applying the Pythagorean theorem each time. ¹

Some properties of the norm.

For a scalar \(\alpha\), you get \(|\alpha{\bf u}| = |\alpha||{\bf u}|\). ²

An inequality that relates the dot product to the length of vectors is the Cauchy-Schwarz inequality. It says

\(|{\bf u}\cdot{\bf v}| \leq |{\bf u}||{\bf v}|\)

for any vectors \({\bf u}, {\bf v}\) in \(\mathbb R^n\). You can remember this inequality by remembering the relationship of the dot product to the angle \(\theta\) between two vectors, likely encountered in a first linear algebra course or in Calculus III:

\({\bf u}\cdot{\bf v} = |{\bf u}||{\bf v}|\cos\theta\).

Since \(-1\leq \cos\theta\leq 1\), this equation implies the Cauchy-Schwarz inequality. ³

In addition, the norm satisfies the triangle inequality. That is, for any \({\bf u}, {\bf v}\), it must be that \(|{\bf u}+{\bf v}| \leq |{\bf u}| + |{\bf v}|\).

Proof. If \({\bf u}\cdot{\bf v} \geq 0\) then \({\bf u}\cdot{\bf v} = |{\bf u}\cdot{\bf v}|\). And so, using the Cauchy-Schwarz inequality,

\(|{\bf u}+{\bf v}|^2 = ({\bf u}+{\bf v})\cdot ({\bf u}+{\bf v}) = |{\bf u}|^2+2({\bf u}\cdot{\bf v})+|{\bf v}|^2\)
\(\leq |{\bf u}|^2+2|{\bf u}||{\bf v}|+|{\bf v}|^2 = (|{\bf u}| + |{\bf v}|)^2.\)

Taking the square root of both sides gives the desired inequality.
If \({\bf u}\cdot{\bf v} < 0\) instead, then you don’t need to use \(|{\bf u}\cdot{\bf v}| \leq |{\bf u}||{\bf v}|\). A negative dot product gives

\(|{\bf u}+{\bf v}|^2 = ({\bf u}+{\bf v})\cdot ({\bf u}+{\bf v}) = |{\bf u}|^2+2({\bf u}\cdot{\bf v})+|{\bf v}|^2\)
\(\leq |{\bf u}|^2+|{\bf v}|^2 \leq |{\bf u}|^2+2|{\bf u}||{\bf v}|+|{\bf v}|^2 = (|{\bf u}| + |{\bf v}|)^2\). \(\blacksquare\)

The distance between two points \(\bf x = (x_1, x_2,\ldots, x_n)\) and \(\bf y = (y_1, y_2,\ldots, y_n)\) is defined to be the length of the vector beginning at one point and ending at the other. So, the distance equals \(|{\bf x} – {\bf y}|\).

Angles and Orthogonality.

In the section above, we saw that the angle between two vectors \(\bf u\) and \(\bf v\) is defined through the dot product. When \({\bf u}\cdot{\bf v}\) is positive, then the angle is acute (less than \(\pi/2\)), since those are the angles where the cosine is positive. When \({\bf u}\cdot{\bf v}\) is negative, then the angle is obtuse (more than \(\pi/2\)).

Vectors are called orthogonal if \({\bf u}\cdot{\bf v} = 0\) (so, the angle is \(\pi/2\)). Something nice happens if you have a basis where each pair of basis vectors is orthogonal, and each one has length one (an orthonormal basis):

Say that \(\{\bf{q_1, q_2,\ldots, q_k}\}\) is an orthonormal basis and that \({\bf x} = \alpha_1{\bf q_1}+\alpha_2{\bf q_2}+\ldots+\alpha_k{\bf q_k}\). Then \({\bf x}\cdot{\bf q_i} = \alpha_i\) for each \(i\).

Example. Let \(\mathsf Q = \{{\bf q_1}, {\bf q_2}, {\bf q_3}\}\), where

\({\bf q_1} = \begin{bmatrix}1/2\\ 1/2\\ 1/\sqrt{2}\end{bmatrix},\quad {\bf q_2}=\begin{bmatrix}-1/2\\ -1/2\\ 1/\sqrt{2}\end{bmatrix},\quad {\bf q_3}=\begin{bmatrix}1/\sqrt{2}\\ -1/\sqrt{2}\\ 0\end{bmatrix}\)

Then find \({\bf x} = \begin{bmatrix}1\\ 0\\ 1\end{bmatrix}\) as a linear combination of the vectors in \(\mathsf Q\).

Note that \(|{\bf q_1}| = |{\bf q_2}| = |{\bf q_3}| = 1\). Also, \({\bf q_1}\cdot{\bf q_2} = {\bf q_1}\cdot{\bf q_3} = {\bf q_2}\cdot{\bf q_3} = 0\). And so \(\mathsf Q\) is orthonormal. Thus, since
\({\bf x}\cdot{\bf q_1} = \dfrac{1+\sqrt{2}}{2},\ {\bf x}\cdot{\bf q_2} = \dfrac{-1+\sqrt{2}}{2},\ \textrm{and }{\bf x}\cdot{\bf q_3} = \dfrac{1}{\sqrt{2}}\)
it must be that \({\bf x} = \dfrac{1+\sqrt{2}}{2}{\bf q_1} + \dfrac{-1+\sqrt{2}}{2}{\bf q_2}+\dfrac{1}{\sqrt{2}}{\bf q_3}\).

Every subspace has an orthonormal basis.

If \(V\) is a (non-zero) subspace of \(\mathbb R^n\) and has a basis \(\{{\bf b_1},{\bf b_2},\ldots,{\bf b_k}\}\), where \(1\leq k\leq n\), then there is a process to find an orthonormal basis of \(V\), called the Gram-Schmidt process. It proceeds as follows, first getting a set \(\mathsf Q\) of vectors that are pairwise orthogonal.

The Gram-Schmidt process. First, set \({\bf q_1} = {\bf b_1}\). Now, instead of using \({\bf b_2}\) for the next vector, we subtract a vector from it in order to get a vector that is orthogonal to \({\bf q_1}\). More precisely, we define

\({\bf q_2} = {\bf b_2} – \left(\dfrac{{\bf b_2}\cdot{\bf q_1}}{{\bf q_1}\cdot{\bf q_1}}\right){\bf q_1}\).

The vector \(\left(\dfrac{{\bf b_2}\cdot{\bf q_1}}{{\bf q_1}\cdot{\bf q_1}}\right){\bf q_1}\) is called the projection of \(\bf b_2\) to \(\bf q_1\). So, to get \(\bf q_2\) you subtract the projection of \(\bf b_2\) to \(\bf q_1\) from the vector \(\bf b_2\). We can directly check that \({\bf q_1}\cdot{\bf q_2} = 0\):

\({\bf q_1}\cdot{\bf q_2} = {\bf q_1}\cdot{\bf b_2} – \left(\dfrac{{\bf b_2}\cdot{\bf q_1}}{{\bf q_1}\cdot{\bf q_1}}\right)({\bf q_1}\cdot{\bf q_1}) = 0\).

The vectors \(\bf q_1\) and \(\bf q_2\) will be elements of \(\mathsf Q\). Define the other elements, \({\bf q_3},\ldots,{\bf q_k}\), by starting with the corresponding \(\bf b_i\) vector and, for each \(j < i\), subtract its projection to \(\bf q_j\). For example,

\({\bf q_3} ={\bf b_3} – \left(\dfrac{\bf b_3\cdot q_1}{\bf q_1\cdot q_1}\right){\bf q_1} – \left(\dfrac{\bf b_3\cdot q_2}{\bf q_2\cdot q_2}\right){\bf q_2}\).

This gives a set \(\mathsf Q = \{{\bf q_1}, {\bf q_2}, \ldots, {\bf q_k}\}\) of pairwise orthogonal vectors (\({\bf q_i}\cdot{\bf q_j} = 0\) if \(i \ne j\)). ⁴ We know that \(\mathsf Q \subset \mathsf V\) since each of the vectors in \(\mathsf Q\) is defined as a linear combination of vectors known to be in \(\text{Span}({\bf b_1},\ldots,{\bf b_k})\).

Exercise. Explain why \(\{{\bf q_1}, {\bf q_2}, \ldots, {\bf q_k}\}\) is an independent set of vectors.

Finally, to get an orthonormal basis one needs to multiply each \(\bf q_i\) by the reciprocal of its length. So, define

\({\bf u_i} = \dfrac{1}{|{\bf q_i}|}{\bf q_i}\)

for each \(1\le i\le k\). Then \(\{{\bf u_1}, {\bf u_2}, \ldots, {\bf u_k}\}\) is an orthonormal basis of \(\mathsf V\).

We noticed before that, for the purpose of writing \(\bf x\) as a linear combination of basis vectors, it is very nice to have the basis be orthonormal (since, in that case, the components of \(\bf x\) are its dot products with the basis elements). This is quite useful when you want the vector in a given subspace that is closest to \(\bf x\).

Orthogonal Projection.

The goal here is the following. You have a vector \(\bf x\), and a subspace \(\mathsf V\), in \(\mathbb R^n\). Thinking of \(\bf x\) as a point in \(\mathbb R^n\), there is a point in \(\mathsf V\) that is closest to it. Call the vector of this point the orthogonal projection of \(\bf x\) to \(\mathsf V\), or \(\text{proj}_{\mathsf V}({\bf x})\) for short. (The name is because the vector that starts from \(\bf x\) and ends at \(\text{proj}_{\mathsf V}({\bf x})\) is orthogonal to every vector in \(\mathsf V\). ⁵) You want to be able to find \(\text{proj}_{\mathsf V}({\bf x})\) from \(\bf x\).

Say that we have an orthonormal basis \(\mathsf B_{ON}\) of \(\mathsf V\). Then:

(1) the set of vectors orthogonal to every vector of \(\mathsf V\) is a subspace of \(\mathbb R^n\) (it equals the null space of the matrix whose rows are the vectors in \(\mathsf B_{ON}\));
(2) if you take a basis of that orthogonal subspace and combine (take a union of) all of its vectors with those in \(\mathsf B_{ON}\), then this is a basis \(\mathsf B\) of \(\mathbb R^n\).

So, our vector \(\bf x\) can be written as a linear combination of vectors in \(\mathsf B\). If you take the part of the linear combination that only uses the basis vectors that came from \(\mathsf B_{ON}\), then this is the projection \(\text{proj}_{\mathsf V}({\bf x})\). In other words, say that \(\mathsf B_{ON} = \{{\bf u_1}, {\bf u_2}, \ldots, {\bf u_k}\}\) (remember, this is an orthonormal basis of \(\mathsf V\)) and that \(\mathsf B = \{{\bf u_1}, {\bf u_2}, \ldots, {\bf u_k}, {\bf b_{k+1}}, \ldots, {\bf b_n}\}\). Then, writing \(\bf x\) in this basis, say we get

\({\bf x} = \alpha_1{\bf u_1}+\ldots + \alpha_k{\bf u_k} + \alpha_{k+1}{\bf b_{k+1}} + \ldots + \alpha_n{\bf b_n}\).

It must be that \(\text{proj}_{\mathsf V}({\bf x}) = \alpha_1{\bf u_1}+\ldots + \alpha_k{\bf u_k}\). Wonderfully, we don’t even have to worry about the other vectors \({\bf b_{k+1}},\ldots,{\bf b_n}\), or their scalar coefficients, because:

\(\alpha_1 = {\bf u_1}\cdot{\bf x};\qquad \alpha_2 = {\bf u_2}\cdot{\bf x};\qquad\ldots\qquad \alpha_k = {\bf u_k}\cdot{\bf x}\).

Example. Let \({\bf a} = (4,2,0)\) and \({\bf b} = (1,-1,1)\). The subspace \(\text{Span}({\bf a}, {\bf b})\) is a plane in \(\mathbb R^3\). Find the orthogonal projection of \({\bf x} = (0, 8, 4)\) to this plane.

First, find an orthonormal basis for the plane. Let \({\bf q_1}={\bf b}\) and, following Gram-Schmidt, define

\({\bf q_2} = {\bf a} – \left(\dfrac{{\bf a}\cdot{\bf q_1}}{{\bf q_1}\cdot{\bf q_1}}\right){\bf q_1}\).

Since \({\bf a}\cdot{\bf q_1} = 2\) and \(|{\bf q_1}|^2 = 3\), we have that

\({\bf q_2} = {\bf a} – \left(\dfrac{2}{3}\right){\bf q_1} = \dfrac{1}{3}\begin{bmatrix}10\\ 8\\ -2\end{bmatrix}\).

Computing that \(|{\bf q_1}| = \sqrt{3}\) and \(|{\bf q_2}| = \sqrt{168}/3 = \sqrt{56}/\sqrt{3}\), we have that our orthonormal basis is given by

\({\bf u_1} = \dfrac{1}{\sqrt{3}}\begin{bmatrix}1\\ -1\\ 1\end{bmatrix},\qquad {\bf u_2} = \dfrac{1}{\sqrt{3}\sqrt{56}}\begin{bmatrix}10\\ 8\\ -2\end{bmatrix}\).

Now we can get the projection. Call the plane that we are projecting to \(\mathsf P\). Calculate that \({\bf x}\cdot{\bf u_1} = \dfrac{-4}{\sqrt{3}}\) and \({\bf x}\cdot{\bf u_2} = \dfrac{\sqrt{56}}{\sqrt{3}}\). And so,

\(\text{proj}_{\mathsf P}({\bf x}) = \dfrac{-4}{\sqrt{3}}{\bf u_1} + \dfrac{\sqrt{56}}{\sqrt{3}}{\bf u_2} = \begin{bmatrix}2\\ 4\\ -2\end{bmatrix}\).

Takeaway: to get the orthogonal projection of a vector \(\bf x\) to a subspace \(\mathsf V\), find an orthonormal basis of \(\mathsf V\), take all the dot products of \(\bf x\) with the vectors in the orthonormal basis (this gives you scalars \(\alpha_1,\ldots,\alpha_k\)), then the projection of \(\bf x\) is the linear combination of the orthonormal basis vectors that uses those scalars.

First, use \(u_1\) and \(u_2\) in their coordinate plane, getting an arrow of length, \(\sqrt{u_1^2+u_2^2}\). Then, use that length and a perpendicular segment of length \(u_3\), and fill these out to a right triangle. By the Pythagorean theorem, the hypotenuse, w/ coordinates \((u_1,u_2,u_3)\), has length
\(\sqrt{\left(\sqrt{u_1^2+u_2^2}\right)^2+u_3^2} = \sqrt{u_1^2+u_2^2+u_3^2}\).
This can be extended to any number of coordinates, giving length \(|{\bf u}| = \sqrt{u_1^2+\ldots+u_n^2}\). ↩
The \(|\ |\) on the \(\alpha\) is absolute value and on the \(\bf u\) it is the norm, or length, in \(\mathbb R^n\). But note, when \(n=1\) the norm is the same thing as the absolute value. You can think of real numbers as vectors in \(\mathbb R^1\). ↩
More accurately, the Cauchy-Schwarz inequality is the more fundamental fact; it can be proved (even for more general inner products) with no reference to angles. In fact, typically the (cosine of the) angle between the vectors is said to be defined by the ratio \({\scriptsize\dfrac{{\bf u}\cdot{\bf v}}{|{\bf u}||{\bf v}|}}\). Cauchy-Schwarz guarantees this definition makes sense. ↩
The verification that \({\bf q_j}\cdot{\bf q_i} = 0\) for all \(j < i\) is almost entirely the same as the verification above that \({\bf q_1}\cdot{\bf q_2} = 0\), by noting that we already know that the various \(\bf q_j\), \(j < i\), are pairwise orthogonal. ↩
You can check this using the dot product — essentially, it’s the Pythagorean theorem. Pick a not-orthogonal vector from \(\bf x\) to a point in \(\mathsf V\), and use the dot product and the Pythagorean theorem to see it must be longer than one starting at \(\bf x\) and which is orthogonal to all vectors in \(\mathsf V\). ↩