matrix multiplication

Matrix multiplication

Intuition and definition

Matrix multiplication is a symbolic way of substituting one linear change of variables into another one, or in a more familiar notation, the composition of two linear functions, e.g. \[g(f(x))\].

Let \[x^{\prime}=ax+by\] and \[y^{\prime}=cx+dy\], \[x^{\prime\prime}=a^{\prime}x^{\prime}+b^{\prime}y^{\prime}\] and \[y^{\prime\prime}=c^{\prime}x^{\prime}+d^{\prime}y^{\prime}\]. To express \[x^{\prime\prime}\] and \[y^{\prime\prime}\] in terms of \[x\] and \[y\]:

\begin{align*} x^{\prime\prime}&=a^{\prime}x^{\prime}+b^{'}y^{\prime}\\ &=a^{\prime}(ax+by)+b^{\prime}(cx+dy)\\ &=(a^{\prime}a+b^{\prime}c)x+(a^{\prime}b+b^{\prime}d)y\\ y^{\prime\prime}&=c^{\prime}x^{\prime}+d^{\prime}y^{\prime}\\ &=c^{\prime}(ax+dy)+d^{\prime}(cx+dy)\\ &=(c^{\prime}a+d^{\prime}c)x+(c^{\prime}b+d^{\prime}d)y\\ \end{align*}

It is very tedious to write these variables, thus we use arrays to track these coefficients. What is just written coincides with the matrix product:

\begin{align*} \begin{pmatrix} a^{\prime}&b^{\prime}\\c^{\prime}&d^{\prime} \end{pmatrix} \begin{pmatrix} a&b\\c&d \end{pmatrix} = \begin{pmatrix} a^{\prime}a+b^{\prime}c&a^{\prime}b+b^{\prime}d\\ c^{\prime}a+d^{\prime}c&c^{\prime}b+d^{\prime}d \end{pmatrix} \end{align*}

So matrix multiplication is just a bookkeeping device for systems of linear substitutions plugged into one another. The formula itself isn't quite intuitive as it is just a simple idea of combining two linear changes of variables in succession.

The notation above is a little convoluted, so let's just take a simple example, \[u=3x+7y,v=-2x+11y\] and \[p=13u-20v,q=2u+6v\]. Representing \[p\] and \[q\] in terms of \[x\] and \[y\] gives us \[p=79x-129y,q=-6x+80y\], which coincides with,

\begin{align*} \begin{pmatrix} 13&-20\\2&6 \end{pmatrix} \begin{pmatrix} 3&7\\-2&11 \end{pmatrix} &= \begin{pmatrix} 79&-129\\-6&80 \end{pmatrix} \end{align*}

Now you may be wondering, what's the need of this? A simple answer is that it allows us to represent linear transformations in a natural way, which implies that the resulting matrix from multiplication tells us the mapping if you apply one transformation followed by the other.

main-qimg-97765db14915ca1cc61edec37c91b03a.png
To extend on the image above, we'll demonstrate it with functions. Let \[f:\mathbb{R}^{2}\to\mathbb{R}^{3}\] and \[g:\mathbb{R}^{3}\to \mathbb{R}^{4}\]. \[f(\begin{pmatrix} x\\y \end{pmatrix})=\begin{pmatrix} x+2y\\x-y\\x \end{pmatrix}\] and \[g(\begin{pmatrix} a\\b\\c \end{pmatrix})=\begin{pmatrix} a+b+c\\c\\0\\a \end{pmatrix}\].

With the way we defined both functions, it only makes sense to do \[g(f(x))\] as opposed to \[f(g(x))\]. If we were to do \[f(g(x))\], it would mean mapping a vector from dimension 4 to 3, but \[f(x)\] requires something representable in \[\mathbb{R}^{2}\]. This also explains why the condition (as defined later on) \[n_{1}=m_{2}\] is a requirement for matrix multiplication.

If we expand \[g(f(x))\],

\begin{align*} g(f(\textbf{x}))&=g(f( \begin{pmatrix} x\\y \end{pmatrix} ))\\ &=g( \begin{pmatrix} x+2y\\x-y\\x \end{pmatrix} )\\ &= \begin{pmatrix} 3x+y\\x\\0\\x+2y \end{pmatrix} \end{align*}

We can also rewrite \[\begin{pmatrix} x+2y\\x-y\\x \end{pmatrix}=\begin{pmatrix} 1&2\\1&-1\\1&0 \end{pmatrix}\cdot\begin{pmatrix} x\\y \end{pmatrix}=A\cdot\mathbf{x}\] and \[\begin{pmatrix} a+b+c\\c\\0\\a \end{pmatrix}=\begin{pmatrix} 1&1&1\\0&0&1\\0&0&0\\1&0&0 \end{pmatrix}\cdot\begin{pmatrix} a\\b\\c \end{pmatrix}=B\cdot\mathbf{x}\]. Thus, \[\begin{pmatrix} 3x+y\\x\\0\\x+2y \end{pmatrix}=\begin{pmatrix} 3&1\\1&0\\0&0\\1&2 \end{pmatrix}\cdot\begin{pmatrix} x\\y \end{pmatrix}=C\cdot\mathbf{x}\].

Composing \[g(f(\mathbf{x}))\] once more we get \[g(f(\mathbf{x}))=g(A\mathbf{x})=B(A\mathbf{x})=C\mathbf{x}\], which brings us back to matrix multiplication \[BA\mathbf{x}=C\mathbf{x}\].

Scalar

Assume \[A\] is a \[2\times2\] matrix.

\begin{align*} A&= \begin{pmatrix} a & b \\ c & d \\ \end{pmatrix} \\ kA&= \begin{pmatrix} ka & kb \\ kc & kd \\ \end{pmatrix} \end{align*}

Matrices

Assume two matrices of order \[m_{1}\times n_{1}\] and \[m_{2}\times n_{2}\]. For the two matrices to be able to be multiplied together, the condition \[n_{1}=m_{2}\] must be true, as explained above, e.g.

\begin{align*} A&= \begin{pmatrix} a & b \\ c & d \\ \end{pmatrix},\, B= \begin{pmatrix} e & f \\ g & h \\ \end{pmatrix}\\ AB&= \begin{pmatrix} ae+bg & af+bh \\ ce+dg & cf+dh \\ \end{pmatrix} \end{align*}

For multiplications of three or more matrices, say \[A\], \[B\], \[C\], \[D\], as long as they are multiplied in sequence, such as \[(AB)CD\] or \[A(BC)D\], the answer will be the same.

Let \[A=(a_{ij})\] be a \[m\times n\] matrix and \[B=(b_{ij})\] be a \[n\times p\] matrix. The matrix product \[AB\] is defined to be a \[m\times p\] whose entry \[i,j\] entry is \[\sum_{k=1}^{n}(a_{ik}\cdot b_{kj})\]. For example, the entry at the first row and second column of the matrix product \[AB\] the formula would be a shorthand for \[a_{11}b_{12}+a_{12}b_{22}+\cdots+a_{1n}b_{n2}\].

Non-commutativity

An operation is defined as commutative if given two elements \[A\] and \[B\] such that the product \[AB\] is defined, then \[BA\] is also defined and \[AB=BA\]. For matrix multiplication, \[AB\ne BA\] therefore it is non-commutative. To demonstrate this, \[\begin{pmatrix} 0&1\\0&0 \end{pmatrix}\begin{pmatrix} 0&0\\1&0 \end{pmatrix}=\begin{pmatrix} 1&0\\0&0 \end{pmatrix}\] but \[\begin{pmatrix} 0&0\\1&0 \end{pmatrix}\begin{pmatrix} 0&1\\0&0 \end{pmatrix}=\begin{pmatrix} 0&0\\0&1 \end{pmatrix}\].

  • Zero matrix:

    We define a zero matrix as

    \begin{align*} 0_{mn}= \begin{pmatrix} 0 & 0 & \cdots \\ 0 & 0 & \cdots \\ \vdots & \vdots & \ddots \\ \end{pmatrix} \end{align*}

    Any matrix multiplied by zero, is zero, therefore \[AB=BA\]

  • Powers

    Assume:

    \begin{align*} A= \begin{pmatrix} 2 & 5 \\ 1 & 2 \\ \end{pmatrix},\, A^{2}= \begin{pmatrix} 9 & 20 \\ 4 & 9 \\ \end{pmatrix} \end{align*}

    Here, \[AA^{2}=A^{2}A\], or in general, \[A^{m}A^{n}=A^{n}A^{m}\].

  • Identity matrix

    When \[A\] is an \[m\times n\] matrix, then \[I_{m}A=AI_{n}=A\].

Matrix multiplication happens columnwise

When we multiple a \[m\times n\] matrix by an \[n\times 1\] column vector. Let \[A=\begin{pmatrix} a&b&c\\d&e&f \end{pmatrix}\], \[\mathbf{x}=\begin{pmatrix} x\\y\\z \end{pmatrix}\]. Then, \[A\mathbf{x}=\begin{pmatrix} ax+by+cz\\dx+ey+fz \end{pmatrix}\], or another way to write the result of the multiplication would be \[x\begin{pmatrix} a\\d \end{pmatrix}+y\begin{pmatrix} b\\e \end{pmatrix}+z\begin{pmatrix} c\\f \end{pmatrix}\]. If we denote the \[j\]th column of \[A\] as \[\mathbf{c}_{j}\] then the expression is simplified into \[x\mathbf{c}_{1}+y\mathbf{c}_{2}+z\mathbf{c}_{3}\], which is known as a linear combination of \[\mathbf{c}_{1},\mathbf{c}_{2},\mathbf{c}_{3}\].

index