Chapter 3 Applications of Matrix Multiplication

As we will begin to see here, matrix multiplication has a number of uses in data modeling and problem solving. It expresses a rather large number of operations in a surprisingly compact way. The more comfortable we can be with this compact notation and what it entails, the more understanding we can have with analytical tools like Principal Components Analysis, Factor Analysis, Markov Chains, and Optimization (to name a few).

3.1 Systems of Equations

Matrix multiplication creates a system of equations, which is nothing more than a collection of equations which hold true simultaneously. Suppose we take the matrix-vector product: \[\A\x=\b\] Where \[\A=\pm 1&2&3&1\\0&3&2&1\\1&1&1&4\mp \quad \x=\pm x_1\\x_2\\x_3\\x_4 \mp \quad \mbox{and} \quad \b=\pm 10\\15\\6\mp\] Let’s take a look at what happens when we write the equation \(\A\x=\b\) the old-fashioned way, without matrices: \[ \pm 1&2&3&1\\0&3&2&1\\1&1&1&4\mp\pm x_1\\x_2\\x_3\\x_4 \mp = \pm 10\\15\\6\mp \] \[ \Longrightarrow\begin{cases}\begin{align} x_1+2x_2+3x_3+x_4 = 10\\ 3x_2+2x_3+x_4 = 15\\ x_1+x_2+x_3+4x_4 = 6\end{align}\end{cases} \]

We get a system of three equations. In general, a system of equations is nothing more than a matrix equation, \(\A\x=\b\) where the matrix \(\A\) contains the coefficients on the parameters you wish you find, \(\x\) is a vector containing those unknown parameters and \(\b\) is a vector containing the right hand sides of the equations. These systems of equations pop-up in all types of data applications from regression analysis to optimization. Let’s consider a scenario which mimics the real-world and try to model it using a matrix-vector product.

Example 3.1 (System of Equations) A large manufacturing company has recently signed a deal to manufacture trail mix for a well-known food label. This label makes 3 versions of its product - one for airlines, one for grocery stores, and one for gas stations. Each version has a different mixture of peanuts, raisins, and chocolate which serves as the base of the trail mix. The base mixtures are made in 15 kg batches and sent to a second building for packaging.

The following table contains the information about the mixes, each row containing the recipe for a 15 kg batch. There is also some additional information on the costs of the ingredients, the price the manufacturer can charge for the mixtures and the amount of storage allocated for each ingredient.

Raisins
(kg/batch)
Peanuts
(kg/batch)
Chocolate
(kg/batch)
Sale Price
($/kg)
Airline (a) 7 6 2 4.99
Grocery (g) 2 5 8 6.50
Gas Station (s) 6 4 5 5.50

|Storage (kg) | 380 | 500| 620| | |Cost ($/kg) | 2.55|4.65|4.80| |

a. If the manufacturer wanted to use up all the ingredients in storage each day, how many batches of each mixture (airline, gas station, and grocery) should be made?

We can gather from the table that 1 batch of the airline mixture contains 7 kgs of raisins. We want the total number of kgs of raisins from each of the 3 mixtures to match the storage capacity of raisins, which is 380 kg. We can set this up as a system of equations, one for each ingredient, where \[\begin{eqnarray} a&=&\mbox{batches of airline mixture}\\ g&=&\mbox{batches of grocery mixture}\\ s&=&\mbox{batches of gas station mixture}\\ a,g,s &\geq & 0 \end{eqnarray}\] as follows: \[\begin{cases}\begin{eqnarray} 7 a+6 s+2g &=& 380 \quad \mbox{(Raisins)}\\ 6 a+4 s+5g &=& 500 \quad \mbox{(Peanuts)}\\ 2 a+5 s+8g &=& 620 \quad \mbox{(Chocolate)}\end{eqnarray}\end{cases}\]

We can then transform this system of equations into matrix form: \[\pm 7&2&6\\6&5&4\\2&8&5 \mp \pm a\\g\\s \mp = \pm 380\\500\\620 \mp\]

While we haven’t yet discussed how to solve such a system of equations, you can verify that \[a=20\,batches \quad g=60\,batches \quad s=20\,batches\] is indeed a solution - in fact, it is the only possible solution. Determining solutions such as this, and establishing that they are unique (i.e. that they are the only possible solution) is one of the many tools that the study of linear algebra will provide.

b. Use matrix-vector multiplication to determine how much it costs the manufacturer to produce 1 batch of each mixture

For 1 batch of airline mixture, the manufacturer will spend \[7 kg\times \$2.55/kg = \$17.85\,\,\mbox{ on raisins.}\]

Of course, we need to add in the cost of peanuts and chocolate and then repeat this calculation for both grocery and gas station mixtures.

This is conveniently done in one matrix-vector multiplication: \[\pm 7&6&2\\2&5&8 \\6&4&5 \mp \pm 2.55\\4.65\\4.80 \mp = \pm 55.35\\66.75\\57.90\mp\] Thus, the cost of 1 batch of each type of mixture is:

Mixture Cost ($/batch)
airline 55.35
grocery 66.75
gas station 57.90

3.1.1 Big Systems of Equations

When we expand our minds to the possibilities associated with matrix-matrix products, the systems that we can generate get very large very quickly.

For example, let’s consider a \(2\times 2\) example, \[\A\X=\B\] where \[\A=\pm 2&3\\1&4 \mp \qquad \X=\pm x_{11} & x_{12} \\x_{21} & x_{22} \mp \qquad \B=\pm 7 & 6 \\ 5& 9 \mp\]

Let’s take a look at what the simple equation \(\A\X=\B\) is really saying in terms of all the matrix values that we have. All we have to do is write out the multiplication “the long way.” For example, to get the element in the first row and first column of \(\B\) (in this case, 7) we would compute the inner product of the first row of \(\A\) with the first column of \(\X\):

\[\pm 2 & 3 \mp \pm x_{11} \\ x_{21} \mp = \red{2x_{11}+3x_{21} = 7}\]

Now the equation in red above is just one of 4. We have one equation for each element of \(\B\)!. Let’s make sure we understand how to get all 4 of these equations: \[\begin{eqnarray} 2x_{11}+3x_{21} &=& 7 \\ 2x_{12}+3x_{22} &=&6\\ 1x_{11}+4x_{21} &=& 5 \\ 1x_{12}+4x_{22} &=&9\\ \end{eqnarray}\]

Now, a list of 4 equations does not seem that big. But what if the dimensions of the matrix \(\B\) were \(9\times10\)? By now you should be able to see that we’d have a system of 90 equations! The number of equations generated will always equal the number of elements in the right hand side matrix \(\B\).

3.2 Regression Analysis

In statistics, the solution to these systems of equations is exactly what we are trying to find when we do regression analysis. Take, for example, a regression analysis with some dependent variable, \(\y\), and two independent variables, \(\h,\w\). The preliminary goal of this analysis is to find unknown parameters \(\beta_0, \beta_1, \dots\) such that \[\begin{equation} \y= \beta_0+\beta_1\h+\beta_2\w \tag{3.1} \end{equation}\]

This is the single equation we usually consider when talking about regression analysis - but what about all those data points? Suppose, for simplicity, we have only 4 observations as listed in the following table:

\(\h\) \(\w\) \(\y\)
3 3 6
2 3 6
5 6 10
6 5 9

When we write the model from Equation (3.1), what we are really saying is that the equation holds true for each of the 4 observations in our dataset. So rather than 1 single equation, what we really have here is 4 equations - 1 for each observation:

\[\begin{eqnarray} \beta_0 + 3 \beta_1 + 3 \beta_2 &=& 6 \quad \mbox{(obs. 1)}\\ \beta_0 + 2 \beta_1 + 3 \beta_2 &=& 6 \quad \mbox{(obs. 2)}\\ \beta_0 + 5 \beta_1 + 6 \beta_2 &=& 10 \,\,\, \mbox{(obs. 3)}\\ \beta_0 + 6 \beta_1 + 5 \beta_2 &=& 9 \quad \mbox{(obs. 4)}\\ \end{eqnarray}\]

Rather than writing all these equations out, we instead represent the situation in matrix format as \[\X\bbeta = \y,\] Where \[\X=\pm 1&3&3\\1&2&3\\1&5&6\\1&6&5 \mp \quad \bbeta = \pm \beta_0 \\\beta_1 \\ \beta_2 \mp \quad \mbox{and}\quad \y = \pm 6 \\6 \\10\\9\mp\]

We know from our experiences with data that this situation will not have an exact solution: our data does not fall exactly on some straight line or surface. Instead, we have to consider some error, \(\boldsymbol\epsilon\) and try to minimize it: \[\X\bbeta + \boldsymbol\epsilon = \y,\] where \[\boldsymbol\epsilon = \pm \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \\ \epsilon_4 \mp\] is a vector containing the residuals. We will get into the exact details of this shortly, but for now it is important that we see how to set up the regression equation in terms of matrices and vectors. The typical regression equation with the intercept will always involve adding a column of 1’s to the matrix of independent variables, as seen in the previous example.

3.3 Linear Combinations

Let’s revisit the second part of Example 3.1, where the task was to use matrix-vector multiplication to determine how much it would cost for the manufacturer to produce 1 batch of each mixture.

Essentially what we want to do is take a linear combination of the amounts of raisins, peanuts, and chocolate where the scalar weights are the cost of each ingredient: \[\bordermatrix{~& \mbox{Cost of 1kg}}{}{\begin{pmatrix}\mbox{airline}\\ \mbox{grocery}\\ \mbox{gas station} \end{pmatrix}} = \$2.55 \bordermatrix{~& raisins}{}{ \begin{pmatrix} 7\\ 2\\ 6\end{pmatrix}} + \$4.65 \bordermatrix{~& peanuts}{}{ \begin{pmatrix} 6\\ 5\\ 4\end{pmatrix}} + \$4.80 \bordermatrix{~& chocolate}{}{ \begin{pmatrix} 2\\ 8\\ 5\end{pmatrix}}\] This linear combination is exactly the same as the matrix-vector product originally used: \[\pm 7&6&2\\2&5&8 \\6&4&5 \mp \pm 2.55\\4.65\\4.80 \mp = \pm 55.35\\66.75\\57.90\mp\]

Matrix multiplication is nothing more than a series of linear combinations. Let’s develop another quick example.

Example 3.2 (Linear Combinations of Variables) Suppose we have data for 100 postal packages using 3 variables: height \(\bo{h}\), weight \(\bo{w}\), and volume \(\bo{v}\). If we create a data matrix, \(\X\), the size of the matrix will be \(100\times 3\) and the three columns will be composed of the variables height, weight, and volume. The previous sentence is written mathematically by creating a partitioned matrix: \[\X = \left( \bo{h} | \bo{w} | \v \right)\]

If we wanted to create a new variable vector, \(\bo{c}\), which equaled the height plus twice the weight of the package, we’d want to compute the following linear combination: \[\bo{c} = \bo{h} + 2\bo{w} + 0\v\]

This could be accomplished by multiplying our whole data matrix by the vector \(\pm 1\\2\\0\mp\).

\[\begin{eqnarray*} \bo{c}&=&\underset{(100\times 3)}{\X}\pm 1\\2\\0\mp \cr &=&\left( \bo{h} | \bo{w} | \v \right)\pm 1\\2\\0\mp \cr &=& \bo{h} + 2\bo{w} + 0\v \end{eqnarray*}\]

If this example confuses you, you ought to write out a smaller matrix of values for the three variables, height, weight and volume. Write down 3 observations or so and see how the linear combination of these columns is precisely the same as the matrix-vector product. Being able to think of these two ideas as interchangeable will be fundamental when we start talking about factor analysis and principal components analysis.

If we dissect our formula for a system of linear equations, \(\A\x=\bo{b}\), we will find that the right-hand side vector \(\bo{b}\) can be expressed as a linear combination of the columns in the coefficient matrix, \(\A\).

\[\begin{eqnarray*} \bo{b}&=& \A\x\\ \bo{b}&=& (\A_1|\A_2|\dots|\A_n)\pm x_1\\x_2\\ \vdots\\x_3 \mp \\ \bo{b}&=& x_1\A_1 + x_2\A_2 + \dots + x_n\A_n \end{eqnarray*}\] A concrete example of this expression is given in Example 3.3.

Example 3.3 (Systems of Equations as Linear Combinations) Consider the following system of equations: \[\begin{eqnarray} 3x_1 + 2x_2 + 9x_3 &=& 1\\ 4x_1 + 2x_2 + 3x_3 &=& 5\\ 2x_1 + 7x_2 + \,x_3 &=& 0 \end{eqnarray}\] We can write this as a matrix vector product \(\A\x=\bo{b}\) where \[\A=\pm 3 & 2 & 9\\4 & 2 & 3\\2 &7&1\mp \,\,\,\x=\pm x_1\\x_2\\x_3\mp \mbox{ and } \bo{b}=\pm 1\\5\\0 \mp\] We can also write \(\bo{b}\) as a linear combination of columns of \(\A\): \[x_1 \pm 3\\4\\2 \mp +x_2 \pm 2\\2\\7\mp + x_3 \pm9\\3\\1 \mp = \pm 1\\5\\0 \mp\]

Similarly, if we have a matrix-matrix product, we can write each column of the result as a linear combination of columns of the first matrix. Let \(\A_{m\times n}\), \(\X_{n\times p}\), and \(\B_{m\times p}\) be matrices. If we have \(\A\X=\B\) then \[ (\A_1 | \A_2 | \dots | \A_n) \pm x_{11} & x_{12} & \dots & x_{1p} \\x_{21} & x_{22} & \dots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2}&\dots &x_{np} \mp = (\B_1 | \B_2 | \dots | \B_n) \]

and we can write \[\B_j = \A\X_j = x_{1j}\A_1 + x_{2j}\A_2 + x_{3j}\A_3 + \dots + x_{nj}\A_n.\] A concrete example of this expression is given in Example 3.4.

Example 3.4 (Linear Combinations in Matrix-Matrix Products) Suppose we have the following matrix formula: \[\A\X=\B\] Where \(\A=\pm 2 & 1 & 3\\1 & 4 & 2\\ 3 & 2 & 1 \mp\), \(\X=\pm 5&6\\9&5\\7&8 \mp\). Then \[\begin{eqnarray} \B &=&\pm 2 & 1 & 3\\1 & 4 & 2\\ 3 & 2 & 1 \mp \pm 5&6\\9&5\\7&8 \mp \\ ~ &=& \pm 2(5)+1(9)+3(7)&2(6) +1(5)+3(8)\\1(5)+4(9)+2(7)&1(6)+4(5)+2(8)\\3(5)+2(9)+1(7)&3(6)+2(5)+1(8) \mp \end{eqnarray}\] and we can immediately notice that the columns of \(\B\) are linear combinations of columns of \(\A\): \[\B_1 = 5\pm 2\\1\\3\mp+9\pm 1\\4\\2 \mp + 7 \pm 3\\2\\1 \mp\] \[\B_2 = 6\pm 2\\1\\3\mp+5\pm 1\\4\\2 \mp + 8 \pm 3\\2\\1 \mp\]

We may also notice that the rows of \(\B\) can be expressed as a linear combination of rows of \(\X\):

\[\B_{1\star} = 2\pm 5& 6 \mp + 1\pm 9& 5 \mp + 3\pm 7& 8 \mp \] \[\B_{2\star} = 1\pm 5& 6 \mp + 4\pm 9& 5 \mp + 2\pm 7& 8 \mp \] \[\B_{3\star} = 3\pm 5& 6 \mp + 2\pm 9& 5 \mp + 1\pm 7& 8 \mp \]

Linear combinations are everywhere, and they can provide subtle but important meaning in the sense that they can break data down into a sum of parts.

You should convince yourself of one final view of matrix multiplication, as the sum of outer products. In this case \(\B\) is the sum of 3 outer products (3 matrices of rank 1) involving the columns of \(\A\) and corresponding rows of \(\X\): \[\B=\acol{1}\X_{1\star}+\acol{2}\X_{2\star}+\acol{3}\X_{3\star}.\]

Example 3.4 turns out to have important implications for our interpretation of matrix factorizations. In this context we’d call \(\A\X\) a factorization of the matrix \(\B\). We will see how to use these expressions to our advantage in later chapters.

3.4 Exercises

  1. A florist offers three sizes of flower arrangements (small, medium, large) containing three types of flowers (roses, daisies, and chrysanthemums). The number of each type of flower in each size arrangement is given in the table below, along with the selling price of each arrangement and the cost of each individual flower.

    Roses Daisies Chrys. Price
    Small 1 3 3 $10
    Medium 2 4 6 $15
    Large 4 8 6 $20
    Cost $0.50 $0.25 $0.10
    Let \[\A = \pm 1 &3 & 3\\2& 4& 6\\4& 8&6 \mp \quad \bo{p}= \pm 10\\15\\20 \mp \quad \bo{c} = \pm 0.50\\0.25\\0.10 \mp.\]
    1. Determine the matrix-vector product that produces a vector, \(\y\) which gives the total cost of creating each size arrangement (small, medium, and large).
    2. Suppose that an order came in for 2 small arrangements and 2 large arrangements. Let \[\bo{v} = \pm 2\\0\\2 \mp\] Using matrix arithmetic (and writing out the formula) determine both the price of this order and the total profit to the florist.
  2. In an effort to learn more about visitors to social media websites, you’ve collected data on whether or not individuals have visited certain sites in the past week. These are binary variables indicating if the website has been visited (“1”) or not (“0”). Using the following dataset, answer the questions below.

    Name FB Insta Link Twit TikTk YuTube GoogPlus
    Tito 1 1 0 0 0 0 0
    Jojo 0 1 0 0 0 1 1
    Loki 0 1 1 1 1 1 1
    Niko 0 0 1 1 1 1 1
    Yaya 1 0 0 1 0 1 1
    Kujo 1 1 1 0 0 0 1
    Arod 0 0 0 0 1 0 0
    Park 1 1 1 1 0 1 1
    Yorp 0 1 0 1 0 0 0
    Hoki 0 0 1 1 0 1 1
    1. Suppose we were interested to know how many social media sites two users have in common - we intend to use this as a sort of similarity score to inform us which users might have similar preferences. What matrix operation might be useful to compute this similarity score? In particular, let \(\bo{t}\) represent the data for Tito so that \[\mathbf{t} = [t_1 \,\, t_2\,\, \dots \,\, t_7] = [1\,\,1\,\,0\,\,0\,\,0\,\,0\,\,0].\] Likewise, let \(\bo{j}\) represent the data for Jojo: \[\mathbf{j} = [j_1 \,\, j_2\,\, \dots \,\, j_7] = [0\,\,1\,\,0\,\,0\,\,0\,\,1\,\,1].\] Write down an arithmetic expression using \(\mathbf{t}\) and \(\mathbf{j}\) that will compute the number of social media sites that Tito and Jojo have in common.
    2. Repeat the same exercise from part (a) with the users Hoki and Yorp.
    3. Consider using the information from parts (a) and (b) to come up with a formula to compute ALL of the pairwise similarity scores in one matrix. What information do the diagonal entries of this matrix provide? Note: a similarity matrix has rows and columns corresponding to the same entities (in this case, the users) and the \((i,j)^{TH}\)-entry in the matrix would tell us the similarity score of user \(i\) and user \(j\).
    4. What if our goal was not to examine similar users, but to examine similarities between social media sites? Reframe the above analysis to pertain to the similarity between websites, so the similarity score between Facebook (FB) and Instagram (Insta) is computed to be the number of users in the dataset using both sites. What information do the diagonal entries of this matrix provide?
    5. What words describe the similarity matrices found in part (c) and (d)? Rectangular? Square? Symmetric? Diagonal? Upper Triangular? Identity?
  3. Write the following system of equations as a matrix-vector product \(\A\x=\b\): \[\begin{eqnarray*} 2x_2 +3x_3&=& 8 \\ 2x_1+3x_2+1x_3 &=& 5 \\ x_1-x_2-2x_3 &=&-5 \end{eqnarray*}\]

  4. Linear regression involves a system of equations of the form \(\X\boldsymbol\beta = \y\). For the following data and regression model, write out each piece of the equation (\(\X,\boldsymbol\beta, \y\)), whether it is known or unknown. Note that you are not asked to solve this system of equations. Is this system of equations likely to have an exact solution? Justify your answer. Again you don’t have to perform any calculations, this problem is only for discussion.

  5. group age \(\mathbf{VO_2}\) max mile pace (min)
    A 32 55 8.3
    A 23 45 8.3
    B 41 44 7.9
    B 19 31 9.9
    B 27 41 6.9
    B 25 60 5.3
    A 21 45 7.5
    B 45 43 7.1
    C 22 49 6.6

    \[ mile\_pace = \beta_0 + \beta_1 (group_A)+ \beta_2 (group_B) + \beta_3 (age) + \beta_4 (VO_2\_max) \]

  6. A model is being developed to predict a student’s SAT score based upon some numeric attributes. The data being used for this model is provided below:

    Observation PSAT score Mother’s SAT score SAT Score
    1 1600 1700 1750
    2 1800 1250 1750
    3 1750 1300 1600
    4 1200 1800 1450
    5 1350 1950 1500

    If our regression model is \[SAT\_score = \beta_0 + \beta_1* PSAT\_score + \beta_2* Mothers\_SAT\_Score + \epsilon\] Show how we’d set up the underlying matrix equation for regression analysis, \[\y = \X\bbeta \] by defining the matrices/vectors \(\X, \bbeta, \mbox{ and } \y\).

  7. Suppose a company collected daily data regarding the sales and revenue of particular products for which prices fluctuate daily:

    Monday Tuesday Wednesday Thursday Friday
    Product Sales Rev. Sales Rev. Sales Rev. Sales Rev. Sales Rev.
    Widgets 1 195 5 945 2 400 2 450 5 790
    Gadgets 35 350 13 110 25 300 45 497 90 789
    1. Show how you could use matrix addition to compute the total weekly sales and revenue of each product.
    2. Now suppose you find out that both the sales and the revenue numbers in the table above were listed in hundreds (i.e. that 100 widgets were sold on Monday, bringing in $19,500 in revenue). Using your answer from part a. show how you would use scalar multiplication to represent the exact weekly numbers for revenue and units sold.
  8. For a general matrix \(\A_{m\times n}\) describe what the following products will provide. Also give the size of the result (i.e. “\(n\times 1\) vector” or “scalar”). Recall that \(\e\) denotes the vector of all ones, and \(\e_i\) denotes the \(i^{th}\) column of the identity matrix.
    1. \(\A\e_j\)
    2. \(\e_i^T\A\)
    3. \(\e_i^T\A\e_j\)
    4. \(\A\e\)
    5. \(\e^T\A\)
    6. \(\frac{1}{n}\e^T\A\)
  9. Let \(\bo{D}_{n\times n}\) be a diagonal matrix with diagonal elements \(D_{ii}\). What effect does multiplying a matrix \(\A_{n\times m}\) on the left by \(\bo{D}\) have? What effect does multiplying a matrix \(\A_{m\times n}\) on the right by \(\bo{D}\) have? If you cannot see this effect in a general sense, try writing out a simple \(3\times 3\) matrix as an example first.