# Motivation¶

Multivariate linear regression before linear algebra:

\begin{align*} y^{(1)} &\approx \theta_0 + \theta_1 x_1^{(1)} + \theta_2 x_2^{(1)} + \ldots + \theta_n x_n^{(1)} \\ y^{(2)} &\approx \theta_0 + \theta_1 x_1^{(2)} + \theta_2 x_2^{(2)} + \ldots + \theta_n x_n^{(2)} \\ \ldots \\ y^{(m)} &\approx \theta_0 + \theta_1 x_1^{(m)} + \theta_2 x_2^{(m)} + \ldots + \theta_n x_n^{(m)} \end{align*}

Multivariate linear regression after linear algebra $$\mathbf{y} \approx X \boldsymbol{\theta}$$

# Linear Algebra in ML¶

• Succinct notation for models and algorithms
• Numerical tools (save coding!) $$\boldsymbol{\theta} = (X^{T}X)^{-1}X^T\mathbf{y} $$
• Inspiration for new models and problems: Netflix

# Netflix¶

Gladiator Silence of the Lambs WALL-E Toy Story
Alice 5 4 1
Bob 5 2
Carol 5
David 5 5
Eve 5 4

Matrix completion problem, matrix factorization

# Topics¶

• Matrices
• Vectors
• Matrix-Matrix multiplication (and special cases)
• Tranpose
• Inverse

# Matrices¶

A matrix is an rectangular array of numbers

$$A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right]$$

When $A$ has $m$ rows and $n$ columns, we say that:

• $A$ is an $m \times n$ matrix
• $A \in \mathbb{R}^{m \times n}$

The entry in row $i$ and column $j$ is denoted $A_{ij}$

• sometimes $a_{ij}$ or $(A)_{ij}$

# Example¶

$$A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right]$$
• $A \in \mathbb{R}^{3 \times 2}$
• $A_{11}= 101$
• $A_{32}=$
• $A_{22}=$
• $A_{23}=$

# Matrices in Python¶

In [3]:
import numpy as np

# Pass a list of lists to the np.array constructor
A = np.array([[101, 10],
[54,  13],
[10,  47]])

print A

m,n = A.shape

print "A has %d rows and %d columns" % (m, n)

[[101  10]
[ 54  13]
[ 10  47]]
A has 3 rows and 2 columns


# Matrix Indexing in Python¶

Note that Python is zero-indexed

In [23]:
print A[0,0]   # A_11 in math
print A[2,1]   # A_32 in math
print A[1,1]   # A_22 in math
print A[1,2]   # A_23 in math. ERROR!

101
47
13

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-8a5b8a7640a3> in <module>()
2 print A[2,1]   # A_32 in math
3 print A[1,1]   # A_22 in math
----> 4 print A[1,2]   # A_23 in math. ERROR!

IndexError: index 2 is out of bounds for axis 1 with size 2

# Vectors¶

A vector is an $n \times 1$ matrix:

$$\mathbf{x} = \left[ \begin{array}{c} 8 \\ 2.4 \\ 1 \\ -10 \end{array} \right]$$
• We write $\mathbf{x} \in \mathbb{R}^n$ (instead of $\mathbf{x} \in \mathbb{R}^{n \times 1}$)

• The $i$th entry is $x_i$

# Example¶

$$\mathbf{x} = \begin{bmatrix} 8 \\ 2.4 \\ 1 \\ -10 \end{bmatrix}$$
• $\mathbf{x} \in \mathbb{R}^4$
• $x_1 =$
• $x_4 =$
In [22]:
x = np.array([8, 2.4, 1, -10])
print x[0]
print x[3]

8.0
-10.0


If two matrices have the same size, we can add them by adding corresponding elements

$$\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 3 & 5 \\ -1 & 0 \end{bmatrix} = \begin{bmatrix} 4 & 7 \\ 2 & 4 \end{bmatrix}$$
• Subtraction is similar
• Matrices of different sizes cannot be added or subtracted
In [5]:
A = np.array([[1, 2],
[3, 4]])
B = np.array([[3, 5],
[-1, 0]])
print A + B

[[4 7]
[2 4]]

In [6]:
A = np.array([[1, 2],
[3, 4]])
C = np.array([[1, 2, 3]])
print A + C

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-b01738fa6a0a> in <module>()
2               [3, 4]])
3 C = np.array([[1, 2, 3]])
----> 4 print A + C

ValueError: operands could not be broadcast together with shapes (2,2) (1,3) 
In [2]:
# Beware: broadcasting. This will work, and is a nice feature, but
# is not a linear algebra operation
A = np.array([[1, 2],
[3, 4]])
D = np.array([1, 2])
print A + D

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-164fc80373a4> in <module>()
1 # Beware: broadcasting. This will work, and is a nice feature, but
2 # is not a linear algebra operation
----> 3 A = np.array([[1, 2],
4               [3, 4]])
5 D = np.array([1, 2])

NameError: name 'np' is not defined
In [8]:
# Do this to broadcast a column vector
A = np.array([[1, 2],
[3, 4]])
D = np.array([[1], [2]])   # a 2x1 vector or "column vector"
print A + D

[[2 3]
[5 6]]


# Scalar Multiplication¶

A scalar $x \in \mathbb{R}$ is a real number (i.e., not a vector)

$$\text{e.g., } 2,\, 3,\, \pi,\, \sqrt{2},\, 1.843,\, \ldots$$

Scalar times a matrix:

$$2 \cdot \begin{bmatrix} 1 & 3 \\ -2 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 6 \\ -4 & 0 \end{bmatrix}$$

(multiply each entry by the scalar)

In [9]:
B = 2 * np.array([[1,3], [-2,0]])
print B

[[ 2  6]
[-4  0]]


# Matrix-Matrix Multiplication¶

Can multiply two matrices if their inner dimensions match

$$A \in \mathbb{R}^{m \times n}, B \in \mathbb{R}^{n \times p}$$$$C = AB \quad \in \mathbb{R}^{m \times p}$$

The product has entries $$C_{ij} = \sum_{k=1}^n A_{ik} B_{kj}$$

# Matrix-Matrix Multiplication¶

$$C_{ij} = \sum_{k=1}^n A_{ik} B_{kj}$$

Move along $i$th row of $A$ and $j$th row of $B$. Multiply corresponding entries, then add.

$\newcommand{\r}{\mathbf}$

$$\begin{bmatrix} c_{11} & c_{12} & c_{13}\\ c_{21} & c_{22} & c_{23}\\ c_{31} & \r{c_{32}} & c_{33}\\ c_{41} & c_{42} & c_{43} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \\ \end{bmatrix} \begin{bmatrix} b_{11} & \r{b_{12}} & b_{13} \\ b_{21} & \r{b_{22}} & b_{23} \end{bmatrix}$$$$c_{32} = a_{31}b_{12} + a_{32}b_{22}$$

# Example¶

$$A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad B = \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix}$$$$AB = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix} =$$
In [10]:
A = np.array([[1, -1], [0, 3]])
B = np.array([[3,  2], [-1, 0]])
print A * B   # NOT matrix multiplication

[[ 3 -2]
[ 0  0]]

In [11]:
# OH NO! A*B gives elementwise multiplication!

# Use np.dot for matrix multiplication
print np.dot(A, B)

[[ 4  2]
[-3  0]]


# Multiplication Properties¶

• Associative $$(AB)C = A(BC)$$
• Distributive $$A(B+C) = AB + AC$$ $$(B+C)D = BD + CD$$
• Not commutative $$\r{AB \neq BA}$$

# Matrix-Vector Multiplication¶

A (worthy) special case of matrix-matrix multiplication:

$$A \in \mathbb{R}^{m \times n}, \quad \mathbf{x} \in \mathbb{R}^n$$$$\mathbf{y} = A\mathbf{x} \in \mathbb{R}^{m}$$

Definition $$y_i = \sum_{j=1}^n A_{ij} x_j$$

# Matrix-Vector Multiplication¶

$$y_i = \sum_{j=1}^n A_{ij} x_j$$$$\begin{bmatrix} y_1 \\ y_2 \\ \r{y_3} \\ y_4 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \end{bmatrix} \begin{bmatrix} \r{x_1} \\ \r{x_2} \end{bmatrix}$$$$y_3 = a_{31} x_1 + a_{32} x_2$$

# Example¶

$$A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix}1 \\ -1\end{bmatrix} \quad \mathbf{z} = \begin{bmatrix}8 \\ 1.5\end{bmatrix}$$$$A \mathbf{x} = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix}1 \\ -1\end{bmatrix} =$$$$A \mathbf{z} = \qquad \qquad \qquad \quad$$
In [12]:
A = np.array([[1, -1], [0, 3]])

x = np.array([1, -1])
z = np.array([8, 1.5])

x_rowvec = np.array([[1, -1]])
x_colvec = np.array([[1], [-1]])

print np.dot(A, x)
print np.dot(A, z)

# print np.dot(A, x_rowvec) # error
print np.dot(A, x_colvec)

[ 2 -3]
[ 6.5  4.5]
[[ 2]
[-3]]


# Transpose¶

Transposition of a matrix swaps the rows and columns $$A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix}, \quad A^T = \begin{bmatrix}1 & 0 \\ -1 & 3 \end{bmatrix}.$$

Definition:

• Let $A \in \mathbb{R}^{m \times n}$
• The transpose $A^T \in \mathbb{R}^{n \times m}$ has entries
$$(A^T)_{ij} = A_{ji}.$$

# Example¶

$\newcommand{\b}{\mathbf}$

$$A = \begin{bmatrix} 3 & 2 \\ -1 & 0 \\ 1 & 4 \end{bmatrix}\qquad A^T = \begin{bmatrix} 3 & -1 & 1 \\ 2 & 0 & 4 \end{bmatrix}$$$$\b{x} = \begin{bmatrix} 1 \\ -3 \\ 2\end{bmatrix} \qquad \b{x}^T = \begin{bmatrix} 1 & -3 & 2\end{bmatrix}$$
In [13]:
A = np.array([[3, 2], [-1, 0], [1, 4]])

print A.T  # numpy array has a transpose property

[[ 3 -1  1]
[ 2  0  4]]


# Dot Product¶

• A special special-case of matrix-matrix multiplication
• Let $\b{x}, \b{y}$ be vectors of same size ($\b{x}, \b{y} \in \mathbb{R}^n$).

• Their dot product is \begin{align*} \b{x}^T \b{y} &= \sum_{i=1}^n x_i y_i \\ &= \begin{bmatrix}x_1 & x_2 & \ldots & x_n \end{bmatrix} \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \end{align*}

In [14]:
x = np.array([1,2,3])
y = np.array([2,4,5])

print np.dot(x, y)

25


# Vector Norm¶

The norm of a vector is \begin{align*} \| \b{x} \| &= \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2} \\ &= \sqrt{\b{x}^T \b{x}} \end{align*}

Geometric interpretation: length of the vector

# Transpose Properties¶

• Transpose of transpose

$$(A^T)^T = A$$

• Transpose of sum

$$(A+B)^T = A^T + B^T$$

• Transpose of product

$$(AB)^T = B^T A^T$$

# Identity¶

The identity matrix $I \in \mathbb{R}^{n \times n}$ has entries

$$I_{ij} = \begin{cases}1 & i=j \\ 0 & i \neq j\end{cases},$$$$I_{1 \times 1} = [1], \qquad I_{2 \times 2} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}, \qquad I_{3 \times 3} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix}.$$

For any $A, B$ of appropriate dimensions \begin{align*} IA &= A \\ BI &= B \end{align*}

In [15]:
# Use numpy.eye to create identity matrices of different dimensions

I = np.eye(1)
print I

I = np.eye(2)
print I

I = np.eye(3)
print I

[[ 1.]]
[[ 1.  0.]
[ 0.  1.]]
[[ 1.  0.  0.]
[ 0.  1.  0.]
[ 0.  0.  1.]]

In [16]:
I = np.eye(2)
A = np.array([[1,2], [3,4]])

print A
print np.dot(A, I)
print np.dot(I, A)

[[1 2]
[3 4]]
[[ 1.  2.]
[ 3.  4.]]
[[ 1.  2.]
[ 3.  4.]]


# Inverse¶

The inverse $A^{-1} \in \mathbb{R}^{n \times n}$ of a square matrix $A \in \mathbb{R}^{n \times n}$ satisfies $$AA^{-1} = I = A^{-1}A$$

Compare to division of scalars $$x x^{-1} = 1 = x^{-1} x$$

Not all matrices are invertible

• E.g., $A$ not square, $A = [0]$, $A = \begin{bmatrix}0 & 0\\ 0 & 0\end{bmatrix}$, many more

# Inverse¶

$$A = \begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix}, \qquad B = \begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{2}\end{bmatrix}$$

Is $B$ the inverse of $A$?

$$A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \qquad A^{-1} = \begin{bmatrix}1 & \frac{1}{3} \\ 0 & \frac{1}{3}\end{bmatrix}$$

In [17]:
A = np.array([[1, -1], [0, 3]])

# Use numpy.linalg.inv to invert a matrix
print np.linalg.inv(A)

[[ 1.          0.33333333]
[ 0.          0.33333333]]


# Inverse Properties¶

• Inverse of inverse

$$(A^{-1})^{-1} = A$$

• Inverse of product

$$(AB)^{-1} = B^{-1}A^{-1}$$

• Inverse of transpose $$(A^{-1})^T = (A^T)^{-1} := A^{-T}$$