Motivation¶

Multivariate linear regression before linear algebra:

$$ \begin{align*} y^{(1)} &\approx \theta_0 + \theta_1 x_1^{(1)} + \theta_2 x_2^{(1)} + \ldots + \theta_n x_n^{(1)} \\ y^{(2)} &\approx \theta_0 + \theta_1 x_1^{(2)} + \theta_2 x_2^{(2)} + \ldots + \theta_n x_n^{(2)} \\ \ldots \\ y^{(m)} &\approx \theta_0 + \theta_1 x_1^{(m)} + \theta_2 x_2^{(m)} + \ldots + \theta_n x_n^{(m)} \end{align*} $$

Multivariate linear regression after linear algebra $$ \mathbf{y} \approx X \boldsymbol{\theta} $$

Linear Algebra in ML¶

Succinct notation for models and algorithms
Numerical tools (save coding!) $$\boldsymbol{\theta} = (X^{T}X)^{-1}X^T\mathbf{y}$$
Inspiration for new models and problems: Netflix

Netflix¶

user	Moonlight	The Shape of Water	Frozen	Moana
Alice	5	4	1
Bob		5		2
Carol				5
David			5	5
Eve	5	4

Matrix completion problem, matrix factorization

Topics¶

Matrices
Vectors
Matrix-Matrix multiplication (and special cases)
Tranpose
Inverse

Matrices¶

A matrix is an rectangular array of numbers

$$ A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right] $$

When $A$ has $m$ rows and $n$ columns, we say that:

$A$ is an $m \times n$ matrix
$A \in \mathbb{R}^{m \times n}$

The entry in row $i$ and column $j$ is denoted $A_{ij}$

sometimes $a_{ij}$ or $(A)_{ij}$

Example¶

$$ A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right] $$

$A \in \mathbb{R}^{3 \times 2}$
$A_{11}= 101$
$A_{32}=$
$A_{22}=$
$A_{23}=$

Matrices in Python¶

import numpy as np

# Pass a list of lists (rows) to the np.array constructor
A = np.array([[101, 10], 
             [54,  13], 
             [10,  47]])

print(A)

m,n = A.shape

print ("A has %d rows and %d columns" % (m, n))

[[101  10]
 [ 54  13]
 [ 10  47]]
A has 3 rows and 2 columns

Matrix Indexing in Python¶

Note that Python is zero-indexed

A = np.array([[101, 10], 
             [54,  13], 
             [10,  47]])


print (A[0,0])   # A_11 in math
print (A[2,1])   # A_32 in math
print (A[1,1])   # A_22 in math
print (A[1,2])   # A_23 in math. ERROR!

101
47
13

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-b8e3cbbc21f2> in <module>
      7 print (A[2,1])   # A_32 in math
      8 print (A[1,1])   # A_22 in math
----> 9 print (A[1,2])   # A_23 in math. ERROR!

IndexError: index 2 is out of bounds for axis 1 with size 2

Vectors¶

A vector is an $n \times 1$ matrix:

$$ \mathbf{x} = \left[ \begin{array}{c} 8 \\ 2.4 \\ 1 \\ -10 \end{array} \right] $$

We write $\mathbf{x} \in \mathbb{R}^n$ (instead of $\mathbf{x} \in \mathbb{R}^{n \times 1}$)
The $i$th entry is $x_i$

Indexing¶

Consider $\mathbf{x} \in \mathbb{R}^4$ defined as $\small \mathbf{x} = \left[ \begin{array}{c} 8 \\ 2.4 \\ 1 \\ -10 \end{array} \right] $
- $x_1 =$
- $x_4 =$

x = np.array([8, 2.4, 1, -10])  # in numpy a vector is a 1d array
print(x[0])
print(x[3])

8.0
-10.0

Addition¶

If two matrices have the same size, we can add them by adding corresponding elements

$$ \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 3 & 5 \\ -1 & 0 \end{bmatrix} = \begin{bmatrix} 4 & 7 \\ 2 & 4 \end{bmatrix} $$

Subtraction is similar
Matrices of different sizes cannot be added or subtracted

A = np.array([[1, 2], 
              [3, 4]])

B = np.array([[3, 5],
              [-1, 0]])

print(A + B)

[[4 7]
 [2 4]]

A = np.array([[1, 2], 
              [3, 4]])

C = np.array([[1, 2, 3]])

print(A + C)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-80349e9e711d> in <module>
      4 C = np.array([[1, 2, 3]])
      5 
----> 6 print(A + C)

ValueError: operands could not be broadcast together with shapes (2,2) (1,3)

# Beware: broadcasting. This will work, and is a nice feature, but
# is *not* an accepted linear algebra operation

A = np.array([[1, 2], 
              [3, 4]])

D = np.array([10, 20])
print(A + D)

[[11 22]
 [13 24]]

# Do this to broadcast a column vector
A = np.array([[1, 2], 
              [3, 4]])

D = np.array([[10], [20]])   # a 2x1 vector or "column vector"

print(A + D)

[[11 12]
 [23 24]]
[[1, 3], array([-2,  0]), [1, 3], array([-2,  0])]

Scalar Multiplication¶

A scalar $x \in \mathbb{R}$ is a real number (i.e., not a vector)

$$\text{e.g., } 2,\, 3,\, \pi,\, \sqrt{2},\, 1.843,\, \ldots$$

Scalar times a matrix:

$$ 2 \cdot \begin{bmatrix} 1 & 3 \\ -2 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 6 \\ -4 & 0 \end{bmatrix} $$

(multiply each entry by the scalar)

B = 2 * np.array([[1,3], [-2,0]])
print(B)

[[ 2  6]
 [-4  0]]

Dot Product¶

A type of multiplication between vectors
Let $\mathbf{x}, \mathbf{y}$ be vectors of same size ($\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$).
Their dot product is $$ \begin{align*} \mathbf{x}^T \mathbf{y} &= \sum_{i=1}^n x_i y_i \\ &= \begin{bmatrix}x_1 & x_2 & \ldots & x_n \end{bmatrix} \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \end{align*} $$

x = np.array([1,2,3])
y = np.array([2,4,5])

print(np.dot(x, y))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-d7449816218d> in <module>
      2 y = np.array([2,4,5,6])
      3 
----> 4 print(np.dot(x, y))

ValueError: shapes (3,) and (4,) not aligned: 3 (dim 0) != 4 (dim 0)

Matrix-Matrix Multiplication¶

Can multiply two matrices if their inner dimensions match

$$ A \in \mathbb{R}^{m \times n}, B \in \mathbb{R}^{n \times p} $$$$ C = AB \quad \in \mathbb{R}^{m \times p} $$

The product has entries $$ C_{ij} = \sum_{k=1}^n A_{ik} B_{kj} $$

Matrix-Matrix Multiplication¶

$$ C_{ij} = \sum_{k=1}^n A_{ik} B_{kj} $$

Dot product of $i$th row of $A$ and $j$th column of $B$.

$\newcommand{\r}{\mathbf}$

$$ \begin{bmatrix} c_{11} & c_{12} & c_{13}\\ c_{21} & c_{22} & c_{23}\\ c_{31} & \r{c_{32}} & c_{33}\\ c_{41} & c_{42} & c_{43} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \\ \end{bmatrix} \begin{bmatrix} b_{11} & \r{b_{12}} & b_{13} \\ b_{21} & \r{b_{22}} & b_{23} \end{bmatrix} $$$$ c_{32} = a_{31}b_{12} + a_{32}b_{22} $$

Example¶

$$ A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad B = \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix} $$$$ AB = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix} = $$

A = np.array([[1, -1], [0, 3]])
B = np.array([[3,  2], [-1, 0]])
print(A * B)   # NOT matrix multiplication

[[ 3 -2]
 [ 0  0]]

# OH NO! A*B gives elementwise multiplication!

# Use np.dot for matrix multiplication
print(np.dot(A, B)) 

print(A.dot(B))

[[ 4  2]
 [-3  0]]
[[ 4  2]
 [-3  0]]

Multiplication Properties¶

Associative $$ (AB)C = A(BC) $$
Distributive $$ A(B+C) = AB + AC $$ $$ (B+C)D = BD + CD $$
Not commutative $$\r{AB \neq BA}$$

Matrix-Vector Multiplication¶

A (worthy) special case of matrix-matrix multiplication:

$$ A \in \mathbb{R}^{m \times n}, \quad \mathbf{x} \in \mathbb{R}^n $$$$ \mathbf{y} = A\mathbf{x} \in \mathbb{R}^{m} $$

Definition $$ y_i = \sum_{j=1}^n A_{ij} x_j $$

Matrix-Vector Multiplication¶

$$ y_i = \sum_{j=1}^n A_{ij} x_j $$$$ \begin{bmatrix} y_1 \\ y_2 \\ \r{y_3} \\ y_4 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \end{bmatrix} \begin{bmatrix} \r{x_1} \\ \r{x_2} \end{bmatrix} $$$$ y_3 = a_{31} x_1 + a_{32} x_2 $$

Example¶

$$ A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix}1 \\ -1\end{bmatrix} \quad \mathbf{z} = \begin{bmatrix}8 \\ 1.5\end{bmatrix} $$$$ A \mathbf{x} = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix}1 \\ -1\end{bmatrix} = $$$$ A \mathbf{z} = \qquad \qquad \qquad \quad $$

A = np.array([[1, -1], [0, 3]])

x = np.array([1, -1])
z = np.array([8, 1.5])

print(np.dot(A, x))
print(np.dot(A, z))

[ 2 -3]
[6.5 4.5]

A = np.array([[1, -1], [0, 3]])
x = np.array([1, -1])

print(np.dot(x,A))

[ 1 -4]

A = np.array([[1, -1], [0, 3]])

# A vector can also be represented as a 2D array. In this case 
# you must be careful about whether it is a row or column vector
x_rowvec = np.array([[1, -1]])
x_colvec = np.array([[1], [-1]])

print(np.dot(A, x_rowvec)) # error
#print(np.dot(A, x_colvec))
#print(np.dot(x_rowvec, A))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-3c0e8cd0229e> in <module>
      6 x_colvec = np.array([[1], [-1]])
      7 
----> 8 print(np.dot(A, x_rowvec)) # error
      9 #print(np.dot(A, x_colvec))
     10 #print(np.dot(x_rowvec, A))

ValueError: shapes (2,2) and (1,2) not aligned: 2 (dim 1) != 1 (dim 0)

Transpose¶

Transposition of a matrix swaps the rows and columns $$ A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix}, \quad A^T = \begin{bmatrix}1 & 0 \\ -1 & 3 \end{bmatrix}. $$

Definition:

Let $A \in \mathbb{R}^{m \times n}$
The transpose $A^T \in \mathbb{R}^{n \times m}$ has entries

$$ (A^T)_{ij} = A_{ji}. $$

Example¶

$\newcommand{\b}{\mathbf}$

$$ A = \begin{bmatrix} 3 & 2 \\ -1 & 0 \\ 1 & 4 \end{bmatrix}\qquad A^T = \begin{bmatrix} 3 & -1 & 1 \\ 2 & 0 & 4 \end{bmatrix} $$$$ \b{x} = \begin{bmatrix} 1 \\ -3 \\ 2\end{bmatrix} \qquad \b{x}^T = \begin{bmatrix} 1 & -3 & 2\end{bmatrix} $$

A = np.array([[3, 2], [-1, 0], [1, 4]])

print(A.T)  # numpy array has a transpose property

[[ 3 -1  1]
 [ 2  0  4]]

Vector Norm¶

The norm of a vector is $$ \begin{align*} \| \b{x} \| &= \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2} \\ &= \sqrt{\b{x}^T \b{x}} \end{align*} $$

Geometric interpretation: length of the vector

Transpose Properties¶

Transpose of transpose

$$ (A^T)^T = A $$
Transpose of sum

$$ (A+B)^T = A^T + B^T $$
Transpose of product

$$ (AB)^T = B^T A^T $$

Identity¶

The identity matrix $I \in \mathbb{R}^{n \times n}$ has entries

$$ I_{ij} = \begin{cases}1 & i=j \\ 0 & i \neq j\end{cases}, $$$$ I_{1 \times 1} = [1], \qquad I_{2 \times 2} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}, \qquad I_{3 \times 3} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix}. $$

For any $A, B$ of appropriate dimensions $$ \begin{align*} IA &= A \\ BI &= B \end{align*} $$

# Use numpy.eye to create identity matrices of different dimensions

I = np.eye(1)
print(I)

I = np.eye(2)
print(I)

I = np.eye(3)
print(I)

I = np.eye(2)
A = np.array([[1,2], [3,4]])

print(A)
print(np.dot(A, I))
print(np.dot(I, A))

Inverse¶

The inverse $A^{-1} \in \mathbb{R}^{n \times n}$ of a square matrix $A \in \mathbb{R}^{n \times n}$ satisfies $$ AA^{-1} = I = A^{-1}A $$

Compare to division of scalars $$ x x^{-1} = 1 = x^{-1} x $$

Not all matrices are invertible

E.g., $A$ not square, $A = [0]$, $A = \begin{bmatrix}0 & 0\\ 0 & 0\end{bmatrix}$, many more

Inverse¶

$$ A = \begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix}, \qquad B = \begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{2}\end{bmatrix} $$

Is $B$ the inverse of $A$?

$$ A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \qquad A^{-1} = \begin{bmatrix}1 & \frac{1}{3} \\ 0 & \frac{1}{3}\end{bmatrix} $$

Verify on your own.

A = np.array([[1, -1], [0, 3]])

# Use numpy.linalg.inv to invert a matrix
print(np.linalg.inv(A))

Inverse Properties¶

Inverse of inverse

$$ (A^{-1})^{-1} = A $$
Inverse of product

$$(AB)^{-1} = B^{-1}A^{-1} $$
Inverse of transpose $$ (A^{-1})^T = (A^T)^{-1} := A^{-T} $$