Motivation

Multivariate linear regression before linear algebra:

$$ \begin{align*} y^{(1)} &\approx \theta_0 + \theta_1 x_1^{(1)} + \theta_2 x_2^{(1)} + \ldots + \theta_n x_n^{(1)} \\ y^{(2)} &\approx \theta_0 + \theta_1 x_1^{(2)} + \theta_2 x_2^{(2)} + \ldots + \theta_n x_n^{(2)} \\ \ldots \\ y^{(m)} &\approx \theta_0 + \theta_1 x_1^{(m)} + \theta_2 x_2^{(m)} + \ldots + \theta_n x_n^{(m)} \end{align*} $$

Multivariate linear regression after linear algebra $$ \mathbf{y} \approx X \boldsymbol{\theta} $$

Linear Algebra in ML

  • Succinct notation for models and algorithms
  • Numerical tools (save coding!) $$
    \boldsymbol{\theta} = (X^{T}X)^{-1}X^T\mathbf{y}
    
    $$
  • Inspiration for new models and problems: Netflix

Netflix

Gladiator Silence of the Lambs WALL-E Toy Story
Alice 5 4 1
Bob 5 2
Carol 5
David 5 5
Eve 5 4

Matrix completion problem, matrix factorization

Topics

  • Matrices
  • Vectors
  • Matrix-Matrix multiplication (and special cases)
  • Tranpose
  • Inverse

Matrices

A matrix is an rectangular array of numbers

$$ A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right] $$

When $A$ has $m$ rows and $n$ columns, we say that:

  • $A$ is an $m \times n$ matrix
  • $A \in \mathbb{R}^{m \times n}$

The entry in row $i$ and column $j$ is denoted $A_{ij}$

  • sometimes $a_{ij}$ or $(A)_{ij}$

Example

$$ A = \left[ \begin{array}{cc} 101 & 10 \\ 54 & 13 \\ 10 & 47 \end{array} \right] $$
  • $A \in \mathbb{R}^{3 \times 2}$
  • $A_{11}= 101$
  • $A_{32}=$
  • $A_{22}=$
  • $A_{23}=$

Matrices in Python

In [3]:
import numpy as np

# Pass a list of lists to the np.array constructor
A = np.array([[101, 10], 
             [54,  13], 
             [10,  47]])

print A

m,n = A.shape

print "A has %d rows and %d columns" % (m, n)
[[101  10]
 [ 54  13]
 [ 10  47]]
A has 3 rows and 2 columns

Matrix Indexing in Python

Note that Python is zero-indexed

In [23]:
print A[0,0]   # A_11 in math
print A[2,1]   # A_32 in math
print A[1,1]   # A_22 in math
print A[1,2]   # A_23 in math. ERROR!
101
47
13
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-23-8a5b8a7640a3> in <module>()
      2 print A[2,1]   # A_32 in math
      3 print A[1,1]   # A_22 in math
----> 4 print A[1,2]   # A_23 in math. ERROR!

IndexError: index 2 is out of bounds for axis 1 with size 2

Vectors

A vector is an $n \times 1$ matrix:

$$ \mathbf{x} = \left[ \begin{array}{c} 8 \\ 2.4 \\ 1 \\ -10 \end{array} \right] $$
  • We write $\mathbf{x} \in \mathbb{R}^n$ (instead of $\mathbf{x} \in \mathbb{R}^{n \times 1}$)

  • The $i$th entry is $x_i$

Example

$$ \mathbf{x} = \begin{bmatrix} 8 \\ 2.4 \\ 1 \\ -10 \end{bmatrix} $$
  • $\mathbf{x} \in \mathbb{R}^4$
  • $x_1 =$
  • $x_4 =$
In [22]:
x = np.array([8, 2.4, 1, -10])
print x[0]
print x[3]
8.0
-10.0

Addition

If two matrices have the same size, we can add them by adding corresponding elements

$$ \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 3 & 5 \\ -1 & 0 \end{bmatrix} = \begin{bmatrix} 4 & 7 \\ 2 & 4 \end{bmatrix} $$
  • Subtraction is similar
  • Matrices of different sizes cannot be added or subtracted
In [5]:
A = np.array([[1, 2], 
              [3, 4]])
B = np.array([[3, 5],
              [-1, 0]])
print A + B
[[4 7]
 [2 4]]
In [6]:
A = np.array([[1, 2], 
              [3, 4]])
C = np.array([[1, 2, 3]])
print A + C
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-b01738fa6a0a> in <module>()
      2               [3, 4]])
      3 C = np.array([[1, 2, 3]])
----> 4 print A + C

ValueError: operands could not be broadcast together with shapes (2,2) (1,3) 
In [2]:
# Beware: broadcasting. This will work, and is a nice feature, but
# is not a linear algebra operation
A = np.array([[1, 2], 
              [3, 4]])
D = np.array([1, 2])
print A + D
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-164fc80373a4> in <module>()
      1 # Beware: broadcasting. This will work, and is a nice feature, but
      2 # is not a linear algebra operation
----> 3 A = np.array([[1, 2], 
      4               [3, 4]])
      5 D = np.array([1, 2])

NameError: name 'np' is not defined
In [8]:
# Do this to broadcast a column vector
A = np.array([[1, 2], 
              [3, 4]])
D = np.array([[1], [2]])   # a 2x1 vector or "column vector"
print A + D
[[2 3]
 [5 6]]

Scalar Multiplication

A scalar $x \in \mathbb{R}$ is a real number (i.e., not a vector)

$$\text{e.g., } 2,\, 3,\, \pi,\, \sqrt{2},\, 1.843,\, \ldots$$

Scalar times a matrix:

$$ 2 \cdot \begin{bmatrix} 1 & 3 \\ -2 & 0 \end{bmatrix} = \begin{bmatrix} 2 & 6 \\ -4 & 0 \end{bmatrix} $$

(multiply each entry by the scalar)

In [9]:
B = 2 * np.array([[1,3], [-2,0]])
print B
[[ 2  6]
 [-4  0]]

Matrix-Matrix Multiplication

Can multiply two matrices if their inner dimensions match

$$ A \in \mathbb{R}^{m \times n}, B \in \mathbb{R}^{n \times p} $$$$ C = AB \quad \in \mathbb{R}^{m \times p} $$

The product has entries $$ C_{ij} = \sum_{k=1}^n A_{ik} B_{kj} $$

Matrix-Matrix Multiplication

$$ C_{ij} = \sum_{k=1}^n A_{ik} B_{kj} $$

Move along $i$th row of $A$ and $j$th row of $B$. Multiply corresponding entries, then add.

$\newcommand{\r}{\mathbf}$

$$ \begin{bmatrix} c_{11} & c_{12} & c_{13}\\ c_{21} & c_{22} & c_{23}\\ c_{31} & \r{c_{32}} & c_{33}\\ c_{41} & c_{42} & c_{43} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \\ \end{bmatrix} \begin{bmatrix} b_{11} & \r{b_{12}} & b_{13} \\ b_{21} & \r{b_{22}} & b_{23} \end{bmatrix} $$$$ c_{32} = a_{31}b_{12} + a_{32}b_{22} $$

Example

$$ A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad B = \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix} $$$$ AB = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix} 3 & 2 \\ -1 & 0 \end{bmatrix} = $$
In [10]:
A = np.array([[1, -1], [0, 3]])
B = np.array([[3,  2], [-1, 0]])
print A * B   # NOT matrix multiplication
[[ 3 -2]
 [ 0  0]]
In [11]:
# OH NO! A*B gives elementwise multiplication!

# Use np.dot for matrix multiplication
print np.dot(A, B) 
[[ 4  2]
 [-3  0]]

Multiplication Properties

  • Associative $$ (AB)C = A(BC) $$
  • Distributive $$ A(B+C) = AB + AC $$ $$ (B+C)D = BD + CD $$
  • Not commutative $$\r{AB \neq BA}$$

Matrix-Vector Multiplication

A (worthy) special case of matrix-matrix multiplication:

$$ A \in \mathbb{R}^{m \times n}, \quad \mathbf{x} \in \mathbb{R}^n $$$$ \mathbf{y} = A\mathbf{x} \in \mathbb{R}^{m} $$

Definition $$ y_i = \sum_{j=1}^n A_{ij} x_j $$

Matrix-Vector Multiplication

$$ y_i = \sum_{j=1}^n A_{ij} x_j $$$$ \begin{bmatrix} y_1 \\ y_2 \\ \r{y_3} \\ y_4 \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \\ \r{a_{31}} & \r{a_{32}} \\ a_{41} & a_{42} \end{bmatrix} \begin{bmatrix} \r{x_1} \\ \r{x_2} \end{bmatrix} $$$$ y_3 = a_{31} x_1 + a_{32} x_2 $$

Example

$$ A = \begin{bmatrix} 1 & -1 \\ 0 & 3 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix}1 \\ -1\end{bmatrix} \quad \mathbf{z} = \begin{bmatrix}8 \\ 1.5\end{bmatrix} $$$$ A \mathbf{x} = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \begin{bmatrix}1 \\ -1\end{bmatrix} = $$$$ A \mathbf{z} = \qquad \qquad \qquad \quad $$
In [12]:
A = np.array([[1, -1], [0, 3]])

x = np.array([1, -1])
z = np.array([8, 1.5])

x_rowvec = np.array([[1, -1]])
x_colvec = np.array([[1], [-1]])

print np.dot(A, x)
print np.dot(A, z)

# print np.dot(A, x_rowvec) # error
print np.dot(A, x_colvec)
[ 2 -3]
[ 6.5  4.5]
[[ 2]
 [-3]]

Transpose

Transposition of a matrix swaps the rows and columns $$ A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix}, \quad A^T = \begin{bmatrix}1 & 0 \\ -1 & 3 \end{bmatrix}. $$

Definition:

  • Let $A \in \mathbb{R}^{m \times n}$
  • The transpose $A^T \in \mathbb{R}^{n \times m}$ has entries
$$ (A^T)_{ij} = A_{ji}. $$

Example

$\newcommand{\b}{\mathbf}$

$$ A = \begin{bmatrix} 3 & 2 \\ -1 & 0 \\ 1 & 4 \end{bmatrix}\qquad A^T = \begin{bmatrix} 3 & -1 & 1 \\ 2 & 0 & 4 \end{bmatrix} $$$$ \b{x} = \begin{bmatrix} 1 \\ -3 \\ 2\end{bmatrix} \qquad \b{x}^T = \begin{bmatrix} 1 & -3 & 2\end{bmatrix} $$
In [13]:
A = np.array([[3, 2], [-1, 0], [1, 4]])

print A.T  # numpy array has a transpose property
[[ 3 -1  1]
 [ 2  0  4]]

Dot Product

  • A special special-case of matrix-matrix multiplication
  • Let $\b{x}, \b{y}$ be vectors of same size ($\b{x}, \b{y} \in \mathbb{R}^n$).

  • Their dot product is $$ \begin{align*} \b{x}^T \b{y} &= \sum_{i=1}^n x_i y_i \\ &= \begin{bmatrix}x_1 & x_2 & \ldots & x_n \end{bmatrix} \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} \end{align*} $$

In [14]:
x = np.array([1,2,3])
y = np.array([2,4,5])

print np.dot(x, y)
25

Vector Norm

The norm of a vector is $$ \begin{align*} \| \b{x} \| &= \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2} \\ &= \sqrt{\b{x}^T \b{x}} \end{align*} $$

Geometric interpretation: length of the vector

Transpose Properties

  • Transpose of transpose

    $$ (A^T)^T = A $$

  • Transpose of sum

    $$ (A+B)^T = A^T + B^T $$

  • Transpose of product

    $$ (AB)^T = B^T A^T $$

Identity

The identity matrix $I \in \mathbb{R}^{n \times n}$ has entries

$$ I_{ij} = \begin{cases}1 & i=j \\ 0 & i \neq j\end{cases}, $$$$ I_{1 \times 1} = [1], \qquad I_{2 \times 2} = \begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix}, \qquad I_{3 \times 3} = \begin{bmatrix}1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix}. $$

For any $A, B$ of appropriate dimensions $$ \begin{align*} IA &= A \\ BI &= B \end{align*} $$

In [15]:
# Use numpy.eye to create identity matrices of different dimensions

I = np.eye(1)
print I

I = np.eye(2)
print I

I = np.eye(3)
print I
[[ 1.]]
[[ 1.  0.]
 [ 0.  1.]]
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
In [16]:
I = np.eye(2)
A = np.array([[1,2], [3,4]])

print A
print np.dot(A, I)
print np.dot(I, A)
[[1 2]
 [3 4]]
[[ 1.  2.]
 [ 3.  4.]]
[[ 1.  2.]
 [ 3.  4.]]

Inverse

The inverse $A^{-1} \in \mathbb{R}^{n \times n}$ of a square matrix $A \in \mathbb{R}^{n \times n}$ satisfies $$ AA^{-1} = I = A^{-1}A $$

Compare to division of scalars $$ x x^{-1} = 1 = x^{-1} x $$

Not all matrices are invertible

  • E.g., $A$ not square, $A = [0]$, $A = \begin{bmatrix}0 & 0\\ 0 & 0\end{bmatrix}$, many more

Inverse

$$ A = \begin{bmatrix}1 & 0 \\ 0 & 2\end{bmatrix}, \qquad B = \begin{bmatrix} 1 & 0 \\ 0 & \frac{1}{2}\end{bmatrix} $$

Is $B$ the inverse of $A$?

$$ A = \begin{bmatrix}1 & -1 \\ 0 & 3 \end{bmatrix} \qquad A^{-1} = \begin{bmatrix}1 & \frac{1}{3} \\ 0 & \frac{1}{3}\end{bmatrix} $$

Verify on your own.

In [17]:
A = np.array([[1, -1], [0, 3]])

# Use numpy.linalg.inv to invert a matrix
print np.linalg.inv(A)
[[ 1.          0.33333333]
 [ 0.          0.33333333]]

Inverse Properties

  • Inverse of inverse

    $$ (A^{-1})^{-1} = A $$

  • Inverse of product

    $$(AB)^{-1} = B^{-1}A^{-1} $$

  • Inverse of transpose $$ (A^{-1})^T = (A^T)^{-1} := A^{-T} $$