Skip to main content

Linear Algebra

This guide covers the essential linear algebra concepts needed for machine learning, including vectors, matrices, matrix calculus, and their applications.

Basics

Vector: x=[x1,x2,...,xn]T\mathbf{x} = [x_1, x_2, ..., x_n]^T

Matrix: ARm×n\mathbf{A} \in \mathbb{R}^{m \times n}

A=[a11a12a1na21a22a2nam1am2amn]\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

Transpose: (AT)ij=Aji(\mathbf{A}^T)_{ij} = \mathbf{A}_{ji}

Matrix Multiplication: C=AB\mathbf{C} = \mathbf{AB} where Cij=kAikBkjC_{ij} = \sum_k A_{ik}B_{kj} Identity Matrix: In\mathbf{I}_n where Iij=1I_{ij} = 1 if i=ji=j, else 00 Inverse: A1A=AA1=I\mathbf{A}^{-1}\mathbf{A} = \mathbf{AA}^{-1} = \mathbf{I}

Properties:

  • (AB)T=BTAT(\mathbf{AB})^T = \mathbf{B}^T\mathbf{A}^T
  • (AT)T=A(\mathbf{A}^T)^T = \mathbf{A}
  • (A+B)T=AT+BT(\mathbf{A} + \mathbf{B})^T = \mathbf{A}^T + \mathbf{B}^T
  • (AB)1=B1A1(\mathbf{AB})^{-1} = \mathbf{B}^{-1}\mathbf{A}^{-1}
  • (AT)1=(A1)T(\mathbf{A}^T)^{-1} = (\mathbf{A}^{-1})^T
Trace and Determinant

Trace: Sum of diagonal elements

tr(A)=i=1nAii\text{tr}(\mathbf{A}) = \sum_{i=1}^n A_{ii}

Properties:

  • tr(A+B)=tr(A)+tr(B)\text{tr}(\mathbf{A} + \mathbf{B}) = \text{tr}(\mathbf{A}) + \text{tr}(\mathbf{B})
  • tr(AB)=tr(BA)\text{tr}(\mathbf{AB}) = \text{tr}(\mathbf{BA})
  • tr(AT)=tr(A)\text{tr}(\mathbf{A}^T) = \text{tr}(\mathbf{A})

Determinant: det(A)\det(\mathbf{A}) or A|\mathbf{A}|

For 2×2 matrix:

det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc
Eigenvalues and Eigenvectors

For a square matrix A\mathbf{A}:

Av=λv\mathbf{Av} = \lambda \mathbf{v}

where λ\lambda is the eigenvalue and v\mathbf{v} is the eigenvector.

Properties:

  • det(A)=iλi\det(\mathbf{A}) = \prod_i \lambda_i
  • tr(A)=iλi\text{tr}(\mathbf{A}) = \sum_i \lambda_i

Matrix Calculus

Matrix-Vector Product Derivatives:

AxA=xT\frac{\partial \mathbf{Ax}}{\partial \mathbf{A}} = \mathbf{x}^T
Axx=A\frac{\partial \mathbf{Ax}}{\partial \mathbf{x}} = \mathbf{A}

Matrix Product Derivatives:

ABA=BT\frac{\partial \mathbf{AB}}{\partial \mathbf{A}} = \mathbf{B}^T
ABB=AT\frac{\partial \mathbf{AB}}{\partial \mathbf{B}} = \mathbf{A}^T

Dot Product Derivatives:

For z=yTxz = \mathbf{y}^T\mathbf{x}:

zx=y,zy=x\frac{\partial z}{\partial \mathbf{x}} = \mathbf{y}, \quad \frac{\partial z}{\partial \mathbf{y}} = \mathbf{x}

Quadratic Form (very important!):

For f(x)=xTAxf(\mathbf{x}) = \mathbf{x}^T\mathbf{Ax}:

(xTAx)x=(A+AT)x\frac{\partial (\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = (\mathbf{A} + \mathbf{A}^T)\mathbf{x}

If A\mathbf{A} is symmetric:

(xTAx)x=2Ax\frac{\partial (\mathbf{x}^T\mathbf{Ax})}{\partial \mathbf{x}} = 2\mathbf{Ax}

Linear Form:

(aTx)x=a\frac{\partial (\mathbf{a}^T\mathbf{x})}{\partial \mathbf{x}} = \mathbf{a}
(xTa)x=a\frac{\partial (\mathbf{x}^T\mathbf{a})}{\partial \mathbf{x}} = \mathbf{a}

Norm Squared:

(xTx)x=2x\frac{\partial (\mathbf{x}^T\mathbf{x})}{\partial \mathbf{x}} = 2\mathbf{x}
x2x=2x\frac{\partial \|\mathbf{x}\|^2}{\partial \mathbf{x}} = 2\mathbf{x}

Common ML Applications

Linear Regression Loss:

L(w)=Xwy2=(Xwy)T(Xwy)L(\mathbf{w}) = \|\mathbf{Xw} - \mathbf{y}\|^2 = (\mathbf{Xw} - \mathbf{y})^T(\mathbf{Xw} - \mathbf{y})
Lw=2XT(Xwy)\frac{\partial L}{\partial \mathbf{w}} = 2\mathbf{X}^T(\mathbf{Xw} - \mathbf{y})

Setting to zero gives the Normal Equation:

w=(XTX)1XTy\mathbf{w} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}