Skip to main content

Basics

Basic Derivatives

FunctionDerivative
f(x)=cf(x) = c (constant)f(x)=0f'(x) = 0
f(x)=xnf(x) = x^nf(x)=nxn1f'(x) = nx^{n-1}
f(x)=exf(x) = e^xf(x)=exf'(x) = e^x
f(x)=ln(x)f(x) = \ln(x)f(x)=1xf'(x) = \frac{1}{x}
f(x)=axf(x) = a^xf(x)=axln(a)f'(x) = a^x \ln(a)
f(x)=loga(x)f(x) = \log_a(x)f(x)=1xln(a)f'(x) = \frac{1}{x \ln(a)}
f(x)=g(h(x))f(x) = g(h(x)) (chain rule)f(x)=g(h(x))h(x)f'(x) = g'(h(x)) \cdot h'(x)
f(x)=u(x)v(x)f(x) = u(x) \cdot v(x) (product rule)f(x)=u(x)v(x)+u(x)v(x)f'(x) = u'(x) \cdot v(x) + u(x) \cdot v'(x)
f(x)=u(x)v(x)f(x) = \frac{u(x)}{v(x)} (quotient rule)f(x)=u(x)v(x)u(x)v(x)v(x)2f'(x) = \frac{u'(x) \cdot v(x) - u(x) \cdot v'(x)}{v(x)^2}

Logarithm and Exponential Properties

Logarithm RulesExponential Rules
ln(ab)=ln(a)+ln(b)\ln(ab) = \ln(a) + \ln(b)eaeb=ea+be^a \cdot e^b = e^{a+b}
ln(ab)=ln(a)ln(b)\ln\left(\frac{a}{b}\right) = \ln(a) - \ln(b)eaeb=eab\frac{e^a}{e^b} = e^{a-b}
ln(ab)=bln(a)\ln(a^b) = b \ln(a)(ea)b=eab(e^a)^b = e^{ab}
ln(e)=1\ln(e) = 1e0=1e^0 = 1
ln(1)=0\ln(1) = 0eln(x)=xe^{\ln(x)} = x
ln(ex)=x\ln(e^x) = xeln(a)+ln(b)=abe^{\ln(a) + \ln(b)} = ab
ln(iai)=iln(ai)\ln\left(\prod_{i} a_i\right) = \sum_{i} \ln(a_i)ief(xi)=eif(xi)\prod_{i} e^{f(x_i)} = e^{\sum_{i} f(x_i)}

Important Derivatives for ML

Sigmoid Function:

σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}
σ(x)=σ(x)(1σ(x))\sigma'(x) = \sigma(x)(1 - \sigma(x))

Softmax Function (for class ii):

softmax(xi)=exijexj\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}

Log-Likelihood:

ddxln(f(x))=f(x)f(x)\frac{d}{dx}\ln(f(x)) = \frac{f'(x)}{f(x)}

Partial Derivatives

For a function f(x,y)f(x, y) of multiple variables:

fx=limh0f(x+h,y)f(x,y)h\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h}

Example: f(x,y)=x2+3xy+y2f(x, y) = x^2 + 3xy + y^2

fx=2x+3yfy=3x+2y\begin{align} \frac{\partial f}{\partial x} &= 2x + 3y \\ \frac{\partial f}{\partial y} &= 3x + 2y \end{align}

Gradient

The gradient is a vector of all partial derivatives:

f=[fx1fx2fxn]\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}

The gradient points in the direction of steepest ascent, which is why gradient descent moves in the negative gradient direction to minimize loss functions.

Summary

Calculus is essential for:

  • Gradient Descent: Computing how to update model parameters
  • Backpropagation: Calculating gradients in neural networks
  • Optimization: Finding minima/maxima of loss functions
  • Understanding Convergence: Analyzing how algorithms improve over iterations

Master these concepts and you'll understand the mathematical foundation of how machine learning models learn! 🚀