Skip to main content

Basics

Basic Derivatives

FunctionDerivative
f(x)=Cf(x) = C (constant)f(x)=0f'(x) = 0
f(x)=xnf(x) = x^nf(x)=nxn1f'(x) = nx^{n-1}
f(x)=1exp(Cx)f(x) = 1-exp(Cx)f(x)=Cexp(Cx)f'(x) = -C \exp(Cx)
f(x)=exp(x)f(x) = \exp(x)f(x)=exp(x)f'(x) = \exp(x)
f(x)=exp(f(x))f(x) = \exp(f(x))f(x)=exp(f(x))f(x)f'(x) = \exp(f(x)) \cdot f'(x)
f(x)=log(x)f(x) = \log(x)f(x)=1xf'(x) = \frac{1}{x}
f(x)=log(C)f(x) = \log(C)f(x)=0f'(x) = 0
f(x)=g(h(x))f(x) = g(h(x))f(x)=g(h(x))h(x)f'(x) = g'(h(x)) \cdot h'(x)
f(x)=u(x)v(x)f(x) = u(x) \cdot v(x)f(x)=u(x)v(x)+u(x)v(x)f'(x) = u'(x) \cdot v(x) + u(x) \cdot v'(x)
f(x)=xf(x) = \lvert x \rvertf(x)=sign(1)f'(x) = sign(1)

Common ML Functions

  • Logarithm and Exponential:

    Logarithm RulesExponential Rules
    ln(ab)=ln(a)+ln(b)\ln(ab) = \ln(a) + \ln(b)eaeb=ea+be^a \cdot e^b = e^{a+b}
    ln(ab)=ln(a)ln(b)\ln\left(\frac{a}{b}\right) = \ln(a) - \ln(b)eaeb=eab\frac{e^a}{e^b} = e^{a-b}
    ln(ab)=bln(a)\ln(a^b) = b \ln(a)(ea)b=eab(e^a)^b = e^{ab}
    ln(e)=1\ln(e) = 1e0=1e^0 = 1
    ln(1)=0\ln(1) = 0eln(x)=xe^{\ln(x)} = x
    ln(ex)=x\ln(e^x) = xeln(a)+ln(b)=abe^{\ln(a) + \ln(b)} = ab
    ln(iai)=iln(ai)\ln\left(\prod_{i} a_i\right) = \sum_{i} \ln(a_i)ief(xi)=eif(xi)\prod_{i} e^{f(x_i)} = e^{\sum_{i} f(x_i)}
  • Sigmoid Function:

σ(x)=11+ex\sigma(x) = \frac{1}{1 + e^{-x}}
σ(x)=σ(x)(1σ(x))\sigma'(x) = \sigma(x)(1 - \sigma(x))
  • Softmax Function (for class ii):
softmax(xi)=exijexj\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
  • Log-Likelihood:
ddxln(f(x))=f(x)f(x)\frac{d}{dx}\ln(f(x)) = \frac{f'(x)}{f(x)}

Gradient

The gradient is a vector of all partial derivatives:

f=[fx1fx2fxn]\nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix}

The gradient points in the direction of steepest ascent, which is why gradient descent moves in the negative gradient direction to minimize loss functions.