Basic Derivatives Function Derivative f ( x ) = c f(x) = c f ( x ) = c (constant)f ′ ( x ) = 0 f'(x) = 0 f ′ ( x ) = 0 f ( x ) = x n f(x) = x^n f ( x ) = x n f ′ ( x ) = n x n − 1 f'(x) = nx^{n-1} f ′ ( x ) = n x n − 1 f ( x ) = e x f(x) = e^x f ( x ) = e x f ′ ( x ) = e x f'(x) = e^x f ′ ( x ) = e x f ( x ) = ln ( x ) f(x) = \ln(x) f ( x ) = ln ( x ) f ′ ( x ) = 1 x f'(x) = \frac{1}{x} f ′ ( x ) = x 1 f ( x ) = a x f(x) = a^x f ( x ) = a x f ′ ( x ) = a x ln ( a ) f'(x) = a^x \ln(a) f ′ ( x ) = a x ln ( a ) f ( x ) = log a ( x ) f(x) = \log_a(x) f ( x ) = log a ( x ) f ′ ( x ) = 1 x ln ( a ) f'(x) = \frac{1}{x \ln(a)} f ′ ( x ) = x l n ( a ) 1 f ( x ) = g ( h ( x ) ) f(x) = g(h(x)) f ( x ) = g ( h ( x )) (chain rule)f ′ ( x ) = g ′ ( h ( x ) ) ⋅ h ′ ( x ) f'(x) = g'(h(x)) \cdot h'(x) f ′ ( x ) = g ′ ( h ( x )) ⋅ h ′ ( x ) f ( x ) = u ( x ) ⋅ v ( x ) f(x) = u(x) \cdot v(x) f ( x ) = u ( x ) ⋅ v ( x ) (product rule)f ′ ( x ) = u ′ ( x ) ⋅ v ( x ) + u ( x ) ⋅ v ′ ( x ) f'(x) = u'(x) \cdot v(x) + u(x) \cdot v'(x) f ′ ( x ) = u ′ ( x ) ⋅ v ( x ) + u ( x ) ⋅ v ′ ( x ) f ( x ) = u ( x ) v ( x ) f(x) = \frac{u(x)}{v(x)} f ( x ) = v ( x ) u ( x ) (quotient rule)f ′ ( x ) = u ′ ( x ) ⋅ v ( x ) − u ( x ) ⋅ v ′ ( x ) v ( x ) 2 f'(x) = \frac{u'(x) \cdot v(x) - u(x) \cdot v'(x)}{v(x)^2} f ′ ( x ) = v ( x ) 2 u ′ ( x ) ⋅ v ( x ) − u ( x ) ⋅ v ′ ( x )
Logarithm and Exponential Properties Logarithm Rules Exponential Rules ln ( a b ) = ln ( a ) + ln ( b ) \ln(ab) = \ln(a) + \ln(b) ln ( ab ) = ln ( a ) + ln ( b ) e a ⋅ e b = e a + b e^a \cdot e^b = e^{a+b} e a ⋅ e b = e a + b ln ( a b ) = ln ( a ) − ln ( b ) \ln\left(\frac{a}{b}\right) = \ln(a) - \ln(b) ln ( b a ) = ln ( a ) − ln ( b ) e a e b = e a − b \frac{e^a}{e^b} = e^{a-b} e b e a = e a − b ln ( a b ) = b ln ( a ) \ln(a^b) = b \ln(a) ln ( a b ) = b ln ( a ) ( e a ) b = e a b (e^a)^b = e^{ab} ( e a ) b = e ab ln ( e ) = 1 \ln(e) = 1 ln ( e ) = 1 e 0 = 1 e^0 = 1 e 0 = 1 ln ( 1 ) = 0 \ln(1) = 0 ln ( 1 ) = 0 e ln ( x ) = x e^{\ln(x)} = x e l n ( x ) = x ln ( e x ) = x \ln(e^x) = x ln ( e x ) = x e ln ( a ) + ln ( b ) = a b e^{\ln(a) + \ln(b)} = ab e l n ( a ) + l n ( b ) = ab ln ( ∏ i a i ) = ∑ i ln ( a i ) \ln\left(\prod_{i} a_i\right) = \sum_{i} \ln(a_i) ln ( ∏ i a i ) = ∑ i ln ( a i ) ∏ i e f ( x i ) = e ∑ i f ( x i ) \prod_{i} e^{f(x_i)} = e^{\sum_{i} f(x_i)} ∏ i e f ( x i ) = e ∑ i f ( x i )
Important Derivatives for ML Sigmoid Function :
σ ( x ) = 1 1 + e − x \sigma(x) = \frac{1}{1 + e^{-x}} σ ( x ) = 1 + e − x 1 σ ′ ( x ) = σ ( x ) ( 1 − σ ( x ) ) \sigma'(x) = \sigma(x)(1 - \sigma(x)) σ ′ ( x ) = σ ( x ) ( 1 − σ ( x )) Softmax Function (for class i i i ):
softmax ( x i ) = e x i ∑ j e x j \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} softmax ( x i ) = ∑ j e x j e x i Log-Likelihood :
d d x ln ( f ( x ) ) = f ′ ( x ) f ( x ) \frac{d}{dx}\ln(f(x)) = \frac{f'(x)}{f(x)} d x d ln ( f ( x )) = f ( x ) f ′ ( x ) Partial Derivatives For a function f ( x , y ) f(x, y) f ( x , y ) of multiple variables:
∂ f ∂ x = lim h → 0 f ( x + h , y ) − f ( x , y ) h \frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h, y) - f(x, y)}{h} ∂ x ∂ f = h → 0 lim h f ( x + h , y ) − f ( x , y ) Example : f ( x , y ) = x 2 + 3 x y + y 2 f(x, y) = x^2 + 3xy + y^2 f ( x , y ) = x 2 + 3 x y + y 2
∂ f ∂ x = 2 x + 3 y ∂ f ∂ y = 3 x + 2 y \begin{align} \frac{\partial f}{\partial x} &= 2x + 3y \\ \frac{\partial f}{\partial y} &= 3x + 2y \end{align} ∂ x ∂ f ∂ y ∂ f = 2 x + 3 y = 3 x + 2 y Gradient The gradient is a vector of all partial derivatives:
∇ f = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{bmatrix} ∇ f = ⎣ ⎡ ∂ x 1 ∂ f ∂ x 2 ∂ f ⋮ ∂ x n ∂ f ⎦ ⎤ The gradient points in the direction of steepest ascent, which is why gradient descent moves in the negative gradient direction to minimize loss functions.
Summary Calculus is essential for:
Gradient Descent : Computing how to update model parametersBackpropagation : Calculating gradients in neural networksOptimization : Finding minima/maxima of loss functionsUnderstanding Convergence : Analyzing how algorithms improve over iterationsMaster these concepts and you'll understand the mathematical foundation of how machine learning models learn! 🚀