Generalized Linear Models

Generalized Linear Models (GLMs) extend linear regression to handle different response types (binary, count, etc.) by using exponential family distributions.

Three Components of a GLM

Exponential Family Distribution: $Y|x;\theta \sim \text{ExpFamily}(\eta)$
Linear Predictor: The natural parameter $\eta$ and inputs $x$ are related linearly: $\eta = \theta^T x$
Response Function: $\mu = \mathbb{E}[Y|x] = g^{-1}(\eta)$ where $g^{-1}$ is the canonical response function

Exponential Family Form

A distribution belongs to the exponential family if it can be written as:

p(y;\eta) = b(y) \exp(\eta^T T(y) - a(\eta))

where:

$\eta$ = natural parameter
$T(y)$ = sufficient statistic (usually $T(y) = y$ )
$a(\eta)$ = log partition function
$b(y)$ = base measure

Key Properties:

\mathbb{E}[T(Y)] = \frac{\partial a(\eta)}{\partial \eta} \qquad \text{Var}(T(Y)) = \frac{\partial^2 a(\eta)}{\partial \eta^2}

Common GLMs

Distribution	Mean	Variance	$\eta$ (natural parameter)	$a(\eta)$	$g^{-1}(\eta)$ (response function)	$g(\mu)$ (link)
Gaussian	$\mu$	$\sigma^2$	$\mu$	$\frac{\eta^2}{2}$	$\eta$	$\mu$
Bernoulli	$p$	$p(1-p)$	$\log\frac{p}{1-p}$	$\log(1+e^\eta)$	$\frac{1}{1+e^{-\eta}}$	$\log\frac{\mu}{1-\mu}$
Poisson	$\lambda$	$\lambda$	$\log\lambda$	$e^\eta$	$e^\eta$	$\log\mu$

Canonical Link: Using $g(\mu) = \eta$ makes optimization convex and gradients simpler.

Identity Link (Gaussian): For the Gaussian GLM, since $g^{-1}(\eta) = \eta$ when $\eta = \mu$ , we have $g(\mu) = g^{-1}(\mu) = \mu$ . This is the identity function where the link and response functions are the same, making linear regression particularly simple.

Naive Bayes with Exponential Family

Key Result: When class-conditional distributions are from the exponential family, the posterior has a logistic form.

Setup: For binary classification with:

Bernoulli prior: $p(y) = \phi^y (1-\phi)^{1-y}$ where $y \in \{0,1\}$
Exponential family class-conditionals: $p(x|y=j;\eta_j) = b(x) \exp(\eta_j^T T(x) - a(\eta_j))$

Derivation: Using Bayes' rule:

p(y=1|x;\phi,\eta_0,\eta_1) = \frac{p(y=1;\phi)p(x|y=1;\eta_1)}{p(y=0;\phi)p(x|y=0;\eta_0) + p(y=1;\phi)p(x|y=1;\eta_1)}

Substituting exponential family form and simplifying:

p(y=1|x;\phi,\eta_0,\eta_1) = \frac{1}{1 + \exp\left(\log\frac{1-\phi}{\phi} + (\eta_0 - \eta_1)^T T(x) + a(\eta_1) - a(\eta_0)\right)}

This has the form $\sigma(\tilde{\eta}^T T(x) + c)$ where $\sigma(t) = \frac{1}{1+\exp(-t)}$ is the sigmoid function, with:

\tilde{\eta} = \eta_1 - \eta_0

c = a(\eta_0) - a(\eta_1) - \log\frac{1-\phi}{\phi}

Interpretation: This shows that Naive Bayes with exponential family distributions produces the same decision boundary as logistic regression, though the parameters are estimated differently (generatively vs. discriminatively).

info

The decision boundary is the set $p(y = 1 |x,\phi,\eta_0,\eta_1) = \frac{1}{2}$ . Based on the previous part, this is the same as $\tilde{\eta}^T T(x) + c= 0$ . For this to be linear in $x$ , $T$ must be affine in $x$ , i.e. $T(x) = Ax+ v$ for some matrix $A$ and vector $v$ .

Optimization

Log-Likelihood: Given training data $\{(\mathbf{x}^{(i)}, y^{(i)})\}_{i=1}^m$ , the log-likelihood is:

\ell(\theta) = \sum_{i=1}^m \log p(y^{(i)}|\mathbf{x}^{(i)};\theta) \newline \ell(\theta) = \sum_{i=1}^m \left[ \eta^{(i)T} T(y^{(i)}) - a(\eta^{(i)}) + \log b(y^{(i)}) \right]

where $\eta^{(i)} = \theta^T \mathbf{x}^{(i)}$ .

Grouped Form: For classification, by grouping terms by class $j$ where $S_j = \{i : y^{(i)} = j\}$ and $n_j = |S_j|$ , the $\eta_j$ -dependent terms are:

l_j(\eta_j) \propto \sum_{i \in S_j} \eta_j^{\top}T(\mathbf{x}^{(i)}) - n_j a(\eta_j)

Gradient: For a GLM with canonical link, the gradient has the form:

\nabla_\theta \ell(\theta) = \sum_{i=1}^m (y^{(i)} - h_\theta(x^{(i)})) x^{(i)}

where $h_\theta(x) = \mathbb{E}[Y|x;\theta] = g^{-1}(\theta^T x)$ is the hypothesis function.

Key fact: The negative log-likelihood is convex, so gradient descent converges to the global optimum.

Generalized Linear Models

Three Components of a GLM​

Exponential Family Form​

Common GLMs​

Naive Bayes with Exponential Family​

Optimization​

Three Components of a GLM

Exponential Family Form

Common GLMs

Naive Bayes with Exponential Family

Optimization