Gaussian Discriminant Analysis (GDA)
A generative learning algorithm that models class-conditional distributions as multivariate Gaussians.
Key Assumptions
- Class-conditional distributions are multivariate Gaussian:
- Shared covariance matrix: Both classes use the same (Quadratic Discriminant Analysis (QDA) uses different per class)
- Covariance matrix properties: must be symmetric and positive semi-definite (PSD):
- Symmetric:
- PSD: for all , or equivalently, all eigenvalues
- For GDA to work properly, should be positive definite (PD) (invertible)
Model
For binary classification :
Parameters:
Maximum Likelihood Estimates
Decision Boundary
GDA produces a linear decision boundary (when ).
The decision boundary is given by:
This is linear in (affine function), making it the same form as logistic regression.
GDA vs Logistic Regression
| Aspect | GDA | Logistic Regression (LR) |
|---|---|---|
| Type | Generative | Discriminative |
| Assumptions | Strong (Gaussian distributions) | Weak (only needs linear decision boundary) |
| Data Efficiency | More efficient when assumptions hold | Needs more data |
| Robustness | Sensitive to assumption violations | More robust to distribution violations |
| Decision Boundary | Linear (with shared ) | Linear |
| Parameter Estimation | MLE of and | MLE of directly |
| When to Use | Data truly Gaussian, small dataset | Large dataset, unknown distributions |
Key Insight: GDA and LR produce the same linear decision boundary, but they estimate parameters differently:
- GDA: Models and → derives via Bayes' rule
- LR: Directly models parametrically
Generalization and Reduction
Generalization: GDA is a special case that can be generalized:
- QDA (Quadratic Discriminant Analysis): Different per class → produces quadratic boundaries
- Naive Bayes: Independence assumption → is diagonal
- General exponential family: Replace Gaussian with other distributions (see GLM)
Reduction to Special Cases:
- If features are independent: becomes diagonal
- If classes are balanced and means are equal: reduces to random guessing
- If is identity matrix: decision boundary depends only on Euclidean distance to means
Why Covariance Must Be PSD
The covariance matrix must be PSD because:
- Variance is always non-negative: For any vector ,
- Inverse exists: For GDA, we need , so must be positive definite (PD) (strictly PSD)
- Physical interpretation: Covariance captures how features co-vary; negative variance is meaningless
Ensuring PSD: The MLE formula automatically produces a PSD matrix because it's a sum of outer products , which are PSD.