Introduction
Supervised learning is a type of machine learning where the algorithm learns from labeled training data to make predictions or decisions without being explicitly programmed to perform the task.
Formal Definitions
Hypothesis, Model, and Prediction Function
A hypothesis or model or prediction function is a function that maps from the input space to the output space .
Training Set
A training set is a set of pairs such that and for .
The value is the training set size.
Goal of Learning
Goal: Use the training set to find (= learn) a good model .
- What "good" means is not always easy to define (part of the modeling challenge).
- We will want to use the model on new data, not the training set (generalization).
Problem Types
If is continuous, then we call it a regression problem.
If is discrete, then we call it a classification problem (binary or multi-class).
What is Supervised Learning?
In supervised learning, we have:
- Input variables (): Features or predictors
- Output variable (): Target or label
- Training data: A set of examples
The goal is to learn a function (called a hypothesis) that maps inputs to outputs, such that is a good predictor for the corresponding value of .
Types of Supervised Learning
1. Regression
When the target variable is continuous:
- Predicting house prices
- Forecasting stock prices
- Estimating temperature
Example: Predicting a house price based on its size:
2. Classification
When the target variable is discrete (categorical):
- Email spam detection (spam/not spam)
- Image recognition (cat/dog/bird)
- Disease diagnosis (positive/negative)
Example: Binary classification with logistic regression:
The Learning Process
- Collect training data: Gather labeled examples
- Choose a model: Select an appropriate algorithm (linear regression, decision tree, neural network, etc.)
- Train the model: Use the training data to learn the parameters
- Evaluate: Test the model on unseen data
- Deploy: Use the model to make predictions on new data
Key Concepts
Loss Function
Measures how well our hypothesis predicts the true value . For example, the Mean Squared Error (MSE):
Optimization
The process of finding the parameters that minimize the loss function. Common methods include:
- Gradient Descent: Iteratively update parameters in the direction that reduces the loss
- Normal Equation: Analytical solution for linear regression
- Stochastic Gradient Descent (SGD): Update parameters using one example at a time
Gradient Descent Update Rule
where is the learning rate.
Overfitting vs Underfitting
- Underfitting: Model is too simple and cannot capture the underlying pattern in the data
- Overfitting: Model is too complex and fits the training data too well, including noise
- Good fit: Model generalizes well to unseen data
Addressing Overfitting
- Regularization: Add penalty terms to the loss function
- Cross-validation: Use part of the training data for validation
- More training data: Helps the model learn the true underlying pattern
- Feature selection: Remove irrelevant features
Train/Validation/Test Split
- Training set (60-80%): Used to train the model
- Validation set (10-20%): Used to tune hyperparameters and prevent overfitting
- Test set (10-20%): Used to evaluate final model performance
Evaluation Metrics
For Regression
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Mean Absolute Error (MAE)
- R² Score
For Classification
- Accuracy
- Precision, Recall, F1-Score
- Confusion Matrix
- ROC-AUC
Next Steps
In the following sections, we'll explore:
- Linear Models: Linear regression, logistic regression, and regularization
- Generative Learning: Gaussian Discriminant Analysis and Naive Bayes
- Advanced topics and real-world applications
Key Takeaway: Supervised learning uses labeled data to learn a mapping from inputs to outputs, enabling predictions on new, unseen data.