Maze Navigation: From Classical Algorithms to Reinforcement Learning
Maze navigation is a fundamental problem in computer science and artificial intelligence. This example demonstrates the progression from classical graph search algorithms (BFS, DFS, Dijkstra) to reinforcement learning, and explains why RL is necessary when the environment is unknown or dynamic.
Algorithm Info
DFS (Depth-First Search)
- Strategy: Explore deep, backtrack when stuck
- Visualization: Shows exact exploration order
- Note: Often finds a long, winding path
Legend:
Classical Algorithms: When You Know Everything
Depth-First Search (DFS)
DFS explores as far as possible along each branch before backtracking. It typically finds a path quickly, but not necessarily the shortest one.
Visualization:
- Notice how DFS plunges deep into one path, hits a dead end, and backtracks.
- It visits cells in a specific, winding order.
Time Complexity:
Key Assumption: You have complete knowledge of the maze structure.
Dijkstra's Algorithm
Dijkstra finds the shortest path in a weighted graph. In an unweighted maze like this, it expands uniformly like a "wavefront" from the start.
Visualization:
- Notice how Dijkstra explores all nearby cells first before moving further away.
- It guarantees finding the shortest path (fewest steps).
Time Complexity:
Key Assumption: You have complete knowledge of the maze structure.
Why Reinforcement Learning?
Classical algorithms work perfectly when you have full knowledge of the environment. However, in many real-world scenarios, this assumption breaks down:
1. Unknown Environment
Problem: You don't know the maze layout beforehand.
- BFS/DFS/Dijkstra: Require the full graph structure to work
- RL Solution: Agent explores and learns the environment through trial and error
Example: A robot navigating an unknown building, a game agent in a procedurally generated world.
2. Stochastic Transitions
Problem: Actions don't always work as expected (e.g., 80% chance of moving forward, 20% chance of slipping).
- BFS/DFS/Dijkstra: Assume deterministic transitions
- RL Solution: Learns transition probabilities and optimizes expected reward
Example: Robot with imperfect motors, game with random events.
3. Dynamic Environments
Problem: The maze changes over time (walls appear/disappear, rewards change).
- BFS/DFS/Dijkstra: Require recomputation from scratch when the environment changes
- RL Solution: Can adapt online as the environment changes
Example: Traffic navigation with changing road conditions, dynamic obstacle avoidance.
4. Reward-Based Learning
Problem: You want to optimize for a reward signal, not just reach a goal.
- BFS/DFS/Dijkstra: Optimize path length or cost
- RL Solution: Learns to maximize cumulative reward (which may include penalties for long paths, energy consumption, etc.)
Example: Game AI that needs to collect coins while avoiding enemies.
5. Partial Observability
Problem: You can only see part of the maze (e.g., only adjacent cells).
- BFS/DFS/Dijkstra: Require full observability
- RL Solution: Can work with partial observations using state estimation
Example: Real-world navigation with limited sensors.
Q-Learning: Model-Free Reinforcement Learning
Q-Learning is a model-free RL algorithm that learns the optimal action-value function without needing to know the transition probabilities.
Key Concepts
- Q-Value: = expected cumulative reward from state taking action , then following the optimal policy
- Bellman Equation:
- Update Rule:
where:
- = learning rate
- = discount factor (how much we value future rewards)
- = immediate reward
Exploration vs. Exploitation
- Exploration: Try random actions to discover the environment (high )
- Exploitation: Use learned knowledge to take the best action (low )
- Epsilon-Greedy: With probability , explore; otherwise, exploit
Initially, the agent explores heavily (). As it learns, it exploits more ().
Interactive Simulation
Try the different algorithms below. Notice how:
- DFS/Dijkstra visualize the search process, assuming full knowledge of the maze.
- Q-Learning starts in a "Fog of War". The agent (orange dot) must explore the maze step-by-step to learn the layout and find the goal.
- Watch as the Q-Learning agent initially wanders randomly (exploring), then gradually converges to the optimal path as it learns (exploiting).
Comparison Table
| Aspect | BFS/DFS/Dijkstra | Q-Learning (RL) |
|---|---|---|
| Prior Knowledge | Requires full maze map | None required |
| Stochasticity | Assumes deterministic | Handles stochastic transitions |
| Adaptability | Must recompute on changes | Adapts online |
| Optimization | Path length/cost | Cumulative reward |
| Partial Observability | Requires full observability | Can work with partial info |
| Learning | No learning (just search) | Learns from experience |
| Use Case | Known, static environments | Unknown, dynamic environments |
When to Use Each Approach
Use Classical Algorithms (BFS/DFS/Dijkstra) when:
- You have complete knowledge of the environment
- The environment is static
- Transitions are deterministic
- You need a solution immediately
Use Reinforcement Learning when:
- The environment is unknown or partially observable
- The environment is dynamic or stochastic
- You want to optimize for complex reward signals
- You can afford to learn through trial and error
Real-World Applications
- Classical Algorithms: GPS navigation (known road network), pathfinding in video games (static maps)
- Reinforcement Learning: Autonomous robots (unknown terrain), game AI (opponent behavior), recommendation systems (changing user preferences), autonomous vehicles (dynamic traffic)
Key Takeaway
Classical algorithms are planning methods: given a model, find the optimal path. Reinforcement learning is a learning method: learn the model and policy simultaneously through interaction. When the environment is unknown, stochastic, or dynamic, RL becomes essential.