Maze Navigation: From Classical Algorithms to Reinforcement Learning

Maze navigation is a fundamental problem in computer science and artificial intelligence. This example demonstrates the progression from classical graph search algorithms (BFS, DFS, Dijkstra) to reinforcement learning, and explains why RL is necessary when the environment is unknown or dynamic.

Algorithm Info

DFS (Depth-First Search)

Strategy: Explore deep, backtrack when stuck
Visualization: Shows exact exploration order
Note: Often finds a long, winding path

Legend:

🟩 Start🟥 Goal⬛ Wall🔵 Path🔷 Visited (Search Area)

Classical Algorithms: When You Know Everything

Depth-First Search (DFS)

DFS explores as far as possible along each branch before backtracking. It typically finds a path quickly, but not necessarily the shortest one.

Visualization:

Notice how DFS plunges deep into one path, hits a dead end, and backtracks.
It visits cells in a specific, winding order.

Time Complexity: $O(V + E)$

Key Assumption: You have complete knowledge of the maze structure.

Dijkstra's Algorithm

Dijkstra finds the shortest path in a weighted graph. In an unweighted maze like this, it expands uniformly like a "wavefront" from the start.

Visualization:

Notice how Dijkstra explores all nearby cells first before moving further away.
It guarantees finding the shortest path (fewest steps).

Time Complexity: $O((V + E) \log V)$

Key Assumption: You have complete knowledge of the maze structure.

Why Reinforcement Learning?

Classical algorithms work perfectly when you have full knowledge of the environment. However, in many real-world scenarios, this assumption breaks down:

1. Unknown Environment

Problem: You don't know the maze layout beforehand.

BFS/DFS/Dijkstra: Require the full graph structure to work
RL Solution: Agent explores and learns the environment through trial and error

Example: A robot navigating an unknown building, a game agent in a procedurally generated world.

2. Stochastic Transitions

Problem: Actions don't always work as expected (e.g., 80% chance of moving forward, 20% chance of slipping).

BFS/DFS/Dijkstra: Assume deterministic transitions
RL Solution: Learns transition probabilities and optimizes expected reward

Example: Robot with imperfect motors, game with random events.

3. Dynamic Environments

Problem: The maze changes over time (walls appear/disappear, rewards change).

BFS/DFS/Dijkstra: Require recomputation from scratch when the environment changes
RL Solution: Can adapt online as the environment changes

Example: Traffic navigation with changing road conditions, dynamic obstacle avoidance.

4. Reward-Based Learning

Problem: You want to optimize for a reward signal, not just reach a goal.

BFS/DFS/Dijkstra: Optimize path length or cost
RL Solution: Learns to maximize cumulative reward (which may include penalties for long paths, energy consumption, etc.)

Example: Game AI that needs to collect coins while avoiding enemies.

5. Partial Observability

Problem: You can only see part of the maze (e.g., only adjacent cells).

BFS/DFS/Dijkstra: Require full observability
RL Solution: Can work with partial observations using state estimation

Example: Real-world navigation with limited sensors.

Q-Learning: Model-Free Reinforcement Learning

Q-Learning is a model-free RL algorithm that learns the optimal action-value function $Q(s, a)$ without needing to know the transition probabilities.

Key Concepts

Q-Value: $Q(s, a)$ = expected cumulative reward from state $s$ taking action $a$ , then following the optimal policy
Bellman Equation: $Q(s, a) = R(s, a) + \gamma \max_{a'} Q(s', a')$
Update Rule: $Q(s, a) \leftarrow Q(s, a) + \alpha [R + \gamma \max_{a'} Q(s', a') - Q(s, a)]$

where:

$\alpha$ = learning rate
$\gamma$ = discount factor (how much we value future rewards)
$R$ = immediate reward

Exploration vs. Exploitation

Exploration: Try random actions to discover the environment (high $\epsilon$ )
Exploitation: Use learned knowledge to take the best action (low $\epsilon$ )
Epsilon-Greedy: With probability $\epsilon$ , explore; otherwise, exploit

Initially, the agent explores heavily ( $\epsilon \approx 1$ ). As it learns, it exploits more ( $\epsilon \to 0$ ).

Interactive Simulation

Try the different algorithms below. Notice how:

DFS/Dijkstra visualize the search process, assuming full knowledge of the maze.
Q-Learning starts in a "Fog of War". The agent (orange dot) must explore the maze step-by-step to learn the layout and find the goal.
Watch as the Q-Learning agent initially wanders randomly (exploring), then gradually converges to the optimal path as it learns (exploiting).

Comparison Table

Aspect	BFS/DFS/Dijkstra	Q-Learning (RL)
Prior Knowledge	Requires full maze map	None required
Stochasticity	Assumes deterministic	Handles stochastic transitions
Adaptability	Must recompute on changes	Adapts online
Optimization	Path length/cost	Cumulative reward
Partial Observability	Requires full observability	Can work with partial info
Learning	No learning (just search)	Learns from experience
Use Case	Known, static environments	Unknown, dynamic environments

When to Use Each Approach

Use Classical Algorithms (BFS/DFS/Dijkstra) when:

You have complete knowledge of the environment
The environment is static
Transitions are deterministic
You need a solution immediately

Use Reinforcement Learning when:

The environment is unknown or partially observable
The environment is dynamic or stochastic
You want to optimize for complex reward signals
You can afford to learn through trial and error

Real-World Applications

Classical Algorithms: GPS navigation (known road network), pathfinding in video games (static maps)
Reinforcement Learning: Autonomous robots (unknown terrain), game AI (opponent behavior), recommendation systems (changing user preferences), autonomous vehicles (dynamic traffic)

Key Takeaway

Classical algorithms are planning methods: given a model, find the optimal path. Reinforcement learning is a learning method: learn the model and policy simultaneously through interaction. When the environment is unknown, stochastic, or dynamic, RL becomes essential.

Maze Navigation: From Classical Algorithms to Reinforcement Learning

Algorithm Info

Classical Algorithms: When You Know Everything​

Depth-First Search (DFS)​

Dijkstra's Algorithm​

Why Reinforcement Learning?​

1. Unknown Environment​

2. Stochastic Transitions​

3. Dynamic Environments​

4. Reward-Based Learning​

5. Partial Observability​

Q-Learning: Model-Free Reinforcement Learning​

Key Concepts​

Exploration vs. Exploitation​

Interactive Simulation​

Comparison Table​

When to Use Each Approach​

Real-World Applications​

Key Takeaway​