Skip to content

Imitation Learning

Imitation Learning (IL) enables robots to learn from expert demonstrations rather than trial-and-error.

Overview

Imitation learning allows robots to learn complex behaviors by observing and mimicking expert demonstrations, making it ideal when:

  • Expert demonstrations are available
  • Reward functions are hard to specify
  • Sample efficiency is critical
  • Safe exploration is important
graph LR
    A[Expert Demonstrations] --> B[Learning Algorithm]
    B --> C[Policy]
    C --> D[Robot Execution]
    D --> E{Performance OK?}
    E -->|No| F[Collect More Data]
    F --> A
    E -->|Yes| G[Deploy]

IL Approaches

Supervised learning from state-action pairs.

# Simple behavioral cloning
policy = train_supervised(states, actions)

Pros: Simple, fast, stable

Cons: Distribution shift, no recovery from mistakes

Learn more →

Interactive learning with expert corrections.

# DAgger algorithm
1. Train policy on expert data
2. Execute policy, collect states
3. Query expert for actions on those states
4. Add to dataset, retrain
5. Repeat

Pros: Addresses distribution shift

Cons: Requires expert during training

Learn more →

Learn reward function from demonstrations.

# IRL pipeline
1. Infer reward function from demos
2. Use RL to optimize that reward

Pros: Generalizes better, interpretable rewards

Cons: Computationally expensive

Learn more →

Quick Start

Basic Behavioral Cloning

import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

# Define dataset
class DemonstrationDataset(Dataset):
    def __init__(self, states, actions):
        self.states = torch.FloatTensor(states)
        self.actions = torch.FloatTensor(actions)

    def __len__(self):
        return len(self.states)

    def __getitem__(self, idx):
        return self.states[idx], self.actions[idx]

# Define policy network
class BCPolicy(nn.Module):
    def __init__(self, state_dim, action_dim):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(state_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, action_dim),
            nn.Tanh()  # Assuming normalized actions
        )

    def forward(self, state):
        return self.network(state)

# Train
dataset = DemonstrationDataset(demo_states, demo_actions)
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

policy = BCPolicy(state_dim=10, action_dim=4)
optimizer = torch.optim.Adam(policy.parameters(), lr=1e-3)
criterion = nn.MSELoss()

for epoch in range(100):
    for states, actions in dataloader:
        predicted_actions = policy(states)
        loss = criterion(predicted_actions, actions)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# Deploy
obs = env.reset()
action = policy(torch.FloatTensor(obs)).detach().numpy()

When to Use Imitation Learning

Ideal Scenarios

  • High-quality demonstrations available
  • Reward function difficult to specify
  • Safe operation required (no random exploration)
  • Fast learning needed (fewer samples than RL)
  • No access to expert demonstrations
  • Need to surpass expert performance
  • Demonstrations are low quality or inconsistent
  • Task requires exploration

Comparison with RL

Aspect Imitation Learning Reinforcement Learning
Data Expert demonstrations Environment interaction
Sample Efficiency High Low
Performance Limited by expert Can surpass expert
Reward Not needed Required
Safety Safe (follows expert) Risky (explores)

Data Requirements

Quality over Quantity

# Good demonstration characteristics
 Consistent expert behavior
 Diverse state coverage
 Optimal or near-optimal actions
 Task completion demonstrated

# Poor demonstration characteristics
 Inconsistent actions for same state
 Limited state diversity
 Suboptimal behavior
 Task failures

How Much Data?

Task Complexity Demonstrations Needed
Simple reaching 10-50
Pick and place 100-500
Complex manipulation 1000-10000
Dexterous tasks 10000+

Integration with Other Methods

IL + RL

Fine-tune IL policy with RL:

# 1. Pre-train with BC
bc_policy = train_behavioral_cloning(demonstrations)

# 2. Fine-tune with RL
rl_policy = PPO(policy=bc_policy)
rl_policy.learn(total_timesteps=100_000)

IL + VLA

Use IL to bootstrap VLA models:

# Pre-train VLA on demonstrations
vla_model.pretrain(demonstrations)

# Fine-tune for new tasks
vla_model.finetune(new_task_data)

Next Steps

Resources