Skip to content

Quick Start Paths

Choose your learning path based on your goals and available resources.

Path 1: Imitation Learning (Fastest)

Best for: Tasks with available demonstrations, rapid prototyping

Timeline: 1-2 weeks

Requirements: - 50-500 demonstrations - Robot with teleoperation capability - GPU for training (RTX 3090 or better)

Steps

1. Collect Demonstrations (2-3 days)

# Set up teleoperation
python setup_teleop.py --robot franka --device spacemouse

# Collect 100 demonstrations
python collect_demos.py --num_episodes 100 --task pick_place
→ Teleoperation Guide

2. Prepare Dataset (1 day)

# Convert to LeRobot format
python convert_to_lerobot.py --input demos/ --output dataset/

# Validate
python validate_dataset.py --path dataset/
→ LeRobot Format

3. Train Diffusion Policy (3-5 days)

from diffusion_policy import DiffusionPolicy

model = DiffusionPolicy(obs_dim=10, action_dim=7)
model.train(dataset, epochs=1000)
→ Diffusion Policy | → Training

4. Deploy (2-3 days)

# Real-time inference
action = model.predict(observation)
robot.execute(action)
→ Deployment

Expected Success Rate: 70-90% on seen scenarios


Path 2: Reinforcement Learning (Most Flexible)

Best for: Tasks with definable rewards, optimization needed

Timeline: 2-4 weeks

Requirements: - Simulation environment - Reward function - Multi-GPU setup (4x RTX 3090 recommended)

Steps

1. Set Up Simulation (3-5 days)

# Install IsaacLab
git clone https://github.com/isaac-sim/IsaacLab.git
cd IsaacLab && ./isaaclab.sh --install

# Verify
python -m isaaclab.envs --task Isaac-Reach-Franka-v0
→ IsaacLab Setup

2. Define Reward Function (1-2 days)

def compute_reward(obs, action, next_obs):
    # Distance to goal
    dist_reward = -torch.norm(next_obs['ee_pos'] - goal, dim=-1)

    # Success bonus
    success = (dist < 0.02).float()
    success_reward = success * 10.0

    # Action smoothness
    action_penalty = -0.01 * torch.sum(action**2, dim=-1)

    return dist_reward + success_reward + action_penalty
→ Reward Design

3. Train with PPO/SAC (1-2 weeks)

from stable_baselines3 import PPO
import gymnasium as gym

env = gym.make("Isaac-Reach-Franka-v0", num_envs=4096)
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=10_000_000)
→ RL Training | → Algorithms

4. Sim-to-Real Transfer (1 week)

# Domain randomization
env.randomize_physics()
env.randomize_visuals()

# Fine-tune on real robot (optional)
model.learn(real_robot_env, total_timesteps=100_000)
→ Sim-to-Real

Expected Success Rate: 80-95% after domain randomization


Path 3: Vision-Language-Action (Most Advanced)

Best for: Language-conditioned tasks, multi-task learning

Timeline: 3-6 weeks

Requirements: - 1000-100k demonstrations with language annotations - Large-scale compute (8x A100 GPUs for training) - Pre-trained VLM (or use OpenVLA)

Steps

1. Collect Multi-Modal Data (2-3 weeks)

# Collect with language annotations
collector = DataCollector(robot, camera)
for task in tasks:
    instruction = get_language_instruction(task)
    demo = collector.collect(instruction)
    dataset.add(demo, instruction)
→ Data Collection

2. Prepare LeRobot Dataset (3-5 days)

from lerobot import LeRobotDataset

dataset = LeRobotDataset.create(
    repo_id="username/my_robot_dataset",
    fps=30,
    robot_type="franka"
)

for episode in demonstrations:
    dataset.add_episode(episode)

dataset.push_to_hub()
→ LeRobot

3. Fine-Tune OpenVLA (1-2 weeks)

from openvla import OpenVLAModel

# Load pre-trained model
model = OpenVLAModel.from_pretrained("openvla/openvla-7b")

# Fine-tune on your data
model.finetune(
    dataset,
    num_epochs=10,
    learning_rate=1e-5,
    use_lora=True
)
→ OpenVLA | → Fine-tuning

4. Deploy with Optimization (1 week)

# Optimize for inference
model = torch.compile(model)
model.half()  # FP16

# Deploy
action = model.predict(image, instruction)
→ Optimization

Expected Success Rate: 60-85% including novel instructions


Comparison

Aspect IL RL VLA
Timeline 1-2 weeks 2-4 weeks 3-6 weeks
Data Needed 50-500 demos Sim + reward 1k-100k demos + language
Compute 1x RTX 3090 4x RTX 3090 8x A100
Generalization Limited Good (within task) Excellent (cross-task)
Sample Efficiency High Low Medium
Language Control No No Yes
Difficulty Easy Medium Hard

Hybrid Approaches

  1. Pre-train with IL on demonstrations
  2. Fine-tune with RL for optimization
# Stage 1: IL pre-training
il_policy = train_behavioral_cloning(demos)

# Stage 2: RL fine-tuning
rl_policy = PPO(policy=il_policy, env=env)
rl_policy.learn(total_timesteps=1_000_000)

Benefits: Faster convergence, better final performance

Offline RL

Train RL on logged datasets without environment interaction:

from d3rlpy import CQL

# Train on offline dataset
cql = CQL(use_gpu=True)
cql.fit(dataset, n_epochs=100)

→ Offline RL

Resource Requirements

Minimum (IL Path)

  • GPU: 1x RTX 3090 (24GB)
  • RAM: 32GB
  • Storage: 500GB SSD
  • Cost: ~$2k hardware
  • GPU: 4x RTX 4090 (96GB total)
  • RAM: 128GB
  • Storage: 2TB NVMe
  • Cost: ~$8k hardware

Professional (VLA Path)

  • GPU: 8x A100 (640GB total)
  • RAM: 512GB
  • Storage: 10TB NVMe
  • Cost: ~$80k hardware or cloud

Cloud Options

For IL

  • Lambda Labs: 1x A100 @ $1.10/hr
  • RunPod: 1x RTX 4090 @ $0.40/hr
  • Budget: ~$100-200 for complete training

For RL

  • Lambda Labs: 4x A100 @ $4.40/hr
  • RunPod: 4x RTX 4090 @ $1.60/hr
  • Budget: ~$500-1000 for complete training

For VLA

  • Lambda Labs: 8x A100 @ $8.80/hr
  • AWS: p4d.24xlarge @ $32.77/hr
  • Budget: ~$3000-5000 for fine-tuning

What You'll Build

IL Path Example

Task: Pick and place objects Input: RGB image + robot state Output: 7-DoF arm actions Performance: 85% success on trained objects

RL Path Example

Task: Reach target positions Input: Joint states + target position Output: Joint velocities Performance: 95% success on random targets

VLA Path Example

Task: Language-conditioned manipulation Input: Image + "pick up the red cup" Output: Action sequence Performance: 75% success on novel instructions

Next Steps

  1. Choose your path based on requirements
  2. Set up environment (hardware/cloud)
  3. Follow detailed guides:
  4. IL Training Guide
  5. RL Training Guide
  6. VLA Training Guide
  7. Join community for support
  8. Iterate and improve

Getting Help

Ready to start? Pick your path and dive in!