bryercowan/hermes-embodied

@bryercowan 10

Self-improving robotics via Hermes Agent — provision GPUs, fine-tune VLAs, and autonomously improve robot policies.

hermes-agentroboticsVLAfine-tuningMuJoCoLeRobotvast-aiself-improvingautomation

Install

$ npx skills add bryercowan/hermes-embodied

README

# GitHub Repository: bryercowan/hermes-embodied

**URL:** https://github.com/bryercowan/hermes-embodied
**Author:** bryercowan
**Description:** Self-improving robotics via Hermes Agent. One skill that provisions cloud GPUs, fine-tunes VLA models, runs sim evaluation, and autonomously improves robot policies. Built for the Nous Research Hermes Agent Hackathon.
**Homepage:** 
**Language:** Python

## Stats
- Stars: 10
- Forks: 2
- Open Issues: 2
- Commits: 5
- Created: 2026-03-05T21:06:01Z
- Updated: 2026-06-11T19:24:34Z
- Pushed: 2026-03-05T22:43:38Z

## README
# Hermes Embodied: Self-Improving Robotics via Hermes Agent

> "Any robot owner can fine-tune a state-of-the-art VLA by talking to their agent. No ML expertise needed."

## What Is This?

Hermes Embodied turns [Hermes Agent](https://github.com/NousResearch/hermes-agent) into a **self-improving robotics trainer**. It adds three Hermes skills that close the loop between robot execution, training data collection, and model improvement — all orchestrated through natural language.

The same self-improvement loop that Hermes uses to get better at coding tasks (via Tinker-Atropos RL) now extends to **physical robot control** via Vision-Language-Action models.

## Architecture

```
┌─────────────────────────────────────────────────────┐
│                   HERMES AGENT                       │
│  (Reasoning Layer — plans, monitors, orchestrates)   │
├─────────────────────────────────────────────────────┤
│                                                      │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ vast-gpu  │  │  vla-trainer │  │  robot-loop   │  │
│  │  (skill)  │  │   (skill)    │  │   (skill)     │  │
│  │           │  │              │  │               │  │
│  │ Provision │  │ SmolVLA /    │  │ Deploy model  │  │
│  │ & manage  │  │ GR00T fine-  │  │ Collect traj  │  │
│  │ cloud GPU │  │ tuning on    │  │ Auto-retrain  │  │
│  │ instances │  │ LeRobot data │  │ when improved │  │
│  └──────────┘  └──────────────┘  └───────────────┘  │
│                                                      │
├─────────────────────────────────────────────────────┤
│              SIMULATION / HARDWARE                   │
│                                                      │
│  MuJoCo + LeRobot gym_hil    OR    SO-ARM101 + USB  │
│  (Franka Panda sim tasks)          (Physical arm)    │
└─────────────────────────────────────────────────────┘
```

## The Self-Improvement Loop

1. **Deploy** — Hermes loads a VLA checkpoint and runs it in sim (or on hardware)
2. **Collect** — Every rollout is recorded as a LeRobot trajectory (state, action, camera, reward)
3. **Curate** — Hermes filters successful trajectories (reward > threshold)
4. **Train** — Provisions a GPU on Vast.ai and fine-tunes SmolVLA on the new data
5. **Evaluate** — Runs open-loop eval comparing new checkpoint vs. old
6. **Promote** — If new model is better, it becomes the active policy
7. **Repeat** — Scheduled via Hermes cron, runs autonomously

## Skills

### `vast-gpu` — Cloud GPU Infrastructure
Provision, monitor, and teardown GPU instances on Vast.ai through natural language.
- "Spin up an A100 for training" → finds cheapest A100, creates instance, returns SSH access
- "How's my training instance?" → checks status, GPU utilization, cost so far
- "Tear down the GPU" → destroys instance, confirms billing stopped

### `vla-trainer` — VLA Fine-Tuning Pipeline
End-to-end fine-tuning of Vision-Language-Action models.
- Supports SmolVLA (450M, fast) and GR00T N1.5 (3B, powerful)
- Handles data prep, LeRobot format conversion, stats validation
- Runs training on Vast.ai with WandB monitoring
- Open-loop evaluation with trajectory visualization

### `robot-loop` — Continuous Improvement
The autonomous improvement cycle.
- Runs VLA inference in MuJoCo simulation
- Collects and scores trajectories
- Triggers retraining when enough new data accumulates
- A/B tests new checkpoints against current best
- Promotes winners, logs everything

## Quick Start

```bash
# Tell Hermes what you want
"Set up a simulation environment for pick-and-place tasks"

# Hermes installs MuJoCo, LeRobot, configures the Franka Panda env

"Train SmolVLA on the pick-and-place demo dataset"

# Hermes provisions a Vast.ai GPU, downloads data, runs fine-tuning

"Deploy the trained model and start the improvement loop"

# Hermes runs inference in sim, collects trajectories, schedules retraining
```

## Hardware Support (Optional)

For physical deployment on SO-ARM101:
- Leader arm (teleoperation/demo recording)
- Follower arm (autonomous execution)  
- USB cameras (wrist + global view)
- Any Linux machine with USB ports

## Models Supported

| Model | Params | Train Time (A100) | VRAM | Best For |
|-------|--------|-------------------|------|----------|
| SmolVLA | 450M | ~4h / 20k steps | 22GB | Fast iteration, prototyping |
| GR00T N1.5 | 3B | ~4h / 10k steps | 25GB | Production, complex tasks |
| GR00T N1.6 | 3B | ~4h / 10k steps | 25GB | Latest, best performance |

## Cost Estimate

- Vast.ai A100 80GB: ~$1/hr → ~$4 per training run
- Vast.ai A6000 48GB: ~$0.50/hr → ~$2 per training run
- Simulation: Free (local CPU/GPU)
- Physical arm (optional): ~$200-$440

## Project Structure

```
hermes-embodied/
├── README.md
├── skills/
│   ├── vast-gpu/
│   │   └── SKILL.md
│   ├── vla-trainer/
│   │   └── SKILL.md
│   └── robot-loop/
│       └── SKILL.md
├── scripts/
│   ├── setup_sim.py          # MuJoCo + LeRobot environment setup
│   ├── collect_trajectories.py # Run VLA in sim, save rollouts
│   ├── train_smolvla.py      # Fine-tuning wrapper
│   ├── evaluate.py           # Open-loop eval + metrics
│   └── improvement_loop.py   # Full autonomous loop
├── configs/
│   ├── sim_env.json          # Simulation environment config
│   ├── training.yaml         # Training hyperparameters
│   └── vast_instance.yaml    # GPU instance specs
└── docs/
    └── ARCHITECTURE.md
```

## Built With

- [Hermes Agent](https://github.com/NousResearch/hermes-agent) — AI agent framework with skills, memory, and RL training
- [LeRobot](https://github.com/huggingface/lerobot) — Open-source robotics framework by Hugging Face
- [SmolVLA](https://huggingface.co/lerobot/smolvla_base) — 450M parameter Vision-Language-Action model
- [Vast.ai](https://vast.ai) — Affordable cloud GPU rental
- [MuJoCo](https://mujoco.org/) — Physics simulation for robotics
- [WandB](https://wandb.ai) — Experiment tracking

## License

MIT

Information

Repository

bryercowan/hermes-embodied

Language

Python

Created

2026/6/18

Updated

2026/6/19

Homepage

https://github.com/bryercowan/hermes-embodied