Hermes Agent: Self-Improving AI for NVIDIA RTX

Nous Research's Hermes Agent reached 140,000 GitHub stars in three months and now ranks as the most-used AI agent globally according to OpenRouter's app usage data. On May 13, NVIDIA featured Hermes as the centerpiece of its RTX AI Garage program, pairing it with optimized setup guides for RTX PC owners who want a self-improving local AI agent that never sends data to the cloud.

What Hermes Agent Does Differently

Most agent frameworks give you a fixed set of tools and stop there. Hermes takes a different approach: every time the agent completes a complex task or receives feedback, it writes a new reusable skill and saves it permanently. The next time a similar task appears, Hermes pulls from its own library rather than starting over.

Three other characteristics separate it from AutoGPT, CrewAI, or browser-based assistants:

Contained sub-agents. Long tasks spawn isolated workers, each with its own context window and tool set. Memory from one sub-task cannot contaminate another.
Curated tool library. Nous Research manually stress-tests every plugin and integration before it ships, so reliability is a design goal rather than a beta feature.
Better orchestration with identical models. NVIDIA's testing shows Hermes produces stronger results using the same model weights as competing frameworks, because the coordination layer itself reduces noise between reasoning steps.

For creators already running cloud-based agents, see how Claude's parallel agent architecture compares in its approach to multi-step self-learning workflows.

Self-improving loop: 3D metal circular arrow with cube transforming into refined version — Hermes refines its own reasoning across multi-step tasks without human intervention.

Who Built Hermes

Nous Research has focused on fine-tuning open-weight models since 2023. The company's Hermes instruction-tuned model series built a large community before the agent framework arrived. NVIDIA's RTX AI Garage announcement on May 13 puts Hermes alongside hardware-optimized guidance from NVIDIA engineers and integrates it with the DGX Spark personal AI supercomputer, which ships with 128 GB of unified memory and 1 petaflop of AI performance.

Hardware Requirements

Hermes runs on three tiers of NVIDIA consumer and professional hardware:

Hardware	VRAM	Recommended Model	Category
RTX 4090 / RTX 5090	24 GB	Qwen 3.6 35B (~20 GB)	Consumer
RTX PRO Workstation	48 GB+	Qwen 3.6 72B	Professional
DGX Spark	128 GB unified	Any open-weight model	Enthusiast

The efficiency numbers matter. Qwen 3.6 35B fits in 20 GB of VRAM while benchmarking ahead of 120B-parameter models that require 70 GB. Qwen 3.6 27B matches 400B-parameter model accuracy at one-sixteenth the memory footprint. If you own an RTX 4090, you already have enough hardware to run a competitive reasoning agent locally.

NVIDIA RTX graphics card with orange glow on ivory surface representing local GPU inference — Hermes runs entirely on local hardware -- no cloud API calls required.

Getting Hermes Running: 5-Step Setup

Setup takes 15 to 30 minutes on an RTX 4090 or better machine.

Install a local inference server. Choose LM Studio for a GUI experience or Ollama for CLI control. Both serve models over an OpenAI-compatible local endpoint that Hermes connects to natively.
Download Qwen 3.6 35B. In LM Studio, search "qwen-3.6-35b" and download the Q4_K_M quantization, which fits in 24 GB. In Ollama, run ollama pull qwen3.6:35b from the terminal.
Clone the Hermes repository. Run git clone https://github.com/nousresearch/hermes-agent and follow the README to point Hermes at your local inference endpoint.
Run a first task. Try something file-based: "Scan my Downloads folder, categorize files by type, and move them to labeled subfolders." Hermes executes the task and saves a reusable skill from the experience.
Let skills accumulate. After a week of regular use, check the skills directory. Hermes builds a personal library of automation routines tuned to your file structure and workflow patterns.

Three stepping stones from concrete to marble to frosted glass representing multi-step workflow progression — Multi-step reasoning breaks complex creative tasks into manageable stages.

Creative Workflows Worth Testing

Because Hermes accesses your local filesystem directly and can call local API endpoints, it handles tasks that cloud agents cannot safely manage:

Batch asset preparation. Rename, resize, and move exports from a generation session, then update a project manifest JSON for downstream tools. The ComfyUI production workflow guide shows how these pipelines connect end-to-end.
Research compilation. Feed Hermes a list of competitor tool URLs and ask it to extract pricing, feature updates, and release notes into a structured spreadsheet without manual copying.
Prompt iteration loops. Write a parent task that generates image prompt variants, calls a local Stable Diffusion endpoint, labels each output by variant, and scores them against a plain-text rubric you define.
Background monitoring. Run Hermes as a persistent service that watches folders, triggers notifications, or posts status updates when conditions you define are met. The 24/7 autonomous operation mode is built in from launch.

All four workflows run entirely offline with no usage cost beyond your hardware and electricity.

Frequently Asked Questions

What is the minimum GPU requirement for Hermes Agent?

An NVIDIA RTX 4090 with 24 GB of VRAM is the practical minimum for running a capable reasoning model like Qwen 3.6 35B alongside the Hermes framework. Older RTX cards with less VRAM can run smaller 7B to 14B models, which handle simpler tasks but lack the reasoning depth needed for complex multi-step agentic work.

Can Hermes Agent run without a GPU?

Technically yes, via CPU inference with llama.cpp, but throughput drops to roughly 2 to 5 tokens per second on a high-end CPU compared to 60 to 100 tokens per second on an RTX 4090. For overnight background tasks, CPU-only is workable. For interactive sessions, it is not practical.

Does Hermes only work with Qwen models?

No. Hermes is model-agnostic and works with any OpenAI-compatible local inference endpoint. NVIDIA highlights Qwen 3.6 for its memory efficiency, but Llama 4, Mistral, and other instruction-tuned open-weight models work with the framework.

How does Hermes compare to AutoGPT?

AutoGPT popularized autonomous AI agents but gained a reputation for looping on ambiguous tasks without heavy prompt engineering. Hermes addresses this through skill persistence (learned behaviors that survive session restarts) and sub-agent isolation (separate context windows per task), which reduces the runaway loop pattern AutoGPT became known for.

Can Hermes generate images or video?

Not directly. Hermes is a text-and-tool orchestration agent. It can call external APIs or local endpoints like ComfyUI or Stable Diffusion to trigger generation, but the rendering happens in the downstream tool. Think of Hermes as the coordinator that writes the brief and manages the output, not the renderer.

Is Hermes Agent free?

Yes. The Hermes Agent repository is open-source under a permissive license. The hardware is yours, and there are no subscription fees or API costs for the agent framework itself.