Computation Inside the Model

A GPU-native virtual machine that runs inside LLM inference. No tool calls. No latency. Native computation.

View Presentation GitHub

LLMs Can't Compute

When a language model needs to multiply two numbers, it doesn't calculate. It pattern-matches from training data. Or it calls an external tool—JSON to an API, wait for response, parse, continue.

Three forward passes. Network latency. CPU orchestration. The model doesn't think in computation. It asks someone else to compute.

// Standard LLM with tool use
User: What is 7 × 6?

LLM outputs: {"tool": "calculator", "expression": "7 * 6"}

--- network round trip to tool server ---

Tool returns: {"result": 42}

--- another forward pass ---

LLM outputs: "The answer is 42"

What if computation were native?

Ruffian embeds a virtual machine directly into the token generation loop. When the model outputs special tokens, the GPU executes the computation and injects the result—all in the same cycle.

// Ruffian: computation inside inference
User: What is 7 × 6?

LLM outputs: "7 × 6 = [CALC:7*6] = VM[42]"
                      ↑              ↑
              Thinks in math   GPU computes inline

One forward pass. Zero network calls. Native.
0
API calls
0
CPU round trips
280M
Instructions/sec
600MB
VM memory

Not a Calculator. A Platform.

With 600MB of unified memory, the VM can run real programs. Lisp interpreters. C compilers. Proof checkers. An operating system that persists inside the model.

Tiny

1 KB
Original prototype limits. Calculator-level.

Small

1 MB
Quick tests. Simple programs.

Medium

72 MB
Real programs. Compilers. Interpreters.

Large

609 MB
Operating systems. Persistent state. Self-modification.

Tested on MacBook Air M1. 1 billion instructions in 3.6 seconds. No performance penalty for larger memory.

The Full Story

Open presentation in new tab →

The Vision

Train models that think natively in computation. Models that write code, verify it, and learn from it. Models that can experiment on themselves.

Current: Prototype on MacBook Air ↓ Next: Fine-tune small models (3B-7B) with VM tokens ↓ Then: Train from scratch with compute as first-class ↓ Goal: Models that reason formally. Proofs with receipts.