A GPU-native virtual machine that runs inside LLM inference. No tool calls. No latency. Native computation.
When a language model needs to multiply two numbers, it doesn't calculate. It pattern-matches from training data. Or it calls an external tool—JSON to an API, wait for response, parse, continue.
Three forward passes. Network latency. CPU orchestration. The model doesn't think in computation. It asks someone else to compute.
// Standard LLM with tool use User: What is 7 × 6? LLM outputs: {"tool": "calculator", "expression": "7 * 6"} --- network round trip to tool server --- Tool returns: {"result": 42} --- another forward pass --- LLM outputs: "The answer is 42"
Ruffian embeds a virtual machine directly into the token generation loop. When the model outputs special tokens, the GPU executes the computation and injects the result—all in the same cycle.
// Ruffian: computation inside inference User: What is 7 × 6? LLM outputs: "7 × 6 = [CALC:7*6] = VM[42]" ↑ ↑ Thinks in math GPU computes inline One forward pass. Zero network calls. Native.
With 600MB of unified memory, the VM can run real programs. Lisp interpreters. C compilers. Proof checkers. An operating system that persists inside the model.
Tested on MacBook Air M1. 1 billion instructions in 3.6 seconds. No performance penalty for larger memory.
Train models that think natively in computation. Models that write code, verify it, and learn from it. Models that can experiment on themselves.