Teaching Qwen to Write GPU Code

A language model learns to program a GPU virtual machine through examples alone

Ruffian Project

Qwen 2.5 Coder is a family of code-focused language models. Ruffian is a GPU virtual machine that compiles C to CUDA shaders. We want to see if Qwen can learn to write Ruffian code through pure in-context learning: no fine-tuning, just examples in the prompt.

It works. With 10 examples, Qwen writes working GPU code 85% of the time.

How Ruffian Works

During inference, when the model outputs C code in a marked region, the VM intercepts it, compiles and executes on-GPU, and injects the result back into the token stream:

LLM outputs: "847 x 293 = [C: return 847*293;]"
                                     ↓
VM compiles & executes on GPU → injects result
                                     ↓
Token stream continues: "847 x 293 = [C: return 847*293;] = VM[248171]"

The model doesn't need to know the answer. It just writes C code; the VM guarantees correctness. No network round trips, no JSON tool protocol—execution happens on the same GPU that runs inference.

The Experiment

We test whether Qwen can learn to write [C:...] blocks that the VM will execute. The model's job is to write syntactically valid C—it never needs to know or guess the answer. The VM computes the result and injects it back into the token stream.

What Works

Ruffian supports most C constructs. 2,116 tests pass across CPU and CUDA GPU:

Feature	Status	Example
While/for loops	PASS	`while(i < 10) i++;`
++/-- operators	PASS	`i++; --j;`
Compound assignment	PASS	`sum += i;`
All comparisons	PASS	`!= > < >= <=`
Function calls	PASS	`a = func(a);`
Arrays (up to 2M)	PASS	`int arr[2000000];`
Nested loops	PASS	`while { while { } }`
Recursion	PASS	`return fib(n-1) + fib(n-2);`
Bitwise ops	PASS	`& \| ^ << >>`
Ternary operator	PASS	`a > b ? a : b`

Algorithm Test Suite

We verify 46 algorithms covering sorting, searching, number theory, and more:

Category	Algorithms	Tests
Sorting	Bubble, Selection, Insertion	3
Data Structures	Stack, Queue operations	4
Number Theory	GCD, Prime check, Fibonacci, Factorial, Divisors, Collatz	15
Array Operations	Min, Max, Sum, Reverse, Search, Count	9
Bitwise	Popcount, Power of 2, Bit position	5
Project Euler	PE001, PE002, PE003, PE005, PE007, PE009, PE010	7

Project Euler Results

Qwen solves Project Euler problems correctly when given the VM:

Problem	Description	Answer
PE001	Sum of multiples of 3 or 5 below 1000	233168
PE002	Even Fibonacci sum ≤ 4M	4613732
PE003	Largest prime factor of 13195	29
PE005	LCM of 1-10	2520
PE006	Sum square difference (1-100)	25164150
PE007	6th prime	13
PE010	Sum of primes below 100	1060

Here's PE001 (sum of multiples of 3 or 5 below 1000). During inference, the model would output this inside a [C:...] block and the VM would return 233168:

int pe001(int n){
  int s=0; int i=1;
  while(i



        "Qwen generates correct algorithms on the first try. The code is textbook-correct."

        The Prompt

        We teach Qwen the [C:...] syntax with a few examples. The model writes C code; the VM executes it. The prompt achieves 85%+ success:

        You write C code for the Ruffian GPU VM.
When you need to compute something, output it inside [C: ... ] markers.
The VM will execute your code and return the result.

EXAMPLES:
User: What is 5 factorial?
Assistant: 5! = [C: int f(int n){int r=1;while(n>1){r=r*n;n=n-1;}return r;} return f(5);] = VM[120]

User: What is the 10th Fibonacci number?
Assistant: fib(10) = [C: int fib(int n){int a=0,b=1;for(int i=0;i<n;i++){int t=a+b;a=b;b=t;}return a;} return fib(10);] = VM[55]

User: What is gcd(48, 18)?
Assistant: gcd(48,18) = [C: int gcd(int a,int b){while(b!=0){int t=b;b=a%b;a=t;}return a;} return gcd(48,18);] = VM[6]

[YOUR TASK]

        What We Learn

        Small models write valid C instantly. Qwen picks up the [C:...] syntax and C subset from a few examples. It needs more examples to learn algorithm patterns, but syntactic compliance is immediate.

        Explicit rules beat implicit patterns. Telling Qwen constraints explicitly works better than hoping it infers them from examples.

        AI finds edge cases. When Qwen's correct-looking code fails, we find compiler bugs. The model becomes an accidental fuzzer—it generates valid C that exposes VM issues humans miss.

        The model never computes. The model writes [C: return 847*293;] and the VM computes the result. The model only needs to write correct C—it never guesses the answer.

        Setup

        Requirements:
        
            Qwen 2.5 Coder (3B–14B+)
            llama.cpp with Ruffian patches
            Linux with NVIDIA GPU (CUDA) for GPU execution
        

        The model runs entirely on-GPU. No API calls, no cloud dependencies, no rate limits.