Demo
Full Level 1 playthrough showing proactive SLM hints, dynamic difficulty adjustment, interactive Q&A, and the debug overlay.
The Problem
Most players quit a game within the first month. Frustration, missed mechanics, and difficulty walls are the top causes.
Traditional tutorials are skippable and static. Players who miss mechanics struggle silently and churn.
Difficulty settings are one-size-fits-all. No way to adapt to individual player behavior in real-time.
Rule-based systems handle common scenarios. The long tail of hundreds of state combinations goes unaddressed.
Why an On-Device SLM?
One base model, multiple use cases via hot-swappable LoRA adapters. SLMs are the right choice when the output needs to be language: contextual hints, dialogue, Q&A, or natural language tool arguments.
What the SLM does
- Hints at the right moment when a player misses a mechanic
- Secretly adjusts difficulty when a player is struggling
- Tracks unused abilities and nudges at the right moment
- Stays silent 90%+ of the time
Why not a large model?
- Too slow for real-time game loops
- Per-inference cost scales with player count
- Fine-tuning for a narrow domain is expensive and fragile
- A small trained model outperforms a general large model for structured tasks
- Works offline, no network dependency
See also: NVIDIA Research: SLMs are the Future of Agentic AI
Use Cases
All use cases share one base model in memory. Each has a separate LoRA adapter that swaps in milliseconds.
Proactive Game Assistant
Invisible AI that watches gameplay and decides when and how to help. Structured tool-call output with game-side guardrails.
Interactive Q&A (Ask Pip)
Player asks anything about the game world. RAG-powered answers using embedded lore + session context.
AI Companion (Pip)
Companion that comments on gameplay in real-time. Free-text generation with personality constraints.
Level Director
SLM configures procedural level generation based on player preferences and session history.
Architecture
Game State JSON (every 2-7s)
→ On-Device SLM (base model + LoRA adapter)
→ Tool call / text / Q&A response
→ Game executes
The game code is identical across all tiers, only the transport layer changes.
On-Device
Model runs locally on CPU, GPU, or NPU. Zero latency, zero cost, works offline. This is what the demo shows.
Cloud SLM (Hybrid)
Same fine-tuned model deployed to AWS as fallback for devices that can't run locally. Same API, same accuracy.
Model Stack
| Component | Size | Purpose |
|---|---|---|
| Base model (Qwen 3.5 2B, quantized) | 1.2 GB | Text generation, always loaded |
| LoRA adapters | 40-80 MB each | Per use case, hot-swapped at runtime |
| Embedding model | 20 MB | Vector search for game lore (RAG) |
| Vector database | 5 MB | In-engine vector store |
| Total on-device | 1.3 GB | Fits on mobile (4+ GB RAM) |
Training
Amazon SageMaker for fine-tuning, Amazon Bedrock for data enrichment. Low cost per training run. No real gameplay data needed to start.
At 2B parameters: structured output and RAG-grounded text work. Ungrounded free text doesn't.
Tech Stack
Game
- Godot 4.6 (GDScript)
- Custom llama.cpp GDExtension
- In-engine RAG (embeddings + vector store)
- Ninja Adventure asset pack (CC0)
Training & Infrastructure
- Amazon SageMaker (GPU training)
- Amazon Bedrock (teacher enrichment)
- MLflow (experiment tracking)
- Kiro IDE (development)