Ember Quest, On-Device SLM for Games

Demo

Full Level 1 playthrough showing proactive SLM hints, dynamic difficulty adjustment, interactive Q&A, and the debug overlay.

Download Video Download Game

The Problem

Most players quit a game within the first month. Frustration, missed mechanics, and difficulty walls are the top causes.

😤

Traditional tutorials are skippable and static. Players who miss mechanics struggle silently and churn.

🎚️

Difficulty settings are one-size-fits-all. No way to adapt to individual player behavior in real-time.

📏

Rule-based systems handle common scenarios. The long tail of hundreds of state combinations goes unaddressed.

Why an On-Device SLM?

One base model, multiple use cases via hot-swappable LoRA adapters. SLMs are the right choice when the output needs to be language: contextual hints, dialogue, Q&A, or natural language tool arguments.

What the SLM does

Hints at the right moment when a player misses a mechanic
Secretly adjusts difficulty when a player is struggling
Tracks unused abilities and nudges at the right moment
Stays silent 90%+ of the time

Why not a large model?

Too slow for real-time game loops
Per-inference cost scales with player count
Fine-tuning for a narrow domain is expensive and fragile
A small trained model outperforms a general large model for structured tasks
Works offline, no network dependency

Use Cases

All use cases share one base model in memory. Each has a separate LoRA adapter that swaps in milliseconds.

Ready

Proactive Game Assistant

Invisible AI that watches gameplay and decides when and how to help. Structured tool-call output with game-side guardrails.

Ready

Interactive Q&A (Ask Pip)

Player asks anything about the game world. RAG-powered answers using embedded lore + session context.

In Progress

AI Companion (Pip)

Companion that comments on gameplay in real-time. Free-text generation with personality constraints.

Planned

Level Director

SLM configures procedural level generation based on player preferences and session history.

Architecture

Game State JSON (every 2-7s)
    → On-Device SLM (base model + LoRA adapter)
    → Tool call / text / Q&A response
    → Game executes

The game code is identical across all tiers, only the transport layer changes.

Current

On-Device

Model runs locally on CPU, GPU, or NPU. Zero latency, zero cost, works offline. This is what the demo shows.

Planned

Cloud SLM (Hybrid)

Same fine-tuned model deployed to AWS as fallback for devices that can't run locally. Same API, same accuracy.

Model Stack

Component	Size	Purpose
Base model (Qwen 3.5 2B, quantized)	1.2 GB	Text generation, always loaded
LoRA adapters	40-80 MB each	Per use case, hot-swapped at runtime
Embedding model	20 MB	Vector search for game lore (RAG)
Vector database	5 MB	In-engine vector store
Total on-device	1.3 GB	Fits on mobile (4+ GB RAM)

Training

Amazon SageMaker for fine-tuning, Amazon Bedrock for data enrichment. Low cost per training run. No real gameplay data needed to start.

★

At 2B parameters: structured output and RAG-grounded text work. Ungrounded free text doesn't.

Ember Quest