No install · No server

Run a real AI model in your browser

We’ll download SmolLM2 135M (~150 MB) directly from Hugging Face into your browser cache, then run inference on your device using WebGPU when available, falling back to WASM. After the first load the model is cached for next time.

Your browser can…

Checking…

Prompt

No backend

The model weights, tokenizer, and runtime all live in your browser. Nothing about your prompt leaves this device.

WebGPU when available

Modern browsers expose your GPU via the WebGPU API. Transformers.js compiles the model graph straight to GPU shaders.

Cached after first run

The browser keeps the ~150 MB weights for next time. Subsequent runs start in under a second.

Runs but feels slow? Find a model that’s a better fit for your hardware →