No install · No server
Run a real AI model in your browser
We’ll download SmolLM2 135M (~150 MB) directly from Hugging Face into your browser cache, then run inference on your device using WebGPU when available, falling back to WASM. After the first load the model is cached for next time.
Your browser can…
Checking…
No backend
The model weights, tokenizer, and runtime all live in your browser. Nothing about your prompt leaves this device.
WebGPU when available
Modern browsers expose your GPU via the WebGPU API. Transformers.js compiles the model graph straight to GPU shaders.
Cached after first run
The browser keeps the ~150 MB weights for next time. Subsequent runs start in under a second.
Runs but feels slow? Find a model that’s a better fit for your hardware →