~/runthismodel
daemon okbuild 5a3c91d00:00:00Z

Best Local AI Models for Tiny Models (Phone/Browser/Edge)

Under 2B parameters — runs on phones, edge devices, in-browser.

Verdict

For Tiny Models (Phone/Browser/Edge), SmolLM2 135M is the clear winner due to its exceptional efficiency and high-quality performance. If you need a bit more power, Qwen 2.5 0.5B is a close second.

Tiny models for phone, browser, and edge devices demand a balance between performance and resource efficiency. Users should prioritize models that offer high quality while maintaining low VRAM requirements and a permissive license. Running models locally ensures privacy, reduces latency, and eliminates dependency on internet connectivity, making them ideal for real-time applications and sensitive data processing.

Top picks

  1. #1

    SmolLM2 135M0.135B · apache-2.0 · min 0.6GB

    The smallest and most efficient model with top-notch quality.

    SmolLM2 135M stands out as the best choice for tiny models due to its minimal VRAM requirement of just 0.6GB and a parameter count of only 0.135B. Despite its small size, it delivers exceptional quality (100%) and operates under the permissive Apache-2.0 license. This model is perfect for devices with limited resources, ensuring smooth performance even on older or budget-friendly smartphones and edge devices. Its efficiency makes it an excellent choice for real-time applications where quick responses are crucial.

  2. #2

    Qwen 2.5 0.5B0.5B · apache-2.0 · min 1.0GB

    A slightly larger but highly efficient model with robust performance.

    Qwen 2.5 0.5B is a strong contender with a modest VRAM requirement of 1.0GB and 0.5B parameters. It maintains a high quality score of 98% and is licensed under Apache-2.0, making it suitable for a wide range of devices. While it requires a bit more VRAM than SmolLM2 135M, it offers a good balance between performance and resource usage, making it ideal for devices with slightly more memory, such as mid-range smartphones and edge devices.

  3. #3

    TinyLlama 1.1B1.1B · apache-2.0 · min 1.1GB

    A well-rounded model with a slight trade-off in size.

    TinyLlama 1.1B is a solid choice with 1.1B parameters and a VRAM requirement of 1.1GB. It scores 98% in quality and is licensed under Apache-2.0. This model strikes a balance between performance and resource efficiency, making it suitable for a broader range of devices, including higher-end smartphones and edge devices. Its slightly larger size allows for more complex tasks while still maintaining efficiency.

  4. #4

    Llama 3.2 1B Instruct1.24B · llama3.2 · min 1.3GB

    High-quality performance with a moderate size.

    Llama 3.2 1B Instruct offers a parameter count of 1.24B and a VRAM requirement of 1.3GB, delivering top-notch quality (100%). Licensed under the Llama3.2 license, this model is a strong option for devices with more available memory. It provides excellent performance for a variety of tasks, making it a versatile choice for users who need a bit more power without sacrificing efficiency.

  5. #5

    SmolLM2 360M0.36B · apache-2.0 · min 0.8GB

    A compact model with reliable performance.

    SmolLM2 360M is a compact model with 0.36B parameters and a VRAM requirement of 0.8GB. It scores 98% in quality and is licensed under Apache-2.0. This model is a reliable choice for devices with moderate memory constraints, offering a good balance between performance and resource usage. It is particularly useful for applications that require a bit more complexity than the smallest models can handle.

Hardware guidance

For Tiny Models (Phone/Browser/Edge), devices with 8GB of RAM are sufficient for running the smallest models like SmolLM2 135M and Qwen 2.5 0.5B. Devices with 12GB of RAM can comfortably handle models like TinyLlama 1.1B and SmolLM2 360M. For the largest models in this category, such as Llama 3.2 1B Instruct, devices with 16GB or more RAM are recommended to ensure smooth operation and optimal performance.

When to skip local

While local models offer significant advantages, they may still lose to hosted APIs in scenarios requiring extensive computational resources or real-time collaboration. For these cases, consider hosted alternatives like Anthropic's Claude or Google's PaLM API, which provide powerful capabilities and scalability.

Need a guide for a different use case? See all 50 buyer's guides →