
flash-moe
Running a big model on a small laptop
The Lens
Flash-moe makes that possible. It uses a technique called Mixture of Experts (MoE) to run only the parts of the model that matter for each request, dramatically cutting the memory and compute needed.
The pitch is simple: big model intelligence on small hardware. Models that normally need 32GB+ of VRAM can run on a laptop with 8-16GB of regular RAM. It's slower than running on a GPU, but it works.
The catch: growing explosively but very early. The 'runs on a laptop' promise depends heavily on the model and your hardware. And MoE optimization is an active research area. Expect the approach to evolve fast.
Free vs Self-Hosted vs Paid
fully freeOpen source, no paid tier. You clone it and run it. The license isn't specified in the repo metadata. Check the repo directly before commercial use.
Free. Check the license for commercial use.
Similar Tools

LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.

LLM inference in C/C++

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

Open-source AI engine, run any model locally
About
- Owner
- Dan Woods (User)
- Stars
- 3,345
- Forks
- 392
- discussed
Explore Further
More tools in the directory
Get tools like this delivered weekly
The Open Source Drop — the best new open source tools, analyzed. Free.