
omlx
LLM inference server with continuous batching and SSD caching for Apple Silicon, managed from the macOS menu bar.
The Lens
Omlx puts an LLM inference server in your macOS menu bar. Click the icon, pick a model, and you have a local AI API running. It uses continuous batching (handles multiple requests efficiently) and SSD caching (models load faster after the first time) optimized specifically for Apple Silicon.
This is the easiest way to run local LLMs on a Mac right now. No Docker, no Python environments, no config files. Menu bar app, one click, done. The API is OpenAI-compatible so any tool that talks to OpenAI can point at your local omlx instead. Apache 2.0 licensed, Python.
The catch: Mac only. Apple Silicon specifically; Intel Macs are either unsupported or severely limited. The performance depends on your Mac's unified memory; 8GB will run small models, you need 32GB+ for anything serious. And 'menu bar simplicity' means less control over advanced settings like quantization, context length, and memory allocation.
Free vs Self-Hosted vs Paid
fully freeFully open source under Apache 2.0. No paid tier, no cloud version. Everything runs on your Mac. The only cost is the electricity and the Mac you already own.
Free. Runs entirely on your Mac hardware.
Similar Tools
About
- Stars
- 8,630
- Forks
- 728
Explore Further
More tools in the directory
Get tools like this delivered weekly
The Open Source Drop — the best new open source tools, analyzed. Free.


