Best LLM Inference Tools

Run and serve large language models: local inference, production serving, and model management.

Ranked by score. Updated weekly.

1

ollama

100

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

174,916Gopermissive
2

openai-python

97

The official Python library for the OpenAI API

31,083Pythonpermissive
3

Transformers

91

Model framework for state-of-the-art ML

161,927Pythonpermissive
4

llama.cpp

91

LLM inference in C/C++

118,187C++permissive
5

vLLM

91

High-throughput LLM inference and serving engine

84,365Pythonpermissive
6

ragflow

90

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

83,652Pythonpermissive
7

sglang

85

SGLang is a high-performance serving framework for large language models and multimodal models.

29,670Pythonpermissive
8

Open WebUI

84

Self-hosted AI interface for LLMs

143,037TypeScriptpermissive
9

LiteLLM

84

SDK and proxy to call 100+ LLM APIs in OpenAI format

51,597Pythonpermissive
10

LocalAI

83

Open-source AI engine, run any model locally

47,150Gopermissive
11

Ray

83

AI compute engine for ML workloads at scale

43,026Pythonpermissive
12

Gradio

83

Build and share ML demo apps in Python

42,998Pythonpermissive
13

CLIProxyAPI

83

Wrap Gemini CLI, Antigravity, ChatGPT Codex, Claude Code, Qwen Code, iFlow as an OpenAI/Gemini/Claude/Codex compatible API service, allowing you to enjoy the free Gemini 2.5 Pro, GPT 5, Claude, Qwen model through API

38,416Gopermissive
14

Label Studio

83

Multi-type data labeling and annotation

27,694TypeScriptpermissive
15

MLX

83

Array framework for Apple silicon

27,263C++permissive
16

MLflow

83

Open source AI/ML lifecycle platform

26,746Pythonpermissive
17

Langfuse

81

Open source LLM engineering platform

29,795TypeScriptpermissive
18

pytorch

80

Tensors and Dynamic neural networks in Python with strong GPU acceleration

101,032Pythonunknown
19

headroom

80

The Context Optimization Layer for LLM Applications

51,232Pythonpermissive
20

text-generation-webui

71

Local LLM interface with text, vision, and training

47,376Pythonstrong-copyleft

Explore More Categories