Local Models
Human supports multiple local inference servers. No API key is required for any local provider. Add your provider to ~/.human/config.json and set default_provider and default_model.
llama.cpp
Section titled “llama.cpp”llama.cpp provides a high-performance local server for GGUF models.
Install (macOS)
Section titled “Install (macOS)”brew install llama.cppOr build from source:
git clone https://github.com/ggerganov/llama.cppcd llama.cpp && make -jDownload a model
Section titled “Download a model”Download a GGUF model from Hugging Face or run:
# Example: Llama 3.2 3Bcurl -L -o models/llama-3.2-3b.gguf "https://huggingface.co/bartowski/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_K_M.gguf"Start the server
Section titled “Start the server”llama-server --model path/to/model.gguf --port 8080Or with llama.cpp binary:
./llama-server -m models/llama-3.2-3b.gguf --port 8080Human config
Section titled “Human config”Add to ~/.human/config.json:
{ "default_provider": "llamacpp", "default_model": "llama-3.2-3b", "providers": [ { "name": "llamacpp", "base_url": "http://localhost:8080/v1" } ]}Use "llama.cpp" as the provider name if you prefer (same backend). The default_model should match the name you pass to llama-server (or a friendly alias; the server may accept partial matches).
Ollama
Section titled “Ollama”Ollama is the easiest way to run local models.
Install
Section titled “Install”# macOS / Linuxcurl -fsSL https://ollama.com/install.sh | shPull a model
Section titled “Pull a model”ollama pull llama3ollama pull mistralollama pull codellamaStart Ollama (if not already running)
Section titled “Start Ollama (if not already running)”Ollama runs as a service. If it’s not started:
ollama serveDefault URL: http://localhost:11434
Human config
Section titled “Human config”Add to ~/.human/config.json:
{ "default_provider": "ollama", "default_model": "llama3", "providers": [ { "name": "ollama", "base_url": "http://localhost:11434" } ]}Models: llama3, mistral, codellama, qwen2, phi3, gemma2, etc. Use the exact name from ollama list:
ollama listExample output:
NAME ID SIZE MODIFIEDllama3:latest abc123... 4.7 GB 2 days agomistral:7b def456... 4.1 GB 1 week agoLM Studio
Section titled “LM Studio”LM Studio provides a graphical interface to load and run local models.
- Download and install LM Studio
- Download a model (e.g. Llama 3.2) from the in-app model browser
- Load the model and start the local server (Server tab)
- Default port: 1234
Human config
Section titled “Human config”Add to ~/.human/config.json:
{ "default_provider": "lmstudio", "default_model": "lmstudio-community/Llama-3.2-3B-Instruct-GGUF", "providers": [ { "name": "lmstudio", "base_url": "http://localhost:1234/v1" } ]}Use "lm-studio" as the provider name (same backend). The model name must match what LM Studio shows in the Server tab when the local server is running. Check the “Loaded model” display for the exact identifier.
vLLM is a high-throughput server for HF-style models.
Install
Section titled “Install”pip install vllmStart server
Section titled “Start server”python -m vllm.entrypoints.openai.api_server --model meta-llama/Llama-3.2-3B-Instruct --port 8000Example output:
INFO: Started server processINFO: Uvicorn running on http://0.0.0.0:8000Human config
Section titled “Human config”Add to ~/.human/config.json:
{ "default_provider": "vllm", "default_model": "meta-llama/Llama-3.2-3B-Instruct", "providers": [ { "name": "vllm", "base_url": "http://localhost:8000/v1" } ]}The default_model must match the --model passed to the vLLM server.
sglang
Section titled “sglang”sglang is a fast structured generation engine.
Install and run
Section titled “Install and run”pip install "sglang[all]"python -m sglang.launch_server --model-path meta-llama/Llama-3.2-3B-Instruct --port 30000Human config
Section titled “Human config”Add to ~/.human/config.json:
{ "default_provider": "sglang", "default_model": "meta-llama/Llama-3.2-3B-Instruct", "providers": [ { "name": "sglang", "base_url": "http://localhost:30000/v1" } ]}osaurus
Section titled “osaurus”osaurus is another OpenAI-compatible server.
pip install osaurusosaurus serve --model <model-name> --port 1337Config:
{ "default_provider": "osaurus", "default_model": "model-name", "providers": [ { "name": "osaurus", "base_url": "http://localhost:1337/v1" } ]}Comparison
Section titled “Comparison”| Provider | Port | Best for |
|---|---|---|
| llama.cpp | 8080 | GGUF models, low RAM |
| Ollama | 11434 | Easiest setup, many models |
| LM Studio | 1234 | GUI, casual use |
| vLLM | 8000 | High throughput, HF models |
| sglang | 30000 | Fast structured output |
| osaurus | 1337 | Alternative server |
All use the OpenAI chat completions API, so Human treats them uniformly.