houtini-lm
by houtini-ai·★ 88·Score 48
MCP server that offloads bounded tasks from Claude Code to local/cloud LLMs to reduce token consumption.
Overview
Houtini LM is a specialized MCP server designed to save tokens by delegating bounded tasks from Claude Code to local or cloud LLMs. It works with various endpoints including LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras, and OpenRouter. The server intelligently routes tasks to appropriate models, tracks performance metrics, and provides model discovery capabilities. It focuses on offloading well-defined tasks like code review, test generation, and documentation while keeping complex reasoning tasks with Claude.
Try asking AI
After installing, here are 5 things you can ask your AI assistant:
When to choose this
Developers already using Claude Code who want to reduce API costs by offloading bounded tasks to local or cloud LLMs without compromising on complex reasoning.
When NOT to choose this
Users who need ultra-low latency responses, as local LLM inference is typically 3-30× slower than frontier models.
Tools this server exposes
4 tools extracted from the READMEchatmessage: string, system?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: stringSend a task to an LLM and get an answer. Offloads bounded tasks like generating boilerplate or explanations.
custom_promptinstruction: string, system?: string, context?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: stringSend a three-part prompt (system, context, instruction) to an LLM for consistent high-quality output.
code_taskcode: string, task: string, language?: string, max_tokens?: number, model?: stringBuilt-in code analysis with pre-configured system prompts for finding bugs, explaining code, or writing tests.
code_task_filestask: string, files: string[], max_tokens?: number, model?: stringReview multiple related files or large files directly from disk without context window limits.
Comparable tools
Installation
# For Claude Code
claude mcp add houtini-lm -- npx -y @houtini/lm
# For Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"houtini-lm": {
"command": "npx",
"args": ["-y", "@houtini/lm"],
"env": {
"HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"
}
}
}
}FAQ
- What types of tasks get offloaded to the local model?
- Bounded, well-defined tasks like generating test stubs, code review, converting formats, generating mock data, and writing type definitions. Complex reasoning, tool access, and multi-file changes remain with Claude.
- How does model routing work?
- Houtini-LM queries your LLM server for available models, looks up their metadata on HuggingFace, and maintains a local database with model capabilities. It then scores loaded models against task types to automatically pick the best one for each task.
Compare houtini-lm with
Last updated · Auto-generated from public README + GitHub signals.