houtini-lm

Name: houtini-lm
Rating: 2.4 (88 reviews)
Author: houtini-ai

by houtini-ai·★ 88·Score 48

MCP server that offloads bounded tasks from Claude Code to local/cloud LLMs to reduce token consumption.

ai-llmdeveloper-toolsproductivity

Forks

Open issues

3 mo ago

Last commit

56d ago

Indexed

Overview

Houtini LM is a specialized MCP server designed to save tokens by delegating bounded tasks from Claude Code to local or cloud LLMs. It works with various endpoints including LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras, and OpenRouter. The server intelligently routes tasks to appropriate models, tracks performance metrics, and provides model discovery capabilities. It focuses on offloading well-defined tasks like code review, test generation, and documentation while keeping complex reasoning tasks with Claude.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:Offloading code review and test generation to local LLMs while Claude handles architectural planning

you:Using cloud APIs like DeepSeek for boilerplate code generation to save Claude's expensive tokens

you:Generating commit messages and documentation from diffs using cheaper local models

you:What types of tasks get offloaded to the local model?

you:How does model routing work?

When to choose this

Developers already using Claude Code who want to reduce API costs by offloading bounded tasks to local or cloud LLMs without compromising on complex reasoning.

When NOT to choose this

Users who need ultra-low latency responses, as local LLM inference is typically 3-30× slower than frontier models.

Tools this server exposes

4 tools extracted from the README

chatmessage: string, system?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: string
Send a task to an LLM and get an answer. Offloads bounded tasks like generating boilerplate or explanations.
custom_promptinstruction: string, system?: string, context?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: string
Send a three-part prompt (system, context, instruction) to an LLM for consistent high-quality output.
code_taskcode: string, task: string, language?: string, max_tokens?: number, model?: string
Built-in code analysis with pre-configured system prompts for finding bugs, explaining code, or writing tests.
code_task_filestask: string, files: string[], max_tokens?: number, model?: string
Review multiple related files or large files directly from disk without context window limits.

Comparable tools

claude-llms-mcpmcp-llmlocal-mcp-serverollama-mcp

Installation

# For Claude Code
claude mcp add houtini-lm -- npx -y @houtini/lm

# For Claude Desktop
Add to claude_desktop_config.json:
{
  "mcpServers": {
    "houtini-lm": {
      "command": "npx",
      "args": ["-y", "@houtini/lm"],
      "env": {
        "HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"
      }
    }
  }
}

FAQ

What types of tasks get offloaded to the local model?: Bounded, well-defined tasks like generating test stubs, code review, converting formats, generating mock data, and writing type definitions. Complex reasoning, tool access, and multi-file changes remain with Claude.
How does model routing work?: Houtini-LM queries your LLM server for available models, looks up their metadata on HuggingFace, and maintains a local database with model capabilities. It then scores loaded models against task types to automatically pick the best one for each task.

Compare houtini-lm with

houtini-lm vs ultimate_mcp_server houtini-lm vs mcp-server-chart houtini-lm vs everything houtini-lm vs filesystem houtini-lm vs time

GitHub →

Last updated 2026-05-17 · Auto-generated from public README + GitHub signals.