MCP Catalogs
Home

houtini-lm

by houtini-ai·88·Score 48

MCP server that offloads bounded tasks from Claude Code to local/cloud LLMs to reduce token consumption.

ai-llmdeveloper-toolsproductivity
17
Forks
2
Open issues
1 mo ago
Last commit
2d ago
Indexed

Overview

Houtini LM is a specialized MCP server designed to save tokens by delegating bounded tasks from Claude Code to local or cloud LLMs. It works with various endpoints including LM Studio, Ollama, vLLM, DeepSeek, Groq, Cerebras, and OpenRouter. The server intelligently routes tasks to appropriate models, tracks performance metrics, and provides model discovery capabilities. It focuses on offloading well-defined tasks like code review, test generation, and documentation while keeping complex reasoning tasks with Claude.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:Offloading code review and test generation to local LLMs while Claude handles architectural planning
you:Using cloud APIs like DeepSeek for boilerplate code generation to save Claude's expensive tokens
you:Generating commit messages and documentation from diffs using cheaper local models
you:What types of tasks get offloaded to the local model?
you:How does model routing work?

When to choose this

Developers already using Claude Code who want to reduce API costs by offloading bounded tasks to local or cloud LLMs without compromising on complex reasoning.

When NOT to choose this

Users who need ultra-low latency responses, as local LLM inference is typically 3-30× slower than frontier models.

Tools this server exposes

4 tools extracted from the README
  • chatmessage: string, system?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: string

    Send a task to an LLM and get an answer. Offloads bounded tasks like generating boilerplate or explanations.

  • custom_promptinstruction: string, system?: string, context?: string, temperature?: number, max_tokens?: number, json_schema?: object, model?: string

    Send a three-part prompt (system, context, instruction) to an LLM for consistent high-quality output.

  • code_taskcode: string, task: string, language?: string, max_tokens?: number, model?: string

    Built-in code analysis with pre-configured system prompts for finding bugs, explaining code, or writing tests.

  • code_task_filestask: string, files: string[], max_tokens?: number, model?: string

    Review multiple related files or large files directly from disk without context window limits.

Comparable tools

claude-llms-mcpmcp-llmlocal-mcp-serverollama-mcp

Installation

# For Claude Code
claude mcp add houtini-lm -- npx -y @houtini/lm

# For Claude Desktop
Add to claude_desktop_config.json:
{
  "mcpServers": {
    "houtini-lm": {
      "command": "npx",
      "args": ["-y", "@houtini/lm"],
      "env": {
        "HOUTINI_LM_ENDPOINT_URL": "http://localhost:1234"
      }
    }
  }
}

FAQ

What types of tasks get offloaded to the local model?
Bounded, well-defined tasks like generating test stubs, code review, converting formats, generating mock data, and writing type definitions. Complex reasoning, tool access, and multi-file changes remain with Claude.
How does model routing work?
Houtini-LM queries your LLM server for available models, looks up their metadata on HuggingFace, and maintains a local database with model capabilities. It then scores loaded models against task types to automatically pick the best one for each task.

Compare houtini-lm with

GitHub →

Last updated · Auto-generated from public README + GitHub signals.