MCP Catalogs
Home

FastAPI-BitNet

by grctest·38·Score 42

FastAPI-based MCP server for Microsoft's BitNet model with session management, chat, and benchmarking capabilities.

ai-llmdeveloper-tools
14
Forks
1
Open issues
11 mo ago
Last commit
2d ago
Indexed

Overview

FastAPI-BitNet provides a robust REST API built with FastAPI to manage and interact with BitNet model instances. It allows developers to programmatically control llama-cli and llama-server processes for automated testing, benchmarking, and interactive chat sessions. The server integrates with VS Code Copilot via the Model Context Protocol, enabling seamless model interaction within development workflows.

Try asking AI

After installing, here are 5 things you can ask your AI assistant:

you:Programmatic testing and benchmarking of BitNet models
you:Chat with multiple BitNet instances via API
you:Integrate BitNet capabilities into VS Code Copilot as a tool
you:What models are supported?
you:How do I integrate with VS Code?

When to choose this

Choose FastAPI-BitNet if you need an MCP interface for Microsoft's BitNet model with comprehensive session management and benchmarking capabilities, particularly when integrating with VS Code.

When NOT to choose this

Not suitable for production environments requiring high availability or load balancing, as it appears to be a single-instance implementation without built-in redundancy.

Tools this server exposes

9 tools extracted from the README
  • create_session

    Start a new llama-cli or llama-server session

  • stop_session

    Stop a running llama-cli or llama-server session

  • check_session_status

    Check the status of a running session

  • chat_with_session

    Send a prompt to a running BitNet session and receive a response

  • initialize_multiple_instances

    Initialize multiple BitNet instances simultaneously

  • shutdown_multiple_instances

    Shut down multiple BitNet instances in a single API call

  • run_benchmark

    Run a benchmark test on a GGUF model

  • calculate_perplexity

    Calculate perplexity scores for a model on test data

  • estimate_server_capacity

    Estimate maximum number of BitNet instances the server can handle

Note: Tool names inferred from feature descriptions in the README, as no explicit 'Tools' section was found. The README describes functionality for session management, chat operations, and benchmarking, which were mapped to tool names.

Comparable tools

llama-cpp-mcptransformers-mcpollama-mcpvllm-mcp

Installation

Installation

  1. Prerequisites: Docker Desktop, Conda, Python 3.10+
  2. Set up Python environment:

``bash conda create -n bitnet python=3.11 conda activate bitnet ``

  1. Install Huggingface CLI:

``bash pip install -U "huggingface_hub[cli]" ``

  1. Download BitNet model:

``bash huggingface-cli download microsoft/BitNet-b1.58-2B-4T-gguf --local-dir app/models/BitNet-b1.58-2B-4T ``

  1. Run with Docker (recommended):

``bash docker build -t fastapi_bitnet . docker run -d --name ai_container -p 8080:8080 fastapi_bitnet ``

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "fastapi-bitnet": {
      "command": "http",
      "args": ["http://localhost:8080/mcp"]
    }
  }
}

FAQ

What models are supported?
Currently supports Microsoft's BitNet-b1.58-2B-4T model in GGUF format.
How do I integrate with VS Code?
Run the server and configure VS Code Copilot to use 'http://localhost:8080/mcp' as an HTTP MCP server.

Compare FastAPI-BitNet with

GitHub →

Last updated · Auto-generated from public README + GitHub signals.