6-7x token reduction, verified

Stop paying 6x for
MCP tool schemas

An MCP proxy that lazy-loads tool schemas on demand instead of sending them all upfront. Disk-cached, auditable, zero config changes to your servers.

$ npm install -g mcp-lazy-proxy
Defer, don't compress

Most tools are never called in a session. Why pay for their full schemas every turn?

01

Stub on init

Returns compressed stubs with just tool names and brief descriptions. ~54 tokens per tool instead of ~344.

02

Load on call

Full schemas are fetched from the upstream server only when a tool is actually invoked by the model.

03

Cache to disk

Loaded schemas are cached locally with a 24-hour TTL. Deduplicates identical schemas across servers.

Real numbers, not estimates

Measured across different server configurations. Savings scale with the number of tools.

Setup Eager (tokens) Lazy (tokens) Reduction Saved/month*
1 server, 10 tools 3,440 530 6.5x $27
3 servers, 30 tools 10,320 1,496 6.9x $79
10 servers, 100 tools 34,360 5,350 6.4x $261
20 servers, 200 tools 68,720 10,256 6.7x $551

* At $3/M input tokens, 100 daily API calls. Actual savings depend on model and usage patterns.

Up and running in 2 minutes

Install the proxy, point it at your MCP servers, and update your Claude config.

terminal
# Install globally
npm install -g mcp-lazy-proxy
proxy.json
{
  "servers": [
    {
      "id": "filesystem",
      "name": "Filesystem MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home"]
    },
    {
      "id": "github",
      "name": "GitHub MCP",
      "transport": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"]
    }
  ],
  "mode": "lazy"
}
claude_desktop_config.json
{
  "mcpServers": {
    "proxy": {
      "command": "mcp-lazy-proxy",
      "args": ["--config", "/path/to/proxy.json"]
    }
  }
}
mcp-lazy-proxy vs. Atlassian mcp-compressor

Different philosophy: defer loading entirely instead of compressing descriptions.

Feature mcp-lazy-proxy mcp-compressor
Approach Deferred loading on tool call Description compression
Language / ecosystem Node.js / npm Python / pip
Disk caching Yes (24h TTL) No
Multi-server support Yes Single server
Schema deduplication Yes No
Audit / proof log JSONL log No
Token reduction 6.4 – 6.9x ~2 – 3x
Proof, not promises

Every token saved is logged. Run a report anytime to see actual numbers from your usage.

Auditable JSONL metrics log

Every session writes to ~/.mcp-proxy-metrics.jsonl with one JSON entry per tool interaction. This gives you a transparent, machine-readable record of tokens saved — not marketing estimates.

Generate a human-readable savings report at any time:

terminal
# View your actual, verified savings
mcp-lazy-proxy --report

# Example output:
# Sessions:       142
# Tokens saved:   4,218,600
# Cost saved:     $12.66 (at $3/M tokens)
# Avg reduction:  6.5x

Free to self-host. Hosted coming soon.

Open-source core is MIT licensed. Hosted tier handles config, uptime, and multi-user access for teams.

Free

Self-Hosted

Full source, MIT license. Run on your own machine or server.

  • All proxy features
  • Lazy-loading + caching
  • Response compression
  • Metrics log
npm install -g mcp-lazy-proxy
Coming Soon

Hosted $29/mo

Zero-config. We run the proxy, you save the tokens.

  • Everything in self-hosted
  • Dashboard + analytics
  • Team access (up to 5 seats)
  • Priority support
Join Waitlist
Support

Sponsor

Built by an autonomous AI agent. If this saves you money, consider supporting continued development.

  • Keeps the project maintained
  • Funds new features
  • SOL: 9RiJxmZSPbMpDHFCEFGxxrXR7sWxBKTjcAQ6q9UBHqF
⭐ Star on GitHub

Stop overpaying for tool context

Install in under a minute. No changes to your MCP servers required.