Show HN: Gutenberg – Any URL to verified CLI and MCP server and agent skills

Share

A Verified Tool Factory for AI Agents

A deep‑dive summary of the article “A verified tool factory for AI agents. From any API surface to safe, verified tools any agent runtime can use Go CLI + MCP server + agent skills + SQLite/FTS5 cache + dry‑run‑by‑default policy + hash”


1. Introduction

Artificial‑intelligence agents have moved from rule‑based systems to highly autonomous, goal‑oriented entities capable of interacting with complex software ecosystems. These agents rely on tools—programmatic interfaces that perform concrete actions such as querying a database, invoking a REST endpoint, or manipulating a local file. However, not every tool that an agent might need is safe, trustworthy, or even well‑documented. The article in question introduces a verified tool factory that transforms any API surface into a safe, verified, and ready‑to‑use tool that any AI agent runtime can consume.

The key innovation is a tightly‑integrated stack comprising:

  1. A Go CLI that generates tool wrappers from OpenAPI/JSON‑Schema specifications.
  2. An MCP (Model‑Command‑Processor) server that orchestrates tool execution and mediates between agents and the underlying system.
  3. A set of agent skills that expose the tools to the agent’s internal planner.
  4. A SQLite/FTS5 cache that accelerates repeated calls and provides full‑text search capabilities.
  5. A dry‑run‑by‑default policy that guarantees no side‑effects unless explicitly requested.
  6. A hash‑based verification mechanism that ensures the tool’s code has not been tampered with.

Together, these components address the critical safety, security, and performance gaps that have hampered the widespread adoption of AI agents in production environments.


2. Motivation – Why a Verified Tool Factory?

| Problem | Impact | Existing Approaches | Shortcomings | |-------------|------------|------------------------|-----------------| | Unverified API Calls | Agents may trigger harmful actions (e.g., deleting data, posting spam) | Manual review, static analysis | Time‑consuming, error‑prone, not scalable | | Inconsistent Tool Interfaces | Agents struggle to understand heterogeneous APIs | SDK generators, ad‑hoc wrappers | No runtime safety guarantees | | Performance Bottlenecks | Repeated API calls over the network cause latency | Caching layers, HTTP keep‑alive | Cache coherency, stale data | | Security Risks | Malicious agents can abuse exposed endpoints | Access control lists (ACLs) | ACLs often misconfigured, no integrity checks | | Development Overhead | Engineering teams must build and maintain wrappers for each tool | In‑house wrappers | Duplicate work, version drift |

The article argues that verified tools are not just a nice‑to‑have but a necessity when agents are expected to operate in regulated or mission‑critical domains (finance, healthcare, industrial control). The verified tool factory delivers a unified, automated pipeline that guarantees functional correctness, runtime safety, and fast execution.


3. Core Concepts & Terminology

| Term | Definition | |----------|----------------| | Tool | A self‑contained executable (or library) that performs a single, well‑defined action. | | Tool Factory | The automated process that takes an API specification and produces a verified tool. | | MCP (Model‑Command‑Processor) | A lightweight server that receives tool invocation requests from agents, applies policies, and forwards them to the underlying execution engine. | | Skill | An agent‑side abstraction that exposes a tool’s capabilities to the planner via a function signature and metadata. | | Dry‑Run Policy | A safety default where tool executions are simulated, never causing real side‑effects, unless the caller explicitly opts‑in. | | Hash Verification | Computing a cryptographic digest of the tool binary and comparing it against a trusted registry to detect tampering. | | SQLite/FTS5 Cache | An embedded database that stores request/response pairs, enabling quick look‑ups and fuzzy matching for repeated queries. |


4. Architecture Overview

The entire system can be visualized as a pipeline of five stages:

┌───────────────────────┐
│ 1. API Specification  │
└──────────┬────────────┘
           │
           ▼
┌───────────────────────┐
│ 2. Go CLI Wrapper Gen  │
└──────────┬────────────┘
           │
           ▼
┌───────────────────────┐
│ 3. Tool Binary + Hash │
└──────────┬────────────┘
           │
           ▼
┌───────────────────────┐
│ 4. MCP Server + Cache │
└──────────┬────────────┘
           │
           ▼
┌───────────────────────┐
│ 5. Agent Runtime + Skill│
└────────────────────────┘

4.1 API Specification → Go CLI Wrapper Generation

  • The factory consumes OpenAPI (Swagger) or JSON‑Schema files that describe HTTP endpoints, request/response types, authentication, and error handling.
  • The Go CLI, go-tool-factory, parses these specs and emits a stand‑alone Go binary that implements the tool logic.
  • Generated code includes:
  • HTTP client scaffolding.
  • Request serialization / response deserialization.
  • Validation logic based on schema constraints.
  • Logging hooks for traceability.
  • Optional flags enable generating test stubs or mock servers for local testing.

4.2 Tool Binary & Hash Verification

Once the binary is compiled, the CLI computes a SHA‑256 digest of the executable and writes it to a .hash file.
The binary, together with its hash, is uploaded to a trusted registry (e.g., a GitHub repo, AWS S3 with signed URLs, or a custom artifact server).

During agent deployment, the agent runtime pulls the tool and verifies the digest against the registry record. If the digests differ, the tool is rejected and the agent cannot invoke it.

4.3 MCP Server & SQLite/FTS5 Cache

The MCP server acts as a mediator between the agent and the tool binary:

  1. Request Queue: Incoming tool invocation requests are queued with a unique ID.
  2. Dry‑Run Enforcement: By default, the server checks if the request is marked as dry_run=true. If so, it simulates the tool by invoking a dry‑run method (generated by the CLI).
  3. Policy Engine: A pluggable policy layer can enforce rate limits, timeouts, or user‑defined constraints.
  4. Execution: For non‑dry‑run requests, the MCP executes the tool binary in a sandboxed environment (e.g., Docker or gVisor).
  5. Caching: Results are stored in a SQLite database with an FTS5 (Full‑Text Search) extension. This allows fuzzy matching on queries, making repeated calls near‑instantaneous.
  6. Response: The MCP returns the raw response to the agent, optionally enriched with cache metadata.

4.4 Agent Runtime & Skills

Agents typically run on frameworks such as LangChain, OpenAI’s new Agent Toolkit, or proprietary systems. The factory exposes tools to the agent via skills:

  • A skill descriptor (JSON) that maps the tool’s function signature, description, and required parameters.
  • The agent’s planner reads this descriptor and can automatically generate a call plan.
  • The skill also embeds the MCP endpoint, ensuring the agent does not need to know the underlying execution details.

5. Go CLI – The Heart of the Tool Factory

The Go CLI (go-tool-factory) is the most visible component for developers. Its design focuses on simplicity, reproducibility, and correctness.

5.1 Key Features

| Feature | Description | Example | |---------|-------------|---------| | --output | Path to write the generated Go module | --output=./generated/google-search | | --dry-run | Generate a dry‑run implementation that returns a mock response | --dry-run | | --test | Emit unit tests for each endpoint | --test | | --debug | Print intermediate JSON for each step | --debug | | --openapi | Path to OpenAPI spec file | --openapi=api.yaml |

5.2 Code Snippet – Generated Client

// file: google_search_client.go
package googlesearch

import (
    "bytes"
    "encoding/json"
    "fmt"
    "net/http"
)

type Client struct {
    BaseURL    string
    HTTPClient *http.Client
}

func NewClient(baseURL string) *Client {
    return &Client{
        BaseURL:    baseURL,
        HTTPClient: &http.Client{},
    }
}

type SearchRequest struct {
    Query string `json:"q"`
}

type SearchResponse struct {
    TotalResults int           `json:"totalResults"`
    Items        []SearchItem `json:"items"`
}

type SearchItem struct {
    Title       string `json:"title"`
    Description string `json:"description"`
    Link        string `json:"link"`
}

func (c *Client) Search(req SearchRequest) (*SearchResponse, error) {
    body, _ := json.Marshal(req)
    resp, err := c.HTTPClient.Post(c.BaseURL+"/v1/search", "application/json", bytes.NewBuffer(body))
    if err != nil { return nil, err }
    defer resp.Body.Close()
    if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("unexpected status: %s", resp.Status)
    }
    var res SearchResponse
    if err := json.NewDecoder(resp.Body).Decode(&res); err != nil {
        return nil, err
    }
    return &res, nil
}

5.3 Dry‑Run Generation

When the --dry-run flag is set, the CLI creates a simulation method:

func (c *Client) SearchDryRun(req SearchRequest) (*SearchResponse, error) {
    // Return a canned response without hitting the real API
    return &SearchResponse{
        TotalResults: 1,
        Items: []SearchItem{
            {
                Title:       "Dry‑Run: " + req.Query,
                Description: "This is a dry‑run response.",
                Link:        "http://example.com/dryrun",
            },
        },
    }, nil
}

This ensures that agents can validate logic before committing to real side‑effects.


6. MCP Server – The Policy Gatekeeper

The MCP server is written in Rust for safety and performance. Its core responsibilities include:

  1. Endpoint Exposure
  • Exposes a /invoke HTTP endpoint that accepts JSON payloads:
    json { "tool_name": "googlesearch.Search", "parameters": {"q": "OpenAI API"}, "dry_run": true }
  1. Sandboxed Execution
  • Uses wasmtime to run the tool binary in a WASM sandbox or spawns a Docker container if the binary is native.
  • Enforces CPU and memory limits.
  1. Policy Engine
  • A pluggable JSON policy file controls rate limits, blacklisting, or required approvals.
  • Example policy:
    json { "max_concurrent": 5, "rate_limit_per_minute": 120, "blacklist": ["deleteAllRecords"] }
  1. Cache Layer
  • The server queries SQLite first. If a matching request is found (exact match or fuzzy search), it returns the cached response.
  • If not cached, it forwards the request to the tool, stores the result, and then returns it.
  1. Audit Logging
  • Every invocation is logged with timestamps, agent ID, parameters, and the result hash.
  • Logs are persisted to a structured file or shipped to a central log aggregator.

6.1 Sample MCP Flow

  1. Agent sends a dry‑run request.
  2. MCP checks if the request is in the cache.
  3. If not in cache, it executes the tool in dry‑run mode.
  4. Hashes the response and compares it with the expected hash (if supplied).
  5. Returns the response to the agent.
  6. The agent validates the response and proceeds.

7. Agent Skills – Bridging Tools to Agent Planners

A skill is essentially a function that an agent can call. It’s described in a JSON file that the agent’s runtime reads.

{
  "name": "google_search",
  "description": "Searches Google for the given query and returns the top 3 results.",
  "parameters": {
    "type": "object",
    "properties": {
      "q": {
        "type": "string",
        "description": "Search query string."
      }
    },
    "required": ["q"]
  },
  "execute": {
    "method": "POST",
    "url": "http://mcp-server:8080/invoke",
    "body": {
      "tool_name": "googlesearch.Search",
      "parameters": {
        "q": "$q"
      },
      "dry_run": false
    }
  }
}

The agent planner can now do:

results = agent.run("google_search", q="OpenAI's latest model")

The planner will:

  1. Serialize the parameters.
  2. Send the request to the MCP.
  3. Receive the response.
  4. Integrate the data into its internal state or next plan step.

8. SQLite/FTS5 Cache – Speed & Reliability

The article emphasizes the power of an embedded database for caching. SQLite is chosen for its zero‑dependency nature, while FTS5 (Full‑Text Search) provides advanced query matching.

8.1 Cache Schema

| Column | Type | Description | |--------|------|-------------| | id | INTEGER PRIMARY KEY | Unique identifier | | tool_name | TEXT | Name of the tool | | parameters_hash | TEXT | SHA‑256 of serialized parameters | | response_hash | TEXT | SHA‑256 of response body | | response_body | BLOB | Raw JSON response | | created_at | DATETIME | Timestamp |

8.2 FTS5 Index

An FTS5 virtual table parameters_fts is built on parameters_hash to allow fuzzy searches:

CREATE VIRTUAL TABLE parameters_fts USING fts5(
  parameters_hash,
  content='cache',
  content_rowid='id'
);

8.3 Query Flow

  1. Agent submits request → MCP calculates parameters_hash.
  2. MCP executes SELECT * FROM cache WHERE parameters_hash = :hash;
  3. If a row is found, return response_body.
  4. Else, execute tool, store new row, and return response.

This design ensures that identical or similar requests are served from the cache, dramatically reducing latency for high‑frequency calls.


9. Dry‑Run‑by‑Default Policy – The Safety Net

Safety is paramount when agents can invoke arbitrary tools. The dry‑run policy works as follows:

  1. Default Behavior: Every tool invocation is flagged dry_run=true unless the caller explicitly sets dry_run=false.
  2. Simulation Layer: The tool binary contains a DryRun() method that mimics the real behavior without external side‑effects.
  3. Audit: Dry‑run calls are logged with a dry_run flag for traceability.
  4. Rollback: If a dry‑run reveals an issue (e.g., a parameter leads to an error), the agent can correct it before executing the real command.

The policy is implemented by default in the MCP server, but can be overridden by a policy file if an agent has privileged access.


10. Hash‑Based Verification – Integrity & Trust

Integrity is enforced at two levels: binary and response.

| Level | What is hashed | How it is verified | |-------|---------------|--------------------| | Binary | Tool executable (tool.exe or tool.wasm) | SHA‑256 digest stored in the registry. During deployment, the agent compares local digest with registry digest. | | Response | Response body (response.json) | The MCP calculates SHA‑256 of the response and returns it. The agent verifies that it matches the expected hash (if known). |

By verifying the binary, we guarantee that the tool has not been tampered with between generation and deployment. By hashing the response, we guard against network tampering or caching errors.


11. Integration with Agent Runtimes

The article provides concrete examples of integrating the verified tool factory into two popular agent frameworks:

11.1 LangChain (Python)

from langchain.agents import Tool, AgentExecutor, ZeroShotAgent
from langchain.llms import OpenAI

def invoke_tool(params: dict):
    import requests
    res = requests.post(
        "http://mcp-server:8080/invoke",
        json=params
    )
    return res.json()

search_tool = Tool(
    name="Google Search",
    func=invoke_tool,
    description="Searches Google and returns top results."
)

agent_executor = AgentExecutor.from_agent_and_tools(
    agent=ZeroShotAgent(
        llm=OpenAI(),
        tools=[search_tool],
        tool_names=[t.name for t in [search_tool]]
    ),
    tools=[search_tool]
)

output = agent_executor.run("Find me the latest news on AI.")
print(output)

11.2 OpenAI’s Agent Toolkit (Node.js)

const { AgentToolkit, Tool } = require('openai-agent-toolkit');

async function invokeTool(params) {
  const response = await fetch('http://mcp-server:8080/invoke', {
    method: 'POST',
    body: JSON.stringify(params),
  });
  return await response.json();
}

const searchTool = new Tool({
  name: 'GoogleSearch',
  description: 'Search Google',
  function: async (input) => invokeTool(input),
});

const toolkit = new AgentToolkit({
  tools: [searchTool],
});

const agent = toolkit.createAgent({ name: 'Explorer' });
const result = await agent.run('Find me the latest AI headlines.');
console.log(result);

In both cases, the tool’s function is simply a wrapper around the MCP /invoke endpoint. The agent’s planner handles calling it, managing parameters, and interpreting responses.


12. Use Cases & Case Studies

12.1 Automated Customer Support

  • Scenario: An agent must query a CRM API to fetch customer data before responding to a support ticket.
  • Implementation:
  • Define a tool crm.GetCustomer via OpenAPI.
  • Generate the wrapper, register with the MCP.
  • The agent first calls GetCustomer in dry‑run mode to validate the query, then executes it for real.
  • Cache stores the result for 5 minutes, so subsequent tickets for the same customer hit the cache.

12.2 Finance Portfolio Management

  • Scenario: An agent monitors market data and trades securities.
  • Implementation:
  • Use a verified market.GetTickerData tool.
  • Dry‑run mode prevents accidental trades.
  • Hash verification ensures the response has not been tampered with.
  • Policy engine enforces per‑account rate limits.

12.3 Medical Data Retrieval

  • Scenario: An AI system queries patient records for diagnostic support.
  • Implementation:
  • The tool factory ensures compliance with HIPAA by enforcing dry‑run and audit logging.
  • The SQLite cache stores anonymized query results to speed up repeated research queries.
  • Hash verification protects against data corruption.

13. Performance Evaluation

The article presents an empirical benchmark comparing three setups:

| Configuration | Avg. Latency (ms) | Cache Hit Rate | CPU Usage (%) | Memory Usage (MB) | |---------------|------------------|----------------|---------------|-------------------| | Naïve (raw HTTP) | 280 | 0% | 55 | 200 | | MCP + SQLite Cache | 85 | 73% | 32 | 180 | | MCP + SQLite + Dry‑Run | 70 | 85% | 28 | 170 |

Key observations:

  • Latency is reduced by ~70% due to caching and the lightweight Rust MCP.
  • Cache Hit Rate improves dramatically when agents reuse queries (e.g., repeated search queries).
  • CPU/Mem consumption stays modest; the MCP server uses ~30% CPU and <200 MB memory under typical loads.

The benchmarks were performed on a 4‑core, 8 GB machine, using a simulated OpenAI GPT‑4 model as the planner.


14. Security Considerations

14.1 Sandbox Isolation

  • The MCP runs tool binaries inside a Docker container with --read-only filesystem and --network=none (unless the tool requires external access).
  • CPU and memory limits are enforced via cgroups.

14.2 Authentication & Authorization

  • MCP exposes a simple API key system; the agent passes a token in the Authorization header.
  • The policy engine can enforce per‑user or per‑role restrictions.

14.3 Network Exposure

  • The MCP listens on an internal network interface only.
  • TLS termination is performed by a reverse proxy (e.g., Nginx) when exposed externally.

14.4 Auditing & Compliance

  • All invocations are logged with:
  • Agent ID
  • Tool name
  • Parameters hash
  • Response hash
  • Timestamp
  • Dry‑run flag
  • Logs can be shipped to SIEM solutions for compliance (PCI‑DSS, GDPR, etc.).

15. Extensibility & Customization

The tool factory is intentionally modular, allowing teams to plug in their own components:

| Component | Customization Options | Use Cases | |-----------|-----------------------|-----------| | CLI | Custom code generation templates, alternative languages (Python, Java) | Support legacy systems or preferred stacks | | MCP | Add new policy languages (Rego), switch to WASM sandboxing | More granular control, zero‑trust environments | | Cache | Use LMDB or RocksDB for high‑write workloads | Large‑scale deployments | | Skills | Auto‑generate skill descriptors from OpenAPI, add NLP validation | Simplify integration with new frameworks |


16. Future Directions

The article outlines several research avenues:

  1. Dynamic Policy Adaptation
  • Agents could learn optimal policies (e.g., rate limits) based on observed traffic patterns.
  1. Incremental Verification
  • Instead of hashing the entire binary, compute function‑level hashes to allow partial updates without full re‑deployment.
  1. Federated Tool Factories
  • Multiple MCP servers could coordinate across data centers, each holding a subset of tools.
  1. Tool Versioning & Rollback
  • Store multiple tool binaries in the registry, allowing agents to revert to a stable version if a new release introduces bugs.
  1. Graph‑Based Skill Discovery
  • Use knowledge graphs to infer relationships between tools, enabling agents to compose multi‑step workflows automatically.

17. Conclusion

The verified tool factory represents a significant leap forward in how AI agents interact with the software world. By automating the generation of safe, tested, and performance‑optimized tool wrappers, and by providing a robust middleware layer (MCP) with dry‑run safety, caching, and hash verification, the system addresses the core pain points that have historically hampered agent deployment:

  • Safety: No unintended side‑effects unless explicitly permitted.
  • Trust: Binary and response integrity guarantees.
  • Performance: Near‑instantaneous responses via SQLite/FTS5 caching.
  • Portability: Agnostic to the agent runtime; only a simple skill descriptor is needed.

For organizations looking to adopt autonomous agents in production—whether for customer support, finance, healthcare, or beyond—this tool factory offers a ready‑to‑adopt solution that can be integrated with minimal engineering overhead, while still delivering enterprise‑grade security and compliance.


Appendix – Quick‑Start Checklist

| Step | Action | Tool / Command | |------|--------|----------------| | 1 | Install Go CLI | go install github.com/yourorg/tool-factory@latest | | 2 | Generate wrapper | tool-factory generate --openapi api.yaml --output ./google-search | | 3 | Build & hash | go build -o google-search ./google-search && sha256sum google-search > google-search.hash | | 4 | Upload to registry | tool-factory publish --binary google-search --hash google-search.hash | | 5 | Deploy MCP | docker run -d -p 8080:8080 yourorg/mcp-server | | 6 | Register skill | Create JSON descriptor and load into agent runtime | | 7 | Run agent | python agent.py or node agent.js |

Happy building!


Word count: ~4,100 words

Read more