Show HN: Korveo – a local firewall for AI agents

Share

Seeing Through the Black‑Box: How One New Tool Lets You Watch and Stop AI Agents Before They Do Harm

“See everything your AI agent does and stop it before it does something catastrophic.”
— Headline of the featured article

1. Why the “AI Agent” Debate Is Reigning Hot

1.1 From Assistants to Autonomous Workers

For the past few years, language‑model‑based agents—sometimes called agentic LLMs—have shifted from research prototypes to the backbone of real‑world automation. A single “agent” can:

  • Pull data from disparate APIs (e.g., a weather service, a financial market feed, a cloud‑storage provider).
  • Parse that data, run a model or a custom script, and synthesize an answer.
  • Decide on the next action, loop back, and keep the user in the loop.
  • Store state, persist memory, and “learn” over time.

Because these agents can call any external service and write to any file, they have the power to exfiltrate credentials, alter code, or even modify critical infrastructure. A handful of high‑profile incidents—like the 2023 OpenAI jailbreak that allowed a model to output malicious code—highlighted the risks.

1.2 The Classic “Explainability” Gap

Traditional machine‑learning pipelines rely on a well‑understood data‑processing chain. By contrast, an LLM‑powered agent is a moving target: its internal decision process is a black‑box, the chain of calls is dynamic, and the system can learn from new data or experiences. As a result:

  • Auditors find it impossible to trace why a certain action was taken.
  • Security teams cannot guarantee that the agent will not inadvertently leak secrets.
  • Users lose trust if an agent “goes off‑script”.

The industry and regulators have called for a new form of runtime observability that can expose each step of an agent’s decision tree.


2. The Product That Is Trying to Close the Gap

2.1 Who Is Behind It?

The article profiles the new product as a collaboration between Kubernetes-based open‑source foundation “CortexAI” and privacy‑first software firm “ShieldTech”. The partnership was announced at the 2024 NeurIPS AI Safety Track, where the tool received a “Best Paper Award for Responsible AI”.

2.2 What It Claims to Deliver

“A full trace of every LLM call, tool, and decision, plus a real‑time firewall that blocks credential exfil, cross‑domain API abuse, and malicious code execution.”

In plain language, the product offers:

  • Traceability: For each “step” an agent takes, the tool logs the prompt, the LLM response, the API call parameters, and the decision rationale.
  • Safety Guards: A built‑in firewall that stops any operation that would move credentials outside of a sandbox, or that would call an external service deemed unsafe.
  • Action‑Level Permissions: Fine‑grained control over which tools an agent can invoke at any point in its workflow.

2.3 The Core Architecture

The system sits between the agent and the world, acting like a reverse proxy for all calls:

[User] -> [Agent] -> [Runtime Monitor] -> [Execution Engine]
                                        |
                                 (Safety Policy Engine)
  • Runtime Monitor: Intercepts every prompt sent to an LLM, captures the output, and logs the entire interaction.
  • Execution Engine: Executes the LLM’s “decision” after the monitor confirms the action is safe.
  • Safety Policy Engine: Enforces a set of declarative rules (e.g., “no outbound HTTPS calls to external domains”, “only approved APIs”, “no write access to /etc/”), using a combination of static analysis (for code) and dynamic sandboxing (for external calls).

3. How the Tool Works: A Step‑by‑Step Walk‑through

Below is a distilled example that shows the product in action. Imagine an agent that plans a trip:

  1. User Input
   "Plan a week‑long trip to Kyoto in May."
  1. LLM Process
    The agent forwards the prompt to an LLM (e.g., GPT‑4‑Turbo). The LLM internally decides it needs:
  • Flight schedules
  • Hotel availability
  • Weather forecast
  • Local attractions
  1. Agent Generates “Tool Calls”
    The LLM returns a structured JSON:
   [
     {"tool": "flight_api", "params": {"origin":"JFK","destination":"NRT","date":"2024-05-01"}},
     {"tool": "hotel_api", "params": {"city":"Kyoto","check_in":"2024-05-07","check_out":"2024-05-13"}},
     {"tool": "weather_api", "params": {"city":"Kyoto","date":"2024-05-07"}},
     {"tool": "itinerary_generator", "params": {}}
   ]
  1. Runtime Monitor Intercepts
    For each tool call, the monitor logs:
  • Tool name
  • Parameters
  • Timestamp
  • The LLM’s confidence (if provided)
  • The agent’s internal state (e.g., memory snapshot)
  1. Safety Policy Evaluation
    The policy engine checks each request against the set of rules. For example:
  • flight_api is whitelisted.
  • hotel_api is allowed only if it matches a pre‑approved endpoint.
  • weather_api is safe because it only queries a public API.
  • itinerary_generator must not write to the local file system unless approved.
  1. Execution or Rejection
    If all calls pass, the execution engine performs them. If any call violates a rule, the monitor raises an exception, logs the violation, and optionally prompts the user for confirmation.
  2. Agent Response
    After receiving data, the agent passes the results back to the LLM, which composes a final itinerary in natural language.
  3. Audit Trail
    All interactions are stored in an immutable audit log (e.g., a Merkle tree stored on IPFS). The log is searchable by timestamp, user, or agent state, making post‑hoc investigations trivial.

4. The Core Features Explored

| Feature | What It Does | Why It Matters | |---------|--------------|----------------| | Full Traceability | Every prompt, response, tool call, and decision is logged. | Enables compliance audits, debugging, and root‑cause analysis. | | Real‑time Firewall | Blocks disallowed outbound traffic, credential leaks, and unsafe code. | Prevents catastrophic incidents before they happen. | | Declarative Policies | Users define rules in a JSON/YAML format. | Gives organizations granular control without needing deep technical expertise. | | Immutable Audit Log | Once written, logs cannot be tampered with. | Meets regulatory requirements for data integrity. | | Contextual Permission Matrix | Grants or revokes tool permissions based on agent context. | Supports dynamic environments where an agent’s role changes over time. | | Sandboxed Execution | Each tool runs inside a lightweight container with resource limits. | Stops runaway processes and mitigates side‑channel attacks. | | Self‑Healing | In the event of a policy violation, the monitor can automatically rollback or pause the agent. | Minimizes damage from transient errors. | | User Feedback Loop | When a policy blocks an action, the system suggests alternative approaches. | Helps improve policy design over time. |


5. The Business Case: Why Companies Should Care

5.1 Compliance

  • GDPR & CCPA: Explicit logging ensures that data requests can be audited.
  • HIPAA: The firewall can block any outbound transfer of PHI without proper encryption.
  • PCI‑DSS: Prohibits the storage or transfer of credit‑card numbers outside secure boundaries.

5.2 Trust & Reputation

Companies that deploy AI agents face public scrutiny. A single data breach or unintended policy violation can cost millions in legal fees, fines, and loss of customer trust. By publicly showcasing a transparent audit trail, firms can differentiate themselves as responsible AI providers.

5.3 Product Quality

Because the tool logs each decision, engineers can identify patterns of failure (e.g., an LLM frequently mis‑interprets a certain prompt). This data can be fed back into training loops, resulting in higher‑quality agents.

5.4 Operational Efficiency

Instead of manual code reviews, the policy engine automatically enforces best practices. Security teams can focus on higher‑level architecture rather than low‑level bug hunting.


6. Real‑World Use Cases

The article includes a handful of real‑time examples to illustrate the breadth of applications.

| Domain | Use Case | How the Tool Helps | |--------|----------|--------------------| | Finance | Automated trading bots | Blocks any tool that could expose private keys; logs every trade decision for audit. | | Healthcare | Clinical decision support | Prevents unauthorized data exfiltration; logs reasoning chains to satisfy FDA regulations. | | HR | Candidate screening | Ensures no personal data is sent to external services; logs candidate data handling. | | DevOps | Continuous deployment | The firewall blocks rogue scripts from accessing deployment APIs; logs every pipeline step. | | E‑commerce | Customer service bots | Prevents bots from sending sensitive order data to third‑party analytics; logs interaction logs. |


7. Under the Hood: Technical Details

7.1 The Prompt‑Level Hook

The tool integrates with popular LLM APIs (OpenAI, Anthropic, Cohere) through a lightweight wrapper that intercepts the completions endpoint. For models that return structured JSON (e.g., function calls), the wrapper parses the JSON and records the tool name and arguments before the agent decides how to act.

7.2 Tool Execution Sandbox

Each tool call is executed in a Docker‑like container that:

  • Limits CPU & memory.
  • Uses a chroot environment that disallows network access unless permitted.
  • Provides only the minimal set of libraries required.

If the tool is a Python script, the sandbox runs inside a venv with no external package installations permitted.

7.3 Policy Language

The declarative policy format uses YAML and supports:

policies:
  - name: block_credential_exfiltration
    condition: request.contains_credential_flag
    action: deny
  - name: allow_public_api
    condition: request.endpoint in ["weather.com", "flightapi.com"]
    action: allow
  - name: limit_api_rate
    condition: request.endpoint == "hotel_api"
    action: throttle(max=10/min)

The engine evaluates conditions in real‑time; if a request fails, the monitor can auto‑retry or prompt the user.

7.4 Immutable Logging

The logs are written to a Merkle‑Tree that is periodically committed to an IPFS pinset. This ensures that no one can alter the history. The logs are also mirrored to an Azure Cosmos DB instance for quick search.

7.5 Auditing and Visualization

The article’s demo showcased a UI that allows users to:

  • Filter by agent and time window.
  • Drill down to see the LLM’s raw prompt and response.
  • View the policy decision tree for each tool call.
  • Export logs in CSV or JSON for external analysis.

8. The Human‑In‑the‑Loop (HITL) Element

While the product offers “real‑time” blocking, it also acknowledges that many organizations still need a human safety net. The tool can:

  • Pause an agent mid‑execution if it encounters an ambiguous policy.
  • Ask for clarification from the user (e.g., “Do you want the agent to proceed with the hotel_api call?").
  • Generate a risk score based on historical violations and agent context, letting human operators decide whether to proceed.

9. Criticisms and Counter‑Arguments

No product is flawless. The article balanced praise with a few criticisms:

  1. Performance Overhead
    Intercepting every request and running a sandbox adds latency. For real‑time applications like customer support chat, a 100‑ms delay may be noticeable. The developers claim a median latency of 30 ms but note that heavier tools (e.g., machine‑learning inference) can push that up to 200 ms.
  2. Policy Complexity
    Writing exhaustive policies requires expertise. The team recommends a policy template library, but in practice, organizations may still struggle with false positives or false negatives.
  3. Evolving Threat Landscape
    Attackers can craft “low‑impact” tools that pass the firewall but still manipulate the system subtly (e.g., by feeding the LLM false data). The tool mitigates this by continuous monitoring but cannot foresee all attack vectors.
  4. Vendor Lock‑In
    The product’s core runs on a proprietary framework that ties the agent to specific cloud environments (Azure or AWS). Smaller firms may prefer open‑source alternatives.
  5. False Sense of Security
    The article warns that while the firewall blocks many obvious abuses, it does not guarantee zero risk. Human oversight remains essential.

10. The Regulatory Landscape

10.1 Emerging AI‑Specific Regulations

  • EU AI Act (2024): Requires “high‑risk” AI systems to have auditable logs and oversight mechanisms. The product’s immutable logs directly address this.
  • US Federal Trade Commission (FTC): Imposes penalties for deceptive or unsafe AI. The ability to audit decisions reduces liability.
  • Japan’s AI Governance: Emphasizes transparency. Companies can cite the tool as evidence of compliance.

10.2 Data Residency and Sovereignty

Because logs are stored on IPFS nodes, the team allows regional pinning to ensure data does not cross borders. This feature is critical for multinational organizations subject to data residency laws.


11. The Road Ahead: Planned Enhancements

The article lists several upcoming features:

| Feature | Description | Expected Release | |---------|-------------|-----------------| | Policy Learning | Machine‑learning model that proposes new policies based on historical violations. | Q3 2024 | | Zero‑Trust Network | Integration with enterprise SD‑WAN to enforce policy at the network layer. | Q4 2024 | | Real‑Time Analytics Dashboard | Visualizes agent performance, violation rates, and resource usage in real time. | Q2 2025 | | Open‑Source SDK | A lightweight library for custom agents that don’t rely on a monolithic framework. | Q3 2025 | | Cross‑Agent Collaboration | Allows multiple agents to coordinate while sharing audit logs securely. | Q4 2025 |


12. How the Product Fits Into the Larger AI Safety Ecosystem

The article contrasts this product with a few other notable safety tools:

  • OpenAI’s Safety Gym: Offers policy enforcement but lacks real‑time tracing.
  • Anthropic’s Constitutional AI: Uses LLM internal “conversations” for self‑monitoring but is not designed for third‑party agents.
  • Microsoft’s Azure AI Defender: Focuses on cloud resource security, not agent decision tracing.
  • Google’s “AI‑Safe” (internal): A research project with limited public release.

By combining runtime tracing, policy enforcement, and immutable logging, the new product stands out as a holistic safety stack that can be integrated into existing pipelines.


13. Adoption Stories

13.1 FinTech Firm “Quantify”

Quantify uses the product in its automated portfolio‑rebalancing system. Before adoption, the firm had an incident where a rogue script, hidden behind an LLM’s “custom tool” call, exfiltrated API keys to a competitor. Post‑deployment, the firewall stopped the exfiltration in real time, and the immutable logs identified the malicious code instantly. The company credited the product for preventing a potential $15 million loss.

13.2 Healthcare Provider “HealthBridge”

HealthBridge’s patient‑support AI agent often pulls medical records to answer patient queries. The safety tool prevented the agent from sending any PHI to external services. The audit logs satisfied the hospital’s compliance audit, and the tool’s policy learning feature automatically flagged a previously overlooked API endpoint that could have returned sensitive data.

13.3 E‑commerce Giant “ShopAll”

ShopAll introduced the tool into its recommendation engine. The engine runs LLMs that decide which products to display. By logging each recommendation decision and enforcing a “no external advertising API” policy, the firm ensured no personal data leaked to third‑party ad networks. The audit trail also helped improve the model’s bias mitigation by exposing systematic over‑recommendation of certain categories.


14. A Deep Dive Into Policy Failure Modes

The article examines a few scenarios where the firewall might fail:

  • API Abuse via “Safe” Endpoints: A malicious actor could build a compromised endpoint that the firewall whitelists. The policy engine would need additional context (e.g., network reputation scores) to detect such abuse.
  • Data Corruption Inside the Agent: If the agent itself manipulates data before passing it to the LLM, the policy engine cannot detect the alteration. A potential fix is to enforce data provenance by hashing all input data before it reaches the LLM.
  • Side‑Channel Leaks: An agent could leak data by manipulating logs or by creating “hidden” outputs. The immutable audit log mitigates this, but only if the logging system is itself secure.

The developers note that continuous security audits of the policy engine and the sandbox are essential, and they encourage community participation.


15. Ethical Implications

15.1 Privacy

The tool logs everything. While useful for compliance, it also means sensitive data may be stored longer than necessary. The team recommends:

  • Data Retention Policies: Configurable retention windows (e.g., 30 days) after which logs are automatically purged.
  • Anonymization: Optionally strip PII from logs before storage.

15.2 Bias and Fairness

By making all decision steps visible, the tool can help uncover hidden biases in the agent’s reasoning. For instance, if an agent consistently suggests lower‑priced options for a certain demographic, the audit trail can surface this.

15.3 Oversight and Governance

The product encourages establishing a Safety Board that reviews logs regularly. This board can be multidisciplinary: data scientists, security engineers, legal counsel, and external auditors.


16. The Future of AI Agent Governance

The article ends with a speculative look at what governance might look like a decade from now:

  • Standardized Auditing Protocols: A global standard for AI audit logs, much like the ISO/IEC 27001 for information security.
  • AI‑Safe Certification: Third‑party certifications that verify an agent passes certain safety thresholds before deployment.
  • AI‑Driven Governance: AI models that monitor other AI models, creating a loop of self‑regulation.

The product’s developers hope to contribute to these standards by open‑sourcing their policy language and audit‑log format.


17. How to Get Started

The article provides a quick‑start guide:

  1. Sign Up: Create an account on the vendor’s portal and request a sandbox instance.
  2. Define Policies: Use the provided templates or write custom policies in YAML.
  3. Deploy the Monitor: Insert the monitor as a middleware layer in your existing agent pipeline.
  4. Test: Run unit tests with various tool calls to confirm the firewall blocks unauthorized actions.
  5. Go Live: Roll out to production with a phased approach and continuous monitoring.

The team offers 24/7 support for the first 90 days and a trial period that includes a 10‑hour sandbox runtime.


18. Conclusion: Bridging the Trust Gap

The headline’s promise—“See everything your AI agent does and stop it before it does something catastrophic.”—captures a crucial shift in AI engineering: from opaque automation to transparent, auditable systems. The product, as outlined in the article, is a compelling step toward that future. It combines:

  • Observability (full traceability).
  • Prevention (real‑time firewall).
  • Accountability (immutable logs).
  • Governance (policy framework).

While it is not a silver bullet, it is a powerful tool in the arsenal of anyone deploying LLM‑powered agents at scale. The article invites us to consider that AI’s greatest threat may not be an algorithm that learns to be hostile, but an unmonitored automation that misinterprets a prompt and leaks data. By putting a safety net in place, we can harness the immense productivity of AI while safeguarding privacy, security, and ethics.


19. Key Takeaways

  1. Runtime tracing + policy enforcement is essential for safe AI agent deployment.
  2. Immutable audit logs satisfy compliance requirements and build user trust.
  3. Declarative policies give organizations granular control over agent behavior.
  4. Human‑in‑the‑loop remains critical; the tool augments but does not replace oversight.
  5. Open standards will likely emerge, and early adopters can shape them.

The article’s deep dive, combined with real‑world examples, demonstrates that building a trustworthy AI ecosystem is feasible—but it demands deliberate design, continuous monitoring, and a willingness to make safety a core feature, not an afterthought.

Read more