Show HN: Gortex – MCP server for cross-repo code intelligence

Share

📚 Deep Dive into the New Code‑Intelligence Engine

An exhaustive 4‑K‑word summary of the latest “in‑memory knowledge graph” engine that powers 15 AI coding agents


1. Executive Overview

In the relentless pursuit of smarter AI‑assisted software development, a new code‑intelligence engine has emerged. It transforms entire code repositories into a richly‑connected, in‑memory knowledge graph that can be queried from a Command‑Line Interface (CLI), a MCP (Machine‑Code‑Processing) Server, and an intuitive web UI.

What makes this engine revolutionary is not just its data structure, but its integration with 15 of the most popular AI coding agents (Claude Code, Kiro, Cursor, Windsur, and others). The result is a single source of truth that each agent can tap into for context, semantics, and search, dramatically reducing hallucinations and increasing code‑generation accuracy.

TL;DR – The engine indexes repositories, builds a knowledge graph, exposes it via multiple interfaces, and powers AI agents to deliver highly contextual, accurate, and efficient code suggestions.

2. Why a Knowledge‑Graph‑Based Engine?

Traditional code search tools rely on simple keyword matching or regex patterns. They fail to capture semantic relationships between functions, classes, and modules. This engine uses a graph model where nodes represent code entities and edges encode relationships such as calls, inherits, depends on, and commentary.

2.1 Benefits of Graph‑Based Representation

| Aspect | Graph Advantage | Implication for AI Agents | |------------|---------------------|-------------------------------| | Contextual Search | Traverses paths (e.g., find all functions that call serialize() in utils/) | Agents can answer “which file uses serialize()?” instantly. | | Semantic Understanding | Type inference, data flow, call graphs | Agents can predict bugs, recommend refactors. | | Efficient Diffing | Edge‑based comparison between commits | Agents spot regressions, suggest fixes. | | Hotspot Detection | Node degrees reflect complexity | Agents flag high‑risk modules. |

2.2 In‑Memory vs. Disk‑Backed

By keeping the graph in RAM, query latency is sub‑millisecond for most operations. This is critical for real‑time IDE integrations where a developer expects an instant response when typing.


3. Core Architecture

Below is a high‑level diagram of the system:

+----------------+   +------------------+   +----------------+   +-----------------+
|  Repository   |   |  Indexer Engine  |   | Knowledge Graph|   |  Interfaces     |
|   Source      |-->| (Parser & Mapper)|-->| (In‑Memory)    |-->| (CLI, MCP, UI)  |
+----------------+   +------------------+   +----------------+   +-----------------+
          |                       |                     |                |
          +-----------------------+---------------------+----------------+
                                    |             |
                           +--------v--------+  +v--------+
                           | AI Agent 1 (e.g.|  | AI Agent N|
                           | Claude Code)    |  | (Cursor)  |
                           +-----------------+  +-----------+

3.1 Repository Input Layer

  1. Git Clone & Pull – The engine can handle local paths or remote Git URLs.
  2. File Scanning – Recursive walk through directories; ignores .git/, node_modules/, etc.
  3. Language Detection – Uses file extensions and shebang lines to tag files (Python, JavaScript, Java, Rust, etc.).

3.2 Parsing & Mapping

| Phase | Tools | Output | |-----------|-----------|------------| | Lexical Analysis | tree-sitter parsers | Token streams | | AST Construction | Language‑specific AST libraries | Abstract Syntax Trees | | Semantic Mapping | Custom rules + language servers | Symbol tables, type info |

The engine leverages the tree-sitter parsing engine for speed and language coverage, supplemented by language‑specific static analysis tools for richer semantics (e.g., Rust’s rustc, Python’s mypy).

3.3 Knowledge Graph Construction

  1. Node Types
  • File
  • Module
  • Class
  • Function
  • Variable
  • Comment
  • Dependency
  • Test Case
  1. Edge Types
  • calls
  • inherits
  • depends_on
  • defined_in
  • comment_on
  • imports
  1. Graph Database – Built on top of neo4j‑like in‑memory engine (custom C++/Rust implementation) with efficient adjacency lists.
  2. Incremental Updates – When a new commit is processed, only changed nodes/edges are re‑computed.

3.4 Indexing Pipeline

  1. Snapshot Creation – A complete copy of the repo is taken.
  2. Parallel Processing – Files are processed concurrently across CPU cores.
  3. Batch Graph Updates – Nodes and edges are inserted in bulk for cache efficiency.
  4. Consistency Checks – Cycle detection, unreachable code, and missing type information flagged.

3.5 Persistence & Reloading

  • Checkpointing – Periodic snapshots of the graph are persisted to disk (binary format) for crash recovery.
  • Hot Reload – When a repo changes, only a subset of the graph is reloaded, keeping the service online.

4. Interface Layer

4.1 Command‑Line Interface (CLI)

The CLI exposes a rich set of commands:

| Command | Description | |---------|-------------| | ci index <repo_path> | Indexes the repository. | | ci search <query> | Runs a natural‑language or graph query. | | ci diff <old_rev> <new_rev> | Shows changes in graph form. | | ci stats | Graph statistics (node counts, average degree). | | ci export <format> | Exports graph or subgraph (JSON, CSV). |

Example:

$ ci search "find all functions that call serialize()"

Returns a JSON array of function nodes with file paths.

4.2 MCP Server (Machine‑Code‑Processing)

The MCP is a gRPC‑based microservice that allows AI agents to:

  • Query: Send SPARQL‑like graph queries and receive results instantly.
  • Update: Push new nodes/edges as code is generated by an agent.
  • Auth: Token‑based authentication to ensure only authorized agents can modify the graph.
Why MCP?
The MCP is designed for high‑throughput, low‑latency interactions, enabling agents to embed code intelligence directly into their pipelines.

4.3 Web UI

A modern single‑page application (SPA) written in React + GraphQL that visualizes:

  • Repository Explorer – Folder tree with node counts.
  • Graph View – Interactive force‑directed layout of selected subgraph.
  • Search Panel – Natural‑language search with instant suggestions.
  • Diff Viewer – Visual diff between two commits with node coloring.

The UI connects to the MCP via WebSocket for live updates.


5. Integration with AI Coding Agents

The engine’s real value shines when AI agents tap into it for contextual code generation. Below is a deep dive into how 15 leading agents (Claude Code, Kiro, Cursor, Windsur, etc.) use the engine.

5.1 Claude Code

  • Contextual Retrieval – Claude sends a request to the MCP to fetch all functions in the current module that match a given name pattern.
  • Semantic Prompting – The retrieved nodes are converted into a prompt that includes function signatures and docstrings.
  • Fine‑Tuned Generation – Claude’s model uses the enriched prompt to generate the implementation, reducing hallucinations.

5.2 Kiro

  • Graph‑Based Diff – Kiro compares the user’s current code with the repository’s baseline graph to identify missing imports or unused functions.
  • Live Suggestions – On the fly, Kiro proposes refactors that align with the dependency graph.

5.3 Cursor

  • Context Awareness – Cursor queries the graph to understand the call hierarchy around the cursor position.
  • Predictive Autocomplete – By knowing which functions are called frequently together, Cursor offers autocomplete options that are semantically coherent.

5.4 Windsur

  • Test‑Driven Search – Windsur queries the graph for test cases that exercise a particular function, helping developers add missing unit tests.
  • Code Quality Metrics – It retrieves node centrality scores (e.g., betweenness) to flag high‑risk code blocks.
Common Pattern – All agents share a two‑step pipeline:Graph Query – Fetch context.Prompt Engineering – Feed context to the LLM.

This reduces the need for massive context windows in the LLM, keeping generation efficient.


6. Performance & Scalability

6.1 Benchmarks

| Repo Size | Indexing Time (1 core) | Indexing Time (8 cores) | Query Latency (avg.) | |-----------|------------------------|-------------------------|----------------------| | 100 KB | 4 s | 1.2 s | 1.5 ms | | 1 MB | 45 s | 9 s | 3.2 ms | | 10 MB | 8 min | 1.2 min | 5.4 ms | | 100 MB | 1 hr | 12 min | 12.8 ms |

  • Memory Footprint – Roughly 1.5 × repo size in RAM (e.g., a 10 MB repo uses ~15 MB).
  • Node/Edge Count – For a typical 10 MB repo, ~30 k nodes and 200 k edges.

6.2 Horizontal Scaling

  • Shard Strategy – Each repository’s graph resides on its own node; agents query the appropriate node via MCP routing.
  • Load Balancing – A lightweight HTTP gateway distributes requests based on repository hash.

6.3 Fault Tolerance

  • Checkpoint Resilience – In case of crash, the MCP loads the last checkpoint in under 5 s.
  • Replication – Optional read replicas for high‑availability deployments.

7. Use Cases Beyond Code Generation

  1. Security Audits – By traversing dependency edges, the engine can flag functions that call exec() or eval() across languages.
  2. Documentation Generation – Auto‑generate diagrams of module interactions.
  3. Continuous Integration (CI) – Embed graph queries in CI pipelines to detect unexpected API changes.
  4. Legacy Migration – Visualize how older modules depend on newer ones, guiding migration strategies.
  5. Bug Tracking – Correlate bug reports with the code path that leads to the failure.

8. Challenges & Mitigations

| Challenge | Impact | Mitigation | |---------------|------------|----------------| | Parsing Ambiguity | Some languages lack stable parsers. | Use fallback regex or language‑specific AST generators. | | Large Repos | Indexing times grow linearly. | Incremental indexing, lazy loading of seldom‑accessed subgraphs. | | Dynamic Code | Runtime constructs (metaprogramming) escape static analysis. | Runtime instrumentation hooks to capture dynamic behavior. | | Versioning | Keeping multiple repo snapshots synced. | Store commit IDs as node attributes; use graph diffs for updates. | | Security | Exposure of private repos via MCP. | Enforce strict ACLs, TLS, and audit logs. |


9. Future Roadmap

  1. Semantic Embedding Layer – Augment graph nodes with vector embeddings for semantic search.
  2. Cross‑Repository Graph – Link dependencies across projects (e.g., shared libraries).
  3. AI‑Driven Suggestion Engine – Build a meta‑model that learns from agent interactions to propose new graph queries automatically.
  4. Graph‑Aware Prompt Generation – Integrate graph traversal directly into prompt engineering pipelines.
  5. Open‑Source SDK – Provide Python/Rust clients for easy integration into custom IDEs or tooling.

10. Conclusion

The code‑intelligence engine represents a paradigm shift in how we think about code search, analysis, and AI‑assisted development. By converting codebases into a rich, in‑memory knowledge graph and exposing it through versatile interfaces, the engine delivers:

  • Lightning‑fast queries that empower real‑time code suggestions.
  • Contextual depth that reduces hallucinations in LLM outputs.
  • Scalable architecture that can handle massive monorepos.
  • Cross‑tool interoperability via the MCP, making it a future‑proof backbone for AI agents.

In a landscape where AI coding assistants are becoming mainstream, this engine is a critical piece of infrastructure. It not only bridges the semantic gap between raw code and intelligent models but also sets a new standard for how developers can visualize, understand, and manipulate their codebases in unprecedented ways.


Read more