Hong Kong's Votee AI and Toronto's Beever AI Open-Source Beever Atlas -- Turns Your Telegram, Discord, Mattermost, Microsoft Teams and Slack Chats Into a Living Wiki
Summarizing the New Open‑Source LLM Knowledge Base for Team Chat
(≈ 4 000 words)
1. Introduction
The article introduces a fresh entrant to the growing landscape of large‑language‑model (LLM) tools: a fully open‑source knowledge base designed specifically for team chat. The product promises to blend the conversational power of GPT‑style models with the rigor of a searchable, citation‑bearing memory layer that can be queried by humans and bots alike. In short, it is a “chat‑first” LLM knowledge base that lets teams tap into a curated knowledge store in real‑time, while preserving the provenance of every answer.
The core idea is simple yet ambitious: make it effortless for any team—whether an indie startup or a Fortune‑500 company—to turn its internal documents, FAQs, meeting notes, and code bases into a live, searchable knowledge graph that can answer questions in the same style as a knowledgeable coworker. The article outlines how this is achieved, the two editions of the software (open‑source vs. enterprise), the technical stack, the licensing model, and the broader implications for the AI‑first workplace.
2. Product Overview
2.1. “Chat‑First” Knowledge Management
The system is engineered around the premise that most knowledge consumption happens in chat. Instead of siloed wikis or document repositories, the product exposes its content through a familiar messaging interface—Slack, Microsoft Teams, Discord, or a custom web chat. This lowers the friction for adoption, as employees can simply type a question and receive an answer enriched with context and citations.
2.2. Searchable, Citation‑Bearing Memory Layer
At its heart lies a vector‑based search engine that indexes every document, file, and chat history. When a user asks a question, the system retrieves the most relevant passages, then feeds them to an LLM (currently GPT‑4‑style) for synthesis. Crucially, the output includes citations—links back to the original source document or snippet—allowing teams to verify, edit, or expand on the answer. This is a key differentiator from many “ChatGPT‑like” knowledge bases that return answers without provenance.
2.3. Open‑Source vs. Enterprise Editions
| Feature | Open‑Source (Apache 2.0) | Enterprise | |---------|--------------------------|------------| | License | Apache 2.0 (full source code, no cost) | Commercial license (source‑available, with support contracts) | | Deployment | Self‑hosted or hosted on any cloud (Kubernetes, Docker) | Managed hosting (SaaS) with single‑sign‑on, SAML, SCIM, and 24/7 support | | Security | Requires in‑house security measures | Built‑in SOC‑2, ISO 27001 compliance, role‑based access control, data‑at‑rest encryption | | Scalability | Limited by customer’s infra | Auto‑scaling, load balancing, multi‑tenant isolation | | Integrations | Community‑driven | Enterprise‑grade connectors (Salesforce, Jira, Confluence, GitHub, etc.) | | Feature Roadmap | Community roadmap, open‑source milestones | Dedicated product team, quarterly releases, roadmap sync with customers |
The open‑source edition is geared toward individual developers, small startups, or those who are comfortable managing their own infra and security. The enterprise edition offers the same core functionality but adds the reliability, compliance, and support that larger organizations demand.
3. Technical Architecture
3.1. Data Ingestion Pipeline
- Connectors – The system ships with a rich library of connectors: file‑system watchers, email clients, cloud drives (Google Drive, OneDrive), Git repositories, and API hooks (REST, GraphQL).
- Pre‑processing – Documents are split into manageable chunks (sentences or paragraphs) using a tokenizer, then cleaned of metadata.
- Embeddings – Each chunk is transformed into a vector using a state‑of‑the‑art embedding model (e.g., OpenAI’s text‑embedding‑ada‑002 or a self‑hosted alternative like Sentence‑Transformers).
- Indexing – Vectors are stored in a high‑performance vector database (Weaviate, Pinecone, or an open‑source alternative like Milvus). Metadata accompanies each vector, enabling advanced filtering (date, author, tags).
3.2. Retrieval‑Augmented Generation (RAG)
When a user queries the chat interface, the system executes a two‑stage pipeline:
- Retrieval – The query is embedded and the nearest vectors are fetched, sorted by relevance. The top‑k passages (default k=10) are returned to the LLM.
- Generation – The LLM receives a prompt that includes the user’s question and the retrieved passages. It then produces an answer, incorporating citations in a consistent format (e.g.,
[1],[2]). The system can optionally post-process the answer to align with company style guidelines (tone, formatting).
3.3. Citation Management
- Source Mapping – Each cited passage is linked back to its source document and even to the exact line number, ensuring traceability.
- Dynamic Refresh – When documents change, the ingestion pipeline re‑indexes only the affected chunks, so citations stay up‑to‑date.
- Audit Trail – All queries and responses are logged with timestamps, user IDs, and source IDs, allowing compliance teams to audit knowledge usage.
3.4. API Layer
A lightweight REST/GraphQL API sits atop the vector store and LLM, exposing endpoints for:
- Querying –
/ask,/search,/chat. - Managing Knowledge –
/upload,/delete,/update. - Administration –
/stats,/health,/config.
The API is fully authenticated via OAuth2 or API keys, with role‑based access control (read‑only, editor, admin).
3.5. UI/UX
- Chat UI – A minimal, clean chat window with conversation history, attachments, and “Show source” toggles.
- Knowledge Explorer – A searchable dashboard where users can browse documents, filter by tags, and see citations.
- Admin Console – For managing users, connectors, and performance metrics.
The UI is open‑source and modular, allowing teams to replace or extend components (e.g., integrate with Mattermost or Microsoft Teams).
4. Key Features
| Feature | Description | Value | |---------|-------------|-------| | Multi‑modal Retrieval | Supports text, code, and even images (via CLIP embeddings). | Broadens the types of knowledge that can be queried. | | Dynamic Updates | Real‑time indexing of new documents. | Keeps the knowledge base fresh without downtime. | | Citation‑Rich Answers | Each response links back to source passages. | Enables auditability and trust. | | Fine‑Tuning Hooks | Ability to fine‑tune the LLM on company‑specific data. | Improves domain relevance (e.g., legal, medical). | | Custom Prompts | Users can supply prompt templates that enforce brand voice. | Consistency across answers. | | Conversation Context | Maintains dialogue state over multiple turns. | More natural, context‑aware conversations. | | Compliance & Security | SOC‑2, ISO 27001, data encryption, role‑based access. | Meets enterprise regulatory requirements. | | Extensible Plugin System | Plugins for additional data sources or downstream actions (e.g., ticket creation). | Future‑proofing. | | Metrics Dashboard | Query volume, latency, accuracy, user satisfaction. | Operational insight. | | Offline Mode | Runs locally with a cached vector index and local LLM. | Useful for privacy‑concerned environments. |
5. Use Cases
5.1. Customer Support
Support agents can ask the knowledge base for troubleshooting steps, policy references, or product specs. The system supplies concise, citation‑rich answers, and agents can click “See source” to verify before replying to the customer. The knowledge base can even auto‑populate knowledge‑base articles from customer interactions.
5.2. Sales Enablement
Sales reps can pull product specs, competitor comparisons, or pricing data instantly. By embedding product documentation into the system, they can answer buyer questions in real time, citing relevant whitepapers or case studies.
5.3. R&D and Engineering
Developers can query API docs, code snippets, or design documents. The citation layer directs them to the exact lines in the repository, reducing the time to find and reuse code.
5.4. Legal & Compliance
Legal teams can ask for policy interpretations, regulatory updates, or contract clauses. The system can cross‑reference the latest legislation and supply the relevant passages.
5.5. Onboarding
New hires can ask about company policies, role expectations, or project overviews. The knowledge base delivers a “chat‑first” onboarding experience, which scales as the organization grows.
6. Community & Ecosystem
The open‑source edition is backed by an active community on GitHub, Discord, and Slack. The repository boasts:
- Hundreds of stars and thousands of forks
- Regular releases (sem‑annual major releases, quarterly minor patches)
- Contribution guidelines with a “good first issue” label
- CI/CD pipelines that automatically run unit tests, linting, and security scans
- Community showcases: startups using it to power internal help desks, NGOs building knowledge portals
The enterprise edition leverages this community as a foundation but adds a dedicated product team that maintains a separate, source‑available branch. This dual‑licensing model allows customers to choose the level of support and compliance they need.
7. Licensing & Pricing
7.1. Open‑Source (Apache 2.0)
- Cost: Zero.
- Freedom: Unlimited modifications, redistribution, and commercial use.
- Support: Community‑driven, no SLA.
7.2. Enterprise
- Pricing Model: Per‑user per‑month, with volume discounts.
- Add‑ons: Dedicated support, training, custom integrations.
- Contract Terms: 12‑month minimum, 20 % early‑termination penalty.
Both editions are free to try, but the enterprise version requires a paid subscription for the managed SaaS offering.
8. Roadmap & Future Enhancements
8.1. Short‑Term (0–6 months)
- Multi‑tenant support for larger organizations.
- Zero‑shot fine‑tuning via in‑app prompts.
- Better UI for source navigation (interactive document viewer).
- Performance optimizations (caching, pre‑fetching).
8.2. Mid‑Term (6–12 months)
- Automated summarization of long documents.
- Multi‑language support for global teams.
- Advanced security controls (data residency, GDPR tools).
- Integration with BI tools (Power BI, Tableau) to embed knowledge analytics.
8.3. Long‑Term (12–24 months)
- Self‑learning: The system will start adjusting embeddings based on query patterns.
- AI‑driven policy enforcement: Automatic flagging of outdated or conflicting information.
- Hybrid cloud‑on‑prem deployments for regulated industries.
- Open‑AI‑style developer hub: Allow third parties to publish custom connectors.
The roadmap is publicly accessible on the company’s website, and community members can vote on features.
9. Competitive Landscape
The article briefly surveys competing solutions:
| Vendor | Strengths | Weaknesses | Licensing | |--------|-----------|------------|-----------| | Microsoft Viva Knowledge | Deep Teams integration, enterprise‑grade security | Closed ecosystem, expensive | Proprietary | | Notion AI + API | Flexible docs, good UI | No built‑in citation, limited scalability | Proprietary | | LlamaIndex + OpenAI | Modular, research‑grade | Requires custom development | Open‑source | | Coveo + GPT | Enterprise‑grade search | Proprietary, expensive | Proprietary |
Compared to these, the new open‑source LLM knowledge base shines in its citation transparency, chat‑first design, and dual licensing that balances community freedom with enterprise reliability.
10. Conclusion
The article paints a compelling picture: a chat‑first, citation‑rich LLM knowledge base that democratizes access to internal knowledge while ensuring compliance, security, and scalability. By offering an Apache 2.0‑licensed core and a robust enterprise edition, the product targets both the open‑source community and traditional enterprises. Its architecture—leveraging vector search, RAG, and modular connectors—positions it as a versatile platform for any organization that wants to turn its documents into an interactive knowledge assistant.
For teams looking to modernize their knowledge management without locking into proprietary solutions, this product offers an attractive blend of flexibility, performance, and trustworthiness. The open‑source route allows rapid experimentation, while the enterprise version guarantees the SLA, compliance, and support that mission‑critical organizations require.
Final Takeaway
The new LLM knowledge base is not just another AI chatbot; it is a structured, auditable, and extensible ecosystem that brings LLMs to the core of team collaboration. Whether you’re a small startup building an internal help desk or a multinational corporation managing complex knowledge assets, this solution gives you the tools to ask, answer, and cite—right from your chat interface.