Build a Knowledge Base from Slack and Chat History

The Problem

Your 25-person engineering team has been using Slack for 3 years. Somewhere in those 140,000 messages are the answers to every recurring question: why the billing service uses a separate database, how to configure the staging VPN, what the workaround is for that OAuth edge case. But nobody can find anything.

Engineers spend 20-30 minutes per day searching Slack for answers that were already given months ago. The same questions get asked every few weeks -- "How do I connect to the staging DB?" has been answered 11 times across 3 channels, by 6 different engineers, each with slightly different instructions. When someone leaves the company, their knowledge disappears into an archive that nobody reads. Two senior engineers left last year, and their departures created knowledge gaps the team is still stumbling into.

The team tried a wiki. It went stale in 2 months. The problem was never the wiki itself -- it was that nobody has time to write documentation from scratch. Writing a clean article about VPN setup takes 30 minutes. Explaining it in a Slack thread while helping someone takes 5 minutes and happens naturally. The knowledge already exists, scattered across thousands of messages. It just needs to be extracted, deduplicated, and organized.

The Solution

Using the data-extractor, web-scraper, and markdown-writer skills, the agent parses 3 years of Slack exports, identifies the high-value threads (problem/solution pairs, architecture decisions, setup guides), deduplicates answers to the same question, and produces a structured knowledge base of 412 articles organized into searchable categories with cross-references.

Step-by-Step Walkthrough

Step 1: Export and Analyze Chat History

Parse our Slack export (slack-export.zip). Identify the highest-value messages: solutions to problems, architectural decisions, config explanations, and how-to threads.

The analysis covers 143,291 messages across 47 channels spanning January 2023 through February 2026. The vast majority of messages are not knowledge -- they are casual conversation, meeting scheduling, emoji reactions, and social chat. The filter identifies high-value content by looking for problem/solution patterns, code blocks, configuration snippets, and decision-making language:

Content Type	Count
Problem/solution threads	342
Architecture decisions	89
How-to explanations	156
Config/setup guides	78
Total high-value items	665

The richest channels: #engineering (127 items), #incidents (84), #help (68), #devops (63). Incident channels are particularly valuable because solutions posted during outages tend to be precise, battle-tested, and well-explained -- engineers write clearly when the pressure is on and someone needs to follow their instructions exactly.

Step 2: Extract and Deduplicate Knowledge

The same question answered 11 times is not 11 knowledge articles -- it is one article, using the most complete and most recent answer:

Extract the top knowledge threads. Deduplicate — if the same question was answered multiple times, keep the most complete answer. Group by topic.

Extraction produces 412 unique knowledge items after merging 47 duplicate threads into single authoritative answers. The deduplication is important -- the staging DB question has been answered 11 times, but only 2 of those answers are complete and current. The rest reference old hostnames or missing environment variables.

Category	Items	Examples
Infrastructure and DevOps	94	Database connections, VPN setup, deploy procedures, Docker configs
Application Architecture	67	Service boundaries, API design decisions, data model choices, why things are the way they are
Debugging and Troubleshooting	118	Common errors with solutions, performance fixes, third-party API quirks
Onboarding and Setup	53	Environment setup, tool configs, access request procedures
Business Logic	80	Billing rules, compliance requirements, feature specifications, edge case handling

The debugging category is the largest because engineers naturally explain solutions in detail when they are helping someone in real time. A Slack thread about fixing a production issue often contains better documentation than anything written after the fact -- it includes the symptoms, the investigation steps, the red herrings, and the actual fix. These threads are essentially documentation written under real conditions. They just need to be reformatted.

Step 3: Generate Knowledge Base Articles

Convert the extracted threads into structured knowledge base articles. Each should have a clear title, context, solution, and last-verified date.

Each article follows a consistent format: a clear question as the title, context explaining when and why this comes up, the solution with exact commands or code, and metadata about the source and verification date. Three representative examples:

infrastructure/connect-to-staging-database.md

Q: How do I connect to the staging database?
A: Use the VPN first (see vpn-setup.md), then: psql "postgresql://readonly:$STAGING_DB_PASS@staging-db:5432/app"
Source: #devops, confirmed by 3 engineers. Last verified: 2026-01.

architecture/why-billing-has-separate-db.md

Q: Why does billing use a separate PostgreSQL instance?
A: Isolation for PCI compliance. Billing data cannot share a database with user-generated content per the compliance requirements established in March 2024.
Source: #engineering, decision made 2024-03. Decision by CTO and lead architect.

debugging/oauth-refresh-token-race-condition.md

Q: Users randomly get logged out mid-session.
A: Race condition in refresh token rotation. Two concurrent requests both try to rotate the token, and one gets invalidated. Fix: add a mutex lock around the rotation logic. See PR #847 for the implementation.
Source: #incidents, resolved 2025-08.

The architecture decisions are particularly valuable. Six months from now, when someone asks "why does billing use a separate database?", the answer is documented with the original context and decision-makers -- instead of requiring someone to remember a conversation from 2024.

Step 4: Build Navigation and Search

Create a table of contents, tag index, and search-friendly structure. Add cross-references between related articles.

The knowledge base structure:

docs/knowledge-base/
  README.md          # Table of contents: 5 categories, 412 articles
  _tags.md           # Tag index: 34 tags, most used: "database" (41), "auth" (28)
  infrastructure/    # 94 articles
  architecture/      # 67 articles
  debugging/         # 118 articles
  onboarding/        # 53 articles
  business-logic/    # 80 articles

Cross-references link related articles: connect-to-staging-database.md links to vpn-setup.md and database-credentials.md. The oauth-refresh-token-race-condition.md article links to auth-architecture.md for broader context on the auth system design.

Every article uses consistent YAML frontmatter (title, tags, date, source channel), making the knowledge base compatible with GitHub search, Docusaurus, GitBook, or any static site generator. The team can deploy it however they prefer without restructuring.

Step 5: Set Up Continuous Knowledge Capture

A knowledge base that stops growing goes stale -- which is exactly what killed the wiki. The last step prevents the same problem from recurring:

Create a Slack workflow that lets engineers flag messages as "knowledge" and automatically adds them to the base.

The workflow is designed to be as low-friction as possible:

React with the :brain: emoji on any valuable Slack message
A bot collects the full thread and asks the person for a one-line summary
A draft article is created in docs/knowledge-base/inbox/
Weekly: a reviewer approves, edits, or rejects inbox items and merges them to the main knowledge base

The key difference from the wiki approach: engineers are not writing documentation. They are flagging conversations that already happened. The writing was done in the natural flow of helping someone. The :brain: emoji just captures it before it disappears into the archive.

Additionally, a monthly report surfaces the most-searched-but-missing topics -- questions people are looking for in the knowledge base that do not have articles yet. This turns knowledge gaps into a visible backlog that can be addressed systematically.

Real-World Example

Priya, a staff engineer at a 30-person healthtech startup, noticed her team spent more time searching Slack than writing code. She exported 3 years of chat history and ran the extraction workflow on a Monday morning. The data-extractor identified 342 problem-solution threads, including the staging database question that had been answered 11 separate times by 6 different engineers -- each one slightly different, none of them easy to find when you needed the answer.

The markdown-writer produced 412 articles organized into 5 categories. She deployed them as a Docusaurus site on the internal network. Within the first week, the #help channel volume dropped 35%. New hires started finding answers on their own instead of posting questions and waiting hours for a response. After 2 months, the knowledge base had grown to 480 articles as engineers used the :brain: emoji workflow to capture new solutions in real time -- turning the stale wiki problem into a living knowledge system that grows naturally from the conversations the team is already having.

Skills stack · 3 skills

data-extractor

markdown-writer

web-scraper