Building a Multi-Agent Development Workflow
- The Problem
- Architecture Overview
- Layer 1: Ruler - Unified Agent Configuration
- Layer 2: Beads - Git-Native Issue Tracking
- Layer 3: PostgreSQL Advisory Locks - Atomic Claims
- Layer 4: Git Worktrees - Filesystem Isolation
- Layer 5: Skillz - Domain-Specific Validation
- Layer 6: Agent Learning - Context Passing & State Consistency
- Layer 7: Git Hooks
- Layer 8: Session Protocols
- Putting It All Together
- Complete Setup Checklist
- Measuring Success
- Common Pitfalls
- Conclusion
AI coding assistants are powerful. But running multiple agents across different machines? That’s where things get interesting; and chaotic.
I’ve built a workflow that lets multiple AI agents work on the same codebase without stepping on each other’s toes. This post breaks down the complete system using open-source tools.
The Problem
Imagine this: You have Claude running in your IDE, another instance in your terminal, and maybe a third on your laptop. All working on the same repo. What could go wrong?
- Race conditions: Two agents pick up the same task
- Conflicting changes: Agent A refactors a function while Agent B adds to it
- Build artifact conflicts: Agent A’s
.venvgets corrupted by Agent B’suv sync - Lost context: Agent crashes mid-task, work is abandoned
- Configuration drift: Each agent follows different rules
- No visibility: You have no idea what each agent is doing
The solution isn’t to use fewer agents; it’s to build coordination infrastructure.
Architecture Overview
┌──────────────────────────────────────────────────────────────┐
│ Agent Configuration Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ Ruler │ │ Skillz │ │ MCP Servers │ │
│ │ (Rules) │ │ (Validation) │ │ (Tools) │ │
│ └──────────────┘ └──────────────┘ └────────────────────┘ │
├──────────────────────────────────────────────────────────────┤
│ Issue Tracking Layer │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Beads - Git-native issue tracking │ │
│ │ - JSONL format, one issue per line │ │
│ │ - Sync via git pull/push │ │
│ │ - CLI for agents, not web UI │ │
│ └────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────┤
│ Coordination Layer │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ Advisory Locks │ │ Git Worktrees │ │ Claim Protocol │ │
│ │ (Who works) │ │ (Isolation) │ │ (Atomic ops) │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
├──────────────────────────────────────────────────────────────┤
│ Context & Learning Layer │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Agent Learning - Institutional knowledge │ │
│ │ - PostgreSQL (project_dev) + Git JSONL sync │ │
│ │ - Automatic extraction from commits │ │
│ │ - Context-aware retrieval with token budgets │ │
│ │ - Staleness detection & confidence decay │ │
│ └────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────┤
│ Git Integration Layer │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────────┐ │
│ │ pre-commit │ │ pre-push │ │ post-merge │ │
│ │ (lint) │ │ (validate) │ │ (auto-sync) │ │
│ └────────────┘ └─────────────┘ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Let’s build each layer.
Layer 1: Ruler - Unified Agent Configuration
Every agent needs the same rules. Without consistency, you get chaos. Ruler solves this.
The Problem with Multiple Agents
Different AI tools expect configuration in different places:
| Tool | Config Location |
|---|---|
| Claude Code | CLAUDE.md |
| Cursor | .cursor/rules/ |
| GitHub Copilot | .github/copilot-instructions.md |
| Windsurf | .windsurfrules |
Maintaining 5 copies of the same rules? Nightmare.
Ruler: Single Source of Truth
Ruler generates agent-specific configs from a single source:
.ruler/
├── ruler.toml # Which agents to generate for
├── AGENTS.md # Main rules (the source of truth)
├── python-patterns.md # Python-specific rules
├── testing-rules.md # Testing requirements
└── security-rules.md # Security constraints
ruler.toml:
[agents.claude]
output = "CLAUDE.md"
mcp_output = ".mcp.json"
[agents.cursor]
output = ".cursor/rules/AGENTS.md"
mcp_output = ".cursor/mcp.json"
[agents.copilot]
output = ".github/copilot-instructions.md"
AGENTS.md (your rules):
# Project Rules
## Critical Constraints
1. **Never decrease test coverage** - 90%+ required
2. **TODOs must reference issues** - Format: `# TODO(#123): description`
3. **No secrets in code** - Use environment variables
## Before Every Commit
pytest && pylint --recursive=y .
## Code Style
- Type hints on ALL functions
- Async/await everywhere
- 80 char lines
Generate all configs:
npx @intellectronica/ruler apply
Now Claude, Cursor, and Copilot all follow the same rules.
Modular Rules
Split rules into focused files:
<!-- .ruler/AGENTS.md -->
# Project Rules
{{include:python-patterns.md}}
{{include:testing-rules.md}}
{{include:security-rules.md}}
Each file handles one concern. Ruler concatenates them.
Layer 2: Beads - Git-Native Issue Tracking
GitHub Issues are great for humans. Terrible for agents. Why?
- API rate limits
- Network latency
- No offline access
- Can’t atomically claim work
Beads solves this with git-native issue tracking.
Installation
# macOS
brew install steveyegge/tap/bd
# Or from source
cargo install beads
Initialize in Your Repo
cd your-project
bd init
This creates:
.beads/
├── config.yaml # Configuration
├── issues.jsonl # Issues (git-tracked)
└── metadata.json # Local state
Basic Commands
# Create issues
bd create --title="Add user authentication" --type=feature --priority=2
# List open issues
bd list --status=open
# Show ready work (no blockers)
bd ready
# View issue details
bd show proj-abc1
# Claim work
bd update proj-abc1 --status=in_progress
# Complete work
bd close proj-abc1 --reason="Implemented in commit abc123"
# Sync with remote
bd sync
Why JSONL?
Issues are stored as JSON Lines; one issue per line:
{"id":"proj-001","title":"Add auth","status":"open","priority":2,"created_at":"2026-01-08T10:00:00Z"}
{"id":"proj-002","title":"Fix login","status":"in_progress","priority":1,"created_at":"2026-01-08T11:00:00Z"}
{"id":"proj-003","title":"Update docs","status":"closed","priority":3,"created_at":"2026-01-08T12:00:00Z"}
Benefits:
- Git-native: Changes tracked like code
- Merge-friendly: Conflicts are per-line, easy to resolve
- Offline-capable: No network needed
- Fast: Local SQLite for queries, JSONL for sync
Dependencies
Beads tracks issue dependencies:
# Create related issues
bd create --title="Design auth API" --type=task
# Created: proj-xyz1
bd create --title="Implement auth API" --type=task
# Created: proj-xyz2
# Add dependency (implement depends on design)
bd dep add proj-xyz2 proj-xyz1
# See what's blocked
bd blocked
# proj-xyz2 won't show in `bd ready` until proj-xyz1 is closed
Conflict Resolution
When two agents edit issues.jsonl simultaneously:
<<<<<<< HEAD
{"id":"proj-001","status":"in_progress","assignee":"agent-a"}
=======
{"id":"proj-001","status":"in_progress","assignee":"agent-b"}
>>>>>>> origin/main
Resolution rules (built into bd sync):
closedbeatsin_progressbeatsopen- Later timestamp wins for same status
- Both versions kept for different issues
After resolving: bd sync rebuilds the local database.
Layer 3: PostgreSQL Advisory Locks - Atomic Claims
The beads claim protocol has a race window:
# Agent A # Agent B
bd sync bd sync
# Both see issue proj-abc1 as open
bd update proj-abc1 --status=in_progress
bd update proj-abc1 --status=in_progress
# Both agents now working on same issue!
The solution: PostgreSQL advisory locks.
Why Advisory Locks?
| Feature | Benefit |
|---|---|
| Session-level | Auto-release on agent crash (no orphaned locks) |
| Non-blocking | Immediate failure if already claimed (no waiting) |
| Fast | Pure in-memory, no table writes (~1ms overhead) |
| Cluster-aware | Works across all PostgreSQL connections |
| Built-in | No external dependencies (Redis, ZooKeeper, etc.) |
Implementation
Create a simple lock module:
# db/advisory_locks.py
import hashlib
from contextlib import asynccontextmanager
from sqlalchemy import text
from sqlalchemy.ext.asyncio import AsyncSession
def _issue_id_to_lock_key(issue_id: str) -> int:
"""Convert issue ID to PostgreSQL advisory lock key (int32)."""
hash_digest = hashlib.md5(issue_id.encode()).hexdigest()
return int(hash_digest[:8], 16) % (2**31 - 1)
async def acquire_issue_lock(session: AsyncSession, issue_id: str) -> bool:
"""Try to acquire advisory lock. Returns True if acquired."""
lock_key = _issue_id_to_lock_key(issue_id)
result = await session.execute(
text("SELECT pg_try_advisory_lock(:key)"),
{"key": lock_key}
)
return result.scalar()
async def release_issue_lock(session: AsyncSession, issue_id: str) -> bool:
"""Release advisory lock. Returns True if released."""
lock_key = _issue_id_to_lock_key(issue_id)
result = await session.execute(
text("SELECT pg_advisory_unlock(:key)"),
{"key": lock_key}
)
return result.scalar()
@asynccontextmanager
async def issue_lock(session: AsyncSession, issue_id: str):
"""Context manager for issue locks with auto-cleanup."""
acquired = await acquire_issue_lock(session, issue_id)
try:
yield acquired
finally:
if acquired:
await release_issue_lock(session, issue_id)
The Claim Script
Create a script that atomically claims issues:
# scripts/bd_claim.py
import asyncio
import subprocess
import sys
from db.advisory_locks import acquire_issue_lock
from db.client import get_session
async def claim_issue(issue_id: str) -> bool:
"""Atomically claim an issue via advisory lock + beads update."""
async with get_session() as session:
# Step 1: Try to acquire lock
if not await acquire_issue_lock(session, issue_id):
print(f"Issue {issue_id} already claimed by another agent")
return False
# Step 2: Update beads status (lock held, safe from races)
result = subprocess.run(
["bd", "update", issue_id, "--status=in_progress"],
capture_output=True, text=True
)
if result.returncode != 0:
print(f"Failed to update beads: {result.stderr}")
return False
print(f"Successfully claimed {issue_id}")
return True
if __name__ == "__main__":
issue_id = sys.argv[1]
success = asyncio.run(claim_issue(issue_id))
sys.exit(0 if success else 1)
Shell wrapper for convenience:
#!/bin/bash
# scripts/bd_claim.sh
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR/.."
exec uv run python scripts/bd_claim.py "$@"
Updated Claim Protocol
# Agent A # Agent B
bd_claim.sh proj-abc1 bd_claim.sh proj-abc1
# ✓ Lock acquired # ✗ Already claimed
# ✓ Issue claimed # Agent B picks different task
No more race conditions.
Layer 4: Git Worktrees - Filesystem Isolation
Advisory locks solve coordination (who works on what). But agents still share the same working directory. This causes:
- Build artifact conflicts: Agent A’s
pytestcorrupts Agent B’s__pycache__ - Virtual environment races: Simultaneous
uv synccommands - Uncommitted change conflicts: Agent A’s WIP blocks Agent B’s tests
The solution: git worktrees.
What Are Worktrees?
Git worktrees let you check out multiple branches simultaneously in different directories:
# Main repo
/project/ # main branch
# Worktrees (separate directories, shared .git)
/project-worktrees/
├── proj-abc1/ # work/proj-abc1 branch
├── proj-xyz2/ # work/proj-xyz2 branch
└── proj-def3/ # work/proj-def3 branch
Key insight: All worktrees share the same .git directory, saving disk space while providing complete filesystem isolation.
Worktree Management Script
Create a script that combines claiming with worktree creation:
#!/bin/bash
# scripts/agent_worktree.sh
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
WORKTREE_BASE="${WORKTREE_BASE:-$REPO_ROOT/../project-worktrees}"
create_worktree() {
local issue_id="$1"
local worktree_path="$WORKTREE_BASE/$issue_id"
local branch_name="work/$issue_id"
# Step 1: Claim issue via advisory lock (atomic)
echo "Claiming issue $issue_id..."
if ! "$SCRIPT_DIR/bd_claim.sh" "$issue_id"; then
echo "Failed to claim (already taken?)"
exit 1
fi
# Step 2: Create worktree directory
mkdir -p "$WORKTREE_BASE"
# Step 3: Create worktree on new branch
echo "Creating worktree at $worktree_path..."
cd "$REPO_ROOT"
git worktree add "$worktree_path" -b "$branch_name"
# Step 4: Copy environment files (not in git)
if [-f "$REPO_ROOT/.env"](/-f "$REPO_ROOT/.env"/); then
cp "$REPO_ROOT/.env" "$worktree_path/.env"
fi
# Step 5: Install dependencies
echo "Installing dependencies..."
cd "$worktree_path"
uv sync --quiet
echo "Worktree ready: $worktree_path"
echo "Branch: $branch_name"
}
remove_worktree() {
local issue_id="$1"
local worktree_path="$WORKTREE_BASE/$issue_id"
local branch_name="work/$issue_id"
echo "Removing worktree..."
cd "$REPO_ROOT"
git worktree remove "$worktree_path" --force 2>/dev/null || rm -rf "$worktree_path"
git worktree prune
# Delete branch if merged
git branch -d "$branch_name" 2>/dev/null || true
echo "Cleanup complete"
}
list_worktrees() {
echo "Active worktrees:"
git worktree list
if [-d "$WORKTREE_BASE"](/-d "$WORKTREE_BASE"/); then
echo ""
echo "Worktree status:"
for dir in "$WORKTREE_BASE"/*/; do
if [-d "$dir"](/-d "$dir"/); then
local id=$(basename "$dir")
local branch=$(cd "$dir" && git branch --show-current)
local uncommitted=$(cd "$dir" && git status --porcelain | wc -l)
echo " $id (branch: $branch, uncommitted: $uncommitted files)"
fi
done
fi
}
case "${1:-}" in
create) create_worktree "$2" ;;
remove) remove_worktree "$2" ;;
list) list_worktrees ;;
*) echo "Usage: $0 {create|remove|list} [issue-id]" ;;
esac
Worktree Workflow
# 1. Create isolated worktree (claims issue automatically)
scripts/agent_worktree.sh create proj-abc1
# 2. Work in isolated directory
cd ../project-worktrees/proj-abc1
# Make changes with complete isolation
uv run pytest # Won't affect other agents
uv sync # Safe, isolated .venv
git add . && git commit -m "Implement feature"
git push -u origin work/proj-abc1
# 3. Create PR, get review, merge to main
# 4. Cleanup
scripts/agent_worktree.sh remove proj-abc1
bd close proj-abc1 && bd sync
Worktree vs Advisory Locks
Both mechanisms solve different problems:
| Mechanism | Purpose | Scope |
|---|---|---|
| Advisory Locks | WHO works on WHAT | Coordination |
| Worktrees | HOW they work | Isolation |
They work together:
- Lock prevents duplicate claims (coordination)
- Worktree prevents file conflicts (isolation)
Disk Space Considerations
Each worktree duplicates working files but shares .git:
| Component | Main Repo | Per Worktree |
|---|---|---|
.git/ |
~200MB | 0 (shared) |
| Source code | ~50MB | ~50MB |
.venv/ |
~300MB | ~300MB |
node_modules/ |
~200MB | ~200MB |
| Total | ~750MB | ~550MB |
For 5 concurrent agents: ~750MB + 5×550MB = ~3.5GB total
Worth it for complete isolation.
Layer 5: Skillz - Domain-Specific Validation
Generic linting catches syntax errors. Skills catch domain errors.
Skillz is an MCP server that provides domain-specific validation skills.
What’s a Skill?
A skill is a specialized instruction set with tool restrictions:
# .ruler/skills/api-validator/SKILL.md
---
name: api-validator
description: Validate API endpoints follow conventions. Triggers on "api check", "endpoint review".
allowed-tools: Read, Grep, Glob
---
# API Validation Skill
## Checks
1. All endpoints return `{data, meta}` structure
2. Error responses include `error_code` and `message`
3. No breaking changes to existing endpoints
4. Authentication required on non-public routes
## Validation Process
1. Find all router files: `glob("**/routers/**/*.py")`
2. Check response models include required fields
3. Verify error handlers follow pattern
4. Report violations with file:line references
The allowed-tools restriction is crucial. A validation skill shouldn’t edit files; only read them.
Setting Up Skillz
Add to your MCP configuration (.mcp.json):
{
"skillz": {
"command": "uvx",
"args": ["skillz@latest", "/path/to/your/project/.ruler/skills"]
}
}
Ruler can generate this automatically:
# ruler.toml
[mcp.skillz]
command = "uvx"
args = ["skillz@latest", ".ruler/skills"]
Skill Categories
Create skills for different validation needs:
.ruler/skills/
├── api-validator/ # API contract compliance
├── security-check/ # OWASP patterns
├── architecture/ # Layer boundaries
├── test-quality/ # Test coverage, meaningful assertions
└── domain-rules/ # Business logic constraints
Example architecture skill:
# .ruler/skills/architecture/SKILL.md
---
name: architecture
description: Validate layer boundaries. Triggers on "architecture check", "layer validation".
allowed-tools: Read, Grep, Glob
---
# Architecture Validation
## Layer Rules
| Layer | Can Import From |
|-------------------|---------------------|
| 1. Core | Nothing |
| 2. Domain | Layer 1 |
| 3. Application | Layers 1, 2 |
| 4. Infrastructure | Layers 1, 2, 3 |
| 5. Presentation | All layers |
## Validation
1. Find all Python files
2. Extract imports
3. Check each import against layer rules
4. Report violations: "file.py:15 - Data layer cannot import from Presentation layer"
Invoking Skills
In your agent instructions:
## When to Use Skills
- Before committing router changes: invoke `api-validator`
- Before any commit: invoke `architecture`
- When adding auth code: invoke `security-check`
Agents can invoke skills via MCP tool calls or slash commands (depending on the client).
Layer 6: Agent Learning - Context Passing & State Consistency
One of the biggest challenges in multi-agent workflows is context passing between agents and managing state consistency at scale.
The Agent Learning System solves this by transforming ephemeral task context into durable institutional knowledge that persists across agents and sessions.
The Learning Problem
When Agent A discovers that “Repository classes must use AsyncSession, not engine directly,” that insight is lost when the session ends. Agent B working on a similar task later has to rediscover this pattern, wasting time and potentially making inconsistent decisions.
Without learning:
- Each agent rediscovers the same patterns
- Inconsistent decisions across agents
- No shared understanding of architectural choices
- Context lost when sessions end
With learning:
- Agents inherit prior discoveries
- Consistent decisions across sessions
- Shared institutional knowledge
- Context persists across machines
Architecture
The system uses a two-tier storage architecture:
- PostgreSQL database (
project_dev) - Fast queries, semantic search, staleness detection - Git-synced JSONL (
.learnings/corpus.jsonl) - Cross-machine sync, version control, offline access
┌─────────────────────┐ ┌─────────────────────┐
│ Agent A (Mac 1) │ │ Agent B (Mac 2) │
│ │ │ │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ Local PostgreSQL│ │ │ │ Local PostgreSQL│ │
│ │ (learnings DB) │ │ │ │ (learnings DB) │ │
│ └────────┬────────┘ │ │ └────────▲────────┘ │
│ │ export │ │ import │ │
│ ▼ │ │ │ │
│ ┌─────────────────┐ │ │ ┌─────────────────┐ │
│ │ .learnings/ │ │ │ │ .learnings/ │ │
│ │ corpus.jsonl │ │ │ │ corpus.jsonl │ │
│ └────────┬────────┘ │ │ └────────▲────────┘ │
└──────────┼──────────┘ └──────────┼──────────┘
│ │
│ git push git pull │
│ │
└───────────► GitHub/GitLab ◄──────────────┘
(.learnings/corpus.jsonl)
Storage Schema
Each learning captures:
| Field | Purpose |
|---|---|
insight |
The learning (1-3 sentences, specific) |
category |
Classification: architecture, domain, patterns, trade-offs, anti-patterns, edge-cases, tooling |
evidence |
Supporting evidence: {"files": [...], "commits": [...], "docs": [...]} |
confidence |
0.0-1.0 (decays over time) |
relevance_scope |
Where it applies: global, layer-1 through layer-5, file-specific |
tags |
Semantic tags for retrieval |
embedding |
384-dim vector for semantic search |
Automatic Learning Extraction
Learnings are automatically extracted from completed tasks via git hooks:
# Agent implements feature
git commit -m "Add feature X (proj-abc1)"
# Post-commit hook automatically:
# 1. Analyzes commit message, files changed, code diff
# 2. Uses LLM to extract key learnings
# 3. Stores learnings in database
# 4. Exports to .learnings/corpus.jsonl
The extraction prompt prioritizes implementation-specific insights:
- Patterns & decisions: How was the feature implemented?
- Research-backed choices: Thresholds, formulas based on research
- Architectural choices: Why reuse existing infrastructure?
- Edge cases: Floating-point precision, boundary conditions
- Trade-offs: What alternatives were considered?
Context-Aware Retrieval
When starting a new task, agents retrieve relevant learnings:
learnings = get_task_relevant_learnings(
task_description="Implement feature X",
files_being_modified=["backend/models/user.py"],
tags=["authentication", "security"],
max_tokens=2000 # Stay within LLM context limits
)
Hybrid retrieval combines three signals:
- File/tag matching (30%): Exact file paths and tag overlap
- Embedding similarity (30%): Semantic similarity using vector embeddings
- LLM semantic scoring (40%): LLM understands conceptual alignment
Results are merged, deduplicated, and ranked by combined relevance score.
Token Budget Management
To address the context passing challenge, the system includes token budget management:
# Greedy packing algorithm stays within budget
learnings = get_task_relevant_learnings(
task_description="...",
max_tokens=2000
)
# Response includes budget metadata
# {
# "count": 5,
# "learnings": [...],
# "budget": {
# "total_tokens": 1847,
# "truncated_count": 2,
# "truncated": true,
# "available_budget": 1900
# }
# }
This prevents context overflow while maximizing relevant information.
Staleness Detection
To maintain state consistency, learnings automatically decay when evidence files are modified:
# Pre-commit hook detects modified files
git commit -m "Refactor event repository"
# Output:
# ⚠️ Reduced confidence for 2 learnings due to modified evidence files
# Learnings automatically updated to prevent staleness
Confidence decay:
- 10% reduction every 30 days (time-based)
- 20% reduction when evidence files modified (event-based)
- Learnings with
confidence < 0.1filtered out during retrieval
Git-Based Sync
Learnings sync across machines via git, similar to Beads:
# Agent A: Store learning
store_learning(
insight="Repository classes must use AsyncSession, not engine directly",
category="architecture",
evidence_files=["backend/repositories/user_repository.py"],
...
)
# Commit triggers export
git commit -m "Add feature"
# → Exports to .learnings/corpus.jsonl
# Agent B: Pull and import
git pull
# → Post-merge hook imports learnings to local database
# Agent B can now query
get_task_relevant_learnings(...)
# → Sees learning from Agent A!
Benefits:
- Offline-capable: Work locally, sync when connected
- Version controlled: Git history shows learning evolution
- No infrastructure: Reuses git, no separate database server
- Conflict resolution: Git merge handles conflicts (timestamp wins)
Observability
To address the debugging story challenge, the system provides observability:
# Learning corpus health
stats = get_learning_statistics()
# {
# "total_count": 150,
# "active_count": 120,
# "invalidated_count": 30,
# "by_category": {"architecture": 45, "domain": 30, ...},
# "average_confidence": 0.87
# }
# Red flags:
# - Average confidence < 0.5 (too much low-quality data)
# - Invalidation rate > 50% (learnings becoming stale too fast)
# - One category dominates (imbalanced coverage)
Integration with Workflow
Before starting work:
# Retrieve relevant learnings as context
learnings = get_task_relevant_learnings(
task_description=issue.description,
files_being_modified=issue.files,
tags=issue.tags
)
# Agent reviews learnings alongside documentation
# Makes consistent decisions based on prior discoveries
After completing work:
# Automatic extraction from commit
# (handled by post-commit hook)
# Or manual storage for specific insights
store_learning(
insight="Algorithm stability can be negative after very poor performance",
category="edge-cases",
evidence_files=["backend/utils/calculation_tools.py"],
confidence=0.9,
tags=["algorithm", "edge-cases", "numerical-stability"]
)
Design Decisions
Why separate database (project_dev)?
- Production data separation: AgentLearning is development tooling, not core platform
- Physical isolation: Production database never contains development metadata
- Clean backups: Production backups exclude development tooling data
- Performance: Development tooling queries don’t impact production database
Why PostgreSQL + JSONB?
- Structured schema enables filtering by category, confidence, tags
- JSONB evidence provides flexible file/commit/doc references
- Partial indexes optimize active high-confidence learnings
- Semantic search via pgvector embeddings
Why NOT SQLite-vector?
- Write locking = multi-agent conflicts
- No concurrent writes (we have multiple agents)
Why NOT ChromaDB?
- 10x storage overhead (10GB vs 1GB for same data)
- Overkill for <1000 learnings
Future Enhancements
- Learning validation skill:
/learning-checkvalidates quality, detects duplicates - Cross-agent learning sync: Real-time sync via PostgreSQL (currently git-based)
- Tier 2 consolidation: LLM-summarized patterns when corpus exceeds 1000 learnings
- Learning diffs:
bd learning diffto show changes between machines
Layer 7: Git Hooks
Hooks enforce the workflow automatically. No discipline required.
Beads Hooks
Beads installs its own git hooks:
bd hooks install
This creates:
- pre-commit: Flushes pending issue changes
- pre-push: Validates issues are synced
- post-merge: Auto-syncs after pull
Pre-commit Framework
Combine with pre-commit for code quality:
# .pre-commit-config.yaml
repos:
# Code formatting
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.0
hooks:
- id: ruff-check
args: [--fix]
- id: ruff-format
# Beads sync
- repo: local
hooks:
- id: bd-sync
name: bd sync (flush)
entry: bash -c 'bd sync --flush-only || true'
language: system
always_run: true
pass_filenames: false
# Type checking
- repo: local
hooks:
- id: mypy
name: mypy
entry: mypy
language: system
types: [python]
# Ruler apply (regenerate agent configs if rules changed)
- repo: local
hooks:
- id: ruler-apply
name: ruler apply
entry: npx @intellectronica/ruler apply
language: system
files: ^\.ruler/
Custom Pre-push Validation
#!/bin/sh
# .git/hooks/pre-push (or via bd hooks)
# 1. No uncommitted changes
if ! git diff --quiet; then
echo "Error: Uncommitted changes"
exit 1
fi
# 2. Issues synced
bd sync --status || {
echo "Error: Beads not synced. Run 'bd sync'"
exit 1
}
# 3. Tests pass with coverage
pytest --cov=. --cov-fail-under=80 || exit 1
# 4. Warn about in-progress issues
in_progress=$(bd list --status=in_progress --count)
if [ "$in_progress" -gt 0 ]; then
echo "Warning: You have $in_progress in-progress issues"
bd list --status=in_progress
read -p "Continue push? [y/N] " confirm
[ "$confirm" = "y" ] || exit 1
fi
Layer 8: Session Protocols
The most overlooked piece: how agents start and end work.
Session Start Protocol
Add this to your AGENTS.md:
## Multi-Agent Coordination
**Session Start Protocol** (MANDATORY):
Option A - Shared directory (simple):
1. Get latest state: `bd sync`
2. Find available work: `bd ready`
3. Claim atomically: `scripts/bd_claim.sh <id>`
4. Push claim: `bd sync`
5. Now read and plan: `bd show <id>`
Option B - Isolated worktree (recommended for parallel work):
1. Get latest state: `bd sync`
2. Find available work: `bd ready`
3. Create worktree: `scripts/agent_worktree.sh create <id>`
(This claims + creates isolated directory)
4. Work in: `cd ../project-worktrees/<id>`
**Stale Work Detection**:
Check for potentially abandoned work:
- `bd list --status=in_progress`
- If resuming abandoned work: `bd comment <id> "Resuming from previous session"`
Session Close Protocol
## Session Close Protocol
**CRITICAL**: Before ending any session, complete this checklist:
For shared directory:
1. `git status` - Check what changed
2. `git add <files>` - Stage code changes
3. `bd sync` - Sync beads changes
4. `git commit -m "..."` - Commit code
5. `bd sync` - Sync any new beads changes
6. `git push` - Push to remote
For worktree:
1. `git add . && git commit` - Commit in worktree
2. `git push -u origin work/<id>` - Push branch
3. Create PR if ready
4. `scripts/agent_worktree.sh remove <id>` - Cleanup worktree
5. `bd close <id> && bd sync` - Close issue
**Work is not done until pushed.**
Putting It All Together
Here’s the complete workflow with worktrees:
Agent A (Machine 1) Agent B (Machine 2)
─────────────────── ───────────────────
bd sync bd sync
bd ready bd ready
→ proj-001: Add auth → proj-001: Add auth
→ proj-002: Fix bug → proj-002: Fix bug
agent_worktree.sh create proj-001 (Agent A claims first)
→ Lock acquired
→ Worktree: ../worktrees/proj-001 bd sync (sees claim)
bd ready
→ proj-002: Fix bug
cd ../worktrees/proj-001 agent_worktree.sh create proj-002
→ Lock acquired
→ Worktree: ../worktrees/proj-002
# Complete isolation cd ../worktrees/proj-002
uv run pytest # Agent A's tests
uv sync # Agent A's venv uv run pytest # Agent B's tests
uv sync # Agent B's venv
# Retrieve relevant learnings # Retrieve relevant learnings
get_task_relevant_learnings(...) get_task_relevant_learnings(...)
# → Sees prior architectural # → Sees prior patterns
# patterns discovered # from Agent A
git commit && git push git commit && git push
# work/proj-001 branch # work/proj-002 branch
# → Post-commit hook extracts # → Post-commit hook extracts
# learnings automatically # learnings automatically
agent_worktree.sh remove proj-001 agent_worktree.sh remove proj-002
bd close proj-001 && bd sync bd close proj-002 && bd sync
# → Learnings exported to # → Learnings exported to
# .learnings/corpus.jsonl # .learnings/corpus.jsonl
Key properties:
- No coordinator needed: Git is the source of truth
- Atomic claims: PostgreSQL advisory locks prevent races
- Complete isolation: Each agent has its own directory
- Context persistence: Agent learning system captures and shares insights
- Offline-capable: Work locally, sync when connected
- Self-healing: Session-level locks auto-release on crash
Complete Setup Checklist
1. Install Tools
# Beads
brew install steveyegge/tap/bd
# Ruler (via npx, no install needed)
npx @intellectronica/ruler --help
# Pre-commit
pip install pre-commit
2. Initialize Project
cd your-project
# Initialize beads
bd init
# Create ruler structure
mkdir -p .ruler/skills
touch .ruler/ruler.toml
touch .ruler/AGENTS.md
# Create scripts directory
mkdir -p scripts
3. Configure Ruler
# .ruler/ruler.toml
[agents.claude]
output = "CLAUDE.md"
mcp_output = ".mcp.json"
[agents.cursor]
output = ".cursor/rules/AGENTS.md"
[mcp.skillz]
command = "uvx"
args = ["skillz@latest", ".ruler/skills"]
4. Write Your Rules
# .ruler/AGENTS.md
## Critical Constraints
1. Never decrease test coverage
2. TODOs must reference issues
3. No secrets in code
## Multi-Agent Coordination
**Session Start**:
- Simple: `bd sync` → `bd ready` → `bd_claim.sh <id>` → `bd sync` → plan
- Isolated: `bd sync` → `bd ready` → `agent_worktree.sh create <id>` → work
**Session Close**:
- Simple: `git status` → `git add` → `bd sync` → `git commit` → `bd sync` → `git push`
- Isolated: commit → push branch → remove worktree → `bd close <id>` → `bd sync`
## Before Every Commit
pytest && pylint .
5. Create Scripts
# Create advisory lock module
cat > db/advisory_locks.py << 'EOF'
# ... (see implementation above)
EOF
# Create claim script
cat > scripts/bd_claim.py << 'EOF'
# ... (see implementation above)
EOF
# Create worktree script
cat > scripts/agent_worktree.sh << 'EOF'
# ... (see implementation above)
EOF
chmod +x scripts/*.sh
6. Set Up Agent Learning (Optional but Recommended)
# Create separate database for agent learnings
createdb project_dev
# Enable pgvector extension (for semantic search)
psql project_dev -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Set environment variable
export AGENT_LEARNING_DB_URL=postgresql+asyncpg://localhost:5432/project_dev
# Create .learnings directory (git-tracked)
mkdir -p .learnings
# Add to .gitignore (DO NOT ignore .learnings/)
# .learnings/ should be committed to git for cross-machine sync
Git hooks (already configured if using pre-commit):
- Pre-commit: Exports learnings to
.learnings/corpus.jsonl - Post-commit: Extracts learnings from commit (if issue ID present)
- Post-merge: Imports learnings from
.learnings/corpus.jsonl
Usage:
# Before starting work: Retrieve relevant learnings
learnings = get_task_relevant_learnings(
task_description="Implement feature X",
files_being_modified=["backend/models/user.py"],
tags=["authentication", "security"]
)
# After completing work: Automatic extraction via git hook
# Or manual storage:
store_learning(
insight="Repository classes must use AsyncSession, not engine directly",
category="architecture",
evidence_files=["backend/repositories/user_repository.py"],
session_id="full-cycle-2026-01-10-123",
confidence=1.0,
tags=["async-patterns", "database"]
)
7. Generate and Commit
# Generate agent configs
npx @intellectronica/ruler apply
# Install hooks
bd hooks install
pre-commit install
# Commit everything
git add .
git commit -m "Add multi-agent workflow infrastructure"
git push
Measuring Success
How do you know if your multi-agent workflow is working?
| Metric | Healthy | Warning |
|---|---|---|
| Race conditions per week | 0 | >2 |
| Stale in_progress issues | 0 | >1 |
| Merge conflicts on issues.jsonl | <1/week | >3/week |
| Build artifact conflicts | 0 | >1/week |
| Agent idle time | <5% | >20% |
| Rule violations caught by hooks | Decreasing | Increasing |
| Learning corpus size | Growing | Stagnant |
| Average learning confidence | >0.7 | <0.5 |
| Learning invalidation rate | <30% | >50% |
Use bd stats for project health:
bd stats
# Open: 12 In Progress: 2 Closed: 45 Blocked: 3
Check worktree status:
scripts/agent_worktree.sh list
# Active worktrees:
# proj-abc1 (branch: work/proj-abc1, uncommitted: 0 files)
# proj-xyz2 (branch: work/proj-xyz2, uncommitted: 3 files)
Check learning corpus health:
from dev_tools.agent_learning.tools import get_learning_statistics
stats = get_learning_statistics()
# {
# "total_count": 150,
# "active_count": 120,
# "invalidated_count": 30,
# "by_category": {"architecture": 45, "domain": 30, ...},
# "average_confidence": 0.87
# }
Common Pitfalls
1. Reading Before Claiming
# WRONG
bd show proj-001 # Read first
# ... 5 minutes planning ...
bd update proj-001 --status=in_progress # Too late! Agent B claimed it
# RIGHT (with advisory locks)
scripts/bd_claim.sh proj-001 # Atomic claim
bd show proj-001 # Now read safely
2. Forgetting to Push
# Agent A
git commit -m "Done with proj-001"
bd close proj-001
# Forgets to push, closes laptop
# Agent B (next day)
bd sync # Sees proj-001 still in_progress locally
# Confusion ensues
Solution: Pre-push hooks warn about uncommitted work.
3. Not Using Worktrees for Parallel Work
# WRONG: Two agents in same directory
# Agent A Agent B
uv sync uv sync
# Race condition on .venv!
# RIGHT: Isolated worktrees
agent_worktree.sh create proj-001 agent_worktree.sh create proj-002
cd ../worktrees/proj-001 cd ../worktrees/proj-002
uv sync uv sync
# Each has its own .venv - no conflicts
4. Not Using Dependencies
# WRONG: Create independent issues
bd create --title="Design API"
bd create --title="Implement API"
bd create --title="Test API"
# Agent might start "Test API" before "Implement API" is done
# RIGHT: Create with dependencies
bd create --title="Design API" # → proj-001
bd create --title="Implement API" # → proj-002
bd create --title="Test API" # → proj-003
bd dep add proj-002 proj-001 # Implement depends on Design
bd dep add proj-003 proj-002 # Test depends on Implement
# Now `bd ready` only shows unblocked work
Conclusion
Multi-agent development isn’t about having more agents; it’s about coordination infrastructure. As one Reddit commenter noted:
“multi-agent workflows are becoming pretty essential for serious dev work tbh. the tradeoffs between agent specialization and orchestration complexity are real though - most people swing too hard one way or the other.”
The solution is a layered architecture that addresses each coordination challenge:
- Ruler: Single source of truth for agent rules
- Beads: Git-native issue tracking with dependencies
- Advisory Locks: Atomic claim operations (no races)
- Worktrees: Complete filesystem isolation
- Skillz: Domain-specific validation skills
- Agent Learning: Context passing & state consistency
- Git Hooks: Automatic enforcement
- Protocols: Explicit session start/end procedures
Key Insights
Coordination vs. Isolation: These are different problems requiring different solutions.
- Advisory locks solve coordination (who works on what)
- Worktrees solve isolation (how they work without conflicts)
- Agent learning solves context passing (what agents know)
Context Passing & State Consistency: The agent learning system addresses the critical challenge:
“what usually trips teams up: context passing between agents and managing state consistency at scale.”
By transforming ephemeral task context into durable institutional knowledge, agents inherit prior discoveries and make consistent decisions across sessions and machines.
Observability: As another commenter noted:
“also the debugging story is rough. single agent is easier to reason about. multiple agents means you need solid observability and tracing to figure out where something went sideways.”
The system provides observability through:
- Learning corpus health metrics
- Staleness detection (automatic confidence decay)
- Git history for learning evolution
- Structured evidence linking learnings to files/commits/docs
Getting Started
The tools are open source and composable. Start simple, add layers as needed:
- Start: Beads for issue tracking
- Add: Ruler when you have multiple agents
- Layer in: Advisory locks when you hit race conditions
- Add: Worktrees when build artifacts conflict
- Enable: Agent learning when context passing becomes a bottleneck
The result: agents that work together without stepping on each other, with shared institutional knowledge and consistent decision-making.
Resources:
- Beads - Git-native issue tracking
- Ruler - Multi-agent configuration
- Skillz - MCP skills server
- Pre-commit - Git hooks framework
- PostgreSQL Advisory Locks - Lock documentation
- Git Worktree - Worktree documentation
- Agent Learning System - Context passing & state consistency