Internal Investigations
Cross-reference communications between multiple parties to establish who knew what and when.
The Problem
Internal investigations require tracing information flow across multiple people and communication channels. You need to build a timeline showing when key decisions were made, who was informed, and what was discussed.
Workflow
1. Ingest All Relevant Sources
uv run python dedup.py
uv run python ingest.py --reset2. Map the Communication Landscape
Use explore.py to understand who communicated and when:
# Full corpus overview
uv run python explore.py
# Focus on specific year
uv run python explore.py --year 2025
# Check specific individuals
uv run python explore.py --sender [email protected]
uv run python explore.py --sender [email protected]3. Track Specific Communications
Search for communications between key individuals:
mkdir -p evidence
uv run python query.py \
--sender "[email protected],[email protected]" \
--date-range 2025-01-01 2025-06-30 \
--export-json evidence/alice-bob.json4. Search for Key Topics
uv run python query.py \
--semantic "approval authorisation sign-off decision" \
--date-range 2025-01-01 2025-06-30 \
--top-k 200 --export-json evidence/approvals.json
uv run python query.py \
--semantic "risk warning concern escalation" \
--date-range 2025-01-01 2025-06-30 \
--top-k 200 --export-json evidence/warnings.json
uv run python query.py \
--semantic "meeting minutes discussion agreed actions" \
--source-type meeting_note \
--top-k 200 --export-json evidence/meetings.json5. Merge Results
uv run python merge.py evidence/*.json --output evidence/merged.json6. Triage Separately (recommended)
Run triage as a standalone step so results are saved to disk:
# Recommended: gemini-flash for triage (cheapest, fastest)
uv run python analyze.py evidence/merged.json \
--triage \
--model gemini-flash \
--truncate 500 \
--concurrency 5 \
--context "Key decisions, who was involved, what information was available" \
--output evidence/triaged.json \
--dry-run
# Free, private, slow (local Mistral 7B):
uv run python analyze.py evidence/merged.json \
--triage \
--local \
--context "Key decisions, who was involved, what information was available" \
--output evidence/triaged.jsonUse gemini-flash for triage (cheapest, fastest). Use --truncate 500 and --concurrency 5 for speed.
Checkpoints save every wave — if interrupted, re-run to resume.
7. Deep Analysis (on triaged results)
Use --deep-only to skip triage and run deep analysis on already-triaged data.
Start with --min-relevance 5 to avoid missing borderline evidence:
uv run python analyze.py evidence/triaged.json \
--deep-only \
--min-relevance 5 \
--context "Build a chronological timeline of key decisions, who was involved, and what information was available at each decision point" \
--model deepseek \
--dry-runReview the cost estimate, then run without --dry-run. Tighten to
--min-relevance 7 if output is noisy (no re-triage needed).
8. Export Timeline
uv run python export.py analysis_output.md --output investigation-timeline.mdTips
- Always triage separately with
--triage --outputso results are saved - Start with
--min-relevance 5to avoid missing borderline evidence - Use
--date-rangeto narrow the investigation window - Combine
--senderfilters with--semanticto find specific topics discussed by specific people - Meeting notes often contain the clearest record of decisions — filter with
--source-type meeting_note - For maximum privacy during investigation, use
--localto keep all analysis on-machine - Use
gemini-flashfor triage (cheapest, fastest),deepseekfor deep analysis (best reasoning) - Use
--truncate 500and--concurrency 5for fast triage - Use
--retry-failedto re-triage failed batches without re-running everything - Use
--read Nto inspect individual results in full before exporting - Triage checkpoints save progress — re-run to resume if interrupted