Workplace Disputes
Gather email trails, meeting notes, and diary entries to build a chronological evidence pack for HR or legal review.
The Problem
Workplace disputes often involve scattered evidence across email threads, meeting notes, and personal records spanning months or years. Manually assembling a timeline is time-consuming and risks missing critical communications.
Workflow
1. Configure Sources
Set up config.yaml to point at the relevant source directories:
deduplication:
source_paths:
- "~/exported-emails/work"
ingestion:
sources:
- type: diary
path: "~/diary/work-journal"
entry_separator: "## "
- type: meeting_note
path: "~/meeting-notes/team"
- type: document
path: "~/documents/hr-correspondence"2. Ingest Everything
uv run python dedup.py
uv run python ingest.py --reset3. Multi-Search Strategy
Run several searches with different phrasings to maximise recall:
mkdir -p evidence
uv run python query.py \
--semantic "performance review concerns feedback" \
--top-k 200 --export-json evidence/reviews.json
uv run python query.py \
--semantic "meeting conduct behaviour complaint" \
--top-k 200 --export-json evidence/conduct.json
uv run python query.py \
--semantic "workload pressure stress unreasonable" \
--top-k 200 --export-json evidence/workload.json4. Merge and Deduplicate
uv run python merge.py evidence/*.json --output evidence/merged.json5. Filter by Specific People
Narrow down to communications involving specific individuals:
uv run python query.py \
--sender "[email protected],[email protected]" \
--date-range 2024-01-01 2025-01-01 \
--export-json evidence/key-people.json6. Triage Separately (recommended)
Run triage as a standalone step so results are saved to disk. This lets you re-run deep analysis at different relevance thresholds without re-triaging.
# Recommended: gemini-flash for triage (cheapest, fastest)
uv run python analyze.py evidence/merged.json \
--triage \
--model gemini-flash \
--truncate 500 \
--concurrency 5 \
--context "Identify incidents, dates, and communications relevant to a workplace grievance" \
--output evidence/triaged.json \
--dry-run
# Free, private, slow (local Mistral 7B):
uv run python analyze.py evidence/merged.json \
--triage \
--local \
--context "Identify incidents, dates, and communications relevant to a workplace grievance" \
--output evidence/triaged.jsonUse gemini-flash for triage — it’s the cheapest and fastest API model.
--truncate 500 caps doc bodies to 500 chars (enough for relevance scoring, much faster).
--concurrency 5 sends 5 batches in parallel.
Checkpoints save every wave — if interrupted, re-run to resume.
Use --retry-failed to re-triage only failed batches from a previous run.
7. Deep Analysis (on triaged results)
Use --deep-only to skip triage and run deep analysis on already-triaged data.
Start with --min-relevance 5 to cast a wide net, then tighten if needed:
uv run python analyze.py evidence/triaged.json \
--deep-only \
--min-relevance 5 \
--context "Identify incidents, dates, and communications relevant to a workplace grievance" \
--model deepseek \
--dry-runReview the cost estimate, then run without --dry-run to proceed.
If the output is too noisy, re-run at --min-relevance 7 (no re-triage needed).
8. Export for Review
uv run python export.py analysis_output.md --clipboardTips
- Always triage separately with
--triage --outputso results are saved to disk - Start with
--min-relevance 5to avoid missing borderline evidence - Use
--date-rangeto focus on the relevant period - The pseudonymisation layer automatically protects names when using cloud models
- For maximum privacy, use
--localto keep everything on your machine - Use
gemini-flashfor triage (cheapest, fastest),deepseekfor deep analysis (best reasoning) - Use
--truncate 500and--concurrency 5for fast triage - Use
--retry-failedto re-triage failed batches without re-running everything - Export to CSV with
--export results.csvfor spreadsheet review - Triage checkpoints save progress — re-run to resume if interrupted