CLI Commands

All Foxhound scripts are run with uv run python <script>.py.

dedup.py — Email Deduplication

Strips quoted text from email reply chains and groups messages into threads.

uv run python dedup.py

Thread detection uses Message-ID, In-Reply-To, and References headers, with Thread-Topic fallback for Outlook.

ingest.py — Embed Sources

Embeds all configured sources into ChromaDB using all-MiniLM-L6-v2.

uv run python ingest.py          # Incremental ingest
uv run python ingest.py --reset  # Re-ingest from scratch

Flag	Description
`--reset`	Clear ChromaDB and re-ingest all sources

explore.py — Corpus Exploration

Free stats, sender breakdowns, date ranges, and interactive exploration.

uv run python explore.py
uv run python explore.py --year 2025
uv run python explore.py --sender [email protected]
uv run python explore.py --source-type meeting_note
uv run python explore.py -i

Flag	Description
`--year YYYY`	Filter stats to a specific year
`--sender EMAIL`	Filter stats to a specific sender
`--source-type TYPE`	Filter by source type (`email`, `diary`, `meeting_note`, `document`)
`-i`	Interactive exploration mode

query.py — Search and Filter

Metadata filtering, semantic search, and export.

uv run python query.py --semantic "search phrase" --top-k 100
uv run python query.py --sender [email protected] --year 2025
uv run python query.py --count-only --sender [email protected]

Flag	Description
`--semantic TEXT`	Semantic search query
`--top-k N`	Number of results to return (default: 200)
`--sender EMAIL`	Filter by sender (comma-separated for multiple)
`--date-range START END`	Filter by date range (YYYY-MM-DD format)
`--year YYYY`	Filter by year
`--source-type TYPE`	Filter by source type
`--count-only`	Show count only, no results
`--show`	Show body previews inline
`--read N`	Read result N in full
`--open N`	Open source file for result N (macOS)
`--export FILE.csv`	Export to CSV
`--export-json FILE.json`	Export to JSON (for analysis pipeline)

merge.py — Combine Results

Merge and deduplicate multiple search result files.

uv run python merge.py evidence/*.json --output evidence/merged.json

Flag	Description
`--output FILE`	Output file path

analyze.py — AI Analysis

AI-powered triage and deep analysis.

Recommended workflow: triage separately, then analyse at chosen threshold.

# Step 1: Triage (fast — gemini-flash recommended for triage)
uv run python analyze.py results.json \
  --triage --model gemini-flash --truncate 500 --concurrency 5 \
  --context "Investigation focus" --output triaged.json --dry-run
 
# Step 2: Deep analysis on triaged results (deepseek recommended)
uv run python analyze.py triaged.json \
  --deep-only --min-relevance 5 \
  --context "Key themes" --model deepseek --dry-run
 
# Retry failed triage batches
uv run python analyze.py triaged.json \
  --retry-failed --model gemini-flash --truncate 500 \
  --context "Investigation focus" --output triaged-fixed.json
 
# Local only (free, private, slow)
uv run python analyze.py results.json --triage --local --context "Focus" --output triaged.json

Model Recommendations (Feb 2026)

Model	Best For	Input/Output Cost	Speed
`gemini-flash`	Triage — cheapest, fastest	$0.10/$0.40 per M tokens	Very fast
`deepseek`	Deep analysis — best reasoning per dollar	$0.25/$0.38 per M tokens	Moderate
`gemini-2.5`	Analysis — newer thinking model	$0.30/$2.50 per M tokens	Fast
`gemini-3`	Analysis — latest, agentic	$0.50/$3.00 per M tokens	Fast
`gemini-free`	Testing — free but rate-limited	Free	Fast
`haiku`	High quality — most expensive	$0.80/$4.00 per M tokens	Fast
`--local`	Maximum privacy — Ollama Mistral 7B	Free	Very slow

Rule of thumb: Use gemini-flash for triage (cheap, fast, good enough for scoring), deepseek for deep analysis (best reasoning per dollar).

Flags

Flag	Description
`--context TEXT`	Analysis context/question
`--model MODEL`	Model choice (see table above) — applies to triage and analysis
`--local`	Use Ollama Mistral 7B (free, private, slow)
`--triage`	Triage only — score and save results without deep analysis
`--full-pipeline`	Triage first, then deep analysis on high-scoring docs
`--deep-only`	Skip triage, run deep analysis on already-triaged data
`--retry-failed`	Re-triage only failed documents from a previous run
`--min-relevance N`	Minimum triage score for deep analysis (default: 7, recommend 5)
`--truncate N`	Truncate each doc body to N chars for triage (e.g. 500)
`--concurrency N`	Run N triage batches in parallel (default: 1, try 5 for API models)
`--output FILE`	Output file (default: `analysis_output.md`)
`--dry-run`	Show cost estimate without running (works with `--triage --model`)
`--no-pseudonymise`	Skip pseudonymisation (not recommended for cloud models)

pseudonymise.py — Privacy

Manage pseudonymisation aliases.

uv run python pseudonymise.py --show
uv run python pseudonymise.py --add "Alice Smith" "[email protected]"
uv run python pseudonymise.py --reset

Flag	Description
`--show`	Display current alias map
`--add NAME EMAIL`	Pre-register an identity
`--reset`	Clear all aliases

export.py — Format Output

Format analysis for export.

uv run python export.py analysis_output.md --clipboard
uv run python export.py analysis_output.md --output report.md
uv run python export.py results.json --raw --clipboard

Flag	Description
`--clipboard`	Copy to clipboard
`--output FILE`	Save to file
`--raw`	Export raw evidence package

Overview Configuration