CLI Commands
All Foxhound scripts are run with uv run python <script>.py.
dedup.py — Email Deduplication
Strips quoted text from email reply chains and groups messages into threads.
uv run python dedup.pyThread detection uses Message-ID, In-Reply-To, and References headers, with Thread-Topic fallback for Outlook.
ingest.py — Embed Sources
Embeds all configured sources into ChromaDB using all-MiniLM-L6-v2.
uv run python ingest.py # Incremental ingest
uv run python ingest.py --reset # Re-ingest from scratch| Flag | Description |
|---|---|
--reset | Clear ChromaDB and re-ingest all sources |
explore.py — Corpus Exploration
Free stats, sender breakdowns, date ranges, and interactive exploration.
uv run python explore.py
uv run python explore.py --year 2025
uv run python explore.py --sender [email protected]
uv run python explore.py --source-type meeting_note
uv run python explore.py -i| Flag | Description |
|---|---|
--year YYYY | Filter stats to a specific year |
--sender EMAIL | Filter stats to a specific sender |
--source-type TYPE | Filter by source type (email, diary, meeting_note, document) |
-i | Interactive exploration mode |
query.py — Search and Filter
Metadata filtering, semantic search, and export.
uv run python query.py --semantic "search phrase" --top-k 100
uv run python query.py --sender [email protected] --year 2025
uv run python query.py --count-only --sender [email protected]| Flag | Description |
|---|---|
--semantic TEXT | Semantic search query |
--top-k N | Number of results to return (default: 200) |
--sender EMAIL | Filter by sender (comma-separated for multiple) |
--date-range START END | Filter by date range (YYYY-MM-DD format) |
--year YYYY | Filter by year |
--source-type TYPE | Filter by source type |
--count-only | Show count only, no results |
--show | Show body previews inline |
--read N | Read result N in full |
--open N | Open source file for result N (macOS) |
--export FILE.csv | Export to CSV |
--export-json FILE.json | Export to JSON (for analysis pipeline) |
merge.py — Combine Results
Merge and deduplicate multiple search result files.
uv run python merge.py evidence/*.json --output evidence/merged.json| Flag | Description |
|---|---|
--output FILE | Output file path |
analyze.py — AI Analysis
AI-powered triage and deep analysis.
Recommended workflow: triage separately, then analyse at chosen threshold.
# Step 1: Triage (fast — gemini-flash recommended for triage)
uv run python analyze.py results.json \
--triage --model gemini-flash --truncate 500 --concurrency 5 \
--context "Investigation focus" --output triaged.json --dry-run
# Step 2: Deep analysis on triaged results (deepseek recommended)
uv run python analyze.py triaged.json \
--deep-only --min-relevance 5 \
--context "Key themes" --model deepseek --dry-run
# Retry failed triage batches
uv run python analyze.py triaged.json \
--retry-failed --model gemini-flash --truncate 500 \
--context "Investigation focus" --output triaged-fixed.json
# Local only (free, private, slow)
uv run python analyze.py results.json --triage --local --context "Focus" --output triaged.jsonModel Recommendations (Feb 2026)
| Model | Best For | Input/Output Cost | Speed |
|---|---|---|---|
gemini-flash | Triage — cheapest, fastest | $0.10/$0.40 per M tokens | Very fast |
deepseek | Deep analysis — best reasoning per dollar | $0.25/$0.38 per M tokens | Moderate |
gemini-2.5 | Analysis — newer thinking model | $0.30/$2.50 per M tokens | Fast |
gemini-3 | Analysis — latest, agentic | $0.50/$3.00 per M tokens | Fast |
gemini-free | Testing — free but rate-limited | Free | Fast |
haiku | High quality — most expensive | $0.80/$4.00 per M tokens | Fast |
--local | Maximum privacy — Ollama Mistral 7B | Free | Very slow |
Rule of thumb: Use gemini-flash for triage (cheap, fast, good enough for scoring), deepseek for deep analysis (best reasoning per dollar).
Flags
| Flag | Description |
|---|---|
--context TEXT | Analysis context/question |
--model MODEL | Model choice (see table above) — applies to triage and analysis |
--local | Use Ollama Mistral 7B (free, private, slow) |
--triage | Triage only — score and save results without deep analysis |
--full-pipeline | Triage first, then deep analysis on high-scoring docs |
--deep-only | Skip triage, run deep analysis on already-triaged data |
--retry-failed | Re-triage only failed documents from a previous run |
--min-relevance N | Minimum triage score for deep analysis (default: 7, recommend 5) |
--truncate N | Truncate each doc body to N chars for triage (e.g. 500) |
--concurrency N | Run N triage batches in parallel (default: 1, try 5 for API models) |
--output FILE | Output file (default: analysis_output.md) |
--dry-run | Show cost estimate without running (works with --triage --model) |
--no-pseudonymise | Skip pseudonymisation (not recommended for cloud models) |
pseudonymise.py — Privacy
Manage pseudonymisation aliases.
uv run python pseudonymise.py --show
uv run python pseudonymise.py --add "Alice Smith" "[email protected]"
uv run python pseudonymise.py --reset| Flag | Description |
|---|---|
--show | Display current alias map |
--add NAME EMAIL | Pre-register an identity |
--reset | Clear all aliases |
export.py — Format Output
Format analysis for export.
uv run python export.py analysis_output.md --clipboard
uv run python export.py analysis_output.md --output report.md
uv run python export.py results.json --raw --clipboard| Flag | Description |
|---|---|
--clipboard | Copy to clipboard |
--output FILE | Save to file |
--raw | Export raw evidence package |