FoxhoundReferenceCLI Commands

CLI Commands

All Foxhound scripts are run with uv run python <script>.py.

dedup.py — Email Deduplication

Strips quoted text from email reply chains and groups messages into threads.

uv run python dedup.py

Thread detection uses Message-ID, In-Reply-To, and References headers, with Thread-Topic fallback for Outlook.

ingest.py — Embed Sources

Embeds all configured sources into ChromaDB using all-MiniLM-L6-v2.

uv run python ingest.py          # Incremental ingest
uv run python ingest.py --reset  # Re-ingest from scratch
FlagDescription
--resetClear ChromaDB and re-ingest all sources

explore.py — Corpus Exploration

Free stats, sender breakdowns, date ranges, and interactive exploration.

uv run python explore.py
uv run python explore.py --year 2025
uv run python explore.py --sender [email protected]
uv run python explore.py --source-type meeting_note
uv run python explore.py -i
FlagDescription
--year YYYYFilter stats to a specific year
--sender EMAILFilter stats to a specific sender
--source-type TYPEFilter by source type (email, diary, meeting_note, document)
-iInteractive exploration mode

query.py — Search and Filter

Metadata filtering, semantic search, and export.

uv run python query.py --semantic "search phrase" --top-k 100
uv run python query.py --sender [email protected] --year 2025
uv run python query.py --count-only --sender [email protected]
FlagDescription
--semantic TEXTSemantic search query
--top-k NNumber of results to return (default: 200)
--sender EMAILFilter by sender (comma-separated for multiple)
--date-range START ENDFilter by date range (YYYY-MM-DD format)
--year YYYYFilter by year
--source-type TYPEFilter by source type
--count-onlyShow count only, no results
--showShow body previews inline
--read NRead result N in full
--open NOpen source file for result N (macOS)
--export FILE.csvExport to CSV
--export-json FILE.jsonExport to JSON (for analysis pipeline)

merge.py — Combine Results

Merge and deduplicate multiple search result files.

uv run python merge.py evidence/*.json --output evidence/merged.json
FlagDescription
--output FILEOutput file path

analyze.py — AI Analysis

AI-powered triage and deep analysis.

Recommended workflow: triage separately, then analyse at chosen threshold.

# Step 1: Triage (fast — gemini-flash recommended for triage)
uv run python analyze.py results.json \
  --triage --model gemini-flash --truncate 500 --concurrency 5 \
  --context "Investigation focus" --output triaged.json --dry-run
 
# Step 2: Deep analysis on triaged results (deepseek recommended)
uv run python analyze.py triaged.json \
  --deep-only --min-relevance 5 \
  --context "Key themes" --model deepseek --dry-run
 
# Retry failed triage batches
uv run python analyze.py triaged.json \
  --retry-failed --model gemini-flash --truncate 500 \
  --context "Investigation focus" --output triaged-fixed.json
 
# Local only (free, private, slow)
uv run python analyze.py results.json --triage --local --context "Focus" --output triaged.json

Model Recommendations (Feb 2026)

ModelBest ForInput/Output CostSpeed
gemini-flashTriage — cheapest, fastest$0.10/$0.40 per M tokensVery fast
deepseekDeep analysis — best reasoning per dollar$0.25/$0.38 per M tokensModerate
gemini-2.5Analysis — newer thinking model$0.30/$2.50 per M tokensFast
gemini-3Analysis — latest, agentic$0.50/$3.00 per M tokensFast
gemini-freeTesting — free but rate-limitedFreeFast
haikuHigh quality — most expensive$0.80/$4.00 per M tokensFast
--localMaximum privacy — Ollama Mistral 7BFreeVery slow

Rule of thumb: Use gemini-flash for triage (cheap, fast, good enough for scoring), deepseek for deep analysis (best reasoning per dollar).

Flags

FlagDescription
--context TEXTAnalysis context/question
--model MODELModel choice (see table above) — applies to triage and analysis
--localUse Ollama Mistral 7B (free, private, slow)
--triageTriage only — score and save results without deep analysis
--full-pipelineTriage first, then deep analysis on high-scoring docs
--deep-onlySkip triage, run deep analysis on already-triaged data
--retry-failedRe-triage only failed documents from a previous run
--min-relevance NMinimum triage score for deep analysis (default: 7, recommend 5)
--truncate NTruncate each doc body to N chars for triage (e.g. 500)
--concurrency NRun N triage batches in parallel (default: 1, try 5 for API models)
--output FILEOutput file (default: analysis_output.md)
--dry-runShow cost estimate without running (works with --triage --model)
--no-pseudonymiseSkip pseudonymisation (not recommended for cloud models)

pseudonymise.py — Privacy

Manage pseudonymisation aliases.

uv run python pseudonymise.py --show
uv run python pseudonymise.py --add "Alice Smith" "[email protected]"
uv run python pseudonymise.py --reset
FlagDescription
--showDisplay current alias map
--add NAME EMAILPre-register an identity
--resetClear all aliases

export.py — Format Output

Format analysis for export.

uv run python export.py analysis_output.md --clipboard
uv run python export.py analysis_output.md --output report.md
uv run python export.py results.json --raw --clipboard
FlagDescription
--clipboardCopy to clipboard
--output FILESave to file
--rawExport raw evidence package

MIT 2026 © Docs Hub