Due Diligence
Search across thousands of documents for red flags, inconsistencies, or undisclosed risks before a deal closes.
The Problem
Due diligence requires reviewing large volumes of documents under time pressure. Important details hide in email threads, board minutes, and contracts. Traditional keyword search misses contextually relevant content that uses different wording.
Workflow
1. Configure Document Sources
Point Foxhound at the due diligence data room:
ingestion:
sources:
- type: document
path: "~/due-diligence/contracts"
- type: document
path: "~/due-diligence/financial-reports"
- type: document
path: "~/due-diligence/board-minutes"
- type: meeting_note
path: "~/due-diligence/meeting-notes"2. Ingest the Corpus
uv run python ingest.py --reset3. Explore the Corpus
Get a high-level view of what you’re working with:
uv run python explore.py
uv run python explore.py --source-type document4. Targeted Searches
Run semantic searches for common red flags:
mkdir -p evidence
uv run python query.py \
--semantic "litigation pending lawsuit legal dispute" \
--top-k 200 --export-json evidence/litigation.json
uv run python query.py \
--semantic "debt obligation liability undisclosed" \
--top-k 200 --export-json evidence/liabilities.json
uv run python query.py \
--semantic "regulatory compliance violation fine penalty" \
--top-k 200 --export-json evidence/compliance.json
uv run python query.py \
--semantic "key person departure resignation retention" \
--top-k 200 --export-json evidence/key-people.json5. Merge and Analyse
uv run python merge.py evidence/*.json --output evidence/merged.json
uv run python analyze.py evidence/merged.json \
--full-pipeline \
--context "Identify red flags, undisclosed risks, and material inconsistencies across the document set" \
--model deepseek6. Export Findings
uv run python export.py analysis_output.md --output due-diligence-report.mdTips
- Use
--source-type documentto filter searches to formal documents only - Run
explore.pyfirst to understand the volume and date range of your corpus - Semantic search finds conceptually related content even when exact terms differ — “financial difficulty” matches “cash flow problems”
- For large corpora (>1000 documents), use
gemini-flashfor its 1M token context window