A domain-agnostic engine that treats any dataset as a living organism — continuously breathing, growing, evolving, and pruning itself from the open internet.
Every structured dataset — investors, companies, drugs, space missions, geopolitical entities — begins decaying the moment it is created. The world moves. The data doesn't.
Existing solutions (Crunchbase, PitchBook, Apollo) are domain-locked, expensive, and closed. They work for one slice of the world and charge accordingly. No open, self-hostable, domain-agnostic equivalent exists.
The Living Database treats any dataset not as a static table, but as a living organism with four biological behaviours running continuously:
The engine runs as a scheduled pipeline. Every cycle, it processes a batch of records through five stages — all locally, with a self-hosted LLM and free/open internet sources.
| Layer | Tool | Cost | Role |
|---|---|---|---|
| LLM | Gemma 4 (local) | Free | Schema inference, extraction, merge decisions |
| Crawling | Crawl4AI | Free / OSS | JS-rendered page scraping, returns clean markdown |
| Search | Serper.dev / Exa / DDG | Free tier | Live web queries per record, per field |
| Public Data | SEC EDGAR, OpenCorporates, arXiv, RSS | Free | Structured sources for specific domain types |
| Orchestration | LangGraph | OSS | Agent pipeline, state management, retry logic |
| Storage | Supabase | Free tier | Dataset storage, change logs, provenance tracking |
| Scheduler | Cron / Supabase Edge | Free | Trigger daily refresh cycles per record tier |
The value isn't in re-doing investor data. It's in every domain where no Crunchbase exists — vast, valuable, structurally scattered knowledge that the world hasn't indexed.