Government / Defense · Confidential
Enterprise Document Search & Indexing Platform
Built a full-stack document intelligence platform for a government contractor: Elasticsearch + NLP + React, making vast secure document repositories instantly searchable across networked NAS systems.
Services: Custom Software & Apps · Data Engineering & Automation · AI Pipelines & Agents
Client
Confidential: government contractor operating under strict data security requirements. Client name withheld by agreement.
The challenge
A government contractor managing large repositories of sensitive documents across Synology NAS systems had no way to search or surface content across their holdings. Files lived in silos. Finding anything meant knowing exactly where it lived. Analysts were losing hours a week to manual retrieval.
The system needed to be air-gap-friendly, containerized for auditability, and built to enterprise security standards: no SaaS document tools, no third-party indexing services.
What we built
A self-hosted, full-stack document search and indexing platform with end-to-end automation from NAS ingestion to browser-based search.
Ingestion & extraction
A Python backend crawls NAS mounts, extracts content from every document format (PDF, DOCX, XLSX, scanned images, and more) using Apache Tika, and passes raw text through spaCy for natural language processing, generating intelligent summaries, entity extraction, and keyword sets automatically.
Search engine
All extracted content, metadata, and NLP outputs are indexed into Elasticsearch. The cluster runs in Docker alongside Kibana for operational visibility and analytics dashboards, giving administrators a live view of corpus health, indexing throughput, and query patterns.
Frontend
A modern React.js frontend powered by Searchkit gives users full-text search with faceted navigation: filter by document type, date range, extracted entities, or custom metadata. Results surface with NLP-generated summaries so analysts read relevance at a glance without opening files.
Infrastructure
The entire stack ships as a Docker Compose configuration with separate dev and production profiles. Synology NAS integration handles enterprise storage, with mounted network paths feeding the ingestion pipeline. No document ever leaves the client’s infrastructure.
Stack
| Layer | Technology |
|---|---|
| Ingestion | Python, Apache Tika |
| NLP | spaCy, custom extraction pipeline |
| Search | Elasticsearch |
| Monitoring | Kibana |
| Frontend | React.js, Searchkit |
| Infrastructure | Docker, Synology NAS |
Outcome
Analysts went from manual file hunting to sub-second search across millions of documents. The platform deployed to production within the client’s secure environment and has operated continuously since, indexed, auditable, and fully self-hosted.
This engagement demonstrates Builtwell’s capability to build production-grade enterprise systems where SaaS isn’t an option.