Case Study

BioAtlas – biosecurity intelligence platform

A biological Security Operations Centre (SOC) that fuses wastewater, genomic, and epidemiological signals into an evidence-linked intelligence graph, surfaces explainable alerts with triage workflows, and provides sequence-level threat analysis tools for biosecurity analysts.

Next.jsFastAPIPythonSupabaseESM-2Mapbox

Problem & Context

Existing biosecurity tools either focus narrowly on sequence analysis (like Valthos, backed by OpenAI and Founders Fund) or provide generic dashboards without explainability. There is no full-stack intelligence platform that fuses upstream surveillance signals, generates evidence-backed alerts, and integrates sequence-level analysis in a single analyst workflow. After being unable to join Valthos due to US citizenship requirements, I decided to build the platform they don't have.

Architecture

BioAtlas is built around an evidence-first intelligence pipeline. ML models produce Evidence, not Alerts directly. Evidence supports Claims. Claims aggregate into Alerts. Analysts interact with the full chain, making every alert explainable and traceable back to source data.

Stage 1 (Surveillance): Ingests CDC NWSS wastewater data, Nextstrain genomic metadata, and epidemiological reports. Normalizes into a unified observation layer.
Stage 2 (Characterization): Runs change-point detection, anomaly scoring, and cross-signal correlation to generate evidence, claims, and risk-scored alerts with analyst triage workflows.
Stage 3 (Sequence Analysis): Uses Meta's ESM-2 protein language model for natural vs. engineered detection, fitness scoring, and adaptive countermeasure design as an open alternative to proprietary models.

Implementation

Frontend: Next.js 15 (App Router) with shadcn/ui, Tailwind CSS, Recharts for time-series dashboards, and Mapbox GL JS for geospatial anomaly views.
Backend: FastAPI with Pydantic v2, SQLAlchemy 2.0, and Alembic migrations. Supabase for managed Postgres, auth, and realtime alert subscriptions.
Analytics: polars for data pipelines, ruptures for change-point detection, scikit-learn for anomaly detection, and httpx for async data ingestion.
Sequence tools: ESM-2 via PyTorch for per-residue embeddings and log-likelihoods, BioPython for sequence parsing, Isolation Forest for engineering detection.
Infrastructure: Vercel (frontend), Railway (backend + Redis via arq), Supabase (database + auth + realtime), GitHub Actions for CI.

Key Design Decisions

Evidence-first architecture: Every ML output becomes a traceable evidence object with inputs, outputs, confidence, and render hints. This separates BioAtlas from dashboard tools.
Linked ontology: Pathogens, lineages, locations, observations, evidence, claims, alerts, investigations, and response bundles are all connected in a graph, not siloed tables.
Outcome feedback loop: Analysts mark alerts as true/false positives. This data tunes detection thresholds over time.
Open models over proprietary: ESM-2 (650M params) runs on a single GPU or CPU, providing detection and fitness scoring without vendor lock-in.

Product Surface

Triage Queue: Alerts with severity, confidence, location, top reason, and status. Row actions for opening cases, dismissing, or escalating.
Alert Detail: Evidence timeline with narrative, evidence cards with mini-plots, and full audit trail of analyst actions.
Map View: Anomaly scores by geography with time slider and drill-down per region.
Data Explorer: Filter by pathogen, region, date, and signal type. Export CSV and browse raw observations.

Outcomes

BioAtlas is under active development. It demonstrates that the full-stack intelligence layer upstream of sequence analysis tools can be built by a single engineer using open models and modern web infrastructure. The platform fills a gap that even well-funded companies in the space have not addressed: connecting surveillance data to analyst workflows with full explainability.

← Back to selected work