Docs · OGN platform
GPU-native genomics operating system
From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.
CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
Platform architecture
OGN Architecture Overview
This note is the high‑level map for engineers and integrators. It explains where the core engine lives, how host‑side orchestration works, how data bundles are structured, and where verification hooks in.
1. Engine core
OGN/– C++/CUDA libraries that implement the GPU‑first genomics engine:common/– shared utilities and CUDA guards.io/– FASTA/Q/CRAM ingest, pinned host buffers, S3 staging helpers.fm_index/,fm2_index/– FM index construction and queries.mapper/– ORBIT/seed‑and‑extend mapper, WFA/DPX paths, CPU fallback.sw/– Smith–Waterman kernels and host wrappers.pairhmm/– log‑space PairHMM kernels (with optional DPX acceleration).variant/– variant calling stages and orchestration.
include/OGN/– public headers that mirror theOGN/layout; this is the stable C++ include surface for downstream applications.include/ogn/– higher‑level C++ APIs (ogn_run_api.hpp,ogn_variant_runner_api.hpp,schema.hpp, etc.) and schema types shared with tools and SDKs.
The CMake target
ogn links the core engine pieces; CLI tools under apps/ and benchmarks under bench/ depend on this library.2. Host scheduling and pipelines
apps/– host binaries:ogn_run– end‑to‑end mapping + variant calling runner for local use.ogn_variant_runner– pipeline entrypoint used by orchestration layers.ogn_profile_exporter– Prometheus‑style metrics exporter for profiles.
pipelines/– YAML/JSON pipeline definitions that describe how to map from inputs (reads + reference) to outputs (VCF, metrics, artifacts).profiles/– small JSON descriptors for named profiles (e.g.,illumina_wgs), used by the Python CLI and jobs API.schema/andschemas/– JSON/Protobuf schema definitions and generated code for profile, provenance, and metrics payloads.
These components are responsible for resource scheduling, device selection, and composing the lower‑level kernels into usable pipelines. They deliberately keep the core engine decoupled from any particular workflow manager or orchestration stack.
3. Data bundles and OGX
data/ogx/– OmegaAlign (OGX) bundles; each bundle is a directory with amanifest.jsonand pre‑built indices suitable for GPU ingest.benchmarks/– benchmark configurations and regression profiles:benchmarks/runs_hg002_wgs_ogn/– HG002 WGS benchmark outputs.benchmarks/regression/(planned) – small golden regression packs used for CI perf/accuracy checks.
pipelines/+profiles/+ OGX bundle paths together form the contract “given reads + reference + profile → VCF + metrics”.
The
scripts/ogn-doctor.sh helper inspects whether a bundle such as data/ogx/chr20_10M.ogx.json/ is present and reports which parts of the test suite can run.4. Verification and benchmarks
tests/– CTest/C++ tests plus Python tests:- Unit tests for core kernels (FM index, SW, PairHMM, mapper).
- Integration tests such as
Smoke_RunAndVerifyandAlignmentTraceback. - CLI tests for
ogn_cliand schema/provenance checks.
bench/andtools/– benchmarking harnesses:ogn_bench_*binaries for FM, SW, WFA, PairHMM, mapper.tools/perf_guard.pyandbench/perf_baseline.jsonfor CI perf gating.- GIAB tooling (
tools/run_giab_validation.py,tools/collect_giab_metrics.py) for HG002/HG005/HG007 validation.
- CI workflows:
ogn-engine-core– golden linux‑release build + tests (CPU and GPU).ogn-bench-regression– perf regression guard versus baseline.ogn-giab-hg002-chr20andogn-giab-wgs– GIAB correctness checks.
Over time, VeriBiota‑style “proof profiles” will sit on top of these artifacts: small JSON schemas for alignment/variant proofs plus validators that can be run in CI against representative runs.
5. SDKs and integration points
- OGN Core Kit (
ogn-core-kit) – open adoption surface:ognCLI,ogn-runner, Job Spec v1, SDKs, and workflow adapters. openapi/– OpenAPI spec for the gateway; maps the core “reads + reference + profile → VCF + metrics” contract to HTTP.
The platform treats the core kit and the gateway as integration layers over a small, stable set of contracts (Job Spec,
artifact identity rules, and proof/provenance schemas).
6. What is considered stable
At a high level:
- Stable, versioned surface:
- C++ headers under
include/OGN/andinclude/ogn/. - JSON/Protobuf schemas under
schema/andschemas/. - The Python/Rust SDK APIs (
ogn_sdk) plus the public CLI/runner shipped via OGN Core Kit.
- C++ headers under
- Internal / subject to change:
- Most code under
OGN/andapps/that is not explicitly documented as public. - Experimental scripts under
scripts/dev/, benchmarking harnesses, and ad‑hoc tooling undertools/.
- Most code under
Future refactors should preserve the stable surface while allowing internal layout changes, new kernels, and new pipelines behind those contracts.