Skip to content
Docs · OGN platform

GPU-native genomics operating system

From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.

CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
Platform architecture

OGN Architecture Overview

This note is the high‑level map for engineers and integrators. It explains where the core engine lives, how host‑side orchestration works, how data bundles are structured, and where verification hooks in.

1. Engine core

  • OGN/ – C++/CUDA libraries that implement the GPU‑first genomics engine:
    • common/ – shared utilities and CUDA guards.
    • io/ – FASTA/Q/CRAM ingest, pinned host buffers, S3 staging helpers.
    • fm_index/, fm2_index/ – FM index construction and queries.
    • mapper/ – ORBIT/seed‑and‑extend mapper, WFA/DPX paths, CPU fallback.
    • sw/ – Smith–Waterman kernels and host wrappers.
    • pairhmm/ – log‑space PairHMM kernels (with optional DPX acceleration).
    • variant/ – variant calling stages and orchestration.
  • include/OGN/ – public headers that mirror the OGN/ layout; this is the stable C++ include surface for downstream applications.
  • include/ogn/ – higher‑level C++ APIs (ogn_run_api.hpp, ogn_variant_runner_api.hpp, schema.hpp, etc.) and schema types shared with tools and SDKs.
The CMake target ogn links the core engine pieces; CLI tools under apps/ and benchmarks under bench/ depend on this library.

2. Host scheduling and pipelines

  • apps/ – host binaries:
    • ogn_run – end‑to‑end mapping + variant calling runner for local use.
    • ogn_variant_runner – pipeline entrypoint used by orchestration layers.
    • ogn_profile_exporter – Prometheus‑style metrics exporter for profiles.
  • pipelines/ – YAML/JSON pipeline definitions that describe how to map from inputs (reads + reference) to outputs (VCF, metrics, artifacts).
  • profiles/ – small JSON descriptors for named profiles (e.g., illumina_wgs), used by the Python CLI and jobs API.
  • schema/ and schemas/ – JSON/Protobuf schema definitions and generated code for profile, provenance, and metrics payloads.
These components are responsible for resource scheduling, device selection, and composing the lower‑level kernels into usable pipelines. They deliberately keep the core engine decoupled from any particular workflow manager or orchestration stack.

3. Data bundles and OGX

  • data/ogx/ – OmegaAlign (OGX) bundles; each bundle is a directory with a manifest.json and pre‑built indices suitable for GPU ingest.
  • benchmarks/ – benchmark configurations and regression profiles:
    • benchmarks/runs_hg002_wgs_ogn/ – HG002 WGS benchmark outputs.
    • benchmarks/regression/ (planned) – small golden regression packs used for CI perf/accuracy checks.
  • pipelines/ + profiles/ + OGX bundle paths together form the contract “given reads + reference + profile → VCF + metrics”.
The scripts/ogn-doctor.sh helper inspects whether a bundle such as data/ogx/chr20_10M.ogx.json/ is present and reports which parts of the test suite can run.

4. Verification and benchmarks

  • tests/ – CTest/C++ tests plus Python tests:
    • Unit tests for core kernels (FM index, SW, PairHMM, mapper).
    • Integration tests such as Smoke_RunAndVerify and AlignmentTraceback.
    • CLI tests for ogn_cli and schema/provenance checks.
  • bench/ and tools/ – benchmarking harnesses:
    • ogn_bench_* binaries for FM, SW, WFA, PairHMM, mapper.
    • tools/perf_guard.py and bench/perf_baseline.json for CI perf gating.
    • GIAB tooling (tools/run_giab_validation.py, tools/collect_giab_metrics.py) for HG002/HG005/HG007 validation.
  • CI workflows:
    • ogn-engine-core – golden linux‑release build + tests (CPU and GPU).
    • ogn-bench-regression – perf regression guard versus baseline.
    • ogn-giab-hg002-chr20 and ogn-giab-wgs – GIAB correctness checks.
Over time, VeriBiota‑style “proof profiles” will sit on top of these artifacts: small JSON schemas for alignment/variant proofs plus validators that can be run in CI against representative runs.

5. SDKs and integration points

  • OGN Core Kit (ogn-core-kit) – open adoption surface: ogn CLI, ogn-runner, Job Spec v1, SDKs, and workflow adapters.
  • openapi/ – OpenAPI spec for the gateway; maps the core “reads + reference + profile → VCF + metrics” contract to HTTP.
The platform treats the core kit and the gateway as integration layers over a small, stable set of contracts (Job Spec, artifact identity rules, and proof/provenance schemas).

6. What is considered stable

At a high level:
  • Stable, versioned surface:
    • C++ headers under include/OGN/ and include/ogn/.
    • JSON/Protobuf schemas under schema/ and schemas/.
    • The Python/Rust SDK APIs (ogn_sdk) plus the public CLI/runner shipped via OGN Core Kit.
  • Internal / subject to change:
    • Most code under OGN/ and apps/ that is not explicitly documented as public.
    • Experimental scripts under scripts/dev/, benchmarking harnesses, and ad‑hoc tooling under tools/.
Future refactors should preserve the stable surface while allowing internal layout changes, new kernels, and new pipelines behind those contracts.
Platform architecture | OGN documentation | Omnis Genomics