Skip to content
Docs · OGN platform

GPU-native genomics operating system

From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.

CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
Public benchmark page

OGN Benchmark Contract v1.1 (Public Summary)

  • Panels: GIAB core (HG001-HG007) smoke slice, multi-platform HG002 smoke, CHM13 perf stress
  • Reference/Truth: GRCh38 + GIAB v4.2.1 for accuracy; CHM13v2.0 for perf stress
  • Determinism: fixed manifests, seeds, and env.json (hardware, drivers, container, git tag)
  • CI gating: accuracy and performance on every PR/push; gold snapshots stored in-repo
  • Reproducibility: validated across CPU/GPU targets; env + metrics published with releases
  • Contract files: metrics_spec.json, metrics_perf_spec.json, thresholds_*.json, gold TSVs under benchmarks/gold
  • Badge: CLI and UI show “Verified by OGN Benchmark Contract v1.1 ✓” with hover text citing GIAB/CHM13 panels

GPU-native ingest contract (OGX chr20)

  • Scope: chr20 OGX micro-bundle + overrides; validates Omega chain overrides, ingest performance, and OGC stamping.
  • Parity: device ingest path is alignment-identical to the host path (MapperDeviceIngestEquiv GPU test).
  • Perf gate: h2d throughput and wall-time thresholds calibrated on RTX 5070 and enforced in CI (benchmarks/thresholds_ingest_ogx_chr20.json).
  • Provenance: ingest metrics flow from mapper logs → bench TSV → OGC manifests, so every capsule carries real h2d bytes/gbps and wall time.
  • Status is reported alongside GIAB in CI/release notes; failures block merges/releases just like the accuracy/perf panels.

Coming next (v1.2 target): HG002 multi-platform consistency

  • Illumina 2x150, PacBio HiFi, and ONT R10.4.1 on HG002
  • Metrics: per-platform recall/precision plus cross-platform genotype concordance in GIAB confident regions
  • Will ship with thresholds, gold TSVs/JSON, and a CI gate like the other panels
To ship a release, OGN must pass this contract. Releases list the contract version passed and the environments used. Public site targets: ogn.bio/bench, ogn.bio/benchmark-contract, ogn.bio/verified.

How to reproduce (public buckets)

  • Grab the manifests under benchmarks/smoke/ (GIAB chr20 slices) and benchmarks/smoke_multip/ (HG002 multi-platform smoke).
  • Inputs are publicly hosted in GIAB/NCBI buckets; see benchmarks/configs/benchmarks.yaml for URLs.
  • Run python tools/ogn_bench.py bench smoke and compare with the in-repo gold TSVs.

Public microbench contract (v1.0)

  • Purpose: expose the low-level knobs reviewers ask about (FM lookup, S3 jitter, a single GPU kernel timing harness, a CPU reference baseline).
  • Contract: bench/micro/manifest.yaml → JSONL records (schema ogn/bench/micro/v1).
  • Golden + thresholds: bench/micro/gold/baseline.jsonl, bench/micro/thresholds.json.
  • Run: ./ogn bench micro --build-missing (or tools/run_bench.sh for smoke+micro in one command).

Three headline numbers (fill + rebaseline on release hardware)

  • CPU baseline (SW align_ref, 2048×128×128): ~6.2k alignments/s, ~0.10 GCUPS (bench/micro/gold/baseline.jsonl).
  • OGN GPU (SW kernel, 10k×256×256 on RTX 5070): ~1.03M alignments/s, ~67.5 GCUPS (bench/micro/gold/baseline.jsonl).
  • FM lookup (synthetic 1 MiB text, 100k×L16 queries, loops=256): see bench/micro/gold/baseline.jsonl.
  • Competitor placeholder (Parabricks-style): TBD (tracked in docs; replace with real runs once licensed).
Pipeline perf_guard baseline (chr20_sim32): 520k reads/s, 3.8 GCUPS, 9.2 GB/s ingest (bench/perf_baseline.json).

Cost estimate (procurement-friendly)

After any run, use:
BashPowerShellPython API (coming)
ogn CLI
ogn estimate-cost <profile>
This prints per-run GPU hours, storage, and egress estimates plus a USD breakdown using config/cost_model.json (overrideable via OGN_*_USD env vars).
Example output (illustrative; not a performance claim):
$ ogn estimate-cost illumina_wgs
Profile: illumina_wgs
Run: /path/to/runs/illumina_wgs_hg002/20251216T120000Z
State: succeeded
GPU: NVIDIA A100 (count=1)
Wall: 2592.00s

Estimates (per run):
  GPU hours:  0.7200
  Storage:   12.4000 GB (12400000000 bytes)
  Egress:    0.3000 GB (300000000 bytes; scope=results)
  Retention: 30 days

Costs (USD; config/cost_model.json):
  GPU compute: $1.7280 @ $2.4000/hr (A100_80GB)
  Storage:    $0.2852 @ $0.0230/GB-month (scaled to 30d)
  Egress:     $0.0270 @ $0.0900/GB
  Total:      $2.0402

System bench contract (HG002)

  • HG002 is the end-to-end credibility anchor; we keep the harnesses visible and versioned.
  • Harness: bench/hg002/run_bench.sh (local runner that stages inputs, executes Nextflow pipelines/giab_validation.nf, and writes bench/reports/<timestamp>/).
  • Contract: accuracy/perf gated against in-repo gold/thresholds (see bench/hg002/baseline.json plus benchmarks/thresholds_*.json and benchmarks/gold/*).
  • UI/CLI language stays consistent with microbench: versioned manifest → machine-readable outputs → gold + thresholds → CI gate.
Public benchmark page | OGN documentation | Omnis Genomics