Docs · OGN platform

GPU-native genomics operating system

From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.

CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable

Viewing

OGN CLI quickstart

Benchmark pack Example pipelines GitHub

OGN CLI quickstart

OGN runs a continuous GPU pipeline from raw reads to GIAB-checked variant calls with no CPU bottlenecks in the hot path. This quickstart is the control surface: it gets you from “fresh environment” to a validated HG002 run and then to your own sample with the same profile.

This guide shows how to use the ogn command-line tool in four steps:

Check your setup
Stage the HG002 bundle
Run a GIAB-checked HG002 analysis
Run the same profile on your own sample

The mental model is simple:

doctor → setup → run → results

1. Requirements

You need:

The OGN CLI installed (python -m pip install ogn-sdk)
Access to an OGN engine binary (ogn_variant_runner) on your PATH (distributed separately)
Python available
samtools and hap.py on the PATH for GIAB validation
A writable data root (default /data)

By default the CLI uses:

data root: /data or the OGN_DATA_ROOT environment variable
runs root: <repo>/runs or the OGN_RUNS_ROOT environment variable

You can override the data root per command with --data-root.

2. Directory layout expectations

At minimum the CLI expects:

2.1 HG002 bundle (for GIAB path)

After ogn setup hg002 you should have:

<data_root>/hg002/
    HG002_wgs.cram            # or HG002_wgs.bam
    HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
    HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
    HG002_GRCh38_1_22_v4.2.1_benchmark.bed
    GRCh38_noalt.fa
    GRCh38_noalt.fa.fai

<data_root>/profiles/hg002_summary.json

The summary JSON records the concrete file names under <data_root>/hg002 and is how ogn run discovers the HG002 bundle.

2.2 Reference for simple runs

For any other sample name, the default reference layout is:

<data_root>/reference/GRCh38_noalt.fa
<data_root>/reference/GRCh38_noalt.fa.fai

Later we can add a config override, but this is the default.

2.3 Sample layout for simple runs

For a sample named patient123:

<data_root>/patient123/patient123.cram

<data_root>/patient123/patient123.bam

The CLI will not guess other names for now. If the files are missing it fails fast with a clear hint.

3. Step one: check readiness with `ogn doctor`

From your OGN repo or from a venv where ogn is installed:

BashPowerShellPython API (coming)

ogn CLI

ogn doctor

Typical good output:

OK GPU ready, OGN ready, GIAB bundle available

That line means:

a CUDA device was detected
ogn_variant_runner was found
the data root exists and is writable
the HG002 summary exists under <data_root>/profiles/hg002_summary.json

If you do not use /data you can point the doctor at another root:

BashPowerShellPython API (coming)

ogn CLI

ogn doctor --data-root /your/path

You can also get machine-readable output:

BashPowerShellPython API (coming)

ogn CLI

ogn doctor --json

which returns a small JSON summary including GPU info and bundle status.

4. Step two: stage HG002 with `ogn setup hg002`

To stage the standard HG002 bundle into your data root:

BashPowerShellPython API (coming)

ogn CLI

ogn setup hg002 --data-root /data

This command:

Creates /data/hg002 if it does not exist
Copies or links the HG002 BAM or CRAM, truth VCF and BED, and reference FASTA into that directory
Runs basic integrity checks (e.g. samtools quickcheck if available)
Writes a summary JSON at /data/profiles/hg002_summary.json

On success you see something like:

HG002 bundle ready under /data/hg002
Summary: /data/profiles/hg002_summary.json

If the CLI cannot write under /data you will see an error such as:

Error: Could not create /data/hg002. Ask your admin to grant write access or provide a different data root with --data-root /your/path

In that case pick another directory and reuse it as your --data-root for the rest of this guide.

5. Step three: GIAB-checked HG002 run

This is the main “press button, get truth-checked run” command:

BashPowerShellPython API (coming)

ogn CLI

ogn run profile=illumina_wgs sample=hg002 --data-root /data

What happens:

ogn loads /data/profiles/hg002_summary.json
It resolves the reference, CRAM or BAM, and truth VCF and BED
It builds a run directory under <runs_root>/illumina_wgs_hg002/<timestamp>/
It calls tools/run_giab_validation.py with ogn_variant_runner as the engine
It copies the final GIAB gVCF into results/HG002.g.vcf.gz
It writes:
- status.json with state and any error info
- metrics.json with runtime, paths, and accuracy numbers

A successful run ends with a short summary:

Run complete
Profile illumina_wgs
Sample hg002
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_hg002/2025.../results

If something goes wrong you will always see:

Run failed
Profile illumina_wgs
Sample hg002
Error: <short title>
Hint: <how to fix>
Details: <log path or (no log)>

and status.json in the run directory will have a matching error block.

6. Step four: inspect results with `ogn results`

To inspect the latest run:

BashPowerShellPython API (coming)

ogn CLI

ogn results --latest

Example output for a successful GIAB run:

Run: /path/to/runs/illumina_wgs_hg002/2025...
Profile: illumina_wgs
Sample: hg002
State: succeeded

Outputs:
  VCF: /path/to/runs/.../results/HG002.g.vcf.gz
  BAM/CRAM: None

Accuracy vs GIAB HG002
  Panel        HG002_wgs
  Truth build  GRCh38 VCF v4.2.1 BED v4.2.1
  SNP F1       0.9978
  INDEL F1     0.9712
  Status       PASS

You can also point ogn results at a specific run directory:

BashPowerShellPython API (coming)

ogn CLI

ogn results --run /path/to/runs/illumina_wgs_hg002/2025...

For automation and dashboards you can fetch the combined JSON view:

BashPowerShellPython API (coming)

ogn CLI

ogn results --latest --json

That returns the run directory, basic fields, and the full metrics.json including the accuracy block.

7. Simple mode for your own samples

For a real sample, the command shape is the same, just with a different sample name and without GIAB truth.

7.1 Prepare the reference

Make the reference directory once under your data root:

BashPowerShellPython API (coming)

ogn CLI

mkdir -p /data/reference
cp GRCh38_noalt.fa /data/reference/
samtools faidx /data/reference/GRCh38_noalt.fa

You should now have:

/data/reference/GRCh38_noalt.fa
/data/reference/GRCh38_noalt.fa.fai

7.2 Prepare the sample

For a sample named patient123:

BashPowerShellPython API (coming)

ogn CLI

mkdir -p /data/patient123
cp patient123.cram /data/patient123/patient123.cram

or:

BashPowerShellPython API (coming)

ogn CLI

cp patient123.bam /data/patient123/patient123.bam

The CLI expects exactly these names today: <sample>/<sample>.cram or <sample>/<sample>.bam.

7.3 Run the simple profile

Use the same profile:

BashPowerShellPython API (coming)

ogn CLI

ogn run profile=illumina_wgs sample=patient123 --data-root /data

The CLI:

Checks for /data/patient123/patient123.cram then /data/patient123/patient123.bam
Verifies the reference under /data/reference
Builds a run directory under <runs_root>/illumina_wgs_patient123/<timestamp>/
Calls ogn_variant_runner directly and writes results/patient123.g.vcf.gz
Writes status.json and a metrics.json without an accuracy block

Success looks like:

Run complete
Profile illumina_wgs
Sample patient123
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_patient123/2025.../results

If the CRAM or BAM is missing you will see for example:

Run failed
Profile illumina_wgs
Sample patient123
Error: No CRAM or BAM found for sample patient123 under /data/patient123
Hint: Expected patient123.cram or patient123.bam under /data/patient123.
Details: (no log)

If the reference is missing you will see a similar message pointing at /data/reference.

7.4 Inspect simple run results

Use the same results command:

BashPowerShellPython API (coming)

ogn CLI

ogn results --latest

For simple runs the output looks the same but does not contain the GIAB accuracy section, because there is no truth dataset:

Run: /path/to/runs/illumina_wgs_patient123/2025...
Profile: illumina_wgs
Sample: patient123
State: succeeded

Outputs:
  VCF: /path/to/runs/.../results/patient123.g.vcf.gz
  BAM/CRAM: None

The JSON output still includes the metrics.json with runtime and GPU info.

8. Developer notes

When working inside the repo, you can call the CLI through the module path instead of installing it:

BashPowerShellPython API (coming)

ogn CLI

python -m sdk.python.ogn_cli.main doctor --data-root /data
python -m sdk.python.ogn_cli.main setup hg002 --data-root /data
python -m sdk.python.ogn_cli.main run illumina_wgs hg002 --data-root /data
python -m sdk.python.ogn_cli.main results --latest

The behavior is identical to the ogn entry point; the module form is just convenient while hacking on the codebase.

With this CLI and layout in place, a new hire only has to learn a single pattern:

doctor to check, setup to stage data, run to launch a profile, results to see what happened.

GPU-native genomics operating system

OGN CLI quickstart

1. Requirements

2. Directory layout expectations

2.1 HG002 bundle (for GIAB path)

2.2 Reference for simple runs

2.3 Sample layout for simple runs

3. Step one: check readiness with ogn doctor

4. Step two: stage HG002 with ogn setup hg002

5. Step three: GIAB-checked HG002 run

6. Step four: inspect results with ogn results

7. Simple mode for your own samples

7.1 Prepare the reference

7.2 Prepare the sample

7.3 Run the simple profile

7.4 Inspect simple run results

8. Developer notes

3. Step one: check readiness with `ogn doctor`

4. Step two: stage HG002 with `ogn setup hg002`

6. Step four: inspect results with `ogn results`