Skip to content
Docs · OGN platform

GPU-native genomics operating system

From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.

CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
OGN CLI quickstart

OGN CLI quickstart

OGN runs a continuous GPU pipeline from raw reads to GIAB-checked variant calls with no CPU bottlenecks in the hot path. This quickstart is the control surface: it gets you from “fresh environment” to a validated HG002 run and then to your own sample with the same profile.
This guide shows how to use the ogn command-line tool in four steps:
  1. Check your setup
  2. Stage the HG002 bundle
  3. Run a GIAB-checked HG002 analysis
  4. Run the same profile on your own sample
The mental model is simple:
doctor → setup → run → results

1. Requirements

You need:
  1. The OGN CLI installed (python -m pip install ogn-sdk)
  2. Access to an OGN engine binary (ogn_variant_runner) on your PATH (distributed separately)
  3. Python available
  4. samtools and hap.py on the PATH for GIAB validation
  5. A writable data root (default /data)
By default the CLI uses:
  • data root: /data or the OGN_DATA_ROOT environment variable
  • runs root: <repo>/runs or the OGN_RUNS_ROOT environment variable
You can override the data root per command with --data-root.

2. Directory layout expectations

At minimum the CLI expects:

2.1 HG002 bundle (for GIAB path)

After ogn setup hg002 you should have:
<data_root>/hg002/
    HG002_wgs.cram            # or HG002_wgs.bam
    HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
    HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
    HG002_GRCh38_1_22_v4.2.1_benchmark.bed
    GRCh38_noalt.fa
    GRCh38_noalt.fa.fai

<data_root>/profiles/hg002_summary.json
The summary JSON records the concrete file names under <data_root>/hg002 and is how ogn run discovers the HG002 bundle.

2.2 Reference for simple runs

For any other sample name, the default reference layout is:
<data_root>/reference/GRCh38_noalt.fa
<data_root>/reference/GRCh38_noalt.fa.fai
Later we can add a config override, but this is the default.

2.3 Sample layout for simple runs

For a sample named patient123:
<data_root>/patient123/patient123.cram
or
<data_root>/patient123/patient123.bam
The CLI will not guess other names for now. If the files are missing it fails fast with a clear hint.

3. Step one: check readiness with ogn doctor

From your OGN repo or from a venv where ogn is installed:
BashPowerShellPython API (coming)
ogn CLI
ogn doctor
Typical good output:
OK GPU ready, OGN ready, GIAB bundle available
That line means:
  • a CUDA device was detected
  • ogn_variant_runner was found
  • the data root exists and is writable
  • the HG002 summary exists under <data_root>/profiles/hg002_summary.json
If you do not use /data you can point the doctor at another root:
BashPowerShellPython API (coming)
ogn CLI
ogn doctor --data-root /your/path
You can also get machine-readable output:
BashPowerShellPython API (coming)
ogn CLI
ogn doctor --json
which returns a small JSON summary including GPU info and bundle status.

4. Step two: stage HG002 with ogn setup hg002

To stage the standard HG002 bundle into your data root:
BashPowerShellPython API (coming)
ogn CLI
ogn setup hg002 --data-root /data
This command:
  • Creates /data/hg002 if it does not exist
  • Copies or links the HG002 BAM or CRAM, truth VCF and BED, and reference FASTA into that directory
  • Runs basic integrity checks (e.g. samtools quickcheck if available)
  • Writes a summary JSON at /data/profiles/hg002_summary.json
On success you see something like:
HG002 bundle ready under /data/hg002
Summary: /data/profiles/hg002_summary.json
If the CLI cannot write under /data you will see an error such as:
Error: Could not create /data/hg002. Ask your admin to grant write access or provide a different data root with --data-root /your/path
In that case pick another directory and reuse it as your --data-root for the rest of this guide.

5. Step three: GIAB-checked HG002 run

This is the main “press button, get truth-checked run” command:
BashPowerShellPython API (coming)
ogn CLI
ogn run profile=illumina_wgs sample=hg002 --data-root /data
What happens:
  • ogn loads /data/profiles/hg002_summary.json
  • It resolves the reference, CRAM or BAM, and truth VCF and BED
  • It builds a run directory under <runs_root>/illumina_wgs_hg002/<timestamp>/
  • It calls tools/run_giab_validation.py with ogn_variant_runner as the engine
  • It copies the final GIAB gVCF into results/HG002.g.vcf.gz
  • It writes:
    • status.json with state and any error info
    • metrics.json with runtime, paths, and accuracy numbers
A successful run ends with a short summary:
Run complete
Profile illumina_wgs
Sample hg002
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_hg002/2025.../results
If something goes wrong you will always see:
Run failed
Profile illumina_wgs
Sample hg002
Error: <short title>
Hint: <how to fix>
Details: <log path or (no log)>
and status.json in the run directory will have a matching error block.

6. Step four: inspect results with ogn results

To inspect the latest run:
BashPowerShellPython API (coming)
ogn CLI
ogn results --latest
Example output for a successful GIAB run:
Run: /path/to/runs/illumina_wgs_hg002/2025...
Profile: illumina_wgs
Sample: hg002
State: succeeded

Outputs:
  VCF: /path/to/runs/.../results/HG002.g.vcf.gz
  BAM/CRAM: None

Accuracy vs GIAB HG002
  Panel        HG002_wgs
  Truth build  GRCh38 VCF v4.2.1 BED v4.2.1
  SNP F1       0.9978
  INDEL F1     0.9712
  Status       PASS
You can also point ogn results at a specific run directory:
BashPowerShellPython API (coming)
ogn CLI
ogn results --run /path/to/runs/illumina_wgs_hg002/2025...
For automation and dashboards you can fetch the combined JSON view:
BashPowerShellPython API (coming)
ogn CLI
ogn results --latest --json
That returns the run directory, basic fields, and the full metrics.json including the accuracy block.

7. Simple mode for your own samples

For a real sample, the command shape is the same, just with a different sample name and without GIAB truth.

7.1 Prepare the reference

Make the reference directory once under your data root:
BashPowerShellPython API (coming)
ogn CLI
mkdir -p /data/reference
cp GRCh38_noalt.fa /data/reference/
samtools faidx /data/reference/GRCh38_noalt.fa
You should now have:
/data/reference/GRCh38_noalt.fa
/data/reference/GRCh38_noalt.fa.fai

7.2 Prepare the sample

For a sample named patient123:
BashPowerShellPython API (coming)
ogn CLI
mkdir -p /data/patient123
cp patient123.cram /data/patient123/patient123.cram
or:
BashPowerShellPython API (coming)
ogn CLI
cp patient123.bam /data/patient123/patient123.bam
The CLI expects exactly these names today: <sample>/<sample>.cram or <sample>/<sample>.bam.

7.3 Run the simple profile

Use the same profile:
BashPowerShellPython API (coming)
ogn CLI
ogn run profile=illumina_wgs sample=patient123 --data-root /data
The CLI:
  • Checks for /data/patient123/patient123.cram then /data/patient123/patient123.bam
  • Verifies the reference under /data/reference
  • Builds a run directory under <runs_root>/illumina_wgs_patient123/<timestamp>/
  • Calls ogn_variant_runner directly and writes results/patient123.g.vcf.gz
  • Writes status.json and a metrics.json without an accuracy block
Success looks like:
Run complete
Profile illumina_wgs
Sample patient123
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_patient123/2025.../results
If the CRAM or BAM is missing you will see for example:
Run failed
Profile illumina_wgs
Sample patient123
Error: No CRAM or BAM found for sample patient123 under /data/patient123
Hint: Expected patient123.cram or patient123.bam under /data/patient123.
Details: (no log)
If the reference is missing you will see a similar message pointing at /data/reference.

7.4 Inspect simple run results

Use the same results command:
BashPowerShellPython API (coming)
ogn CLI
ogn results --latest
For simple runs the output looks the same but does not contain the GIAB accuracy section, because there is no truth dataset:
Run: /path/to/runs/illumina_wgs_patient123/2025...
Profile: illumina_wgs
Sample: patient123
State: succeeded

Outputs:
  VCF: /path/to/runs/.../results/patient123.g.vcf.gz
  BAM/CRAM: None
The JSON output still includes the metrics.json with runtime and GPU info.

8. Developer notes

When working inside the repo, you can call the CLI through the module path instead of installing it:
BashPowerShellPython API (coming)
ogn CLI
python -m sdk.python.ogn_cli.main doctor --data-root /data
python -m sdk.python.ogn_cli.main setup hg002 --data-root /data
python -m sdk.python.ogn_cli.main run illumina_wgs hg002 --data-root /data
python -m sdk.python.ogn_cli.main results --latest
The behavior is identical to the ogn entry point; the module form is just convenient while hacking on the codebase.
With this CLI and layout in place, a new hire only has to learn a single pattern:
doctor to check, setup to stage data, run to launch a profile, results to see what happened.
OGN documentation: CLI, pipelines, deployment, benchmarks | Omnis Genomics