Docs · OGN platform
GPU-native genomics operating system
From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.
CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
OGN CLI quickstart
OGN CLI quickstart
OGN runs a continuous GPU pipeline from raw reads to GIAB-checked variant calls with no CPU bottlenecks in the hot path. This quickstart is the control surface: it gets you from “fresh environment” to a validated HG002 run and then to your own sample with the same profile.
This guide shows how to use the
ogn command-line tool in four steps:- Check your setup
- Stage the HG002 bundle
- Run a GIAB-checked HG002 analysis
- Run the same profile on your own sample
The mental model is simple:
doctor → setup → run → results
1. Requirements
You need:
- The OGN CLI installed (
python -m pip install ogn-sdk) - Access to an OGN engine binary (
ogn_variant_runner) on yourPATH(distributed separately) - Python available
samtoolsandhap.pyon thePATHfor GIAB validation- A writable data root (default
/data)
By default the CLI uses:
- data root:
/dataor theOGN_DATA_ROOTenvironment variable - runs root:
<repo>/runsor theOGN_RUNS_ROOTenvironment variable
You can override the data root per command with
--data-root.2. Directory layout expectations
At minimum the CLI expects:
2.1 HG002 bundle (for GIAB path)
After
ogn setup hg002 you should have:<data_root>/hg002/
HG002_wgs.cram # or HG002_wgs.bam
HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz.tbi
HG002_GRCh38_1_22_v4.2.1_benchmark.bed
GRCh38_noalt.fa
GRCh38_noalt.fa.fai
<data_root>/profiles/hg002_summary.json
The summary JSON records the concrete file names under
<data_root>/hg002 and is how ogn run discovers the HG002 bundle.2.2 Reference for simple runs
For any other sample name, the default reference layout is:
<data_root>/reference/GRCh38_noalt.fa
<data_root>/reference/GRCh38_noalt.fa.fai
Later we can add a config override, but this is the default.
2.3 Sample layout for simple runs
For a sample named
patient123:<data_root>/patient123/patient123.cram
or
<data_root>/patient123/patient123.bam
The CLI will not guess other names for now. If the files are missing it fails fast with a clear hint.
3. Step one: check readiness with ogn doctor
From your OGN repo or from a venv where
ogn is installed:BashPowerShellPython API (coming)
ogn CLIogn doctorTypical good output:
OK GPU ready, OGN ready, GIAB bundle available
That line means:
- a CUDA device was detected
ogn_variant_runnerwas found- the data root exists and is writable
- the HG002 summary exists under
<data_root>/profiles/hg002_summary.json
If you do not use
/data you can point the doctor at another root:BashPowerShellPython API (coming)
ogn CLIogn doctor --data-root /your/pathYou can also get machine-readable output:
BashPowerShellPython API (coming)
ogn CLIogn doctor --jsonwhich returns a small JSON summary including GPU info and bundle status.
4. Step two: stage HG002 with ogn setup hg002
To stage the standard HG002 bundle into your data root:
BashPowerShellPython API (coming)
ogn CLIogn setup hg002 --data-root /dataThis command:
- Creates
/data/hg002if it does not exist - Copies or links the HG002 BAM or CRAM, truth VCF and BED, and reference FASTA into that directory
- Runs basic integrity checks (e.g.
samtools quickcheckif available) - Writes a summary JSON at
/data/profiles/hg002_summary.json
On success you see something like:
HG002 bundle ready under /data/hg002
Summary: /data/profiles/hg002_summary.json
If the CLI cannot write under
/data you will see an error such as:Error: Could not create /data/hg002. Ask your admin to grant write access or provide a different data root with --data-root /your/path
In that case pick another directory and reuse it as your
--data-root for the rest of this guide.5. Step three: GIAB-checked HG002 run
This is the main “press button, get truth-checked run” command:
BashPowerShellPython API (coming)
ogn CLIogn run profile=illumina_wgs sample=hg002 --data-root /dataWhat happens:
ognloads/data/profiles/hg002_summary.json- It resolves the reference, CRAM or BAM, and truth VCF and BED
- It builds a run directory under
<runs_root>/illumina_wgs_hg002/<timestamp>/ - It calls
tools/run_giab_validation.pywithogn_variant_runneras the engine - It copies the final GIAB gVCF into
results/HG002.g.vcf.gz - It writes:
status.jsonwith state and any error infometrics.jsonwith runtime, paths, and accuracy numbers
A successful run ends with a short summary:
Run complete
Profile illumina_wgs
Sample hg002
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_hg002/2025.../results
If something goes wrong you will always see:
Run failed
Profile illumina_wgs
Sample hg002
Error: <short title>
Hint: <how to fix>
Details: <log path or (no log)>
and
status.json in the run directory will have a matching error block.6. Step four: inspect results with ogn results
To inspect the latest run:
BashPowerShellPython API (coming)
ogn CLIogn results --latestExample output for a successful GIAB run:
Run: /path/to/runs/illumina_wgs_hg002/2025...
Profile: illumina_wgs
Sample: hg002
State: succeeded
Outputs:
VCF: /path/to/runs/.../results/HG002.g.vcf.gz
BAM/CRAM: None
Accuracy vs GIAB HG002
Panel HG002_wgs
Truth build GRCh38 VCF v4.2.1 BED v4.2.1
SNP F1 0.9978
INDEL F1 0.9712
Status PASS
You can also point
ogn results at a specific run directory:BashPowerShellPython API (coming)
ogn CLIogn results --run /path/to/runs/illumina_wgs_hg002/2025...For automation and dashboards you can fetch the combined JSON view:
BashPowerShellPython API (coming)
ogn CLIogn results --latest --jsonThat returns the run directory, basic fields, and the full
metrics.json including the accuracy block.7. Simple mode for your own samples
For a real sample, the command shape is the same, just with a different sample name and without GIAB truth.
7.1 Prepare the reference
Make the reference directory once under your data root:
BashPowerShellPython API (coming)
ogn CLImkdir -p /data/reference
cp GRCh38_noalt.fa /data/reference/
samtools faidx /data/reference/GRCh38_noalt.faYou should now have:
/data/reference/GRCh38_noalt.fa
/data/reference/GRCh38_noalt.fa.fai
7.2 Prepare the sample
For a sample named
patient123:BashPowerShellPython API (coming)
ogn CLImkdir -p /data/patient123
cp patient123.cram /data/patient123/patient123.cramor:
BashPowerShellPython API (coming)
ogn CLIcp patient123.bam /data/patient123/patient123.bamThe CLI expects exactly these names today:
<sample>/<sample>.cram or <sample>/<sample>.bam.7.3 Run the simple profile
Use the same profile:
BashPowerShellPython API (coming)
ogn CLIogn run profile=illumina_wgs sample=patient123 --data-root /dataThe CLI:
- Checks for
/data/patient123/patient123.cramthen/data/patient123/patient123.bam - Verifies the reference under
/data/reference - Builds a run directory under
<runs_root>/illumina_wgs_patient123/<timestamp>/ - Calls
ogn_variant_runnerdirectly and writesresults/patient123.g.vcf.gz - Writes
status.jsonand ametrics.jsonwithout an accuracy block
Success looks like:
Run complete
Profile illumina_wgs
Sample patient123
GPU NVIDIA RTX 4090
Outputs <runs_root>/illumina_wgs_patient123/2025.../results
If the CRAM or BAM is missing you will see for example:
Run failed
Profile illumina_wgs
Sample patient123
Error: No CRAM or BAM found for sample patient123 under /data/patient123
Hint: Expected patient123.cram or patient123.bam under /data/patient123.
Details: (no log)
If the reference is missing you will see a similar message pointing at
/data/reference.7.4 Inspect simple run results
Use the same results command:
BashPowerShellPython API (coming)
ogn CLIogn results --latestFor simple runs the output looks the same but does not contain the GIAB accuracy section, because there is no truth dataset:
Run: /path/to/runs/illumina_wgs_patient123/2025...
Profile: illumina_wgs
Sample: patient123
State: succeeded
Outputs:
VCF: /path/to/runs/.../results/patient123.g.vcf.gz
BAM/CRAM: None
The JSON output still includes the
metrics.json with runtime and GPU info.8. Developer notes
When working inside the repo, you can call the CLI through the module path instead of installing it:
BashPowerShellPython API (coming)
ogn CLIpython -m sdk.python.ogn_cli.main doctor --data-root /data
python -m sdk.python.ogn_cli.main setup hg002 --data-root /data
python -m sdk.python.ogn_cli.main run illumina_wgs hg002 --data-root /data
python -m sdk.python.ogn_cli.main results --latestThe behavior is identical to the
ogn entry point; the module form is just convenient while hacking on the codebase.With this CLI and layout in place, a new hire only has to learn a single pattern:
doctor to check, setup to stage data, run to launch a profile, results to see what happened.