Skip to content
Docs · OGN platform

GPU-native genomics operating system

From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.

CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
PatentChecker enzyme-platform quickstart

PatentChecker enzyme-platform quickstart

PatentChecker now includes an enzyme-platform workflow for mRNA manufacturing and other IVT enzyme disclosures.
The common path is:
  1. Ingest a WIPO/PCT publication or a signed sequence-listing evidence pack.
  2. Normalize disclosed sequences against a pinned reference (default: canonical T7 RNA polymerase).
  3. Extract patent text sections with provenance and confidence metadata.
  4. Emit a deterministic report with candidate ranking, typed comparisons, and raw-artifact references.
This is an evidence pipeline, not legal advice. It helps counsel and platform teams inspect what was disclosed, how closely it matches a known polymerase reference, and which patent-text sections support the result.

What the enzyme-platform workflow adds

  • compile-pack: build an enzyme-platform report from an existing signed sequence-listing evidence pack.
  • compile-wipo: start from a WO... publication, run WIPO ingest, fetch patent text, and compile the report in one step.
  • Zero-config T7 reference selection: the default path uses the pinned t7_rnap preset, so the common workflow does not require manual FASTA wiring.
  • Patent-text provenance: abstract, claims, and description each carry source, extraction method, confidence tier, hash, and raw-artifact path.
  • Candidate ranking: disclosed protein sequences are ranked for polymerase-likeness before full comparison.
  • Type gating: DNA/RNA disclosures are not treated as protein matches; they are flagged explicitly as type mismatches.

Fast path: compile from a signed evidence pack

Use this when you already have a sequence-listing evidence pack:
BashPowerShellPython API (coming)
ogn CLI
patentchecker enzyme-platform compile-pack \
  --evidence-pack out/sequence_listing_st26__sha256_<...> \
  --sequence-header SEQ_ID_NO_78 \
  --sequence-header SEQ_ID_NO_79 \
  --spec spec.txt \
  --out out/enzyme_platform_report.json
Common behavior:
  • default reference preset: t7_rnap
  • deterministic report schema: enzyme-platform-report.v0.1
  • typed disclosed sequences: protein, dna, or rna

Fast path: compile directly from a WIPO publication

Use this when you want PatentChecker to run the WIPO ingest step for you:
BashPowerShellPython API (coming)
ogn CLI
patentchecker enzyme-platform compile-wipo \
  --wipo WO2024211833A2 \
  --key-id <key-id> \
  --private-key-file <private-key.b64> \
  --sequence-header SEQ_ID_NO_78 \
  --sequence-header SEQ_ID_NO_79 \
  --out out/WO2024211833A2.enzyme_platform_report.json
What happens inside compile-wipo:
  1. WIPO sequence-listing ingest creates a signed evidence pack.
  2. Patent text is fetched and normalized.
  3. Section-level provenance is attached for abstract, claims, and description.
  4. The enzyme-platform compiler generates the final report.

Patent text ingest and provenance

PatentChecker records where each text section came from, not just the merged text.
Current high-confidence sources:
  • Google Patents structured HTML
  • WIPO PATENTSCOPE HTML
  • PDF text layer fallback when structured HTML is unavailable
Each selected section includes:
  • section
  • source
  • source_url
  • extraction_method
  • confidence
  • sha256
  • artifact_path
Each run also persists the fetched raw patent-text artifacts so reviewers can inspect the exact bytes that produced the normalized section text.
When counsel needs high-confidence sections only, keep strict patent-text handling enabled in your operational workflow.

Candidate ranking and typed comparisons

The report includes a candidate_ranking block built for polymerase-likeness against the selected reference.
The current strategy uses:
  • length proximity to canonical T7 RNAP
  • offset-tolerant seed-window matching
  • N-terminal similarity sketch
  • motif-window similarity
This ranking is intended to surface the most plausible polymerase disclosures before full comparison review.
The report also carries typed comparisons:
  • protein vs protein comparisons receive full alignment and mutation output
  • DNA/RNA vs protein comparisons are zeroed and marked with type_mismatch
That prevents biologically invalid comparisons from being reported as meaningful hits.

Key report fields

Expect these fields in the report:
{
  "candidate_ranking": {
    "strategy": "polymerase_likeness_v0",
    "reference_id": "t7_rnap"
  },
  "section_provenance": [
    {
      "section": "claims",
      "source": "google_patents",
      "confidence": "high",
      "artifact_path": "..."
    }
  ],
  "patent_text_artifacts": [
    {
      "source": "google_patents",
      "sha256": "sha256:..."
    }
  ],
  "disclosed_sequences": [
    {
      "molecule_type": "protein",
      "comparisons": [
        {
          "reference_id": "t7_rnap",
          "identity_fraction": 1.0
        }
      ]
    }
  ]
}

When to use this workflow

Use the enzyme-platform lane when you care about:
  • engineered polymerases in mRNA manufacturing
  • IVT process disclosures tied to sequence listings
  • dsRNA / yield / fidelity claims that need a sequence-backed evidence packet
  • counsel review of polymerase-like candidates inside large PCT sequence listings
PatentChecker enzyme-platform quickstart | OGN documentation | Omnis Genomics