Docs · OGN platform
GPU-native genomics operating system
From raw reads to GIAB-validated variant calls in a continuous GPU pipeline. This is the control surface for the engine: CLI, pipelines, benchmarks, and deployment runbooks.
CUDA 12+Hopper · AmpereGIAB-validated flowsSchemas stable
Viewing
PatentChecker enzyme-platform quickstart
PatentChecker enzyme-platform quickstart
PatentChecker now includes an enzyme-platform workflow for mRNA manufacturing and other IVT enzyme disclosures.
The common path is:
- Ingest a WIPO/PCT publication or a signed sequence-listing evidence pack.
- Normalize disclosed sequences against a pinned reference (default: canonical T7 RNA polymerase).
- Extract patent text sections with provenance and confidence metadata.
- Emit a deterministic report with candidate ranking, typed comparisons, and raw-artifact references.
This is an evidence pipeline, not legal advice. It helps counsel and platform teams inspect what was disclosed, how closely it matches a known polymerase reference, and which patent-text sections support the result.
What the enzyme-platform workflow adds
compile-pack: build an enzyme-platform report from an existing signed sequence-listing evidence pack.compile-wipo: start from aWO...publication, run WIPO ingest, fetch patent text, and compile the report in one step.- Zero-config T7 reference selection: the default path uses the pinned
t7_rnappreset, so the common workflow does not require manual FASTA wiring. - Patent-text provenance: abstract, claims, and description each carry source, extraction method, confidence tier, hash, and raw-artifact path.
- Candidate ranking: disclosed protein sequences are ranked for polymerase-likeness before full comparison.
- Type gating: DNA/RNA disclosures are not treated as protein matches; they are flagged explicitly as type mismatches.
Fast path: compile from a signed evidence pack
Use this when you already have a sequence-listing evidence pack:
BashPowerShellPython API (coming)
ogn CLIpatentchecker enzyme-platform compile-pack \
--evidence-pack out/sequence_listing_st26__sha256_<...> \
--sequence-header SEQ_ID_NO_78 \
--sequence-header SEQ_ID_NO_79 \
--spec spec.txt \
--out out/enzyme_platform_report.jsonCommon behavior:
- default reference preset:
t7_rnap - deterministic report schema:
enzyme-platform-report.v0.1 - typed disclosed sequences:
protein,dna, orrna
Fast path: compile directly from a WIPO publication
Use this when you want PatentChecker to run the WIPO ingest step for you:
BashPowerShellPython API (coming)
ogn CLIpatentchecker enzyme-platform compile-wipo \
--wipo WO2024211833A2 \
--key-id <key-id> \
--private-key-file <private-key.b64> \
--sequence-header SEQ_ID_NO_78 \
--sequence-header SEQ_ID_NO_79 \
--out out/WO2024211833A2.enzyme_platform_report.jsonWhat happens inside
compile-wipo:- WIPO sequence-listing ingest creates a signed evidence pack.
- Patent text is fetched and normalized.
- Section-level provenance is attached for abstract, claims, and description.
- The enzyme-platform compiler generates the final report.
Patent text ingest and provenance
PatentChecker records where each text section came from, not just the merged text.
Current high-confidence sources:
- Google Patents structured HTML
- WIPO PATENTSCOPE HTML
- PDF text layer fallback when structured HTML is unavailable
Each selected section includes:
sectionsourcesource_urlextraction_methodconfidencesha256artifact_path
Each run also persists the fetched raw patent-text artifacts so reviewers can inspect the exact bytes that produced the normalized section text.
When counsel needs high-confidence sections only, keep strict patent-text handling enabled in your operational workflow.
Candidate ranking and typed comparisons
The report includes a
candidate_ranking block built for polymerase-likeness against the selected reference.The current strategy uses:
- length proximity to canonical T7 RNAP
- offset-tolerant seed-window matching
- N-terminal similarity sketch
- motif-window similarity
This ranking is intended to surface the most plausible polymerase disclosures before full comparison review.
The report also carries typed comparisons:
- protein vs protein comparisons receive full alignment and mutation output
- DNA/RNA vs protein comparisons are zeroed and marked with
type_mismatch
That prevents biologically invalid comparisons from being reported as meaningful hits.
Key report fields
Expect these fields in the report:
{
"candidate_ranking": {
"strategy": "polymerase_likeness_v0",
"reference_id": "t7_rnap"
},
"section_provenance": [
{
"section": "claims",
"source": "google_patents",
"confidence": "high",
"artifact_path": "..."
}
],
"patent_text_artifacts": [
{
"source": "google_patents",
"sha256": "sha256:..."
}
],
"disclosed_sequences": [
{
"molecule_type": "protein",
"comparisons": [
{
"reference_id": "t7_rnap",
"identity_fraction": 1.0
}
]
}
]
}
When to use this workflow
Use the enzyme-platform lane when you care about:
- engineered polymerases in mRNA manufacturing
- IVT process disclosures tied to sequence listings
- dsRNA / yield / fidelity claims that need a sequence-backed evidence packet
- counsel review of polymerase-like candidates inside large PCT sequence listings