PatentChecker

Platform quickstart (control plane + runners)

This is the operational “how to use it” guide for the headless PatentChecker platform: programs, watchlists, runners, drift, webhooks, triage, and retention.

Verification Self-hosting Adapters Security Request a snapshot drift run

PatentChecker is not a “run a command, get an answer” tool. It’s a continuously running monitoring system that produces verifiable, immutable evidence bundles and drift events you can review and disposition.

This doc is the practical “how to use it today” guide for the headless platform.

Mental model

PatentChecker has four moving parts:

Control plane (SaaS): Programs, watchlists, runs, drift events, receipts, retention, webhooks.
Runner (hosted or customer VPC): Executes the job, produces deterministic artifacts, uploads by digest, finalizes the run.
Scheduler + outbox: Creates runs on schedule and delivers signed webhooks.
Triage API: Humans review drift, assign, disposition, and download/export evidence.

Everything important is an artifact with digests. The UI (later) is just a view onto those immutable artifacts.

Request flow (high level)

program (active corpus snapshot)
  -> watchlist (schedule + query ref + retention policy)
  -> run (immutable record, pinned corpus snapshot)
watchlist (schedule + query ref + retention policy)
  -> run (immutable record)
  -> job (leased to a runner)
  -> runner executes + uploads artifacts (deduped by sha256)
  -> run finalized (idempotent)
  -> drift event created (new vs old run)
  -> webhook delivered (signed, at-least-once)
  -> human triage (assign + dispositions)
  -> retention pins evidence until drift is closed, then purges with a receipt

Where the code lives (local dev)

You will typically interact with three repos:

~/patentchecker: the trust anchor (artifact contracts + verifier + drift diff engine)
~/patentchecker-platform: the control plane + workers + reference runner
~/patentchecker-adapters: search adapters (BLAST/DIAMOND/etc), depending on your deployment

This doc assumes you are running the platform from ~/patentchecker-platform.

0) Run the stack locally

From the platform repo:

cd ~/patentchecker-platform
docker compose -f docker-compose.dev.yml up --build

This starts:

control plane API (Fastify)
Postgres
MinIO (S3-compatible object storage)
scheduler worker
outbox worker (webhooks)
retention worker

The default dev auth token is the CONTROL_PLANE_API_KEY from the compose file (often devkey).

1) Create a program (unit of value)

Programs are the unit you sell and measure: one program per target / modality / team.

export BASE_URL="http://localhost:3000"
export TOKEN="devkey"

curl -sS -X POST "$BASE_URL/v1/programs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: prog-kras-001" \
  -d '{"name":"KRAS-G12D Program","description":"Primary oncology program"}'

Save the returned program_id.

2) Register + activate a corpus snapshot (what was checked)

Every run is pinned to exactly one corpus snapshot digest. The scheduler uses the program’s active corpus snapshot when creating runs (unless you override it on a run-now request).

curl -sS -X POST "$BASE_URL/v1/programs/<program_id>/corpus-snapshots" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: cs-kras-001" \
  -d '{
    "corpus_snapshot_digest":"sha256:<64>",
    "source_type":"customer",
    "manifest":{
      "jurisdictions":["US"],
      "sequence_types":["protein"],
      "made_public_until":"2025-01-01"
    }
  }'

Activate it for the program (what new runs will use by default):

curl -sS -X POST "$BASE_URL/v1/programs/<program_id>/corpus-snapshots/activate" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: cs-kras-activate-001" \
  -d '{"corpus_snapshot_digest":"sha256:<64>","reason":"baseline"}'

3) Create a watchlist (what to monitor)

Watchlists define:

runner target (hosted vs VPC runner group)
schedule (interval for now)
retention policy (what evidence is kept / purged)
query input reference (where your sequence payload lives)

Example (customer-bucket pointer):

curl -sS -X POST "$BASE_URL/v1/programs/<program_id>/watchlists" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: wl-kras-protein-001" \
  -d '{
    "name":"KRAS protein watch",
    "enabled":true,
    "runner_target":{"kind":"hosted"},
    "schedule":{"kind":"interval","interval_seconds":86400},
    "retention_policy":{"retention_enabled":true,"keep_last_n":10,"keep_days":90,"legal_hold":false},
    "query_input_ref":{"kind":"customer_bucket_pointer","uri":"s3://customer-bucket/watchlists/kras.json"}
  }'

Save the returned watchlist_id.

4) Trigger a run (run-now) or wait for the scheduler

Scheduler will create runs automatically based on the watchlist schedule.

To force a run now:

curl -sS -X POST "$BASE_URL/v1/watchlists/<watchlist_id>/runs" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: run-now-001" \
  -d '{}'

This creates:

a run row (system-of-record)
a job row (leaseable unit for a runner)

Option A: Reference runner (dev)

In dev, use the platform’s reference runner to prove the end-to-end lifecycle:

cd ~/patentchecker-platform
npm ci
npm run run:reference-runner

The runner:

pulls a job lease
heartbeats until completion
produces a deterministic run directory + bundle
uploads artifacts by sha256 (dedupe-safe)
finalizes the run (idempotent)

Option B: Hosted runner (prod)

Same protocol, but deployed as a long-running service in your cloud.

Option C: Customer VPC runner (prod upsell)

Same runner container, but deployed in customer infra. The control plane schedules jobs and receives only receipts/artifacts; customer secrets remain in the VPC.

6) Inspect runs and download evidence

List runs:

curl -sS "$BASE_URL/v1/runs?watchlist_id=<watchlist_id>&limit=50" \
  -H "Authorization: Bearer $TOKEN"

Download the run bundle (presigned URL):

curl -sS "$BASE_URL/v1/runs/<run_id>/bundle" \
  -H "Authorization: Bearer $TOKEN"

Get the bundle manifest inline:

curl -sS "$BASE_URL/v1/runs/<run_id>/bundle/manifest" \
  -H "Authorization: Bearer $TOKEN"

Bundle-truth viewer index (UI should follow only returned links; never compose URLs client-side):

curl -sS "$BASE_URL/v1/runs/<run_id>/view-index" \
  -H "Authorization: Bearer $TOKEN"

7) Drift is created automatically (what humans review)

When a run finalizes, the platform compares it to the prior finalized run for the same watchlist and creates a drift event.

List drift events for a program:

curl -sS "$BASE_URL/v1/programs/<program_id>/drift-events?state=new&limit=50" \
  -H "Authorization: Bearer $TOKEN"

Get one drift event:

curl -sS "$BASE_URL/v1/drift-events/<drift_event_id>" \
  -H "Authorization: Bearer $TOKEN"

Download evidence for a drift event (both new + old bundles):

curl -sS "$BASE_URL/v1/drift-events/<drift_event_id>/bundle" \
  -H "Authorization: Bearer $TOKEN"

Explain (stub, but stable pointers):

curl -sS "$BASE_URL/v1/drift-events/<drift_event_id>/explain" \
  -H "Authorization: Bearer $TOKEN"

8) Webhooks (how teams “feel” the system without UI)

curl -sS -X POST "$BASE_URL/v1/webhook-endpoints" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: wh-001" \
  -d '{"url":"https://example.com/webhooks/patentchecker"}'

Events are delivered at-least-once and signed with an HMAC secret. Design consumers to dedupe by event_id.

9) Triage primitives (assignment + dispositions)

Assign a drift event:

curl -sS -X POST "$BASE_URL/v1/drift-events/<drift_event_id>/assign" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: assign-001" \
  -d '{"assigned_to_user_id":"user_demo"}'

Create a disposition (append-only):

curl -sS -X POST "$BASE_URL/v1/drift-events/<drift_event_id>/dispositions" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: disp-001" \
  -d '{"label":"relevant","reason_code":"triage","comment":"Overlaps our KRAS program; counsel review"}'

State semantics (must stay consistent across API + retention):

new: no dispositions
open: latest disposition is non-terminal (e.g. needs_review, escalate)
closed: latest disposition is terminal (e.g. relevant, not_relevant)

10) Retention (evidence purge with receipts)

Retention is designed to be safe and auditable:

Evidence for runs referenced by any non-closed drift event is pinned (both new + old sides).
When evidence is purged, the run/drift history remains, and the system writes a retention deletion receipt artifact.
Evidence never “silently disappears”.

After evidence is purged, bundle/explain endpoints degrade deterministically:

HTTP 410 Gone
error code EVIDENCE_PURGED
details.retention_deletion_receipt_artifact_id points to the receipt artifact

When a retention deletion receipt exists, you can fetch its viewer link:

curl -sS "$BASE_URL/v1/runs/<run_id>/retention-deletion-receipt/view" \
  -H "Authorization: Bearer $TOKEN"

What to do next as a user (dogfood)

If you’re dogfooding internally:

Stand up one real protein watchlist.
Wire webhooks to Slack.
Review drift weekly and disposition everything.
Track: alerts/week, time-to-triage, false positives.

That’s the shortest path to real signal before building UI or scaling corpus ingestion.