Skip to content

Integration Orchestration

Automated ingestion has always been a core part of Dendra. What changed in Release 3 is who can set it up and how the platform keeps it running.

In Release 2, integration settings and transformation logic lived in GitHub (dendra-worker-state and related worker task packages). Dedicated worker processes watched for repository changes and reconfigured themselves. Spinning up a new Campbell LDMP connection, a HOBOlink pull, or a webhook endpoint typically required a backend engineer to edit state documents, deploy workers, and wire NATS subjects by hand.

Release 3 moves integration configuration into the Platform API and the Management interface. Users create integration configs through Configure Integrations. Then a long-running orchestrator job per config listens for changes, reconciles the desired state against what is running, and stands up the ETL (extract/transform/load) pipeline automatically.

For background job mechanics, see Job Scheduler and Workers.

Every integration has a type — the kind of vendor connection — and one or more configs — your organization’s actual setup for that type.

Integration typeIntegration config
What it isTemplate for a vendor integrationA configured instance of that type
Defined byDeploy config (api.toml in api-services)User or API client via Configure Integrations
Direction & scopeInbound or outbound; organization or stationInherits from the type
SettingsRequired variables and their typesActual values — strings, secrets, references to other configs
RuntimeDeclares job types the orchestrator runs when enabledEnabled flag and operation mode (test or live) for pipeline routing
ExampleCampbell Scientific StationOne LoggerNet station linked to an LDMP org config, with table names

See the Integrations guide for supported types.

Organization-scoped configs (e.g. LDMP server credentials) are often referenced by station-scoped configs (e.g. a specific LoggerNet station and table list). The orchestrator treats dependencies and dependents as part of the operational picture — a change to a parent LDMP config can require child station integrations to reconcile.

Each enabled integration config gets a continuous job of type OrchestrateIntegration. The API server creates this job when the config is first saved, and the job scheduler keeps it running.

The orchestration job:

  1. Subscribes to change events for integration configs in the organization.
  2. Performs a reconcile process (on each signal and at startup), which loads the config and related configs from the API.
  3. Compares an operational fingerprint (a hash of state and settings) to the last applied fingerprint stored in integration checkpoints.
  4. When the fingerprint changes or on first enable: provisions integration infrastructure and child jobs (extract, transform, etc.).
  5. When the config is disabled or deleted: tears down child jobs and infrastructure, then retires the orchestrator job.

This follows a familiar control-loop pattern: desired state in the API, observed state in running jobs, reconcile until they match. Cosmetic fields (e.g. name, description) do not trigger redeploys.

The data pipeline itself — the shared JetStream stream that carries messages between stages — is system infrastructure. The backend worker ensures that stream exists when it starts (see below). The integration orchestrator focuses on vendor-specific child jobs for each config.

---
config:
  flowchart:
    nodeSpacing: 40
    rankSpacing: 60
    padding: 16
---
flowchart TB
    User["User or API client"]
    Manage["Management interface"]
    API["Integration service<br/>(Platform API)"]
    Events["Change events<br/>(NATS)"]
    Sched["Job scheduler"]
    Orch["OrchestrateIntegration<br/>(continuous job)"]
    Child["Child jobs<br/>(extract · transform · …)"]
    Pipeline["Data pipeline<br/>(JetStream subjects)"]
    Load["Load · archive · preview"]

    User --> Manage
    User --> API
    Manage --> API
    API -->|"create config + orchestrator job"| Sched
    API --> Events
    Events --> Orch
    Sched --> Orch
    Orch -->|"read config · checkpoints"| API
    Orch -->|"create/cancel child jobs"| Sched
    Sched --> Child
    Child --> Pipeline
    Pipeline --> Load
    Load --> API

After the orchestrator job starts, child jobs take over the ETL work — talking to the vendor system, reshaping records, and passing them along. Each integration type declares which job types it needs (extract, transform, and so on).

Records move through the data pipeline in stages. Jobs hand off messages by publishing to JetStream subjects — named channels scoped to the integration, its operation mode, and the processing stage:

StageWhat it isWhere it goes next
rawData as pulled from the vendorArchived; then on to transform
decodeVendor format decoded (when the type requires it)Prepare
prepStandardized and UTC-normalized, ready to useTest: preview via API
Live: location-scoped prep for loading (see below)

The last segment of each subject name marks whether that stage succeeded or failed:

EndingMeaning
.okStage finished successfully — downstream jobs can consume the message
.errStage failed — logged for alerting or retry

Test and live modes use separate branches of the pipeline, so sample data can be inspected before it reaches production tables.

Not every integration uses every stage — a simpler pipeline may go straight from raw to prep. Release 2 followed the same staged-stream pattern over NATS. Release 3 wires the subjects and jobs automatically when you save an integration config.

The orchestrator job reports runtime phase and log entries back through the Integration service (ReportIntegrationRuntime). Phases include pending, starting, running, idle (disabled), draining, error, and retired. Checkpoints store operational progress — including the applied fingerprint and extract cursors — so jobs can resume without reprocessing entire histories.

In the Management interface, users can navigate to Monitor Integrations (Tools > Data Setup > Monitor Integrations) to monitor and troubleshoot data integrations and their pipelines.

Here is a quick side-by-side of how integration setup changed from Release 2 to Release 3:

ConcernRelease 2Release 3
Where settings liveGitHub state repo + worker envIntegration configs via Platform API
Who can add an integrationBackend engineer (typically)Org admins/curators through the management interface or API
How workers learn configState repo watchers, manual deployOrchestrator job reconcile loop on API change events
Pipeline setupOps scripts, worker-specific wiringOrchestrator creates integration child jobs
ObservabilityLogs, NATS toolingIntegration runtime API + Monitor Integrations page
Test vs. live dataVaried per integrationFirst-class operation_mode on each integration config