Skip to content

Scale and Storage Management

During the rollout of Release 2, a deliberate deployment decision was made: all metadata would live in a single MongoDB store, while each organization received its own dedicated InfluxDB instance for time-series data. This provided strong data separation between organizations.

In practice, organizations varied widely in size and data volume, and InfluxDB’s memory consumption grows with the number of unique time series (cardinality). The separation made it possible to run smaller, independent InfluxDB instances which are easier to manage and scale on a Kubernetes cluster rather than a few very large ones.

While the Release 2 API handled this multi-instance setup effectively behind the scenes, it created a usability issue for end users. Users were occasionally exposed to infrastructure details — such as “which server” their data lived on. However, from the user perspective, they simply work with the tables produced by their data loggers. They should not have to think about storage servers, sharding decisions, or capacity planning.

Release 3 addresses this usability issue by separating user-facing tables from storage layout. Users interact with logical tables that mirror logger output. The where and how data is stored for a given time period is determined automatically by the system, and not exposed to users.

That configuration lives in api.toml. It defines shardspaces that map organizations (or groups of organizations) to storage locations and time ranges, plus table storage rulesets that describe physical database and table patterns for read (match.*) and write (store.*).

A shardspace holds one or more time-bounded shards. Each shard names a location; at deploy time, that location is wired to a physical timeseries database.

---
config:
  flowchart:
    nodeSpacing: 40
    rankSpacing: 60
    padding: 16
---
flowchart TB
    subgraph OrgSS["Shardspace · organization"]
        direction TB
        Shard1["Shard 1<br/>2020 – 2023"]
        Shard2["Shard 2<br/>2023 – present"]
    end

    Loc1["location · historical"]
    Loc2["location · current"]

    DB1[(Any DB<br/>dedicated instance)]
    DB2[(InfluxDB<br/>shared instance)]

    Shard1 --> Loc1
    Shard2 --> Loc2
    Loc1 --> DB1
    Loc2 --> DB2

Here is a simplified excerpt showing shared and dedicated storage side by side:

# Table storage — physical database/table patterns
#
# match.* — read/discovery (regex); store.* — write (template tokens)
# "common": organizations share one InfluxDB; databases are namespaced per org
[table_storage_rulesets.common.organization]
match.database = "org_{organization.id}"
match.table = "tab_(?P<id>\\w+)"
store.database = "org_{organization.id}"
store.table = "tab_{params.table_id}"
[table_storage_rulesets.common.station]
match.database = "org_{organization.id}"
match.table = "sta_{station.id}_tab_(?P<id>\\w+)"
store.database = "org_{organization.id}"
store.table = "sta_{station.id}_tab_{params.table_id}"
# "dedicated": organization has its own InfluxDB instance
[table_storage_rulesets.dedicated.organization]
match.database = "org"
match.table = "tab_(?P<id>\\w+)"
store.database = "org"
store.table = "tab_{params.table_id}"
[table_storage_rulesets.dedicated.station]
match.database = "sta_{station.id}"
match.table = "tab_(?P<id>\\w+)"
store.database = "sta_{station.id}"
store.table = "tab_{params.table_id}"
# Shardspaces — which storage location serves data for a given time range
# Co-located organizations on shared storage
[[shardspaces.common.shard]]
begins_at = "0001-01-01T00:00:00Z"
ends_before = "9999-12-31T23:59:59.999Z"
location = "experimental"
table_storage_ruleset = "common"
# Organization with a dedicated Release 2 instance — data stays in place
[[shardspaces.organization.6092b070492ae15e05876ed8.shard]]
begins_at = "0001-01-01T00:00:00Z"
ends_before = "9999-12-31T23:59:59.999Z"
location = "cdfw"
table_storage_ruleset = "dedicated"

Utilizing the shardspace configuration:

  • Organizations on dedicated Release 2 instances can move to Release 3 with shard rules that point at an existing location and dedicated ruleset — no bulk copy of historical data required.
  • Organizations on shared storage can use the common ruleset, which namespaces databases per organization inside one InfluxDB instance.
  • Operators can add a timeseries location and shard rules in api.toml without changing how users refer to tables.

Storage topology stays in configuration, and users continue to work with tables and datastreams.