Skip to content

Templates

A template is the shared skeleton across a set of configs, produced by simplifying their parsed trees. Where the configs agree, the template preserves the literal value; where they disagree, the template places a variable "hole" whose values and counts you can inspect.

Anti-unification

Given a set of terms (here, parsed TOML or YAML values), anti-unification finds the most specific term that generalises all of them, using fresh variables where they disagree. For trees, this is structural recursion: walk in parallel, agree → copy the literal, disagree → emit a hole.

It's thresholdless (no tuning parameters) and has been well-studied since Plotkin and Reynolds, 1970.

See the Wikipedia article for the formal treatment.

Set-based array handling

Arrays of objects (such as CI workflow steps or dependabot update entries) are anti-unified with set semantics rather than positional alignment. Elements are clustered by structural similarity — specifically, by the intersection of key-sets across all elements — and anti-unified within each cluster.

This means that a checkout step in position 0 of one workflow and position 2 of another will be recognised as the same kind of step and grouped together. Within a cluster, variation becomes holes; clusters present in some instances but not others become optional set elements.

Arrays of scalars (such as Python version lists) continue to use positional alignment.

Hole addresses for set elements use a predicate label showing the varying keys: steps[name,run].run means "the step cluster whose name and run keys vary." In profile display, these are simplified to steps[] for readability.

What nave does with it

nave build groups tracked files by their logical kind (e.g. "all dependabot configs") and anti-unifies each group. The output, for a fleet of 9 dependabot configs, looks like this:

━━ .github/dependabot.yml ━━
  instances: 9

  template:
    updates:
      - cooldown?: ⟨?0⟩
        directory: "/"
        package-ecosystem: ⟨?1⟩
        schedule:
          interval: ⟨?2⟩
    version: 2

  holes:
    updates[0].cooldown  [optionalkey]  3/9 optional  [constant when present]
        3× {"default-days":7}
    updates[0].package-ecosystem  [string]  9/9
        8× "github-actions"
        1× "cargo"
    updates[0].schedule.interval  [string]  9/9
        6× "weekly"
        3× "monthly"

Read as: across 9 files, three hole positions exist. The ecosystem is github-actions in 8 instances and cargo in one. The cooldown is absent from 6 files and, when present, always takes the same value.

Hole kinds

Holes are classified by how they vary:

  • Scalar — a leaf differs (the common case).
  • OptionalKey — a key is present in some files but not others. The hole's presence count (e.g. 3/9) tells you the ratio. When present, Nave descends into the subtree rather than treating it as opaque.
  • Shape — two trees disagree structurally at some position (different types, or different array lengths). This is rare for machine-authored configs and usually indicates genuine drift worth investigating.

Source hints

Some holes aren't really variables — they're functionally determined by the repo they live in. project.name in pyproject.toml is the clearest case: every file has a distinct value, but the value is the repo name (with the conventional kebab-casesnake_case allowance, and PEP 503 normalisation).

After anti-unifying, Nave checks each hole's observed values against per-repo names. If every observed value matches, the hole is flagged DerivedFromRepoName in the build report:

project.name  [string]  40/40  [derived: repo name]

This catches project.name directly, and separately flags things like tool.coverage.run.source[0] and tool.isort.known_first_party[0] — these look like free parameters, but their values are always the Python module path, which equals the normalised package name.

The other hint is ConstantWhenPresent — for optional keys whose value, when present, is always the same (like the cooldown block above). This is a signal that the key could be made mandatory or removed entirely without functional impact.

Cohorts

A cohort is the subset of files sharing the same value at a given hole. "6× weekly / 3× monthly" are two cohorts of the updates[0].schedule.interval hole.

Cohorts are how you decide which configs to bulk-edit. "Move the 3 monthly repos to weekly" is a single codemod targeting a single cohort; you pass the same query to nave pen create that you'd pass to nave build --match to isolate it.

Profiles

A profile is a formal concept from formal concept analysis (FCA): a maximal pair of (repos, hole-value bindings) where every repo in the set shares every binding, and every binding is shared by every repo.

Profiles are computed automatically by nave build from the hole observations. They answer the question "which configuration choices travel together across my fleet?"

For example, in a fleet of 9 dependabot configs with two holes (package-ecosystem and schedule interval), FCA discovers three profiles:

  • 5 repos use github-actions + weekly
  • 3 repos use github-actions + monthly
  • 1 repo uses cargo + weekly

These three profiles completely explain the variation in the fleet.

When the concept lattice has hierarchical structure (some profiles are refinements of others), the display shows deltas: only the bindings that are new relative to parent profiles. This makes the progressive specialisation of configuration visible — "Profile 2 refines Profile 1 by adding the specific version detection script."

Why it works on configs

Anti-unification gives clean results on dependabot/pyproject/workflows because those files are structurally rigid: they're machine-readable, share a spec, and humans rarely reshape them. On freeform prose it would be useless; on config it recovers exactly the template you'd write by hand.

Co-occurrence mode

Whole-file anti-unification can be too coarse. If you have 20 workflow files, each containing 5 steps, and you only care about the uses: pattern for a specific action, template-ing the whole file drowns the signal.

nave build --co-occur --where ... --where ... addresses this by anti-unifying subtrees where multiple --where terms co-occur, rather than whole files. The rule can be considered as "the deepest non-root object ancestor shared by all anchor matches".