About state
One of the greatest underlying assumptions about dbt is that its operations should be stateless and idempotent. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about any other run; it just needs to know about the code in the project and the objects in your database as they exist right now.
That said, dbt does store "state"—a detailed, point-in-time view of project resources, database objects, and invocation results—in the form of its artifacts. If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and idempotent: given the same manifest and the same raw data, dbt will produce the same transformed result.
dbt can leverage artifacts from a prior invocation as long as their file path is passed to the --state
flag. This is a prerequisite for:
- The
state:
selector, whereby dbt can identify resources that are new or modified by comparing code in the current project against the state manifest. - Deferring to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest.
Together, these two features enable "slim CI". We expect to add more features in future releases that can leverage artifacts passed to the --state
flag.
Establishing state
State and defer can be set by environment variables as well as CLI flags:
--state
orDBT_ARTIFACT_STATE_PATH
: file path--defer
orDBT_DEFER_TO_STATE
: boolean
If both the flag and env var are provided, the flag takes precedence.
Notes:
- The
--state
artifacts must be of schema versions that are compatible with the currently running dbt version. - The path to state artifacts can be set via the
--state
flag orDBT_ARTIFACT_STATE_PATH
environment variable. If both the flag and env var are provided, the flag takes precedence. - These are powerful, complex features. Read about known caveats and limitations to state comparison.
The "result" status
Another element of job state is the result
of a prior dbt invocation. After executing a dbt run
, for example, dbt creates the run_results.json
artifact which contains execution times and success / error status for dbt models. You can read more about run_results.json
on the 'run results' page.
The following dbt commands produce run_results.json
artifacts whose results can be referenced in subsequent dbt invocations:
dbt run
dbt test
dbt build
(new in dbt version v0.21.0)dbt seed
After issuing one of the above commands, you can reference the results by adding a selector to a subsequent command as follows:
# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
$ dbt run --select result:<status> --defer --state path/to/prod/artifacts
The available options depend on the node type:
model | seed | snapshot | test | |
---|---|---|---|---|
result:error | ✅ | ✅ | ✅ | ✅ |
result:success | ✅ | ✅ | ✅ | |
result:skipped | ✅ | ✅ | ✅ | |
result:fail | ✅ | |||
result:warn | ✅ | |||
result:pass | ✅ |
Combining state
and result
selectors
The state and result selectors can also be combined in a single invocation of dbt to capture errors from a previous run OR any new or modified models.
$ dbt run --select result:<status>+ state:modified+ --defer --state ./<dbt-artifact-path>