Skip to main content

fail_calc

Test queries are written to return a set of failing records, ones not matching the expectation or assertion declared by that test: duplicate records, null values, etc.

Most often, this is the count of rows returned by the test query: the default value of fail_calc is count(*). But it can also be a custom calculation, whether an aggregate calculation or simply the name of a column to be selected from the test query.

Most tests do not use the fail_calc config, preferring to return a count of failing rows. For the tests that do, the most common place to set the fail_calc config is right within a generic test block, alongside its query definition. All the same, fail_calc can be set in all the same places as other configs.

For instance, you can configure a unique test to return sum(n_records) instead of count(*) as the failure calculation: that is, the number of rows in the model containing a duplicated column value, rather than the number of distinct column values that are duplicated.

Tip

Beware using functions like sum() for fail_calc in any test that has the potential to return no rows at all.

If no rows are returned, the test won't pass or fail but will return the following error:

None is not of type 'integer'

Failed validating 'type' in schema['properties']['failures']:
{'type': 'integer'}

On instance['failures']:
None

To avoid this issue, use a case statement to ensure that 0 is returned when no rows exist:

fail_calc: "case when count(*) > 0 then sum(n_records) else 0 end"

Configure a specific instance of a generic (schema) test:

models/<filename>.yml
version: 2

models:
- name: my_model
columns:
- name: my_columns
tests:
- unique:
config:
fail_calc: "case when count(*) > 0 then sum(n_records) else 0 end"
0