Skip to main content

ClickHouse configurations

Models

TypeSupported?Details
view materializationYESCreates a view.
table materializationYESCreates a table. See below for the list of supported engines.
incremental materializationYESCreates a table if it doesn't exist, and then writes only updates to it.
ephemeral materializedYESCreates a ephemeral/CTE materialization. This does model is internal to dbt and does not create any database objects

View Materialization

A dbt model can be created as a ClickHouse view and configured using the following syntax:

dbt_project.yml
models:
<resource-path>:
+materialized: view

Table Materialization

A dbt model can be created as a ClickHouse table and configured using the following syntax:

dbt_project.yml
models:
<resource-path>:
+materialized: table
+order_by: [ <column-name>, ... ]
+engine: <engine-type>
+partition_by: [ <column-name>, ... ]

Table Configuration

OptionDescriptionRequired?
materializedHow the model will be materialized into ClickHouse. Must be table to create a table model.Required
engineThe table engine to use when creating tables. See list of supported engines below.Optional (default: MergeTree())
order_byA tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster.Optional (default: tuple())
partition_byA partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns.Optional

Incremental Materialization

Table model will be reconstructed for each dbt execution. This may be infeasible and extremely costly for larger result sets or complex transformations. To address this challenge and reduce the build time, a dbt model can be created as an incremental ClickHouse table and is configured using the following syntax:

dbt_project.yml
models:
<resource-path>:
+materialized: incremental
+order_by: [ <column-name>, ... ]
+engine: <engine-type>
+partition_by: [ <column-name>, ... ]
+unique_key: [ <column-name>, ... ]
+inserts_only: [ True|False ]

Incremental Table Configuration

OptionDescriptionRequired?
materializedHow the model will be materialized into ClickHouse. Must be table to create a table model.Required
unique_keyA tuple of column names that uniquely identify rows. For more details on uniqueness constraints, see here.Required. If not provided altered rows will be added twice to the incremental table.
engineThe table engine to use when creating tables. See list of supported engines below.Optional (default: MergeTree())
order_byA tuple of column names or arbitrary expressions. This allows you to create a small sparse index that helps find data faster.Optional (default: tuple())
partition_byA partition is a logical combination of records in a table by a specified criterion. The partition key can be any expression from the table columns.Optional
inserts_only(Deprecated, see the append materialization strategy). If True, incremental updates will be inserted directly to the target incremental table without creating an intermediate table.Optional (default: False)
incremental_strategyThe strategy to use for incremental materialization. delete+insert and append are supported. For additional details on strategies, see hereOptional (default: 'default')
incremental_predicatesIncremental predicate clause to be applied to delete+insert materializationsOptional

Snapshot

dbt snapshots allow a record to be made of changes to a mutable model over time. This in turn allows point-in-time queries on models, where analysts can “look back in time” at the previous state of a model. This functionality is supported by the ClickHouse connector and is configured using the following syntax:

snapshots/<model_name>.sql
{{
config(
target_schema = "<schema_name>",
unique_key = "<column-name>",
strategy = "<strategy>",
updated_at = "<unpdated_at_column-name>",
)
}}

Snapshot Configuration

OptionDescriptionRequired?
target_schemaA ClickHouse's database name where the snapshot table will be created.Required
unique_keyA tuple of column names that uniquely identify rows.Required. If not provided altered rows will be added twice to the incremental table.
strategyDefines how dbt knows if a row has changed. More about dbt startegies hereRequired
updated_atIf using the timestamp strategy, the timestamp column to compare.Only if using the timestamp strategy

Supported Table Engines

TypeDetails
MergeTree (default)https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree/.
HDFShttps://clickhouse.com/docs/en/engines/table-engines/integrations/hdfs
MaterializedPostgreSQLhttps://clickhouse.com/docs/en/engines/table-engines/integrations/materialized-postgresql
S3https://clickhouse.com/docs/en/engines/table-engines/integrations/s3
EmbeddedRocksDBhttps://clickhouse.com/docs/en/engines/table-engines/integrations/embedded-rocksdb
Hivehttps://clickhouse.com/docs/en/engines/table-engines/integrations/hive

If you encounter issues connecting to ClickHouse from dbt with one of the above engines, please report an issue here.

Cross Database Macro Support

dbt-clickhouse supports most of the cross database macros now included in dbt-core, with the following exceptions:

  • The listagg SQL function (and therefore the corresponding dbt macro) is not supported by ClickHouse. You can achieve similar results with the ClickHouse groupArray function but in some cases subqueries may be required to achieve the desired ordering.
  • The split_part SQL function is implemented in ClickHouse using the splitByChar function. This function requires using a constant string for the "split" delimiter, so the delimeter parameter used for this macro will be interpreted as a string, not a column name
  • Similarly, the replace SQL function in ClickHouse requires constant strings for the old_chars and new_chars parameters, so those parameters will be interpreted as strings rather than column names when invoking this macro.

Setting quote_columns

To prevent a warning, make sure to explicitly set a value for quote_columns in your dbt_project.yml. See the doc on quote_columns for more information.

seeds:
+quote_columns: false #or `true` if you have csv column headers with spaces
0