Skip to main content

Connect Apache Spark

If you're using Databricks, use dbt-databricks

If you're using Databricks, the dbt-databricks adapter is recommended over dbt-spark. If you're still using dbt-spark with Databricks consider migrating from the dbt-spark adapter to the dbt-databricks adapter.

For the Databricks version of this page, refer to Databricks setup.

note

See Connect Databricks for the Databricks version of this page.

dbt Cloud supports connecting to an Apache Spark cluster using the HTTP method or the Thrift method. Note: While the HTTP method can be used to connect to an all-purpose Databricks cluster, the ODBC method is recommended for all Databricks connections. For further details on configuring these connection parameters, please see the dbt-spark documentation.

To learn how to optimize performance with data platform-specific configurations in dbt Cloud, refer to Apache Spark-specific configuration.

The following fields are available when creating an Apache Spark connection using the HTTP and Thrift connection methods:

FieldDescriptionExamples
Host NameThe hostname of the Spark cluster to connect toyourorg.sparkhost.com
PortThe port to connect to Spark on443
OrganizationOptional (default: 0)0123456789
ClusterThe ID of the cluster to connect to1234-567890-abc12345
Connection TimeoutNumber of seconds after which to timeout a connection10
Connection RetriesNumber of times to attempt connecting to cluster before failing10
UserOptionaldbt_cloud_user
AuthOptional, supply if using KerberosKERBEROS
Kerberos Service NameOptional, supply if using Kerberoshive
Configuring a Spark connectionConfiguring a Spark connection
0