Packages
Software engineers frequently modularize code into libraries. These libraries help programmers operate with leverage: they can spend more time focusing on their unique business logic, and less time implementing code that someone else has already spent the time perfecting.
In dbt, libraries like these are called packages. dbt's packages are so powerful because so many of the analytic problems we encountered are shared across organizations, for example:
- transforming data from a consistently structured SaaS dataset, for example:
- turning Snowplow or Segment pageviews into sessions
- transforming AdWords or Facebook Ads spend data into a consistent format.
- writing dbt macros that perform similar functions, for example:
- generating SQL to union together two relations, pivot columns, or construct a surrogate key
- creating custom schema tests
- writing audit queries
- building models and macros for a particular tool used in your data stack, for example:
dbt packages are in fact standalone dbt projects, with models and macros that tackle a specific problem area. As a dbt user, by adding a package to your project, the package's models and macros will become part of your own project. This means:
- Models in the package will be materialized when you
dbt run
. - You can use
ref
in your own models to refer to models from the package. - You can use macros in the package in your own project.
- It's important to note that defining and installing dbt packages is different from defining and installing Python packages
Use cases
Starting from dbt v1.6, we added a new configuration file called dependencies.yml
. The file can contain both types of dependencies: "package" and "project" dependencies.
- "Package" dependencies lets you add source code from someone else's dbt project into your own, like a library.
- "Project" dependencies provide a different way to build on top of someone else's work in dbt.
If your dbt project doesn't require the use of Jinja within the package specifications, you can simply rename your existing packages.yml
to dependencies.yml
. However, something to note is if your project's package specifications use Jinja, particularly for scenarios like adding an environment variable or a Git token method in a private Git package specification, you should continue using the packages.yml
file name.
Examine the following tabs to understand the differences and determine when should use to dependencies.yml
or packages.yml
.
- When to use Project dependencies
- When to use Package dependencies
Project dependencies are designed for the dbt Mesh and cross-project reference workflow:
- Use
dependencies.yml
when you need to set up cross-project references between different dbt projects, especially in a dbt Mesh setup. - Use
dependencies.yml
when you want to include both projects and non-private dbt packages in your project's dependencies.- Private packages are not supported in
dependencies.yml
because they intentionally don't support Jinja rendering or conditional configuration. This is to maintain static and predictable configuration and ensures compatibility with other services, like dbt Cloud.
- Private packages are not supported in
- Use
dependencies.yml
for organization and maintainability if you're using both cross-project refs and dbt Hub packages. This reduces the need for multiple YAML files to manage dependencies.
Package dependencies allow you to add source code from someone else's dbt project into your own, like a library:
- If you only use packages like those from the dbt Hub, remain with
packages.yml
. - Use
packages.yml
when you want to download dbt packages, such as dbt projects, into your root or parent dbt project. Something to note is that it doesn't contribute to the dbt Mesh workflow. - Use
packages.yml
to include packages, including private packages, in your project's dependencies. If you have private packages that you need to reference,packages.yml
is the way to go. packages.yml
supports Jinja rendering for historical reasons, allowing dynamic configurations. This can be useful if you need to insert values, like a Git token method from an environment variable, into your package specifications.
Currently, to use private git repositories in dbt, you need to use a workaround that involves embedding a git token with Jinja. This is not ideal as it requires extra steps like creating a user and sharing a git token. We're planning to introduce a simpler method soon that won't require Jinja-embedded secret environment variables. For that reason, dependencies.yml
does not support Jinja.
How do I add a package to my project?
- Add a file named
dependencies.yml
orpackages.yml
to your dbt project. This should be at the same level as yourdbt_project.yml
file. - Specify the package(s) you wish to add using one of the supported syntaxes, for example:
packages:
- package: dbt-labs/snowplow
version: 0.7.0
- git: "https://github.com/dbt-labs/dbt-utils.git"
revision: 0.9.2
- local: /opt/dbt/redshift
The default packages-install-path
is dbt_packages
.
- Run
dbt deps
to install the package(s). Packages get installed in thedbt_packages
directory – by default this directory is ignored by git, to avoid duplicating the source code for the package.
How do I specify a package?
You can specify a package using one of the following methods, depending on where your package is stored.
Hub packages (recommended)
dbt Labs hosts the Package hub, registry for dbt packages, as a courtesy to the dbt Community, but does not certify or confirm the integrity, operability, effectiveness, or security of any Packages. Please read the dbt Labs Package Disclaimer before installing Hub packages.
You can install available hub packages in the following way:
packages:
- package: dbt-labs/snowplow
version: 0.7.3 # version number
Hub packages require a version to be specified – you can find the latest release number on dbt Hub. Since Hub packages use semantic versioning, we recommend pinning your package to the latest patch version from a specific minor release, like so:
packages:
- package: dbt-labs/snowplow
version: [">=0.7.0", "<0.8.0"]
Where possible, we recommend installing packages via dbt Hub, since this allows dbt to handle duplicate dependencies. This is helpful in situations such as:
- Your project uses both the dbt-utils and Snowplow packages, and the Snowplow package also uses the dbt-utils package.
- Your project uses both the Snowplow and Stripe packages, both of which use the dbt-utils package.
In comparison, other package installation methods are unable to handle the duplicate dbt-utils package.
Advanced users can choose to host an internal version of the package hub based on this repository and setting the DBT_PACKAGE_HUB_URL
environment variable.
Prerelease versions
Some package maintainers may wish to push prerelease versions of packages to the dbt Hub, in order to test out new functionality or compatibility with a new version of dbt. A prerelease version is demarcated by a suffix, such as a1
(first alpha), b2
(second beta), or rc3
(third release candidate).
By default, dbt deps
will not include prerelease versions when resolving package dependencies. You can enable the installation of prereleases in one of two ways:
- Explicitly specifying a prerelease version in your
version
criteria - Setting
install_prerelease
totrue
, and providing a compatible version range
For example, both of the following configurations would successfully install 0.4.5-a2
for the dbt_artifacts
package:
packages:
- package: brooklyn-data/dbt_artifacts
version: 0.4.5-a2
packages:
- package: brooklyn-data/dbt_artifacts
version: [">=0.4.4", "<0.4.6"]
install_prerelease: true
Git packages
Packages stored on a Git server can be installed using the git
syntax, like so:
packages:
- git: "https://github.com/dbt-labs/dbt-utils.git" # git URL
revision: 0.9.2 # tag or branch name
Add the Git URL for the package, and optionally specify a revision. The revision can be:
- a branch name
- a tagged release
- a specific commit (full 40-character hash)
Example of a revision specifying a 40-character hash:
packages:
- git: "https://github.com/dbt-labs/dbt-utils.git"
revision: 4e28d6da126e2940d17f697de783a717f2503188
Internally hosted tarball URL
Some organizations have security requirements to pull resources only from internal services. To address the need to install packages from hosted environments such as Artifactory or cloud storage buckets, dbt Core enables you to install packages from internally-hosted tarball URLs.
packages:
- tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.9.6
name: 'dbt_utils'
Where name: 'dbt_utils'
specifies the subfolder of dbt_packages
that's created for the package source code to be installed within.
Private packages
SSH Key Method (Command Line only)
If you're using the Command Line, private packages can be cloned via SSH and an SSH key.
When you use SSH keys to authenticate to your git remote server, you don’t need to supply your username and password each time. Read more about SSH keys, how to generate them, and how to add them to your git provider here: Github and GitLab.
packages:
- git: "git@github.com:dbt-labs/dbt-utils.git" # git SSH URL
If you're using dbt Cloud, the SSH key method will not work, but you can use the HTTPS Git Token Method.
Git token method
This method allows the user to clone via HTTPS by passing in a git token via an environment variable. Be careful of the expiration date of any token you use, as an expired token could cause a scheduled run to fail. Additionally, user tokens can create a challenge if the user ever loses access to a specific repo.
If you are using dbt Cloud, you must adhere to the naming conventions for environment variables. Environment variables in dbt Cloud must be prefixed with either DBT_
or DBT_ENV_SECRET
. Environment variables keys are uppercased and case sensitive. When referencing {{env_var('DBT_KEY')}}
in your project's code, the key must match exactly the variable defined in dbt Cloud's UI.
In GitHub:
packages:
# use this format when accessing your repository via a github application token
- git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL
# use this format when accessing your repository via a classical personal access token
- git: "https://{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL
# use this format when accessing your repository via a fine-grained personal access token (username sometimes required)
- git: "https://GITHUB_USERNAME:{{env_var('DBT_ENV_SECRET_GIT_CREDENTIAL')}}@github.com/dbt-labs/awesome_repo.git" # git HTTPS URL
Read more about creating a GitHub Personal Access token here. You can also use a GitHub App installation token.
In GitLab:
packages:
- git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_DEPLOY_TOKEN')}}@gitlab.example.com/dbt-labs/awesome_project.git" # git HTTPS URL
Read more about creating a GitLab Deploy Token here and how to properly construct your HTTPS URL here. Deploy tokens can be managed by Maintainers only.
In Azure DevOps:
packages:
- git: "https://{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@dev.azure.com/dbt-labs/awesome_project/_git/awesome_repo" # git HTTPS URL
Read more about creating a Personal Access Token here.
In Bitbucket:
packages:
- git: "https://{{env_var('DBT_USER_NAME')}}:{{env_var('DBT_ENV_SECRET_PERSONAL_ACCESS_TOKEN')}}@bitbucketserver.com/scm/awesome_project/awesome_repo.git" # for Bitbucket Server
Read more about creating a Personal Access Token here.