For the BigQuery project in the getting started guide, you'll learn how to set up BigQuery and connect it to dbt Cloud.
This guide will walk you through:
- Setting up a new BigQuery instance
- Accessing sample data in a public data set
- Connecting dbt Cloud to BigQuery
Before beginning this guide, make sure that you have access to new or existing Google account. You can use a personal or work account to set up BigQuery through Google Cloud Platform (GCP).
Before jumping into the steps below, login to your Google account.
Navigate to the BigQuery Console.
- If you don't have a Google Cloud Platform account you will be asked to create one.
- If you do have one (or multiple) it will likely log you into your oldest account. Click your profile picture to the right and verify your are using the correct email account.
Create a new project for this tutorial:
- If you've just created a BigQuery account, you'll be prompted to create a new project straight away.
- If you already have an existing organization, you can select the project drop down in the header bar, and create a new project from there.
Select NEW PROJECT. You should see a project name automatically populate. You can change the name to something more relevant, for example "dbt Learn - Bigquery Setup."
BigQuery supports public data sets that can be directly queried, so we will show you how to access those datasets via select statements. Additionally, we will show you how to populate your own database objects with that data.
Navigate to the BigQuery Console again. Make sure your new project is selected in the header. If you do not see your account or project, click your profile picture to the right and verify your are using the correct email account.
Copy and paste the below queries into the Query Editor to validate that you are able to run them successfully.
select * from `dbt-tutorial.jaffle_shop.customers`;
select * from `dbt-tutorial.jaffle_shop.orders`;
select * from `dbt-tutorial.stripe.payment`;
Verify you can see an output:
Create datasets. Datasets in BigQuery are equivalent to schemas in a traditional database.
- Find your project in the picker. Click the three dots to expose options.
- Click Create dataset.
- Fill in
Dataset IDas required. This will be used like schema in fully qualified references to your database objects, i.e. database.schema.table, so choose a name that fits the purpose, in this case we will be creating one now for
jaffle_shopand one for
- Leave the default values in the rest of the fields:
Data locationcan be left blank -- if selected, this determines the GCP location where your data is stored. The current default location is the US multi-region. All tables within this dataset will share this location.
- Even though it is unchecked, billing table expiration will be set automatically to 60 days, because billing has not been enabled for this project, so GCP defaults to deprecating tables.
- Allow Google to manage encryption.
- Repeat steps i through iv for the second dataset,
Connecting to dbt Cloud
You will learn how to connect dbt Cloud to Google BigQuery so that you can leverage the power of dbt to transform data in BigQuery.
Generate BigQuery credentials
In order to let dbt connect to your warehouse, you'll need to generate a keyfile. This is analogous to using a database user name and password with most other data warehouses.
- Go to the BigQuery credential wizard. Make sure your new project is selected in the header. If you do not see your account or project, click your profile picture to the right and verify your are using the correct email account.
- Select + Create Credentials then select Service account.
dbt-userin the Service account name field, then click Create and Continue.
- Type and select BigQuery Admin in the Role field.
- Click Continue.
- Leave fields blank in the "Grant users access to this service account" section and click Done.
- Click the service account that you just created.
- Select Keys.
- Click Add Key then select Create new key.
- Select JSON as the key type then click Create.
- You should be prompted to download the JSON file. Save it locally to an easy-to-remember spot, with a clear filename. For example,
Create a dbt Cloud account
Let's start this section by creating a dbt Cloud account if you haven't already.
- Navigate to dbt Cloud.
- If you don't have a dbt Cloud account, create a new one, and verify your account via email.
- If you already have a dbt Cloud account, you can create a new project from your existing account:
- Click the gear icon in the top-right, then click Projects.
- Click + New Project.
- You've arrived at the "Setup a New Project" page.
- Type "Analytics" in the dbt Project Name field. You will be able to rename this project later.
- Click Continue.
Connect dbt Cloud to BigQuery
Now let's set up the connection between dbt Cloud and BigQuery.
- Click BigQuery to set up your connection.
- Click Upload a Service Account JSON File in BigQuery settings.
- Select the JSON file you downloaded in Generate BigQuery Credentials. dbt Cloud will fill in all the necessary fields.
- Click Test at the top. This verifies that dbt Cloud can access your BigQuery account.
- If you see "Connection test Succeeded!" then click Continue. If it fails, you might need to go back and regenerate your BigQuery credentials.
Initialize your repository and start development
Setting up a managed repository
dbt Cloud uses Git for version control, but using a managed repository makes this easier. To set up a managed repository:
- Under "Add repository from", select Managed.
- Type a name for your repo such as
- Click Create. It will take a few seconds for your repository to be created and imported.
- Once you see the "Successfully imported repository," click Continue.
Initialize your dbt project
Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:
- Click Develop from the upper left. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
- Above the file tree to the left, click Initialize your project. This builds out your folder structure with example models.
- Make your initial commit by clicking Commit. Use the commit message
initial commit. This creates the first commit to your managed repo and allows you to open a branch where you can add new dbt code.
- Now you should be able to directly query data from your warehouse and execute dbt run. You can try this out now:
- In "Scratchpad 1", delete all text and paste your warehouse-specific code into Scratchpad 1:
select * from `dbt-tutorial.jaffle_shop.customers`
select * from default.jaffle_shop_customers
select * from jaffle_shop_customers
select * from raw.jaffle_shop.customers
- In the command line bar at the bottom, type
dbt runand click Enter. We will explore what happens in the next section of the tutorial.
Congratulations! You have successfully completed the following:
- Set up a new BigQuery instance
- Accessing sample data in a public data set
- Connected dbt Cloud to BigQuery
You can Continue the tutorial by building your first project. You will learn and apply the fundamentals of dbt with models, tests, docs, and deploying your first job!