Skip to main content

Query the Discovery API

The Discovery API supports ad-hoc queries and integrations. If you are new to the API, refer to About the Discovery API for an introduction.

Use the Discovery API to evaluate data pipeline health and project state across runs or at a moment in time. dbt Labs provide a GraphQL explorer for this API, enabling you to run queries and browse the schema.

Since GraphQL describes the data in the API, the schema displayed in the GraphQL explorer accurately represents the graph and fields available to query.

Prerequisites

Authorization

Currently, authorization of requests takes place using a service token. dbt Cloud admin users can generate a Metadata Only service token that is authorized to execute a specific query against the Discovery API.

Once you've created a token, you can use it in the Authorization header of requests to the dbt Cloud Discovery API. Be sure to include the Token prefix in the Authorization header, or the request will fail with a 401 Unauthorized error. Note that Bearer can be used instead of Token in the Authorization header. Both syntaxes are equivalent.

Access the Discovery API

  1. Create a service account token to authorize requests. dbt Cloud Admin users can generate a Metadata Only service token, which can be used to execute a specific query against the Discovery API to authorize requests.

  2. Find your API URL using the endpoint https://metadata.{YOUR_ACCESS_URL}/graphql.

    • Replace {YOUR_ACCESS_URL} with the appropriate Access URL for your region and plan. For example, if your multi-tenant region is North America, your endpoint is https://metadata.cloud.getdbt.com/graphql. If your multi-tenant region is EMEA, your endpoint is https://metadata.emea.dbt.com/graphql.
  3. For specific query points, refer to the schema documentation.

Run queries using HTTP requests

You can run queries by sending a POST request to the https://metadata.YOUR_ACCESS_URL/graphql endpoint, making sure to replace:

  • YOUR_ACCESS_URL with the appropriate Access URL for your region and plan.

  • YOUR_TOKEN in the Authorization header with your actual API token. Be sure to include the Token prefix.

  • QUERY_BODY with a GraphQL query, for example { "query": "<query text>" }

  • VARIABLES with a dictionary of your GraphQL query variables, such as a job ID or a filter.

  • ENDPOINT with the endpoint you're querying, such as environment.

    curl 'https://metadata.YOUR_ACCESS_URL/graphql' \
    -H 'authorization: Bearer YOUR_TOKEN' \
    -H 'content-type: application/json'
    -X POST
    --data QUERY_BODY

Python example:

response = requests.post(
'YOUR_ACCESS_URL',
headers={"authorization": "Bearer "+YOUR_TOKEN, "content-type": "application/json"},
json={"query": QUERY_BODY, "variables": VARIABLES}
)

metadata = response.json()['data'][ENDPOINT]

Every query will require an environment ID or job ID. You can get the ID from a dbt Cloud URL or using the Admin API.

There are several illustrative example queries on this page. For more examples, refer to Use cases and examples for the Discovery API.

Reasonable use

Discovery (GraphQL) API usage is subject to request rate and response size limits to maintain the performance and stability of the metadata platform and prevent abuse.

Job-level endpoints are subject to query complexity limits. Nested nodes (like parents), code (like rawCode), and catalog columns are considered as most complex. Overly complex queries should be broken up into separate queries with only necessary fields included. dbt Labs recommends using the environment endpoint instead for most use cases to get the latest descriptive and result metadata for a dbt Cloud project.

Retention limits

You can use the Discovery API to query data from the previous three months. For example, if today was April 1st, you could query data back to January 1st.

Run queries with the GraphQL explorer

You can run ad-hoc queries directly in the GraphQL API explorer and use the document explorer on the left-hand side to see all possible nodes and fields.

Refer to the Apollo explorer documentation for setup and authorization info.

  1. Access the GraphQL API explorer and select fields you want to query.

  2. Select Variables at the bottom of the explorer and replace any null fields with your unique values.

  3. Authenticate using Bearer auth with YOUR_TOKEN. Select Headers at the bottom of the explorer and select +New header.

  4. Select Authorization in the header key dropdown list and enter your Bearer auth token in the value field. Remember to include the Token prefix. Your header key should be in this format: {"Authorization": "Bearer <YOUR_TOKEN>}.


Enter the header key and Bearer auth token valuesEnter the header key and Bearer auth token values
  1. Run your query by clicking the blue query button in the top right of the Operation editor (to the right of the query). You should see a successful query response on the right side of the explorer.
Run queries using the Apollo Server GraphQL explorerRun queries using the Apollo Server GraphQL explorer

Fragments

Use the ... on notation to query across lineage and retrieve results from specific node types.

query ($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
applied {
models(first: $first, filter: { uniqueIds: "MODEL.PROJECT.MODEL_NAME" }) {
edges {
node {
name
ancestors(types: [Model, Source, Seed, Snapshot]) {
... on ModelAppliedStateNestedNode {
name
resourceType
materializedType
executionInfo {
executeCompletedAt
}
}
... on SourceAppliedStateNestedNode {
sourceName
name
resourceType
freshness {
maxLoadedAt
}
}
... on SnapshotAppliedStateNestedNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
... on SeedAppliedStateNestedNode {
name
resourceType
executionInfo {
executeCompletedAt
}
}
}
}
}
}
}
}
}

Pagination

Querying large datasets can impact performance on multiple functions in the API pipeline. Pagination eases the burden by returning smaller data sets one page at a time. This is useful for returning a particular portion of the dataset or the entire dataset piece-by-piece to enhance performance. dbt Cloud utilizes cursor-based pagination, which makes it easy to return pages of constantly changing data.

Use the PageInfo object to return information about the page. The available fields are:

  • startCursor string type Corresponds to the first node in the edge.
  • endCursor string type Corresponds to the last node in the edge.
  • hasNextPage boolean type Whether or not there are more nodes after the returned results.

There are connection variables available when making the query:

  • first integer type Returns the first n nodes for each page, up to 500.
  • after string type Sets the cursor to retrieve nodes after. It's best practice to set the after variable with the object ID defined in the endCursor of the previous page.

Below is an example that returns the first 500 models after the specified Object ID in the variables. The PageInfo object returns where the object ID where the cursor starts, where it ends, and whether there is a next page.

Example of paginationExample of pagination

Below is a code example of the PageInfo object:

pageInfo {
startCursor
endCursor
hasNextPage
}
totalCount # Total number of records across all pages

Filters

Filtering helps to narrow down the results of an API query. If you want to query and return only models and tests that are failing or find models that are taking too long to run, you can fetch execution details such as executionTime, runElapsedTime, or status. This helps data teams monitor the performance of their models, identify bottlenecks, and optimize the overall data pipeline.

Below is an example that filters for results of models that have succeeded on their lastRunStatus:

Example of filteringExample of filtering

Below is an example that filters for models that have an error on their last run and tests that have failed:

query ModelsAndTests($environmentId: BigInt!, $first: Int!) {
environment(id: $environmentId) {
applied {
models(first: $first, filter: { lastRunStatus: error }) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
tests(first: $first, filter: { status: "fail" }) {
edges {
node {
name
executionInfo {
lastRunId
}
}
}
}
}
}
}
0