JDBC
The dbt Semantic Layer Java Database Connectivity (JDBC) API enables users to query metrics and dimensions using the JDBC protocol, while also providing standard metadata functionality.
A JDBC driver is a software component enabling a Java application to interact with a data platform. Here's some more information about our JDBC API:
- The Semantic Layer JDBC API utilizes the open-source JDBC driver with ArrowFlight SQL protocol.
- You can download the JDBC driver from Maven.
- The dbt Semantic Layer supports ArrowFlight SQL driver version 12.0.0 and higher.
- You can embed the driver into your application stack as needed, and you can use dbt Labs' example project for reference.
- If you’re a partner or user building a homegrown application, you’ll need to install an AWS root CA to the Java Trust documentation (specific to Java and JDBC call).
dbt Labs partners can use the JDBC API to build integrations in their tools with the dbt Semantic Layer
Using the JDBC API
If you are a dbt user or partner with access to dbt Cloud and the dbt Semantic Layer, you can setup and test this API with data from your own instance by configuring the Semantic Layer and obtaining the right JDBC connection parameters described in this document.
You may be able to use our JDBC API with tools that do not have an official integration with the dbt Semantic Layer. If the tool you use allows you to write SQL and either supports a generic JDBC driver option (such as DataGrip) or supports Dremio and uses ArrowFlightSQL driver version 12.0.0 or higher, you can access the Semantic Layer API.
Refer to Get started with the dbt Semantic Layer for more info.
Note that the dbt Semantic Layer API doesn't support ref
to call dbt objects. Instead, use the complete qualified table name. If you're using dbt macros at query time to calculate your metrics, you should move those calculations into your Semantic Layer metric definitions as code.
Authentication
dbt Cloud authorizes requests to the dbt Semantic Layer API. You need to provide an environment ID, host, and service account tokens.
Connection parameters
The JDBC connection requires a few different connection parameters.
This is an example of a URL connection string and the individual components:
jdbc:arrow-flight-sql://semantic-layer.cloud.getdbt.com:443?&environmentId=202339&token=SERVICE_TOKEN
JDBC parameter | Description | Example |
---|---|---|
jdbc:arrow-flight-sql:// | The protocol for the JDBC driver. | jdbc:arrow-flight-sql:// |
semantic-layer.cloud.getdbt.com | The access URL for your account's dbt Cloud region. You must always add the semantic-layer prefix before the access URL. | For dbt Cloud deployment hosted in North America, use semantic-layer.cloud.getdbt.com |
environmentId | The unique identifier for the dbt production environment, you can retrieve this from the dbt Cloud URL when you navigate to Environments under Deploy. | If your URL ends with .../environments/222222 , your environmentId is 222222 |
SERVICE_TOKEN | dbt Cloud service token with “Semantic Layer Only” and "Metadata Only" permissions. Create a new service token on the Account Settings page. | token=SERVICE_TOKEN |
*Note — If you're testing locally on a tool like DataGrip, you may also have to provide the following variable at the end or beginning of the JDBC URL &disableCertificateVerification=true
.
Querying the API for metric metadata
The Semantic Layer JDBC API has built-in metadata calls which can provide a user with information about their metrics and dimensions.
Expand the following toggles for examples and metadata commands:
Querying the API for metric values
To query metric values, here are the following parameters that are available. Your query must have either a metric
or a group_by
parameter to be valid.
Parameter | Description | Example |
---|---|---|
metrics | The metric name as defined in your dbt metric configuration | metrics=['revenue'] |
group_by | Dimension names or entities to group by. We require a reference to the entity of the dimension (other than for the primary time dimension), which is pre-appended to the front of the dimension name with a double underscore. | group_by=['user__country', 'metric_time'] |
grain | A parameter specific to any time dimension and changes the grain of the data from the default for the metric. | group_by=[Dimension('metric_time') grain('week|day|month|quarter|year')] |
where | A where clause that allows you to filter on dimensions and entities using parameters. This takes a filter list OR string. Inputs come with Dimension , and Entity objects. Granularity is required if the Dimension is a time dimension | "{{ where=Dimension('customer__country') }} = 'US')" |
limit | Limit the data returned | limit=10 |
order | Order the data returned by a particular field | order_by=['order_gross_profit'] , use - for descending, or full object notation if the object is operated on: order_by=[Metric('order_gross_profit').descending(True) ] |
compile | If true, returns generated SQL for the data platform but does not execute | compile=True |
saved_query | A saved query you can use for frequently used queries. | select * from {{ semantic_layer.query(saved_query="new_customer_orders" |
Note on time dimensions and metric_time
You will notice that in the list of dimensions for all metrics, there is a dimension called metric_time
. Metric_time
is a reserved keyword for the measure-specific aggregation time dimensions. For any time-series metric, the metric_time
keyword should always be available for use in queries. This is a common dimension across all metrics in a semantic graph.
You can look at a single metric or hundreds of metrics, and if you group by metric_time
, it will always give you the correct time series.
Additionally, when performing granularity calculations that are global (not specific to a particular time dimension), we recommend you always operate on metric_time
and you will get the correct answer.
Note that metric_time
should be available in addition to any other time dimensions that are available for the metric(s). In the case where you are looking at one metric (or multiple metrics from the same data source), the values in the series for the primary time dimension and metric_time
are equivalent.
Examples
Refer to the following examples to help you get started with the JDBC API.
Fetch metadata for metrics
You can filter/add any SQL outside of the templating syntax. For example, you can use the following query to fetch the name and dimensions for a metric:
select name, dimensions from {{
semantic_layer.metrics()
}}
WHERE name='food_order_amount'
Query common dimensions
You can select common dimensions for multiple metrics. Use the following query to fetch the name and dimensions for multiple metrics:
select * from {{
semantic_layer.dimensions(metrics=['food_order_amount', 'order_gross_profit'])
}}
Query grouped by time
The following example query uses the shorthand method to fetch revenue and new customers grouped by time:
select * from {{
semantic_layer.query(metrics=['food_order_amount','order_gross_profit'],
group_by=['metric_time'])
}}
Query with a time grain
Use the following example query to fetch multiple metrics with a change in time dimension granularities:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month')])
}}
Group by categorical dimension
Use the following query to group by a categorical dimension:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'), 'customer__customer_type'])
}}
Query only a dimension
In this case, you'll get the full list of dimension values for the chosen dimension.
select * from {{
semantic_layer.query(group_by=['customer__customer_type'])
}}
Query with where filters
Where filters in API allow for a filter list or string. We recommend using the filter list for production applications as this format will realize all benefits from the Predicate pushdown where possible.
Where Filters have a few objects that you can use:
-
Dimension()
— Used for any categorical or time dimensions.Dimension('metric_time').grain('week')
orDimension('customer__country')
. -
TimeDimension()
— Used as a more explicit definition for time dimensions, optionally takes in a granularityTimeDimension('metric_time', 'month')
. -
Entity()
— Used for entities like primary and foreign keys -Entity('order_id')
.
For TimeDimension()
, the grain is only required in the WHERE
filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains.
For example, consider this Semantic model and Metric config, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model.
semantic_model:
name: my_model_source
defaults:
agg_time_dimension: created_month
measures:
- name: measure_0
agg: sum
- name: measure_1
agg: sum
agg_time_dimension: order_year
dimensions:
- name: created_month
type: time
type_params:
time_granularity: month
- name: order_year
type: time
type_params:
time_granularity: year
metrics:
- name: metric_0
description: A metric with a month grain.
type: simple
type_params:
measure: measure_0
- name: metric_1
description: A metric with a year grain.
type: simple
type_params:
measure: measure_1
Assuming the user is querying metric_0
and metric_1
together in a single request, a valid WHERE
filter would be:
"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"
Invalid filters would be:
-
"{{ TimeDimension('metric_time') }} > '2020-01-01'"
— metrics in the query are defined based on measures with different grains. -
"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"
—metric_1
is not available at a month grain.
- Use the following example to query using a
where
filter with the string format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where="{{ Dimension('metric_time').grain('month') }} >= '2017-03-09' AND {{ Dimension('customer__customer_type' }} in ('new') AND {{ Entity('order_id') }} = 10")
}}
- (Recommended for better performance) Use the following example to query using a
where
filter with a filter list format:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
where=["{{ Dimension('metric_time').grain('month') }} >= '2017-03-09'", "{{ Dimension('customer__customer_type') }} in ('new')", "{{ Entity('order_id') }} = 10"])
}}
Query with a limit
Use the following example to query using a limit
or order_by
clause:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10)
}}
Query with Order By Examples
Order By can take a basic string that's a Dimension, Metric, or Entity, and this will default to ascending order
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10,
order_by=['order_gross_profit'])
}}
For descending order, you can add a -
sign in front of the object. However, you can only use this short-hand notation if you aren't operating on the object or using the full object notation.
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time')],
limit=10,
order_by=[-'order_gross_profit'])
}}
If you are ordering by an object that's been operated on (for example, you changed the granularity of the time dimension), or you are using the full object notation, descending order must look like:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('week')],
limit=10,
order_by=[Metric('order_gross_profit').descending(True), Dimension('metric_time').grain('week').descending(True) ])
}}
Similarly, this will yield ascending order:
select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('week')],
limit=10,
order_by=[Metric('order_gross_profit'), Dimension('metric_time').grain('week')])
}}
Query with compile keyword
-
Use the following example to query using a
compile
keyword:select * from {{
semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'],
group_by=[Dimension('metric_time').grain('month'),'customer__customer_type'],
compile=True)
}} -
Use the following example to compile SQL with a saved query. You can use this for frequently used queries.
select * from {{ semantic_layer.query(saved_query="new_customer_orders", limit=5, compile=True}}
When querying saved queries,you can use parameters such as where
, limit
, order
, compile
, and so on. However, keep in mind that you can't access metric
or group_by
parameters in this context. This is because they are predetermined and fixed parameters for saved queries, and you can't change them at query time. If you would like to query more metrics or dimensions, you can build the query using the standard format.
Query a saved query
Use the following example to query a saved query:
select * from {{ semantic_layer.query(saved_query="new_customer_orders", limit=5}}
The JDBC API will use the saved query (new_customer_orders
) as defined and apply a limit of 5 records.