dbt
).dbt
doesdbt
dbt
)dbt
doesdbt
's tagline: Analytics engineering tools designed for analystspip install dbt
.sql
filesdbt run
dbt
doesdbt
will take all of your .sql
files and send them to the BQ API to be
rundbt
can dodbt
also fully supports Snowflake, Redshift and PostGreSQLJinja templating is used, so it's recommended to refer to tables using ref
:
SELECT
*
FROM
{{ ref('table_name') }}
dbt
can infer the full DAGSuppose you're working on an update of table_x.sql
:
dbt run --model table_x
dbt run +table_x
dbt run table_x+
dbt run +table_x+
dbt
infers the DAG, it knows what queries can run in parallel{{
config(
materialized="incremental",
incremental_strategy="insert_overwrite"
partitions=get_list_of_day("\'" + var('ds') + "\'", -4),
partition_by={"field": "partition_date", "data_type": "date"},
)
}}
.sql
file can have configuration at the top as a jinja templateincremental
, table
or ephemeral
incremental
using the strategy insert_overwrite
, dbt
will append
the indicated partition(s) and write into them in sequenceWITH t_join AS (
SELECT *
FROM
{{ ref('table_a') }} JOIN {{ ref('table_b') }} USING(user_id)
{% if is_incremental() %}
WHERE
a_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 5 DAY)
AND DATE('{{ var("ds") }}')
AND b_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 5 DAY)
AND DATE('{{ var("ds") }}')
{% endif %}
)
SELECT * FROM t_join
WHERE
a_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 3 DAY)
AND DATE('{{ var("ds") }}')
is_incremental
macro/flag, we can force this to be false
with:
dbt run --model table_x --full-refresh
dbt
will first delete the existing table and then run the full refresh
querydbt
has:dbt test
with Airflowref('table_x')
tvlk-data-financial-dev.oliver_chen.finserv_l2_table_x
dbt
is run in staging/prod, the table will be written to:tvlk-data-financial-stg/prod.finserv_l2.table_x
dbt
allows us to specify table and column descriptions in a yaml file:
models:
- name: table_x
description: "Description of table_x"
columns:
- name: field_a
description: "Description of field_a"
dbt docs generate
⌘F
is a low latency search tool.sql
file name that generates it, so ⌘P
(in VSCode) performs a table name searchdbt
black
and flake8
run on dags/
folder.dags/
folder to staging dag bag.dags/
folder to prod dag bag.dbt run
using KubernetesPodOperator
on the same GKE cluster
as Airflow is on.L2
, L3
and L4
tables.dbt
dbt run
to run ALL the queries, then Composer is just a
very expensive cron
jobcron
service to trigger dbt run
. It could
be completely "serverless" if it's triggered to run on CloudBuild.Why are we recommending dbt CLI but not dbt Cloud?
Which pair of statements are both correct?
A linear ordering of the vertices in a Directed Acyclic Graph (DAG) is said to be topologically sorted if for every directed edge u→v in the DAG, vertex u comes before v in the ordering.
What is the computational complexity to topologically sort a DAG that has V number of vertices and E number of edges?
Table of contents | t |
---|---|
Exposé | ESC |
Autoscale | e |
Full screen slides | f |
Presenter view | p |
Source files | s |
Slide numbers | n |
Blank screen | b |
Notes | 2 |
Help | h |