dbt).dbt doesdbtdbt)dbt doesdbt's tagline: Analytics engineering tools designed for analystspip install dbt.sql filesdbt rundbt doesdbt will take all of your .sql files and send them to the BQ API to be
rundbt can dodbt also fully supports Snowflake, Redshift and PostGreSQLJinja templating is used, so it's recommended to refer to tables using ref:
SELECT
*
FROM
{{ ref('table_name') }}
dbt can infer the full DAGSuppose you're working on an update of table_x.sql:
dbt run --model table_xdbt run +table_xdbt run table_x+dbt run +table_x+dbt infers the DAG, it knows what queries can run in parallel{{
config(
materialized="incremental",
incremental_strategy="insert_overwrite"
partitions=get_list_of_day("\'" + var('ds') + "\'", -4),
partition_by={"field": "partition_date", "data_type": "date"},
)
}}
.sql file can have configuration at the top as a jinja templateincremental, table or ephemeralincremental using the strategy insert_overwrite, dbt will append
the indicated partition(s) and write into them in sequenceWITH t_join AS (
SELECT *
FROM
{{ ref('table_a') }} JOIN {{ ref('table_b') }} USING(user_id)
{% if is_incremental() %}
WHERE
a_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 5 DAY)
AND DATE('{{ var("ds") }}')
AND b_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 5 DAY)
AND DATE('{{ var("ds") }}')
{% endif %}
)
SELECT * FROM t_join
WHERE
a_date BETWEEN DATE_SUB(DATE('{{ var("ds") }}'), INTERVAL 3 DAY)
AND DATE('{{ var("ds") }}')
is_incremental macro/flag, we can force this to be false with:
dbt run --model table_x --full-refreshdbt will first delete the existing table and then run the full refresh
querydbt has:dbt test with Airflowref('table_x')tvlk-data-financial-dev.oliver_chen.finserv_l2_table_x
dbt is run in staging/prod, the table will be written to:tvlk-data-financial-stg/prod.finserv_l2.table_x
dbt allows us to specify table and column descriptions in a yaml file:
models:
- name: table_x
description: "Description of table_x"
columns:
- name: field_a
description: "Description of field_a"
dbt docs generate
⌘F is a low latency search tool.sql file name that generates it, so ⌘P (in VSCode) performs a table name searchdbtblack and flake8 run on dags/ folder.dags/ folder to staging dag bag.dags/ folder to prod dag bag.dbt run using KubernetesPodOperator on the same GKE cluster
as Airflow is on.L2, L3 and L4 tables.dbtdbt run to run ALL the queries, then Composer is just a
very expensive cron jobcron
service to trigger dbt run. It could
be completely "serverless" if it's triggered to run on CloudBuild.Why are we recommending dbt CLI but not dbt Cloud?
Which pair of statements are both correct?
A linear ordering of the vertices in a Directed Acyclic Graph (DAG) is said to be topologically sorted if for every directed edge u→v in the DAG, vertex u comes before v in the ordering.
What is the computational complexity to topologically sort a DAG that has V number of vertices and E number of edges?
| Table of contents | t |
|---|---|
| Exposé | ESC |
| Autoscale | e |
| Full screen slides | f |
| Presenter view | p |
| Source files | s |
| Slide numbers | n |
| Blank screen | b |
| Notes | 2 |
| Help | h |