As the title says, I'm looking for a way on how to force skip a model if one of my sources is not updated/fresh using the dbt freshness. Our current setup is as follows:
We basically have models that are sourcing from different snowflake tables and we materialize them as a table. The sources have different freshness frequency and if one of the sources for the model is not updated, we want it to skip that model from computing/calculating since it will just return the same data.
We have tried using if/else in the model itself using Jinja and just run "SELECT * FROM {{this}}" to recreate the table using the old data but it is very hacky and doesn't really skip the model.
So we are looking for better ways on how to leverage the result of DBT freshness command to determine if the models should run or just be skipped.
If you're running v1.1 or newer and don't mind an experimental API, you can use the source_status selector to only refresh models downstream from sources that have received new data. From the docs:
Another element of job state is the source_status of a prior dbt invocation. After executing dbt source freshness, for example, dbt creates the sources.json artifact which contains execution times and max_loaded_at dates for dbt sources.
That means your script that runs dbt in production needs to invoke dbt twice, with the first invocation saving the state of the sources. Again, from the docs:
# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
$ dbt source freshness # must be run again to compare current to previous state
$ dbt build --select source_status:fresher+ --state path/to/prod/artifacts
If you want to do the opposite, and exclude models downstream from sources that haven't updated, you can use the --exclude flag instead:
# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
$ dbt source freshness # must be run again to compare current to previous state
$ dbt build --exclude source_status:error+ --state path/to/prod/artifacts
Related
I am using liquibase scripts with Cordapp. And previously the first version databaseChangeLog file was having all table creations in one single change set and in a later point of time we have split it into different databaseChangeLog having each changeset.
Now the problem is some production testing environments have the data in it with the older script, but we want to use the new scripts.
The change done is like →
Old: abc-master.xml contained abc-init.xml (usual way)|
Now: abc-master.xml contains abc-v1.xml and
abc-v1.xml contains table-v1.xml files for each table creation
Solution we were thinking is like,
create new tables with slight name change → then copy the data from old tables here → then drop old tables. So then we can remove old tables and old scripts (i assume)
Still the DATABASECHANGELOG will probably have the old scripts data, would that be a problem?
Or is there a far better way to do this?
Many thanks.
I answered this also on the Liquibase forums, but I'll copy it here for other people.
The filename is part if the unique key on the databasechangelog table (ID/Author/Filename). So when you change the filename of a changeset that has already executed, that is now in-fact a new changeset according to Liquibase.
I normally recommend that my customers never manually update the databasechangelog table, but in this case I think it might be the best course of action for you. That way your new file structure is properly reflected in the databasechangelog table.
I would run an update-sql command on the new file structure, against one of your database where you have already executed the chagesets. This will show you what changesets are pending, and also the values for the filenames that you need to update.
we are planning to go with
<preConditions onFail="MARK_RAN">
<not>
<tableExists tableName="MY_NEW_TABLE"/>
</not>
</preConditions>
For all those table creation changeset in new distributed structure ones, so our assumptions are:
We can keep this new structure alone in code & remove the old INIT file.
For environments having existing data, eventhough these new structure of changesets will be considered as new changeset to run, the preconditions will prevent it running.
For fresh db deployments, it will work as expected, by creating all the required tables.
Background: I have few models which are materialized as 'Table'. This tables are populated with wipe(Truncate) and Load. Now I want to protect my existing data in the Table if the query used to populate data is returning empty result set. How can I make sure an empty result set is not replacing my existing data in table.
My table lies in Snowflake and using dbt to model the output table.
Nutshell: Commit the transaction only when SQL statement used is returning Not empty result set.
Have you tried using dbt ref() function, which allows us to reference one model within another?
https://docs.getdbt.com/reference/dbt-jinja-functions/ref
If you are loading data in a way that is not controlled via dbt and then you are using this table - this is called a source. You can read more about this in here.
dbt does not control what you load into a source, everything else that is the T in the ELT is controlled where you reference a model via ref() function. A great example if you have a source that changes and you load it into a table and make sure that incoming data does not "drop" already recorded data is "incremental" materialization. I suggest you read more in here.
Thinking incremental takes time and practise, also it is recommended every now and then to do a --full-refresh.
You can have pre-hooks and post-hooks that can check your sources with clever macros and add dbt tests. We would really need a little bit more context of what you have and what you wish to achieve to suggest a real response.
regarding the new
on_schema_change='sync_all_columns'
config I have a question. I tested it a bunch of times and it seems that on adding a new column it doesn't automatically insert data into it. Also it doesn't really perform on the datatype changes it implies.
{{
config(
materialized='incremental',
on_schema_change='sync_all_columns',
incremental_strategy='merge'
)
}}
(https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models)
Am I doing something wrong?
As far as I understand the documentation:
Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.
Looks like it shouldn't run full-refresh
link:
https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#what-if-the-columns-of-my-incremental-model-change
I am trying to debug a complex LogicApps logic and I need to see the valiables and their values for an instance of a run history.
How can I see a list of all variables and their values for one of the run history entries?
Update #1:
I use run history to track/debug LogicApp runs:
Within the run history, I can see variables:
When I have more than 20 variables, inspecting them one at a time could be difficult. Is there any way to see a list of all varaibles and their values in the view above?
If not, what are my options?
I have three tables/entities for which I want to preserve their data when I load Doctrine2 fixtures. Of course, right now, when I run doctrine:fixtures:load, it purges the entire database (except migration_versions) and then loads the fixtures appropriately.
I realize that I can use the --append switch to only add data to the database, but I do want to remove most of the data from the database.
How do I preserve table data from only three tables/entities when using Doctrine2 fixtures?
How about you separate the "append-only" and "delete-only" fixtures classes into separate folders and then run two console commands specifying the fixtures path using --fixtures ?