regarding the new
on_schema_change='sync_all_columns'
config I have a question. I tested it a bunch of times and it seems that on adding a new column it doesn't automatically insert data into it. Also it doesn't really perform on the datatype changes it implies.
{{
config(
materialized='incremental',
on_schema_change='sync_all_columns',
incremental_strategy='merge'
)
}}
(https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models)
Am I doing something wrong?
As far as I understand the documentation:
Incremental models can now be configured to include an optional on_schema_change parameter to enable additional control when incremental model columns change. These options enable dbt to continue running incremental models in the presence of schema changes, resulting in fewer --full-refresh scenarios and saving query costs.
Looks like it shouldn't run full-refresh
link:
https://docs.getdbt.com/docs/building-a-dbt-project/building-models/configuring-incremental-models#what-if-the-columns-of-my-incremental-model-change
Related
As the title says, I'm looking for a way on how to force skip a model if one of my sources is not updated/fresh using the dbt freshness. Our current setup is as follows:
We basically have models that are sourcing from different snowflake tables and we materialize them as a table. The sources have different freshness frequency and if one of the sources for the model is not updated, we want it to skip that model from computing/calculating since it will just return the same data.
We have tried using if/else in the model itself using Jinja and just run "SELECT * FROM {{this}}" to recreate the table using the old data but it is very hacky and doesn't really skip the model.
So we are looking for better ways on how to leverage the result of DBT freshness command to determine if the models should run or just be skipped.
If you're running v1.1 or newer and don't mind an experimental API, you can use the source_status selector to only refresh models downstream from sources that have received new data. From the docs:
Another element of job state is the source_status of a prior dbt invocation. After executing dbt source freshness, for example, dbt creates the sources.json artifact which contains execution times and max_loaded_at dates for dbt sources.
That means your script that runs dbt in production needs to invoke dbt twice, with the first invocation saving the state of the sources. Again, from the docs:
# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
$ dbt source freshness # must be run again to compare current to previous state
$ dbt build --select source_status:fresher+ --state path/to/prod/artifacts
If you want to do the opposite, and exclude models downstream from sources that haven't updated, you can use the --exclude flag instead:
# You can also set the DBT_ARTIFACT_STATE_PATH environment variable instead of the --state flag.
$ dbt source freshness # must be run again to compare current to previous state
$ dbt build --exclude source_status:error+ --state path/to/prod/artifacts
Background: I have few models which are materialized as 'Table'. This tables are populated with wipe(Truncate) and Load. Now I want to protect my existing data in the Table if the query used to populate data is returning empty result set. How can I make sure an empty result set is not replacing my existing data in table.
My table lies in Snowflake and using dbt to model the output table.
Nutshell: Commit the transaction only when SQL statement used is returning Not empty result set.
Have you tried using dbt ref() function, which allows us to reference one model within another?
https://docs.getdbt.com/reference/dbt-jinja-functions/ref
If you are loading data in a way that is not controlled via dbt and then you are using this table - this is called a source. You can read more about this in here.
dbt does not control what you load into a source, everything else that is the T in the ELT is controlled where you reference a model via ref() function. A great example if you have a source that changes and you load it into a table and make sure that incoming data does not "drop" already recorded data is "incremental" materialization. I suggest you read more in here.
Thinking incremental takes time and practise, also it is recommended every now and then to do a --full-refresh.
You can have pre-hooks and post-hooks that can check your sources with clever macros and add dbt tests. We would really need a little bit more context of what you have and what you wish to achieve to suggest a real response.
I have a scenario where Java developer has made the change to the variable which used to transfer the data from column - col of table - tbl.
Now, I have to change the column varchar(15) to varchar(10). But, before making this change - have to handle the existing data and the constraints/dependencies on same column.
What should be the best sequence of doing so?
I am thinking to check the constraints first, then trim the existing data and then alter the table.
Please suggest how to handle constrains/dependencies and before handling it, how to check such dependencies.
Schema-evolution (the DDL changes that happen over time to tables and columns in a database, while preserving existing data and functionality) is a well understood topic with several solutions, some of which are RDBMS independent, others are built-in to the RDBMS solution.
A key requirement for production environments is to need both a forward-change and a backout, which can be run unattended.
Many open source advocates use Liquibase which also has a commercial variant.
Db2 for Linux/Unix/Windows also offers a built-in stored-procedure SYSPROC.ALTOBJ which helps to automate various schema-evolution alterations, including decreasing the size of a column. You would need to study its documentation carefully and test it fully on non-production environments until you are satisfied. Read about it here
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.sql.rtn.doc/doc/r0011934.html
You can grow-your-own script of course, in whatever language you prefer, including SQL, but remember you should also build and test a back-out script.
I am a newbie to cakephp and could really use some help and suggestions!
the application I am working with currently interacts with two databases, both databases have more or less a similar schema and table structure, I have to save some information in both databases, so i have this table say "employee_information" in both databases, both tables have a set of common fields (first_name, last_name, birthday, gender etc) and some other fields specific to that database.
now i have to save some information into the other database using cakephp model::save() method, previously I was normally switching data source and would use sql INSERT to do this and it was working fine, but now i really would like using cakephp conventional methods to do this, reason is that i think i am missing a great deal by not using cake's own methods ( data sanitizing in my case)
i had tried switching data source and using model::save(), the method did not work, though it did not log any errors, but also did not add any record into the database.
// using following snippet in the model to save.
$this->setDataSource('secondary_database');
$this->save($this->data);
$this->setDataSource('primary_database');
Any ideas or suggestions would be highly appreciated!
Thanks!
You're almost there, but you need to setup two db configs and select them with useDbConfig
For example:
$this->User->save($this->data); //Saves data to default (first) database
$this->User->useDbConfig('second'); //Selects second database for next uses
$this->User->save($this->data); //Saves data to second database too
//$this->User->useDbConfig('default'); //Not needed unless you want to do staff with the default database again later in the same code.
But if I'd need to save different fields in each DB, then I'd go with different models.
Setting custom table for the controller after switching data source worked for me. (http://api.cakephp.org/1.3/class-Model.html#_setSource)
$this->User->setDataSource('secondary_database');
$this->User->setSource('secondary_database_table');
$this->User->save($this->data,array(
'validate' => true,
'fieldList' => $fieldList // specific fields that needs to be updated.
));
I'm using Zend Framework's Zend_Db_Table classes to fetch data from a database.
I'd like to "refine" each row I fetch from a table by adding something to it. Within a plain old SQL query I would write eg. SELECT *, dueDate<NOW() AS isOverdue. In this example, feeding an extra field to the SQL query would be possible, but sometimes it might be more suitable to do the extra stuff with PHP. Anyway, I'd use this information mainly in my views, eg. to highlight overdue items accordingly.
What would be a good place to add this isOverdue data in a ZF application?
My thoughts so far:
finding that ZF has a built-in mechanism for this (not successful so far)
subclassing Zend_Db_Table_Row
overriding _fetch() in my Zend_Db_Table class
rethinking whether this is a sane pattern at all :)
As a bonus, it would be nice that I could still use ZF to update rows. Maybe this would be (another) reason for a naming convention for custom fields?
Why reinventing the wheel? There's a built in functionality to do this:
$this->select()->from('your_table_name_here', array('*', 'dueDate<NOW() AS isOverdue'));
Simply specify what columns you need using the second parameter of from() function and it will generate the SQL that you need (by default, if you do not use a second parameter it generates "SELECT * FROM table" query).
PavelDubinin.com - Professional Web Development blog