SnowFlake DB deployment with GitLab-CI - snowflake-cloud-data-platform

I have multiple Snowflake accounts, having the DB creations and Stored procedures available on GitLab repo. Is there a right approach available to deploy the same using GitLab-CI where DB deploy versions can also be tracked and DB-RollBack also will be feasible.
As of now I am trying with Python on pipeline to connect snowflake and to execute SQL-Script files, and to rollback as well specific SQL are needed for clean-ups and rollback where on-demand rollback becomes a challenge.
Looking for a right approach or sample to refer.

Related

How to set up a dev to prod workflow on snowflake (and dbt)?

We are currently implementing snowflake and dbt and want to split snowflake databases between dev and prod, so that we have a database to test on before releasing new data models. We are planning to use dbt to create all of our data models going forward. I have a couple questions on the logistics of the workflow:
How do we keep dev and prod in sync? (Or should they be?) I know in snowflake theres a clone feature you can recreate metadata without copying data over. Should we clone our prod database to dev? On a daily basis? What about users that have materialized resources in dev -- they would lose that data.
Should we make it so that deployment to prod was part of the CICD process, and only a fully merged pull request (tested on snowflake dev) can be deployed to the snowflake prod? Would that present too much of a bottle neck?
Curious to understand how people have architected their workflows maintaining both a dev and prod snowflake environment.
A common implementation is to have user-specific dev schemas (e.g., dbt_lfolsom) that are written to and overwritten whenever a user executes any kind of dbt run; and then a single prod schema (e.g., analytics) that's written to and overwritten when jobs are executed "in production."
Running dbt "in development" means a dbt command is executed by an individual user (using dbt Cloud or dbt CLI) so that data is written to a dedicated schema that is used specifically for development.
Running dbt "in production" means running dbt commands that are configured as jobs in dbt Cloud (or using another orchestration tool) write data to a specific "prod" target schema.
Check out this article on running dbt in production.
With dbt Cloud, you can also write to PR-specific schemas that are created automatically when you create or update a GitHub PR, which may be what you'd like to use for CI/CD. Check out this article on options.
You can (should) definitely configure prod jobs to run only on fully merged code.
If for some reason you really need prod and dev to be separate databases instead of separate schemas, I think you would create separate dbt projects that use code from the same git remote repo but are configured (in dbt Cloud or using your profiles.yml) to write to different Snowflake databases. But I think that's an unconventional approach that would require more work.
To change the database where you run your code go to
top right corner -> your profile -> in credentials on the left choose the project, fill in the dev section (here you select only schema) and above it you can overwrite the default for this project.
This default will be changed only for you and will be indicated by a strikethrough over the default one and your selection next to it

Flyway migration with existing database using hasura - PostgresSQL Backup

I've develop a web platform that uses a PostgresSQL database along with Hasura to make GraphQL interface. This platform was deployed on a Google Cloud enviroment: the database is running in a Google Cloud SQL instance, the Hasura and a simple node.js servers are running on Cloud Run instances.
Anyway, since the database should keep growing, I need a secure and reliable way to keep track of changes done in development env to futher deploy it to production database.
The buck of the edits to the database schema are done using the Hasura Console and by now I just need a solution to track changes in data schema made in development enviroment to deploy only the needed changes to production
Reading about migrations I've found out Flyway as a solution to keep of these changes. However, there still some concerns about the implementation of Flyway in the project. But a couple of question arrise:
Is it possible to use the PostgresSQL (pgAdmin) backup generated files as migrations?
How could I make a migration from a development to the production database? Just by adding the remote url from Google Cloud SQL the do the migration?
There's no much need to keep track of changes of the data in production.
Is there a better option to control changes between development and production databases?
If a make frequent schema backup (using pgAdmin Backup tool) and run restore on the production database, it would do what I want?
Is it possible to use the PostgresSQL (pgAdmin) backup generated files as migrations?
I think you are going the wrong way. Flyway is about the migration scripts you execute to propogate DB version. The backup file contains the whole database. If you want to replace the whole database with the new version of it you may simply drop the old one and create the new one, but you will loose data that way. You can of course use flyway to restore the backup for you, but that way you'll get only the version table. If you'll update over several versions, then multiple restores will be performed that is not needed.
How could I make a migration from a development to the production database? Just by adding the remote url from Google Cloud SQL the do the migration?
I tried google'ing (entered "Google Cloud SQL flyway") and the first result pointed me to Umberto D'Ovido post Setup Flyway with Google Cloud SQL I'm sure with a little effor you'll find the instructions.

Repeatable Flyway Migration

How do I achieve repeateable migration of sql scripts to every database? I have a segment called API and this need to be deployed in all the existing databases in sql server.
Though I am able to repeatedly run/execute the set of scripts based on the naming convention, not able to run on every dbs.
As of now, I have a data-system.json file where all the dbs and segments are registered and I am using this to run the particular segment of a single db.
I'm not 100% on what you're asking, but in reference to the first part of your question:
How do I achieve repeateable migration of sql scripts to every database?
If you want to to run your Flyway scripts on multiple databases, you can use the 'migrate' command in the Flyway CLI to do that (https://flywaydb.org/documentation/command/migrate).
You can configure the environment specific info (e.g. login credentials) using environment variables (https://flywaydb.org/documentation/envvars).
Thanks

Azure continuous deployment from GitHub and database upgrades

I have a Web application that I usually deployed using Web Deploy directly from Visual Studio (whatever branch I am currently using in VS - normally master). But now I'm introducing a second web app on Azure that will be built from the same repo but different branch. To make things simpler I will be configuring both Web apps on Azure to integrate directly with GitHub and associate them with specific branch.
I also added two additional web.config files: Web.Primary.config and Web.Secondary.config and configured app settings on Azure portal of each web app by adding additional value SCM_BUILD_ARGS and set them to
SCM_BUILD_ARGS=-p:PublishProfile=Primary // in primary web app
SCM_BUILD_ARGS=-p:PublishProfile=Secondary // in secondary web app
which I understand will transform correct config file with specific external services' configurations (DB connection, mail server, etc.).
Now the additional step that I would like to include in continuous deployment is run a set of SQL scripts that I have in my repo that I used to manually upgrade database during Web Deploy in VS. Individual scripts are actually doing specific database upgrade steps:
backup current tables - backup creates a set of Backup_OriginalTableName tables that are copied from existing ones and populated with existing data
drop whole DB model - all non-backup objects are being dropped from procedures, functions, types, views, tables...
create model - creates all tables, views and indices
create user types
create user functions
create stored procedures
restore data to new tables from backup tables - this step may occasionally break if we introduce new non-nullable columns to tables in the new model don't have defaults defined on them; I will somehow have to mitigate this problem by adding an additional script that will add missing columns to backup tables and give them some defaults, but that's a completely different issue.
I used to also have a set of batch files (BAT) in my VS solution that simply executed sqlcmd against specific database instance and executed these scripts in predefined order (as above). Hence I had batches:
Recreate Local.bat - this one used additional SQL scripts to not restore from backup but rather to recreate an empty DB with only lookup tables being populated and some default data for development purposes (like predefined test users)
Restore Local.bat - I used this script to simply restore database from backup tables discarding any invalid data I may have created while debugging/testing since last DB recreate/upgrade/restore
Upgrade Local.bat - upgrade local development DB executing scripts mentioned above
Upgrade Production.bat - upgrade production DB on Azure executing scripts mentioned above
So to support the whole deployment process I was now doing manually in VS I would now like to also execute these scripts against specific Azure SQL DB during continuous deployment. I suppose I should be running these right after code deployment because if that one fails, DB shouldn't be upgraded either.
I'm a bit confused where and how to do this? Can I configure this somewhere in Azure portal? I was looking for resources on the Web but I can't seem to find any relevant information how to do additional deployment steps to execute these scripts. I think this is some everyday scenario as it's hard to think of web apps not requiring databases these days.
Maybe it's just my process that is wrong for DB upgrade/deployment so let me also know if there is any other normal way that does DB upgrade/migration with continuous deployment on Azure... I may change my process to accommodate for this.
Note 1: I'm not using Entity Framework or any other full blown ORM. I'm rather using NPoco and all my DB logic is built in SPs that DAL is using.
Note 2: I'm aware of recently introduced staging capabilities of Azure, but my apps are on cheaper plan that doesn't support staging and I want to keep it this way as I may be introducing additional web apps along the way that will be using additional code branches and resources (DB, mail etc.)
It sounds to me like your db project is a good candidate for SSDT and inclusion in source control. You can create a MyDB.sqlproj that builds your db as a dacpac, and then you can use SqlPackage.exe Publish to accomplish your deployment to Azure.
We recently brought our databases under source control and follow a similar process to build and automatically deploy them (but not to a SQL Azure DB). We've found the source control, SSDT tooling support, and automated deployment options to be worth the effort of setting up and maintaining our project this way.
This SO question has some good notes for Azure deployment of a dacpac using SSDT:
How to publish DACPAC file to a SQL Server database project via SQLPackage.exe of SSDT?

Flyway/Liquibase for Database Structure and DBUnit for Database Inserts?

I have the following scenario for my application:
1 Production Server
1 Test Server
n Development Computers
For database migration we use Hibernate Schema Update for the Schema and DBUnit for filling in alle the production data (on all servers/computers). When the schema update is done I generate a new DTD File for the new schema, so I can do a fresh import of the DBUnit XML. The application updates the database at startup with the XML file (only on development and test servers/computers!)
Of course this approach is not optimal and fragile. So I looked at Liquibase and Flyway. Both seem to be great tools, but what I do not get is: How do I migrate the data? In my case, I dump the data of the production system once a week and add it to the applications source control as a DBUnit XML file, so all developers have "fresh" data and the test server has current production data, too.
The problem I see with Liquibase and Flyway is, that there is no solution how to do automated diffs from the database data and generate the migration changes automatically.
So my idea is the following with the following steps:
Set Hibernate to validate instead of update.
When a STRUCTURAL database change is needed, I add it to the migration script for the major version
No database inserts are in the migration script.
Generate a new DTD for DBunit based on the new database structure
Generate the DBUnit XML from the production database.
Another idea would be to utilize flyways JavaMigration and provide an initial Database Dump based on DBUnit. All other changes for database data will be handled in migration scripts. But still there is the problem: How to make diffs from the current migration script state and the production database state?
It would be awesome if anyone could provide me hints how to handle my scenario :)
If your goal is to use dumps of the PROD database in DEV and TEST environments, I would:
Configure the DB migration tool to run on application startup (both Flyway and Liquibase support this through their respective APIs)
Package all the DB structure migrations together with the app
Dump both data and structure from PROD
This way, when the PROD database is restored to DEV or TEST, the old metadata table of the migration tool is restored as well.
When the app starts, the migration tool will discover that the db structure is outdated and upgrade it to the newest version. Done.
No need to use DBUnit for this.
The short answer is that all your changes would be done through Liquibase or Flyway.
We use Flyway, with the same prod/test/development setup.
We make all db changes (structure or metadata) using Flyway migration scripts, stored in source control. Each time we do a new deployment to an environment, we first run the migration scripts there (using either the command line tool or the maven plugin). The code first goes to development environment, gets integration tested there and keeps going to test and production.
The main thing to watch out for is that Flyway requires a linear versioning to the files, so if two developers check in migrations at the same time, one of them will have to rename theirs.

Resources