How to handle code migration when working with Snowflake? - snowflake-cloud-data-platform

I am trying to understand the best way to migrate code when working with Snowflake. There are two scenarios, one where we have only one snowflake account and that houses all environments (dev, test, prod). And the other has two accounts (non-prod, prod). With the second option, I was planning to create a separate script that will set up the correct database name which is based on environment (dev_ent_dw, prod_ent_dw) and then I will refer these as variables when creating objects. Example -
set env = ‘dev’;
set db = $env || ‘_ent_dw.’;
Right now we’re running everything manually so the devops team will run these upfront before running ddl scripts. We may do something similar with the former scenario but I am wondering if folks can share best practices of dealing with this as I am sure it would be common topic at large enterprises.

We have different accounts for each environment (dev, qa, prod). We use Azure DevOps for change management within our team, including the Git repos and Azure Pipelines for deploying scripts via schemachange.
We do NOT append an environment to objects, as that is handled by the account.
Developers write migration scripts and check them into source control. We then create Version folders and move/rename the migration scripts for deployment, and run a pipeline to execute the changes. The only thing we need to change is the URL to deploy against, and that is handled within the pipeline itself. The nice thing here is that we do not need to tweak anything going from different branches in source control.
We have not been doing much with creating clones for development work, as we usually only have one developer working on changes to a set of objects at a time. We are exploring ways to improve our process, but what we have works fairly well for our current needs.

Related

Can SSDT maintain different environment users and permissions in a single project?

We use different service accounts for the different deployment environments, so Dev has Account_A and Prod has Account_B, and any test app using Account_A will not have access to Prod. Or, as another example, Account_A can have read/write permissions in Dev, but only read permissions in Prod.
Up until now there has been no source control on the database definitions, just manual scripts everywhere, and I'd like to create a SSDT solution in Azure DevOps for this. I understand how you can set up releases to handle different database names across environments (Db_Dev vs Db_Prod, for example), but I'm not able to find anything about different users & permissions across environments.
Is this possible in SSDT? As far as I can tell, I have 2 options, but I'm hoping there's a better way:
Handle users and permissions outside of source control
Handle them somehow in a post-deployment script.
Caveat: I'm only talking about Windows Authentication users & groups. Passwords will obviously not be going into source control.
I wrote about this a long time ago here: https://schottsql.com/2013/05/14/ssdt-setting-different-permissions-per-environment/
You really are dealing with environment variables and a bunch of post-deploy scripts in order to do this. Your better option is to assign permissions to database roles so those are all consistent, then assign your users to those roles in each environment as appropriate - outside of SSDT. It's a lot less painful than trying to create/maintain logins and users in a series of post-deploy scripts in the long run.
On DevOps side, an environment is a collection of resources, such as Kubernetes clusters and virtual machines, that can be targeted by deployments from a pipeline. Typical examples of environment names are Dev, Test, QA, Staging, and Production. You can secure environments by specifying which users and pipelines are allowed to target an environment.
You can control who can create, view, use, and manage the environments
with user permissions. There are four roles - Creator (scope: all
environments), Reader, User, and Administrator. In the specific
environment's user permissions panel, you can set the permissions that
are inherited and you can override the roles for each environment.
More details, check the following link:
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/environments?view=azure-devops#security
Managing users and permissions in SSDT is a bit pain in the 🍑 and usually they are not maintained there. However you still have options:
Create post-script as you mentioned (bad choice)
Create separate projects for each environment
What I mean by separate project is: you need to create separate project for each environment, then add reference of your main project (where all of your objects exist) and set reference type as "the same database". Then in that project you'll add all needed users/permissions/modifications. These projects will have their own publish profiles as well. 1 issue you might face as well is that his project should reference ALL databases/dacpacs as your main project as well.

When using Continuous or Automated Deployment, how do you deploy databases?

I'm looking at implementing Team City and Octopus Deploy for CI and Deployment on demand. However, database deployment is going to be tricky as many are old .net applications with messy databases.
Redgate seems to have a nice plug-in for Team City, but the price will probably be stumbling block
What do you use? I'm happy to execute scripts, but it's the comparison aspect (i.e. what has changed) I'm struggling with.
We utilize a free tool called RoundhousE for handling database changes with our project, and it was rather easy to use it with Octopus Deploy.
We created a new project in our solution called DatabaseMigration, included the RoundhousE exe in the project, a folder where we keep the db change scripts for RoundhousE, and then took advantage of how Octopus can call powershell scripts before, during, and after deployment (PreDeploy.ps1, Deploy.ps1, and PostDeploy.ps1 respectively) and added a Deploy.ps1 to the project as well with the following in it:
$roundhouse_exe_path = ".\rh.exe"
$scripts_dir = ".\Databases\DatabaseName"
$roundhouse_output_dir = ".\output"
if ($OctopusParameters) {
$env = $OctopusParameters["RoundhousE.ENV"]
$db_server = $OctopusParameters["SqlServerInstance"]
$db_name = $OctopusParameters["DatabaseName"]
} else {
$env="LOCAL"
$db_server = ".\SqlExpress"
$db_name = "DatabaseName"
}
&$roundhouse_exe_path -s $db_server -d $db_name -f $scripts_dir --env $env --silent -o > $roundhouse_output_dir
In there you can see where we check for any octopus variables (parameters) that are passed in when Octopus runs the deploy script, otherwise we have some default values we use, and then we simply call the RoundhousE executable.
Then you just need to have that project as part of what gets packaged for Octopus, and then add a step in Octopus to deploy that package and it will execute that as part of each deployment.
We've looked at the RedGate solution and pretty much reached the same conclusion you have, unfortunately it's the cost that is putting us off that route.
The only things I can think of are to generate version controlled DB migration scripts based upon your existing database, and then execute these as part of your build process. If you're looking at .NET projects in future (that don't use a CMS), could potentially consider using entity framework code first migrations.
I remember looking into this a while back, and for me it seems that there's a whole lot of trust you'd have to get put into this sort of process, as auto-deploying to a Development or Testing server isn't so bad, as the data is probably replaceable... But the idea of auto-updating a UAT or Production server might send the willies up the backs of an Operations team, who might be responsible for the database, or at least restoring it if it wasn't quite right.
Having said that, I do think its the way to go, though, as its far too easy to be scared of database deployment scripts, and that's when things get forgotten or missed.
I seem to remember looking at using Red Gate's SQL Compare and SQL Data Compare tools, as (I think) there was a command-line way into it, which would work well with scripted deployment processes, like Team City, CruiseControl.Net, etc.
The risk and complexity comes in more when using relational databases. In a NoSQL database where everything is "document" I guess continuous deployment is not such a concern. Some objects will have the "old" data structure till they are updated via the newly released code. In this situation your code would need to be able to support different data structures potentially. Missing properties or those with a different type should probably be covered in a well written, defensively coded application anyway.
I can see the risk in running scripts against the production database, however the point of CI and Continuous Delivery is that these scripts will be run and tested in other environments first to iron out any "gotchas" :-)
This doesn't reduce the amount of finger crossing and wincing when you actually push the button to deploy though!
Having database deploy automation is a real challenge especially when trying to perform the build once deploy many approach as being done to native application code.
In the build once deploy many, you compile the code and creates binaries and then copy them within the environments. From the database point of view, is the equivalent to generate the scripts once and execute them in all environments. This approach doesn't handle merges from different branches, out-of-process changes (critical fix in production) etc…
What I know works for database deployment automation (disclaimer - I'm working at DBmaestro) as I hear this from my customers is using the build and deploy on demand approach. With this method you build the database delta script as part of the deploy (execute) process. Using base-line aware analysis the solution knows if to generate the deploy script for the change or protect the target and not revert it or pause and allow you to merge changes and resolve the conflict.
Consider a simple solution we have tried successfully at this thread - How to continuously delivery SQL-based app?
Disclaimer - I work at CloudMunch
We using Octopus Deploy and database projects in visual studio solution.
Build agent creates a nuget packages using octopack with a dacpac file and publish profiles inside and pushes it onto NuGet server.
Then release process utilizes the SqlPackage.exe utility to generate the update script for the release environment and adds it as an artifact to the release.
Previously created script executed in the next step with SQLCMD.exe utility.
This separation of create and execute steps gives us a possibility to have a manual step in between, so that someone verifies before the script is executed on Live environment, not to mention, that script saved as an artifact in the release can always be referred to, at any later point.
Would there be a demand I would provide more details and step scripts.

Twist to the standard “SQL database change workflow best practices”

Twist to the standard “SQL database change workflow best practices”
Background
ASP.NET/C# Web App
MS SQL
Environments
Production
UAT
Test
Dev
We create patch scripts (XML and sql) that are source controlled in Mercurial. We have cmd line utility that installs patches to DB (utitlity.exe install –patch) from a Release folder the build packages. Patches have meta data that helps with when patch should run and we log patches installed in a table in the target DB. All these were covered in the 3 year old question:
SQL Server database change workflow best practices
Our Problem/Twist
I think this works well for tables, views, functions and stored procedures. We struggle with application configuration data. Here are some touch points on application configurations.
New client. BA performs system study and fit analysis. Out of this comes a configuration word document of what application configurations need to be setup. Note some of these may also come in phases over time. We need to get these new configurations into the system for the developer and client UAT.
Developer works on feature request or bug fix. A new configuration change comes out of that change. The configuration needs to make it into the system for testing and promotion to UAT and up.
QA finds that the developer missed an associated configuration change. That configuration needs to make it into the system for promotion to UAT and up.
Build goes to UAT. Client performs acceptance testing but find they really want to change another unassociated configuration and have it promoted with the changes. In other words they found they want to change a business process by a configuration. The configuration needs to make it into the system for promotion to PRD.
As the client operates in PRD they may tweak application settings. These configurations need to make it into the system for future development and testing.
The general issue is making sure we are accounting for all the configurations and accidently not miss any during promotions which causes grief.
Our Attempts At A Process
a. We have had member of the QA team to write patches (xml and sql) and check those in. This requires a build to make sure those get into the package. With this approach it really just took care of item 1 above and we fell apart on the other items. The nice thing is for the items that made it into the patches it was just an install with the utility.
b. A developer threw together a Config page on the application. All the configurations could be uploaded and downloaded via XML document but it requires the app to be running. For item 1, member of QA team would manually setup configurations in the application and then would download the Config.xml file. This XML file would be used to upload configurations in other environments. We would use text diff tool to look at differences between config.xml files from different environments. This addressed item 1 and the others items but had problems. Problems were not all configurations made it into the XML document (just needs to be fixed by developer), some of the configurations didn’t have a UI in the application so you still had to manually go to the database on some, comparing the XML document with text diff was difficult at time (looked mostly due to sorting but I’m sure there are other issues), XML was not very human readable and finally the XML document did not allow for deleting existing incorrect or outdated configs.
c. Recently we went with option B, but over time for a new client we just started manually tracking configs and promoting them manually by hand (UI and DB) through the promotions. Needless to say lots of human errors.
So we have been looking at solutions. Eventually it would be great to get as much automation in as possible. I’m looking at going with the scripting approach and just focusing on process, documentation and looking at using Redgate data compare in addition to what we had been doing with compare on config.xml. With Redgate we have to create views though and there is no way to create update scripts from that approach except to manually update the scripts. It does at least allow a comparison without the app running. I’m also looking at pulling out the configs from our normal patches and making it a system independent of the build (utility.exe –patch –config). When I say focus on process it will be things like if we compare and find a config change either reported by client or not, we still script it, just means we have to have a process in place to quickly revalidate config install before promoting to the next level. As for documentation looking at making the original QA document a living document instead of just an upfront document. The goal is to try and enhance clarity and reduce missing configurations during promotion. Unfortunately it doesn’t improve speed of delivery.
Does anyone have any recommendations or best practices to pass along. Thanks.
Can I ask exactly what you mean by application configuration. I'm interpreting that as both:
Config files in the web application
Static reference data inside the database
Full disclosure I work for Red Gate. You might be interested in taking a look at Deployment Manager, it's a deployment tool that deploys applications, databases and configuration. It's free for up to 5 projects and target servers.
The approach it uses is to package application code and the database state into packages. These packages can be deployed into dev, test, staging and production environments. The same package is deployed to each environment.
Any application configuration that needs to change between environments is handled in one of the ways below:
Variable substitution in web.config. The tool allows you to specify override values for variables in these files, and set these per environment/server
Substituting the web.config file per environment.
Custom powershell scripts that are run pre/post deploy. You could use these to execute custom SQL based on the environment or server.
Static data within the database, using SQL Source Control's static
data feature. I've written a blog post about how to supply
different sets of static data to different environments/customers.
This allows you to source control the application configurations and deploy them to different environments.

Auto-deploy Zend Framework 2 application + Database schema + Actual data

Background:
I am using GitHub to store a ZF2 application.
The database schema + the actual data stored inside the schema are not being stored inside a version control. At the moment I am in development mode, so I have some database dump scripts that I load into the database when I need to. I also tweak entries in the database via phpMyAdmin when I need ongoing granular control for immediate testing purposes. I am also looking into using Doctrire ORM, so my schema will be part of my code via Annotations, and that will be checked into GitHub. Doctrine ORM will generate the actual schema for me, although it is still a separate step in the deployment process. The actual data however, will still be outside of the application and outside of the repository and currently has to be dealt with separately and is not automated.
Goal:
I want to be able to deploy ZF2 application and the database schema, and the data onto Zend Server and have it "just work" in the most automated, least manual way possible.
Question:
What is a recommended, best practice way to deploy every aspect of ZF2 application in the most automated, least manual way possible and have it "just work"? Let's focus on the Development and Testing mode here, as in Production it may be good to have separate deployment steps to protect against accidental live data overwrites.
You can try Phing (http://www.phing.info/) for deploying your PHP application, adjusting directory permissions, running database migrations, running unit tests, etc. I used Phing in couple of my projects with great success.

Integration tests in Continuous Integration environment: Database and filesystem state

I'm trying to implement automated integration tests for my application. It's a very complex monster. You could say that its database and part of the filesystem are part of its state, because it saves image files in the hard drive, and references to those in the DB. The software needs all those, in a coherent state, to work properly.
Back to writing tests: To run any relevant test, I need some image files in the filesystem, and certain records filled in the database. I thought of putting all of these in a separate folder called TestEnvironmentData in the repository, and retrieving them from the Continuous Integration Server (Team City), but a colleague said the repo is quite full as it is, and that I should set up a special directory, and databases, only in the Continuous Integration server. I don't like that because the tests success depend on me manually mantaining stuff in the server, and restoring initial state before every test becomes cumbersome.
What do you guys do when you need to write integration tests for an app like this? The main goal is having an automated test harness to approach a large scale refactoring. There's lots of spaghetti code and the app's current architecture is hardly unit testable, that's why I decided on integration tests first.
Any alternative approach is welcome.
Developer Repeatability is key when setting up a Continous Integrations Server. I have set one up for my last three employers and I have found the key to success is the developers being able to run the same tests from their dev system in order to get the same results as the CI Server.
The easiest way to do this would be to check in the test artifacts into source control but you could also use dropbox or a Network Share that you copy them from in one of the build steps.
For a .Net solution I have always used MsBuild as you can most easily replicate the build process of Visual Studio and get the same binaries/deployables. As for keeping your database in sync so that tests can be repeatable in the past I used the MbUnit test framework and the [Rollback] attribute as it would roll back any changes to Sql Server that happened in the test. I believe that Nunit now has this attribute as well.
The CI server is great for finding code that breaks existing functionality but unless developers can reproduce the error on their machine they won't trust the CI server for some time.
First of all, we use Maven to build our code. It's like ant, but it relies on convention instead of configuration for many things, like Ruby On Rails does. One of those conventions is a standardized directory structure:
(project)----src----main----(language)
| | \--resources
| \--test----(language)
| \--resources
\--target---...
Using a directory structure like this makes it easy to keep your application resources and testing resources near each other, yet still be able to build for test or build for production, or just build both but just package up the application parts after running the tests.
As far as resetting the database between tests, how you do that is greatly dependent on the DBMS you're using. For instance, if you're using MySQL it's very easy to get the test data the way you want and do a mysqldump to a file you then load before the test. With other DBMSs you may have to drop and recreate the tables and reload the data, or make separate tables for the starting point and use a CREATE/SELECT sql statement to duplicate it each time.
There really is no reliable way around the "reset the database between tests" step.

Resources