There are multiple ways to pass values to an SSIS package. We could use variables, Package parameters, or Project parameters also we could save values in a table and ask SSIS to pick the values from the table. Now, my way of coding is to use project Parameters and get the variables to use them. When deployed to the SSIS catalog, ENV can be set up to overwrite Param values as user requirements. Now, Am weighing the risks/ease of setting up ENV by the user to pass param values vs setting up a table to save values and code in SSIS to pick the values. Pls, pour in your thoughts on the pros and cons of both these approaches.
For eg: let's assume we have an SSIS package to save data to a CSV file. And Folder path where CSV files must be saved varies depending on servers(DEV/UA/Prod). Is it best to save folder path value in a table along with server name or is it best to set folder Path value as Param and ask the user who executes to set up the folder value in ENV at the time of execution depending on the server?
Update on 23 Mar 2022 - Based on all valuable inputs, I decided to use parameters and variables rather than using SQL table to pick values.
In my experience variables are best served by using an Execute SQL task and returning the results to a variable. It's modular and means it certain steps can easily be disabled if need be.
For managing connections (without outright hard-coding a connection string) I'd advise saving the CSV file location via a parameter. A parameter can be modified in the deployment environment via your SQL Script Agent and doesn't require changes to the source table. If I can avoid it, I wouldn't ever put file location information in the source table as it makes the table less exportable.
As mentioned in the official documentation:
Integration Services (SSIS) parameters allow you to assign values to properties within packages at the time of package execution.
Parameters were introduced in SQL Server 2012. This option was added to avoid using variables and external configuration files to pass arguments on the package execution.
If the CSV files directory changes based on the package environment, this means that parameters are the best suited for this situation. Still, other options can be used.
References
SSIS Parameters vs. Variables
There are several methods available and each have there pros and cons.
Utilizing parameters whether package or project level are good for anything that needs to change on a regular basis at execution. As those can be changed with a script task and then the package can be started at the end of the script. This also means whoever needs to execute the packages must have the appropriate security levels and knowledge.
Setting up environments is good for static content such as connection strings or email addresses for errors. It is possible to set up one master environment and have other folders utilize that environment or you would need to set it up for each folder. The downside is the person deploying the package needs to know how they are used and if they are outside the catalog folder than an extra step is required for mapping in the SQL server agent.
My preferred method is to create one table that holds the information and then the package connects to the table first step and loads the values to the variables. I have implemented this at my current position and it has become standard on all packages. It allows for some content to be defaulted and we have tables for DEV, QA, and Prod so as the packages are migrated the values are filled in. The table contains the Package Name, Variable Name, Variable Value, and audit columns to see when rows where added or updated. The table is temporal so it tracks all changes.
The packages execute a SP that pivots the rows to return a single row. The pivot is dynamic so it adds columns to the result set as needed. When a value is marked as default for the package name it appears for all packages but if the same Variable Name is listed under the package name it will show instead of the default. For example when testing I may want to send all error messages to my email instead of the group inbox. I will add a record of My Package Name, Email_Alert, My Email Address. This will then show as my email address for testing, when going to QA or Prod I do not include that record in the other tables so it uses the default inbox.
Utilizing the table also gives me the ability to have a SSRS report that shows the variables used for each package and also allows for me to change the values as needed while keeping an audit log of who changed what value and when. This is useful when something needs to change for backdating or anything else as I can make the change execute the package and then change the value back. If the department is ever audited on anything I have a full audit trail that I can provide in a matter of minutes and not days. There is also a rule that has been implemented that no values are allowed to be hard coded into variables anymore they must be in the table. Stored Procedure Names are also saved in the table and passed to the package so if we need to update a SP we do not need to redeploy the package.
We try to build all SSIS packages so we can adjust to changes without needing to redeploy as that is when mistakes often are made.
Related
I have a sql server database working with a .net 2015 mvc 5 application. My database code is source controlled using SSDT project. I am using SqlPackage.exe to deploy database to the staging environment using .Decpac file created by the SSDT project build process. This has been done using a powershell task of VSTS build.
This way I can make db schema changes in a source controlled way. Now the problem is regarding the master data insertion for the database.
I use a sql script file which have data insertion scripts which is executed as a post deployment script. This file is also source controlled.
The problem is that initially we have prepared the insertion script to target a sprint ( taking sprint n as a base) which works well for first release. but in next sprint if update some master data then how should the master data insert should be updated:
Add new update / insert query at the last of the script file? but in this case the post deployment script will be execute by CI and it try to insert the data again and again in the subsequent builds which will eventually get failed if we have made some schema changes in the master tables of this database.
Update the existing insert queries in the data insertion script. in this case also we have trouble because at the post build event, whole data will be re-inserted.
Maintain separate data insertion scripts for each script and update the script reference to the new file for the post build event of SSDT. This approach has a manual effort and error pron because the developer has to remember this process. Also the other problem with this approach is if we need to setup 1 more database server in the distributed server farm. Multiple data insertion script will throw errors because SSDT has latest schema and it will create a database with the same. but older data scripts has data insertion for previous schema ( sprint wise db schema which was changed in later sprints)
So can anyone suggest best approach which have lesser manual effort but it can cover all the above cases.
Thanks
Rupendra
Make sure your pre- and post-deployment scripts are always idempotent. However you want to implement that is up to you. The scripts should be able to be run any number of times and always produce correct results.
So if your schema changes that would affect the deployment scripts, well, updating the scripts is a dependency of the changes and accompanies it in source control.
Versioning of your database is already a built in feature of SSDT. In the project file itself, there is a node for the version. And there is a whole slew of versioning build tasks in VSTS you can use for free to version it as well. When SqlPackage.exe publishes your project with the database version already set, a record is updated in msdb.dbo.sysdac_instances. It is so much easier than trying to manage, update, etc. your own home-grown version solution. And you're not cluttering up your application's database with tables and other objects not related to the application itself.
I agree with keeping sprint information out of the mix.
In our projects, I label source on successful builds with the build number, which of course creates a point in time marker in source that is linked to a specific build.
I would suggest to use MERGE statements instead of insert. This way you are protected for duplicated inserts within a sprint scope.
Next thing is how to distinguish different inserts for different sprints. I would suggest to implement version numbering to sync database with the sprints. So create a table DbVersion(version int).
Then in post deployment script do something like this:
SET #version = SELECT ISNULL(MAX(version), 0) FROM DbVersion
IF #version < 1
--inserts/merge for sprint 1
IF #version < 2
--inserts/merge for sprint 2
...
INSERT INTO DbVersion(#currentVersion)
What I have done on most projects is to create MERGE scripts, one per table, that populate "master" or "static" data. There are tools such as https://github.com/readyroll/generate-sql-merge that can be used to help generate these scripts.
These get called from a post-deployment script, rather than in a post-build action. I normally create a single (you're only allowed one anyway!) post-deployment script for the project, and then include all the individual static data scripts using the :r syntax. A post-deploy script is just a .sql file with a build action of "Post-Deploy", this can be created "manually" or by using the "Add New Object" dialog in SSDT and selecting Script -> Post-Deployment Script.
These files (including the post-deploy script) can then be versioned along with the rest of your source files; if you make a change to the table definition that requires a change in the merge statement that populates the data, then these changes can be committed together.
When you build the dacpac, all the master data will be included, and since you are using merge rather than insert, you are guaranteed that at the end of the deployment the contents of the tables will match the contents of your source control, just as SSDT/sqlpackage guarantees that the structure of your tables matches the structure of their definitions in source control.
I'm not clear on how the notion of a "sprint" comes into this, unless a "sprint" means a "release"; in this case the dacpac that is built and released at the end of the sprint will contain all the changes, both structural and "master data" added during the sprint. I think it's probably wise to keep the notion of a "sprint" well away from your source control!
I am using Visual Studio 2015 (Enterprise) and a DB project. The compare I have set up compares my local DB with my local files which are under source control. This works fine for most files, but I have some SPs with SQLCMD variables which define the Database name, and I'm having trouble editing these SPs in a sensible way.
Here's an example of a statement in the SP on environment 1:
SELECT T.SomeField1, T.SomeField2
FROM [LinkedServer].[Database1].dbo.SomeTable T
And that statement in the same SP residing on environment 2:
SELECT T.SomeField1, T.SomeField2
FROM [LinkedServer].[Database2].dbo.SomeTable T
(The only difference is Database1 vs Database2.)
And here's what the SP snippet looks like in the local file which is under source control:
SELECT T.SomeField1, T.SomeField2
FROM [LinkedServer].[$(MySqlcmdVariable)].dbo.SomeTable T
The DB Project defines $(MySqlcmdVariable) differently depending on the environment.
I've tried a few different things that don't quite work:
Editing the SP directly within Visual Studio. The problem is that I can't execute the SP because it seems the SQLCMD variables don't work until you Update (from file to DB) or Publish to DB. This might work for minor changes, but for every day development of SPs I find this cumbersome.
Update the SP in SSMS and merge. This would be my preferred method, but I can't figure out a good way to merge the SQL changes in the DB back into the source file. When I do my DB Project compare, if the file is not modified, as expected, the comparison shows no differences despite the presence of the SQLCMD variables. But as soon as any change is made, the difference shows up, along with every SQLCMD variable present in the SP! (I feel like this is a bug in VS.) Now I can't update from DB to source because the SQLCMD variables would get overwritten. Note that updating from src to DB does work, but the change I want to keep is in the DB so I need to go the other way.
It would be nice if the compare tool had an editor so I can manually copy lines from one side to the other. Alternatively, it would be nice if I could use a custom compare tool like I can for comparing files under source control. I can't figure out how to use my preferred compare tool for a DB project when one of the sides is the DB.
What I've resigned to do for now is make changes in SSMS, then open my compare tool, copy the SP to one side and compare with the existing file, then manually merge in my changes. I'm convinced there has to be a better way though.
We just trying to implement SSDT in our project.
We have lots of clients for one of our products which is built on a single DB (DBDB) with tables and stored procedures only.
We created one SSDT project for database DBDB (using VS 2012 > SQL Server object Browser > right click on project > New Project).
Once we build that project it creates one .sql file.
Problem: if we run that file on client's DBDB - it creates all the tables again & it deletes all records in it [this fulfills the requirements but deletes the existing records :-( ]
What we need: only the update which is not present on the client's DBDB should get update with new changes.
Note : we have no direct access to client's DBDB database for comparing with our latest DBDB. We only can send them some magic script file which will update their DBDB to the latest state.
The only way to update the Client's DB is to compare the DB schemas and then apply the delta. Any way you do it, you will need some way to get a hold on the schema thats running at the client:
IF you ship a versioned product, it is easiest to deploy version N-1 of that to your development server and compare that to the version N you are going to ship. This way, SSDT can generate the migration script you need to ship to the client to pull that DB up to the current schema.
IF you don't have a versioned product, or your client might have altered the schema or you will need to find a way to extract the schema data on site (maybe using SSDT there) and then let SSDT create the delta.
Option: You can skip using the compare feature of SSDT altogether. But then you need to write your migration script yourself. For each modification to the schema, you need to write the DDL statements yourself and wrap them in if clauses that check for the old state so the changes will only be made once and if the old state exists. This way, it doesnt really matter from wich state to wich state you are going as the script will determine for each step if and what to do.
The last is the most flexible, but requires deep testing in its own and of course should have started way before the situation you are in now, where you don't know what the changes have been anymore. But it can help for next time.
This only applies to schema changes on the tables, because you can always fall back to just drop and recreate ALL stored procedures since there is nothing lost in dropping them.
It sounds like you may not be pushing the changes correctly. You have a couple of options if you've built a SQL Project.
Give them the dacpac and have them use SQLPackage to update their own database.
Generate an update script against your customer's "current" version and give that to them.
In any case, it sounds like your publish option might be set to drop and recreate the database each time. I've written quite a few articles on SSDT SQL Projects and getting started that might be helpful here: http://schottsql.blogspot.com/2013/10/all-ssdt-articles.html
Our firm does not have a dedicated DBA employed but does have select developers performing DBA functions. We update our database often during a development cycle and have a release script with the various updates. We keep our db schema and objects in Visual Studio in a Database Project.
However, we often encounter two stumbling block problems that causes time-intensive manual intervention:
Developers cannot always sync from the Database Project to their local database because if we have added a NOT NULL field to an existing table that contains data then the Deploy process for VS to the db isn't smart enough to automagically insert "test" data just get the field into the table (unless this is a setting someplace?). We would of course follow this up, if possible, with a script to populate the field with real data, but we can't because the deployment fails.
Sometimes a developer will restore a backup from any past random date. There is no way of knowing exactly which db updates were applied to this database, so they don't know which scripts to start applying. What we do in this case is to check each script, chronologically, to see if the changes from that script have been applied to the database. If so, move on to the next script to run. Repeat.
One method we have discussed is potentially creating a "Database Update Level" table in the database with 1 field, 1 row. It would maintain the level that the database has been updated through. For example, when the first script is run, update the level to 2. In each db script, we would wrap the statements in a check such as
IF Database_Update_Level < 2 THEN
do some things here
UPDATE Database_Update_Level SET Database_Update_Level = 2
END IF
The db scripts can then be run on any database because the individual statement won't execute below a certain level.
This feels like we're missing something because this must be a common problem that every development shop that allows developers to develop locally encounters.
Any insights would be greatly appreciated.
Thanks.
about the restore problem, I don't see many solutions, you might try to prevent full restore and run scripts to populate the tables instead. As for versioning structures, do you use SSDT (SQL Server Data Tools) in VS ? You can generate DACPACs and generate diff scripts.
But what you say is that you also alter structures directly in the database ? No way to avoid that ? If not you could for example use DDL triggers (http://www.mssqltips.com/sqlservertip/2085/sql-server-ddl-triggers-to-track-all-database-changes/) to at least get notified that something changed.
One easy way to solve the NOT NULL problem is to establish default constraints (could just be an empty string, max number value for the data type, max date value, etc.). When the publish occurs the new column will be populated with the default value.
For the second issue I'd utilize post-deploy scripts in your SSDT project to keep the data in sync utilizing 'NOT EXISTS' to make incremental changes. That way, you can simply publish the database and allow the data updates to occur one after another.
After reviewing all the different options I am still confused.
Here is the scenario. We have multiple databases on the same server that we would like to have a Single SSIS job handle imports (or exports) into (from) a table from a file. We are calling this from vb.net and the job is running on SSIS on the server. We don't have xp_cmdshell available.
We need to pass to the job unique job information (it is possible that 2 people could be running the same job on the same db or on a different db on the same server), the database connection information (This cannot be stored and selected in the job, as db's may be added/removed as needed and we don't want to reconfigure the job) and the file name/path (on the server or permitted UNC path available to SSIS).
We have looked at the option of declaring the job/job steps and then directly executing the job. We like this idea in that the Jobs would be unique and we could have the sql proc that the job calls report issues back to a common log table by the job id, which would then be available to review.
What I don't really follow is how to pass the information that this job needs.
In http://code.msdn.microsoft.com/Calling-a-SSIS-Package-a35afefb I see them passing parameters using the set command, but I get confused by the explanation of the call that things are processed twice. Also, in that example, would I be changing the Master DB reference to my DB in the Add Job Step?
My issue is that no example is really clean and simple passing parameters and changing DB's, a lot use different options like a list of db's to process from a data source and none really cleanly show me what to do with a variable that will be passed on down to a called stored procedure.
I don't have time to delve deep and experiment, I need to see how it is done as I am trying to understand it at a level back so I know how we can utilize it and fit the information we need to use (ie what do I need for connection information to dynamically assign it) as I need to know it to understand where in the grand scheme I am getting that information. (We don't store that in the actual DB doing the job, we have a repository in a central DB for that, but I don't know exactly what I need to store!)
Brian
Parameters that are dynamic to a single run of a job can be passed in to the SSIS package through a config table. The process that starts the job sets any necessary parameters in the config table before starting the job. The job kicks off the SSIS package, which has a connection manager to read the values out of the config table and into parameter values within the SSIS package.
You mentioned that you have database connection information, and if you choose to pass in parameters through a table keep in mind that storing SQL login information in a database is bad practice. The connection manager in the SSIS package should use windows authentication, and permissions needed by the SSIS package can be granted to the SQLAgent service account.
From what I understand, you want to run a package (or packages)via a SQLAgent job and the database it will run against is subject to change.
As supergrady says, you can pass in specific parameters to the package through a config table.
What I did was to create a config table and add a status column (a bit that indicates on/off, true/false). This allows me to run a sql script setting the status for the specific databases that I want and turning off those that I don't want. For me this is easier than opening up the job and fiddling with the command line values which is another way of getting what you want. I hope this helps