When publishing a dacpac with sqlpackage.exe, it runs Schema Compare first, followed by pre-deployment scripts. This causes a problem when, for instance, you need to drop a table or rename a column. Schema Compare was done before the object was modified and the deployment fails. Publish must be repeated to take the new schema into account.
Anyone have a work-around for this that does not involve publishing twice?
Gert Drapers called it as pre-pre-deployment script here
Actually it is a challenge. If you need to add non-nullable and foreign key column to a table full of data - you can do with a separate script only.
If you are the only developer - that is not a problem, but when you have a large team that "separate script" has to be somehow executed before every DB publish.
The workaround we used:
Create separate SQL "Before-publish" script (in DB project) which has a property [Build action = None]
Create custom MSBuild Task where to call SQLCMD.EXE utility passing "Before-publish" script as a parameter, and then to call SQLPACKAGE.EXE utility passing DB.dacpac
Add a call of the custom MSBuild Task to db.sqlproj file. For example:
<UsingTask
TaskName="MSBuild.MsSql.DeployTask"
AssemblyFile="$(MSBuildProjectDirectory)\Deploy\MsBuild.MsSql.DeployTask.dll" />
<Target Name="AfterBuild">
<DeployTask
Configuration="$(Configuration)"
DeployConfigPath="$(MSBuildProjectDirectory)\Deploy\Deploy.config"
ProjectDirectory="$(MSBuildProjectDirectory)"
OutputDirectory="$(OutputPath)"
DacVersion="$(DacVersion)">
</DeployTask>
</Target>
MsBuild.MsSql.DeployTask.dll above is that custom MSBuild Task.
Thus the "Before-publish" script could be called from Visual Studio.
For CI we used a batch file (*.bat) where the same two utilities (SQLCMD.EXE & SQLPACKAGE.EXE) were called.
The final process we've got is a little bit complicated and should be described in a separate article - here I mentioned a direction only :)
Move from using visual studio to using scripts that drive sqlpackage.exe and you have the flexibility to run scripts before the compare:
https://the.agilesql.club/Blog/Ed-Elliott/Pre-Deploy-Scripts-In-SSDT-When-Are-They-Run
ed
We faced a situation when we need to transform data from one table into other during deployment of the database project. Of course it is a problem to do using the DB project due to in the pre-deployment the destination table (column) still doesn't exist but in post-deployment script the source table (column) is already absent.
To transform data from TableA to TableB we used the following idea (This approach can be used for any data modifications):
Developer adds destination table (dbo.TableB) into the DB project and deploys it onto the local DB (without committing to a SVN)
He or she creates a pre-deployment transformation script. The trick is that the script put the result data into a temporary table: #TableB
Developer deletes the dbo.TableA in the DB project. It is assumed that the table will be deleted during execution of the main generated script.
Developer writes a post-deployment script that copies data form #TableB to dbo.TableB that was just created by the main script.
All of the changes are committed into the SVN.
This way we don't need the pre-pre-deployment script due to we store the intermediate data in the temporary table.
I'd like to say that the approach that uses the pre-pre-deployment script had the same intermediate (temporary) data, however it is stored not in temporary tables but in real tables. It happens between pre-pre-deployment and pre-deployment. After execution of pre-deployment script this intermediate data disappears.
What is more, the approach with using temporary tables allows us to face the following complicated but real situation: Imagine that we have two transformations in our DB project:
TableA -> TableB
TableB -> TableC
Apart from that we have two databases:
DatabaeA that have the TableA
DatabaeB where the TableA was already transformed into the TableB. The TableA is absent in the DatabaseB.
Nonetheless we can deal this situation. We need just one new action in the pre-deployment. Before the transformation we try to copy data form the dbo.TableA into #TableA. And the transformation script works with temporary tables only.
Let me show you how this idea works in DatabaseA and DatabaseB.
It is assumed that the DB project has two couples of the pre and post deployment scripts: "TableA -> TableB" and "TableB -> TableC".
Below is the example of the scripts for "TableB -> TableC" transformation.
Pre-deployment script
----[The data preparation block]---
--We must prepare to possible transformation
--The condition should verufy the existance of necessary columns
IF OBJECT_ID('dbo.TableB') IS NOT NULL AND
OBJECT_ID('tempdb..#TableB') IS NULL
BEGIN
CREATE TABLE #TableB
(
[Id] INT NOT NULL PRIMARY KEY,
[Value1] VARCHAR(50) NULL,
[Value2] VARCHAR(50) NULL
)
INSERT INTO [#TableB]
SELECT [Id], [Value1], [Value2]
FROM dbo.TableB
END
----[The data transformation block]---
--The condition of the transformation start
--It is very important. It must be as strict as posible to ward off wrong executions.
--The condition should verufy the existance of necessary columns
--Note that the condition and the transformation must use the #TableA instead of dbo.TableA
IF OBJECT_ID('tempdb..#TableB') IS NOT NULL
BEGIN
CREATE TABLE [#TableC]
(
[Id] INT NOT NULL PRIMARY KEY,
[Value] VARCHAR(50) NULL
)
--Data transformation. The source and destimation tables must be temporary tables.
INSERT INTO [#TableC]
SELECT [Id], Value1 + ' '+ Value2 as Value
FROM [#TableB]
END
Post-deployment script
--Here must be a strict condition to ward of a failure
--Checking of the existance of fields is a good idea
IF OBJECT_ID('dbo.TableC') IS NOT NULL AND
OBJECT_ID('tempdb..#TableC') IS NOT NULL
BEGIN
INSERT INTO [TableC]
SELECT [Id], [Value]
FROM [#TableC]
END
In the DatabaseA the pre-deployment script has already created the #TableA. Therefore the data preparation block won't be executed due to there is no dbo.TableB in the database.
However the data transformation will be executed because there is the #TableA in the database that was created by the transformation block of the "TableA -> TableB".
In the DatabaseB the data preparation and transformation blocks for the "TableA -> TableB" script won't be executed. However we already have the the transformed data in the dbo.TableB. Hence the the data preparation and transformation blocks for the "TableB -> TableC" will be executed without any problem.
I use the below work around in such scenarios
If you would like to drop a table
Retain the table within the dacpac (Under Tables folder).
Create a post deployment script to drop the table.
If you would like to drop a column
Retain the column in the table definition within dacpac (Under Tables folder).
Create a post deployment script to drop the column.
This way you can drop tables and columns from your database and whenever you make the next deployment ( may be after few days or even months) exclude that table/columns from dacpac so that dacpac is updated with the latest schema.
Related
On the ETL server I have a DW user table.
On the prod OLTP server I have the sales database. I want to pull the sales only for users that are present in the user table on the ETL server.
Presently I am using an execute SQL task to fetch the DW users into a SSIS System.Object variable. Then using a for each loop to loop through each item (userid) in this variable and via a data flow task fetch the OLTP sales table for each user and dump it into the DW staging table. The for each is taking long time to run.
I want to be able to do an inner join so that the response is quicker, but I cant do this since they are on separate servers. Neither can I use a global temp table to make the inner join, for the same reason.
I tried to collect the DW users into a comma separated string variable and then using it (via string_split) to query into OLTP, but this is also taking more time at the pre-execute phase (not sure why exactly) even for small number of users.
I also am aware of lookup transform but that too will result in all oltp rows to be brought into the dw etl server to test the lookup condition.
Is there any alternate approach to be able to do an inner join by taking the list of users into the source?
Note: I do not have write permissions on the OLTP db.
Based on the comments, I think we can use a temporary table to solve this.
Can you help me understand this restriction? "Neither can I use a global temp table to make the inner join, for the same reason."
The restriction is since oltp server and dw server are separate so can't have global temp table common to both servers. Hope makes sense.
The general pattern we're going to do is
Execute SQL Task to create a temporary table on the OLTP server
A Data Flow task to populate the new temporary table. Source = DW. Destination = OLTP. Ensure Delay Validation = True
Modify existing Data Flow. Modify source to be a query that uses the temporary table i.e. SELECT S.* FROM oltp.sales AS S WHERE EXISTS (SELECT * FROM #SalesPerson AS SP WHERE SP.UserId = S.UserId); Ensure Delay Validation = True
A long form answer on using temporary tables (global to set the metadata, regular thereafter)
I don't use temp table in SSIS
Temporary tables, live in tempdb. Your OLTP and DW connection managers likely do not point to tempdb. To be able to reference a temporary table, local or global, in SSIS you need to either define an additional connection manager for the same server that points explicitly at tempdb so you can use the drop down in the source/destination components (technically accurate but dumb). Or, you use an SSIS Variable to hold the name of the table and use the ~From Variable~ named option in source/destination component (best option, maximum flexibility).
Soup to nuts example
I will use WideWorldImporters as my OLTP system and WideWorldImportersDW as my DW system.
One-time task
Open SQL Server Management Studio, SSMS, and connect to your OLTP system. Define a global temporary table with a unique name and the expected structure. Leave your connection open so the table structure remains intact during initial development.
I used the following statement.
DROP TABLE IF EXISTS #SO_70530036;
CREATE TABLE #SO_70530036(EmployeeId int NOT NULL);
Keep track of your query because we'll use it later on but as I advocate in my SSIS answers, perform the smallest task, test that it works and then go on to the next. It's the only way to debug.
Connection Managers
Define two OLE DB Connection Managers. WWI_DW uses points to the named instance DEV2019UTF8 and WWI_OLTP points to DEV2019EXPRESS. Right click on WWI_OLTP and select Properties. Find the property RetainSameConnection and flip that from the default of False to True. This ensures the same connection is used throughout the package. As temporary tables go out of scope when the connection goes away, closing and reopening a connection in a package will result in a fatal error.
These two databases on different instances so we can't cheat and directly comingle data.
Variables
Define 4 variables in SSIS, all of type String.
TempTableName - I used a value of ##SO_70530036 but use whatever value you specified in the One-time task section.
QuerySourceEmployees - This will be the query you run to generate the candidate set of data to go into the temporary table. I used SELECT TOP (3) E.[WWI Employee ID] AS EmployeeId FROM Dimension.Employee AS E WHERE E.[Is SalesPerson] = CAST(1 AS bit);
QueryDefineTables - Remember the drop/create statements from the on-time task? We're going to use the essence of them but use the expression builder to let us dynamically swap the table name. I clicked the ellipses, ..., on the Expression section and used the following "DROP TABLE IF EXISTS " + #[User::TempTableName] + "; CREATE TABLE " + #[User::TempTableName] + "( EmployeeId int NOT NULL);" You should be able to copy the Value from the row and paste it into SSMS to confirm it works.
QuerySales - This is the actual query you're going to use to pull your filtered set of sales data. Again, we'll use the Expression to allow us to dynamically reference the temporary table name. The prettified version of the expression would look something like
"SELECT
SI.InvoiceID
, SI.SalespersonPersonID
, SO.OrderID
, SOL.StockItemID
, SOL.Quantity
, SOL.OrderLineID
FROM
Sales.Invoices AS SI
INNER JOIN
Sales.Orders AS SO
ON SO.OrderID = SI.OrderID
INNER JOIN
Sales.OrderLines AS SOL
ON SO.OrderID = SOL.OrderID
WHERE
EXISTS (SELECT * FROM " + #[User::TempTableName] + " AS TT WHERE TT.EmployeeID = SI.SalespersonPersonID);"
Again, you should be able to pull the Value from the three queries and run them independently and verify they work.
Execute SQL Task
Add an Execute SQL task to the Control Flow. I named mine SQL Create temporary table My Connection Manager is WWI_OLTP and I changed the SQLSourceType to Variable and the SourceVariable is User::QueryDefineTables
Every time your package runs, the first thing it will do is establish create the temporary table. Which is good because SSIS is a metadata driven ETL engine and the next two steps would fail if the table didn't exist.
Data Flow Task - Prime the pump
This data flow is where we'll transfer DW data back to the OLTP system so can filter in the source system.
Drag a Data Flow Task onto the Control Flow. I named mine DFT Load Temp and before you click into it, right click on the Task and find the DelayValidation property and change this from the default of False to True. Normally, a package validates all metadata before actual execution begins as the idea is you want to know everything is good before any data starts moving. Since we're using temporary tables, we need to tell the execution engine "trust us, it'll be ready"
Double click inside the Data Flow Task.
Add an OLE DB Source. I named mine OLESRC SourceEmployees I use the connection manager WWI_DW. My data access mode changes to SQL command from variable and then I select my variable User::QuerySourceEmployees
Add an OLE DB Destination. I named mine OLEDST TempTableName and double clicked to configure it. The Connection Manager is WWI_OLTP and again, since the table lives in tempdb, we can't select it from the drop down. Change the Data access mode to Table name or view name variable - fast load and then select your variable name User::TempTableName. Click the Mapping tab and ensure source columns map to destination columns.
Data Flow Task - Transfer data
Finally, we will pull our source data, nicely filtered against the data from our target system.
Add an OLE DB Source. I named it OLESRC QuerySales. The Connection Manager is WWI_OLTP. Data access mode again changes to SQL command from variable and the variable name is User::QuerySales
From here, do whatever else you need to do to make the magic happen.
Instead of having 270k rows with an unfiltered query
I have 67k as there are only 3 employees in the temporary table.
Reference package
But wait, there's more!
Close out visual studio, open it back up and try to touch something in the data flows. Suddenly, there are red Xs everywhere! Any time you close a data flow component, it fires a revalidate metadata operation and guess what, it can't do that as the connection to the temporary table is gone.
The package will run fine, it will not throw VS_NEEDSNEWMETADATA but editing/maintenance becomes a pain.
If you switched from global temporary table to local, switch the table name variable's value back to a global and then run the define statement in SSMS. Once that's done, then you can continue editing the package.
I assure you, the local temporary table does work once you have the metadata set and you use queries via variables for source/destination.
No need for the global temporary table hack, or the SET FMTONLY OFF hack (which no longer works).
Just specify the result set metadata in the SQL query with WITH RESULT SETS. eg
EXEC ('
create table #t
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
insert into #t (Id, Name, Number)
select object_id, name, 12
from sys.objects
select * from #t
')
WITH RESULT SETS
(
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
)
If you need to parameterize the query, there's a bit of a catch because there are some limitations in how SSIS discovers parameters. SSIS runs sp_describe_undeclared_parameters, which doesn't really work with batches that call sp_executesql, because sp_executesql has a very unique way it handles parameters, one which you couldn't replicate with a user stored procedure.
So to parameterize the query you'll either need to pass the parameter values into the query using the "query from variable" and SSIS expressions, or push all this TSQL into a stored procedure.
ETL Script to dynamically map multiple execute sql resultset to multiple tables (table name based on sql file provided)
I have a source folder with sql files ( I can put it up as stored procedures as well ) . I know how to loop and execute sql tasks in a foreach container. Now the part where I'm stuck is I need to use the final result set of each sql queries and shove it into a table with the same name as the sql file.
So, Folder -> script1.sql , script2.sql etc -> ETL -> goes to table script1, table script2 etc.
EDIT : Based on the comment made by Joe, I just want to say that I'm aware of using insert within a script but I need to insert it onto a table in a different server.And Linked servers are not the ideal solutions
Any psuedocode or link to tutorials will be extremely helpful . Thanks!
I would add the table creation to the script. It is probably the simplest way to do this. If your script is Select SomeField From Table1, you could change it to Select SomeField Into Table script1 From Table1. Then there is no need to map in SSIS which is not easy to do from my experience.
We have a moderately-sized SSDT project (~100 tables) that's deployed to dozens of different database instances. As part of our build process we generate a .dacpac file and then when we're ready to upgrade a database we generate a publish script and run it against the database. Some db instances are upgraded at different times so it's important that we have a structured process for these upgrades and versioning.
Most of the generated migration script is dropping and (re)creating procs, functions, indexes and performing any structural changes, plus some data scripts included in a Post-Deployment script. It's these two data-related items I'd like to know how best to structure within the project:
Custom data migrations needed between versions
Static or reference data
Custom data migrations needed between versions
Sometimes we want to perform a one-off data migration as part of an upgrade and I'm not sure the best way to incorporate this into our SSDT project. For example, recently I added a new bit column dbo.Charge.HasComments to contain (redundant) derived data based on another table and will be kept in sync via triggers. An annoying but necessary performance improvement (only added after careful consideration & measurement). As part of the upgrade the SSDT-generated Publish script will contain the necessary ALTER TABLE and CREATE TRIGGER statements, but I also want to update this column based on data in another table:
update dbo.Charge
set HasComments = 1
where exists ( select *
from dbo.ChargeComment
where ChargeComment.ChargeId = Charge.ChargeId )
and HasComments = 0
What's the best way to include this data migration script in my SSDT project?
Currently I have each of these types of migrations in a separate file that's included in the Post-Deployment script, so my Post-Deployment script ends up looking like this:
-- data migrations
:r "data migration\Update dbo.Charge.HasComments if never populated.sql"
go
:r "data migration\Update some other new table or column.sql"
go
Is this the right way to do it, or is there some way to tie in with SSDT and its version tracking better, so those scripts aren't even run when the SSDT Publish is being run against a database that's already at a more recent version. I could have my own table for tracking which migrations have been run, but would prefer not to roll-my-own if there's a standard way of doing this stuff.
Static or reference data
Some of the database tables contain what we call static or reference data, e.g. list of possible timezones, setting types, currencies, various 'type' tables etc. Currently we populate these by having a separate script for each table that is run as part of the Post-Deployment script. Each static data script inserts all the 'correct' static data into a table variable and then inserts/updates/deletes the static data table as needed. Depending on the table it might be appropriate only to insert or only insert and delete but not to update existing records. So each script looks something like this:
-- table listing all the correct static data
declare #working_data table (...)
-- add all the static data that should exist into the working table
insert into #working_data (...) select null, null null where 1=0
union all select 'row1 col1 value', 'col2 value', etc...
union all select 'row2 col1 value', 'col2 value', etc...
...
-- insert any missing records in the live table
insert into staticDataTableX (...)
select * from #working_data
where not exists ( select * from staticDataTableX
where [... primary key join on #working_data...] )
-- update any columns that should be updated
update staticDataTableX
set ...
from staticDataTableX
inner join #working_data on [... primary key join on #working_data...]
-- delete any records, if appropriate with this sort of static data
delete from staticDataTableX
where not exists ( select * from staticDataTableX
where [... primary key join on #working_data...] )
and then my Post-Deployment script has a section like this:
-- static data. each script adds any missing static/reference data:
:r "static_data\settings.sql"
go
:r "static_data\other_static_data.sql"
go
:r "static_data\more_static_data.sql"
go
Is there a better or more conventional way to structure such static data scripts as part of an SSDT project?
To track whether or not the field has already been initialized, try adding an Extended Property when the initialize is performed (it can also be used to determine the need for the initialize):
To add the extended property:
EXEC sys.sp_addextendedproperty
#name = N'EP_Charge_HasComments',
#value = N'Initialized',
#level0type = N'SCHEMA', #level0name = dbo,
#level1type = N'TABLE', #level1name = Charge,
#level2type = N'COLUMN', #level2name = HasComments;
To check for the extended property:
SELECT objtype, objname, name, value
FROM fn_listextendedproperty (NULL,
'SCHEMA', 'dbo',
'TABLE', 'Charge',
'COLUMN', 'HasComments');
For reference data, try using a MERGE. It's MUCH cleaner than the triple-set of queries you're using.
MERGE INTO staticDataTableX AS Target
USING (
VALUES
('row1_UniqueID', 'row1_col1_value', 'col2_value'),
('row2_UniqueID', 'row2_col1_value', 'col2_value'),
('row3_UniqueID', 'row3_col1_value', 'col2_value'),
('row4_UniqueID', 'row4_col1_value', 'col2_value')
) AS Source (TableXID, col1, col2)
ON Target.TableXID = Source.TableXID
WHEN MATCHED THEN
UPDATE SET
Target.col1 = Source.col1,
Target.col2 = Source.col2
WHEN NOT MATCHED BY TARGET THEN
INSERT (TableXID, col1, col2)
VALUES (Source.TableXID, Source.col1, Source.col2)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
How can I generate script instead of manually writing
if exists (select ... where id = 1)
insert ...
else
update ...
Very boring to do that with many records!
Using management studio to generate script 'Data only' generates only inserts. So running that against existing db gives error on primary keys.
For SQL 2008 onwards you could start using Merge statements along with a CTE
A simple example for a typical id/description lookup table
WITH stuffToPopulate(Id, Description)
AS
(
SELECT 1, 'Foo'
UNION SELECT 2, 'Bar'
UNION SELECT 3, 'Baz'
)
MERGE Your.TableName AS target
USING stuffToPopulate as source
ON (target.Id = source.Id)
WHEN MATCHED THEN
UPDATE SET Description=source.Description
WHEN NOT MATCHED THEN
INSERT (Id, Description)
VALUES (source.Id, source.Description);
Merge statements have a bunch of other functionality that is useful (such as NOT MATCHED BY DESTINATION, NOT MATCHED BY SOURCE). The docs (linked above) will give you much more info.
MERGE is one of the most effective methods to do this.
However, writing a Merge statement is not very intuitive at the beginning, and generating lines for many rows or many tables is a time-consuming process.
I'd suggest using one of the tools to simplify this challenge:
Data Script Writer (Desktop Application for Windows)
Generate SQL Merge (T-SQL Stored Procedure)
I wrote a blog post about these tools recently and approach to leveraging SSDT for deployment database with data. Find out more:
Script and deploy the data for database from SSDT project
I hope this can help.
I want to update a static table on my local development database with current values from our server (accessed on a different network/domain via VPN). Using the Data Import/Export wizard would be my method of choice, however I typically run into one of two issues:
I get primary key violation errors and the whole thing quits. This is because it's trying to insert rows that I already have.
If I set the "delete from target" option in the wizard, I get foreign key violation errors because there are rows in other tables that are referencing the values.
What I want is the correct set of options that means the Import/Export wizard will update rows that exist and insert rows that do not (based on primary key or by asking me which columns to use as the key).
How can I make this work? This is on SQL Server 2005 and 2008 (I'm sure it used to work okay on the SQL Server 2000 DTS wizard, too).
I'm not sure you can do this in management studio. I have had some good experiences with
RedGate SQL Data Compare in synchronising databases, but you do have to pay for it.
The SQL Server Database Publishing Wizard can export a set of sql insert scripts for the table that you are interested in. Just tell it to export just data and not schema. It'll also create the necessary drop statements.
One option is to download the data to a new table, then use commands similar to the following to update the target:
update target set
col1 = d.col1,
col2 = d.col2
from downloaded d
inner join target t on d.pk = t.pk
insert into target (col1, col2, ...)
select (d.col1, d.col2, ...) from downloaded d
where d.pk not in (select pk from target)
If you disable the FK constrains during the 2nd option - and resume them after finsih - it will work.
But if you are using identity to create pk that are involves in the FK - it will cause a problem, so it works only if the pk values remains the same.