SSIS Deadlock with a Slowly Changing Dimension

SSIS Deadlock with a Slowly Changing Dimension - sql-server

I am running an SSIS package that contains many (7) reads from a single flat file uploaded from an external source. There is consistently a deadlock in every environment(Test, Pre-Production, and Production) on one of the data flows that uses a Slowly Changing Dimension to update an existing SQL table with both new and changed rows.
I have three groups coming off the SCD:
-Inferred Member Updates Output goes directly to an OLE DB Update command.
-Historical Attribute goes to a derived column boxed that sets a delete date and then goes to an update OLE DB command, then goes to a union box where it unions with the last group New Output.
-New Output goes into a union box along with the Historical output then to a derived column box that adds an update/create date, then inserts the values into the same SQL table as the Inferred Member Output DB Command.
The only error I am getting in my log looks like this:
"Transaction (Process ID 170) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
I could put the (NOLOCK) statement into the OLE db commands, but I have read that this isn't the way to go.
I am using SQL Server 2012 Data Tools to investigate and edit the Package, but I am unsure where to go from here to find the issue.
I want to get out there that i am a novice in terms of SSIS programming... with that out of the way... Any help would be greatly appreciated, even if it is just pointing me to a place I haven't looked for help.

Adding index on the WHERE condition column may resolve your issue. After adding index on the column, transactions will executes in faster way which reduce the chances of deadlock.

Related

Can we use ADF Lookup activity perform INSERT operation on SNOWFLAKE table

I have created new dataset using snowflake connector and used the same as source dataset in lookup activity.
Then I am trying to INSERT the record into snowflake using following query.
'INSERT INTO SAMPLE_TABLE VALUES('TEST',1,1,CURRENT_TIMESTAMP,'TEST'-- (all values are passed)
Result: The row getting inserted into snowflake but my pipeline got failed stating the below error.
Failure happened on 'Source' side. ErrorCode=UserErrorOdbcInvalidQueryString,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The following ODBC Query is not valid: 'INSERT INTO SAMPLE_TABLE VALUES('TEST',1,1,CURRENT_TIMESTAMP,'TEST');'
Could you please share you advise or anylead to solve this problem.
Thanks.
Rajesh

Lookup, as the name suggests, is for searching and retrieving data, not for inserting. However, you can enclose your INSERT code in a procedure and execute it using the Lookup activity.
However, I strongly do not recommend such an action, remember that when inserting data into Snowflake, you create at least one micro-partition with a size of 16MB, if you insert one line at a time, the performance will be terrible and the data will take up a disproportionate amount of space. Remember Snowlfake is not a transaction database (! OLTP).
Instead, it's better to save all the records in an intermediate file and then import the entire file in one move.

You can use the lookup activity to perform operations other than selects, it just HAS to have an output. I've gotten around it with a postgres database doing create tables, truncates, one off inserts by just concatenating a
select current_date;
after the main query.
Note, the sql script activity will definitely be better for this, we are waiting on postgres support in that though.

Transforming SQL process into SSIS package

I have a sql query that has more than 200 lines of codes with the following steps. I need to run this everyday and generate Table A. I have a new requirement to make a SSIS package with the same process and create the tableA with ssis . The below details are the current SQL process
drop table_A
select into table_A from (select tableB union all select tableC union all TableD)
key fators : table_B, table_C, table_D - I need to pull 20 columns out of 40 columns from these three tables. The columns names vary and I need to
rename and standardisse the column names and certain data type so that it goes as unique column in Table_A.
This is already set up in sql query, but I need to know whats the best practise and how to transform them into SSIS ? Should I use "Execute SQL Task" process flow
or use data flow task involving oledb source and oledb destination ?

Execute SQL Task is what you're going to want. The Execute SQL Task is designed to run an arbitrary query that may or may not return a result set. You've already done the hard work of getting your code working correctly so all you need to do is define a Connection Manager (likely an OLE DB) and paste in your code.
In this case, SSIS is going to be nothing more than a coordinator/execution framework for your existing SQL Process. And that's perfectly acceptable as someone who's written more than a few SSIS packages.
A Data Flow Task, I find, is more appropriate when you need to move the data from tables B, C, and D into a remote database or you need to perform transformation logic on them that isn't easily done in TSQL.
A Data Flow Task will not support creating the table at run-time. All SSIS tasks perform a validation check - either on package start or it can be delayed until the specific task begins. One of the checks a Data Flow Task is going to perform is "does the target table exist (and does the structure match my cached copy)?"

Ignore duplicate records in SSIS' OLE DB destination

I'm using a OLE DB Destination to populate a table with value from a webservice.
The package will be scheduled to run in the early AM for the prior day's activity. However, if this fails, the package can be executed manually.
My concern is if the operator chooses a date range that over-laps existing data, the whole package will fail (verified).
I would like it:
INSERT the missing values (works as expected if no duplicates)
ignore the duplicates; not cause the package to fail; raise an exception that can be captured by the windows application log (logged as a warning)
collect the number of successfully-inserted records and number of duplicates
If it matters, I'm using Data access mode = Table or view - fast load and
Suggestions on how to achieve this are appreciated.

That's not a feature.
If you don't want error (duplicates), then you need to defend against it - much as you'd do in your favorite language. Instead of relying on error handling, you test for the existence of the error inducing thing (Lookup Transform to identify existence of row in destination) and then filter the duplicates out (Redirect No Match Output).
The technical solution you absolutely should not implement
Change the access mode from the "Table or View Name - Fast Load" to "Table or View Name". This changes the method of insert from a bulk/set-based operation to singleton inserts. By inserting one row at a time, this will allow the SSIS package to evaluate the success/fail of each row's save. You then need to go into the advanced editor, your screenshot, and change the Error disposition from Fail Component to Ignore Failure
This solution should not used as it yields poor performance, generates unnecessary work load and has the potential to mask other save errors beyond just "duplicates" - referential integrity violations for example

Here's how I would do it:
Point your SSIS Destination to a staging table that will be empty
when the package is run.
Insert all rows into the staging table.
Run a stored procedure that uses SQL to import records from the
staging table to the final destination table, WHERE the records don't
already exist in the destination table.
Collect the desired meta-data and do whatever you want with it.
Empty the staging table for the next use.
(Those last 3 steps would all be done in the same stored procedure).

How to control which rows were sent via SSIS

I'm trying to create SSIS package which will periodically send data to other database. I want to send only new records(I need to keep sent records) so I created status column in my source table.
I want my package to update this column after successfuly sending data, but I can't update all rows wih "unsent" status because during package execution some rows may have been added, and I also can't use transactions(I mean on isolation levels that would solve my problem: I can't use Serializable beacause i musn't prevent users from adding new rows, and Sequence Container doesn't support Snapshot).
My next idea was to use recordset and after sending data to other db use it to get ids of sent rows, but I couldn't find a way to use it as datasource.
I don't think I should set status "to send" and then update it to "sent", I believe it would be to costly.
Now I'm thinking about using temporary table, but I'm not convinced that this is the right way to do it, am I missing something?

Record Set is a destination. You cannot use it in Data Flow task.
But since the data is saved to a variable, it is available in the Control flow.
After completing the DataFlow, come to the control flow and create a foreach component that can run on the ResultSet varialbe.
Read each Record Set value into a variable and use it to run an update query.
Also, see if "Lookup Transform" can be useful to you. You can generate rows that match or doesn't match.
I will improve the answer based on discussions

What you have here is a very typical data mirroring problem. To start with, I would not simply have a boolean that signifies that a record was "sent" to the destination (mirror) database. At the very least, I would put a LastUpdated datetime column in the source table, and have triggers on that table, on insert and update, that put the system date into that column. Then, every day I would execute an SSIS package that reads the records updated in the last week, checks to see if those records exist in the destination, splitting the datastream into records already existing and records that do not exist in the destination. For those that do exist, if the LastUpdated in the destination is less than the LastUpdated in the source, then update them with the values from the source. For those that do not exist in the destination, insert the record from the source.
It gets a little more interesting if you also have to deal with record deletions.
I know it may seem wasteful to read and check a week's worth, every day, but your database should hardly feel it, it provides a lot of good double checking, and saves you a lot of headaches by providing a simple, error tolerant algorithm. Some record does not get transferred because of some hiccup on the network, no worries, it gets picked up the next day.
I would still set up the SSIS package as a server task that sends me an email with any errors, so that I can keep track. Most days, you get no errors, and when there are errors, you can wait a day or resolve the cause and let the next days run pick up the problems.

I am doing a similar thing, in my case, I have a status on the source record.
I read in all records with a status of new.
Then use a OLE DB Command to execute SQL on each row, changing
the status to "In progress"(in you where, enter a ? as the value in
the Component Property tab, and you can configure it as a parameter
from the table row like an ID or some pk in the Column Mappings
tab).
Once the records are processed, you can change all "In Progress"
records to "Success" or something similar using another OLE DB
Command.
Depending on what you are doing, you can use the status to mark records that errored at some point, and require further attention.

Update process CUBE

I am using the SSIS Analysis task to process CUBE using the update only process.
The source records is a transactional TYPE 1, so when we get an update records to exisitng records in the source will delete and insert the new records, here Update process cube failing due to missing key at source(which is due to deleting the source records on updated source)
source table records cube process records
1000 1000
deleted 5 and inserted 5 new 5 and key error while processing the cube.
we could use FULL PROCESS to avoid this problem but having performance hit,
How to do this process using the update process only.
thanks
prav

A quickfix could be done in analysis services deployment
Select Process>Change Settings…> Dimension key errors
Choose Use custom error configuration
In Stop on Error, ensure that On Error Action is set to "Stop Logging".
In Specific error conditions make sure that "Report and Continue" is selected.
Click OK and the cube will be reprocessed ignoring this error.

If there are deleted records in the source table, you have to process the cube/measure group/partition using full mode.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight