SSIS - Error Output - Redirect row - sql-server

I've got a question about the result I'm getting with the execution of a task in SSIS.
First of all, this query is been executing from Access. The original source is a set of table in Oracle and the Destination is a local table in Access. This table has a composite primary key. When I execute the query from access as a result I'm getting over one million registers, but before insert this result in the table, Access is showing me a message where it informs that 26 registers violate the primary key constraint (they are repeated). So they are not taken into account.
I have created the destination table in SQL SERVER with the same primary key, I am using the same source used in Access (the same query), but when the data flow begins to work, immediately more than 200.000 register are being redirecting as a error output. And, of course, I was waiting the same result seen in Access, only 26 registers taken as an error.
These are the message from Access:
This is my configuration for SSIS, and its result:
Result
I tried to explain this doubt as clear as possible, but English is not my mother tongue.
If you need clarify something about , please ask me.
Regards.

I'll make the assumption that you're using the default configuration for the OLEDB Destination. This means that the Rows per batch is empty (-1) and a Maximum insert commit size of 2147483647.
Rows per batch
Specify the number of rows in a batch. The default value of this
property is –1, which indicates that no value has been assigned.
Maximum insert commit size
Specify the batch size that the OLE DB destination tries to commit
during fast load operations. The value of 0 indicates that all data is
committed in a single batch after all rows have been processed.
If the rows are offered to the OLEDB Destination in batches of 200.000 all those rows will be inserted in one batch/transaction. If the batch contains one error then the whole batch will fail.
Changes to Rows per batch to 1 will solve this problem but will have a performance impact since it has to insert each row separately.

Related

Destination table becomes truncated at start of running Kettle script

I have a kettle script that reads from Table A, parses the data then sends them to Table 1 and Table 2. From the whole kettle script, I disabled the branch that populates Table 2 and ran the script; from this, Table 1 is populated. After this I did the other way around to populate the other table (Table2). That is, I disabled the branch that populates Table 1. When the script was running, I noticed that Table1 is being truncated while Table2 is being populated. After the whole migration script has finished, both tables are populated.
I also noticed this 'Truncate Table' flag in the destination table. I just don't understand why the truncation is necessary given that I disabled the branch that runs it. Any explanations for this?
The truncation happens when the step is initialized. Regardless of the incoming hop being enabled or disabled, the truncation will always happen. Same happens in steps like Text file output, where a 0 byte file is created when the transformation starts.

SSIS Deadlock with a Slowly Changing Dimension

I am running an SSIS package that contains many (7) reads from a single flat file uploaded from an external source. There is consistently a deadlock in every environment(Test, Pre-Production, and Production) on one of the data flows that uses a Slowly Changing Dimension to update an existing SQL table with both new and changed rows.
I have three groups coming off the SCD:
-Inferred Member Updates Output goes directly to an OLE DB Update command.
-Historical Attribute goes to a derived column boxed that sets a delete date and then goes to an update OLE DB command, then goes to a union box where it unions with the last group New Output.
-New Output goes into a union box along with the Historical output then to a derived column box that adds an update/create date, then inserts the values into the same SQL table as the Inferred Member Output DB Command.
The only error I am getting in my log looks like this:
"Transaction (Process ID 170) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction."
I could put the (NOLOCK) statement into the OLE db commands, but I have read that this isn't the way to go.
I am using SQL Server 2012 Data Tools to investigate and edit the Package, but I am unsure where to go from here to find the issue.
I want to get out there that i am a novice in terms of SSIS programming... with that out of the way... Any help would be greatly appreciated, even if it is just pointing me to a place I haven't looked for help.
Adding index on the WHERE condition column may resolve your issue. After adding index on the column, transactions will executes in faster way which reduce the chances of deadlock.

Clustered Column Store Index gets created in the beginning and vanishes once the job is completed

I have an existing application which has many SQL Server stored procedures that run as below.These stored procs are applied on a data file and compute is done as per some business rules.
1) Pre-process
2) Process
3) Post-Process
In Pre-process, we are creating 'n' no. of tables with clustered column store index in place. When the job kicks off the tables get created with clustered column store index but the indexes vanish once the job is completed. ( This happens only for a large input data file. )
When I run the job on a small data file the clustered column store index gets created on the tables and it exists even after the completion of job.
Note :- The code is the same when i executed it for both small and large data files.
Can somebody share your thoughts on this if you have encountered similar problem?
Two things will cause an already fully established Index to 'vanish' from a table:
A process or user deletes it.
The transaction in which the index was created is rolled back, either because an exception was raised later in the transaction, the transaction wasn't recoverable, or via an explicit Rollback.
And that's it. You're answer lies in one of the two above.
I know this is not the answer you were looking for, it is however guaranteed to be THE answer. Somewhere your code is failing and that's why the indexes are now vanishing.
Sql Server isn't a slapdash RDBMS - if it just arbitrarily just randomly dropped indexes then you know we'd be all over it. By your own admission you have complicated code.
Our DataWarehouse routinely drops and rebuilds indexes of all sorts - the only times it's 'missing' them has been the result of a bug in our code.

Ignore duplicate records in SSIS' OLE DB destination

I'm using a OLE DB Destination to populate a table with value from a webservice.
The package will be scheduled to run in the early AM for the prior day's activity. However, if this fails, the package can be executed manually.
My concern is if the operator chooses a date range that over-laps existing data, the whole package will fail (verified).
I would like it:
INSERT the missing values (works as expected if no duplicates)
ignore the duplicates; not cause the package to fail; raise an exception that can be captured by the windows application log (logged as a warning)
collect the number of successfully-inserted records and number of duplicates
If it matters, I'm using Data access mode = Table or view - fast load and
Suggestions on how to achieve this are appreciated.
That's not a feature.
If you don't want error (duplicates), then you need to defend against it - much as you'd do in your favorite language. Instead of relying on error handling, you test for the existence of the error inducing thing (Lookup Transform to identify existence of row in destination) and then filter the duplicates out (Redirect No Match Output).
The technical solution you absolutely should not implement
Change the access mode from the "Table or View Name - Fast Load" to "Table or View Name". This changes the method of insert from a bulk/set-based operation to singleton inserts. By inserting one row at a time, this will allow the SSIS package to evaluate the success/fail of each row's save. You then need to go into the advanced editor, your screenshot, and change the Error disposition from Fail Component to Ignore Failure
This solution should not used as it yields poor performance, generates unnecessary work load and has the potential to mask other save errors beyond just "duplicates" - referential integrity violations for example
Here's how I would do it:
Point your SSIS Destination to a staging table that will be empty
when the package is run.
Insert all rows into the staging table.
Run a stored procedure that uses SQL to import records from the
staging table to the final destination table, WHERE the records don't
already exist in the destination table.
Collect the desired meta-data and do whatever you want with it.
Empty the staging table for the next use.
(Those last 3 steps would all be done in the same stored procedure).

sql server table fast load isn't

I've inherited an SSIS package which loads 500K rows (about 30 columns) into a staging table.
It's been cooking now for about 120 minutes and it's not done --- this suggests it's running at less than 70 rows per second. I know that everybody's environment is different but I think this is a couple orders of magnitude off from "typical".
Oddly enough the staging table has a PK constraint on an INT (identity) column -- and now I'm thinking that it may be hampering the load performance. There are no other constraints, indexes, or triggers on the staging table.
Any suggestions?
---- Additional information ------
The source is a tab delimited file which connects to two separate Data Flow Components that add some static data (the run date, and batch ID) to the stream, which then connects to an OLE DB Destination Adapter
Access mode is OpenRowset using FastLoad
FastLoadOptions are TABLOCK,CHECK_CONSTRAINTS
Maximum insert commit size: 0
I’m not sure about the etiquette of answering my own question -- so sorry in advance if this is better suited for a comment.
The issue was the datatype of the input columns from the text file: They were all declared as “text stream [DT_TEXT]” and when I changed that to “String [DT_STR]” 2 million rows loaded in 58 seconds which is now in the realm of “typical” -- I'm not sure what the Text file source is doing when columns are declared that way, but it's behind me now!
I'd say there is a problem of some sort, I bulk insert a staging table from a file with 20 million records and more columns and an identity field in far less time than that and SSIS is supposed to be faster than SQL Server 2000 bulk insert.
Have you checked for blocking issues?
If it is running in one big transaction, that may explain things. Make sure that a commit is done every now and then.
You may also want to check processor load, memory and IO to rule out resource issues.
This is hard to say.
I there was complex ETL, I would check the max number of threads allowed in the data flows, see if some things can run in parallel.
But it sounds like it's a simple transfer.
With 500,000 rows, batching is an option, but I wouldn't think it necessary for that few rows.
The PK identity should not be an issue. Do you have any complex constraints or persisted calculated columns on the destination?
Is this pulling or pushing over a slow network link? Is it pulling or pushing from a complex SP or view? What is the data source?

Resources