INSERT/UPDATE conditional split in SSIS working incorrectly - sql-server

I looked for an answer for my specific issue before posting. Didn't find anything. I have a conditional split in SSIS that is inserting and updating, except that it seems to be updating 250+ rows each time it runs, whether an update was made to the source or not. Insert works correctly. But it only works when I "ignore error" on the conditional split, otherwise the split " evaluated to NULL, but the "Conditional Split" requires a Boolean results " error shows up. Any idea on how I can fix this? My conditional split looks like this:
UPDATE = [Copy of ORDER_TYPE] != ORDER_TYPE || [Copy of WEEK] != WEEK || [Copy of GOAL] != GOAL || [Copy of WEEK_START] != WEEK_START || [Copy of WEEK_END] != WEEK_END || [Copy of DIVISION_DESC] != DIVISION_DESC || [Copy of SUB_ORDER_TYPE] != SUB_ORDER_TYPE
INSERT = ISNULL(ID) || ISNULL(WEEK) || ORDER_TYPE == ""
I followed this tutorial.

In a situation like this, it's impossible for us to debug what is happening as we don't have access to your data, your package and the results of the above boolean conditions.
What I do, when faced with a problem like this, is to add one, possibly two, Derived Column task before the Conditional Split. The first I'd called DER Action Flags as we'll generate the boolean conditions for the action we should take.
Add a column IsInsert and IsUpdate and the use the above expressions. Now connect your derived column to the Conditional Split and replace the two expressions to just use our new derived columns. Add a data viewer immediately before the split and you can verify whether your logic is sound.
Given the length of your UPDATE expression, I would break that into individual column evaluations in a derived column before DER Action Flags I described above. Call it something like DER Compute Changed Flags to indicate we're computing the whether the column has changed.
In this Derived Column component, you'll break out each column change check i.e.
Changed_ORDER_TYPE [Copy of ORDER_TYPE] != ORDER_TYPE
Changed_WEEK [Copy of WEEK] != WEEK
That then simplifies the IsUpdate logic to Changed_ORDER_TYPE || Changed_WEEK...
Now that data viewer will show you the exact condition that is resulting in the Change to be erroneously flagged. That distills the problem down to these two inputs and this expression are not evaluating as expected (and that is something we can figure out)
Based on your "ignore error" comment, I assume you have a condition with NULL comparison that might not be covered by the referenced link.
And since this is a series of Comments converted to an answer,
That worked. The data viewer showed that one of my input columns was converting as float, while the table datatype was int, so the values with decimals are different than those in the table, as they get converted as int. fixed.
Future readers, verify your data types (double click the connector lines out of a component and select MetaData) a consistent as data conversion rules might surprise you in unexpected ways.

Related

SSIS Derived Column - Text in Numeric Field is not converting

I'm importing thousands of csv files into an SQL DB. They each have two columns: Date and Value. In some of the files, the value column contains simply a period (ex: "."). I've tried to create a derived column that will handle any cell that contains a period with the following code:
FINDSTRING((DT_WSTR,1)[VALUE],".",1) != 0 ? NULL(DT_R8) : [VALUE]
But, when the package runs it gets the following error when it reaches the cell with the period in it:
The data conversion for column "VALUE" returned status value 2 and status text
"The value could not be converted because of a potential loss of data".
I'm guessing there might be an escape character that I'm missing in my FINDSTRING function but I can't seem to find what that may be. Does anyone have any thoughts on how I can get around this issue?
Trying to debug things like this is why I always advocate adding many Derived Columns to the Data Flow. It's impossible to debug that entire expression. Instead, first find the position of the period and add that as a new column. Then you can feed that into the ternary operation and bit by bit you can add data viewers to ensure you are seeing what you expect to see.
Personally, I'd take a different approach. It seems that you'd like to make any columns that are . into a null of type DT_R8.
Add a derived column, TrimmedValue and use this expression to remove any leading/trailing whitespace and then
RTRIM(LTRIM(Value))
Add a second derived column component, this time we'll add column MenopausalValue as it will remove the period. Use this expression
(TrimmmedValue == ".") ? Trimmedvalue : NULL(DT_WSTR, 50)
Now, you can add your final Derived Column wherein we convert the string representation of Value to the floating point representation.
IsNull(MenopausalValue) ? NULL(DT_R8) : (DT_R8) MenopausalValue
If the above shows an error, then you need to apply the following version as I can never remember the evaluation sequence for ternary operations that change type.
(DT_R8) (IsNull(MenopausalValue) ? NULL(DT_R8) : (DT_R8) MenopausalValue)
Examples of breaking these operations into many steps for debugging purposes
https://stackoverflow.com/a/15176398/181965
https://stackoverflow.com/a/31123797/181965
https://stackoverflow.com/a/33023858/181965
You can do it like this:
TRIM(Value) == "." ? NULL(DT_R8) : (DT_R8)Value

Redirect NULL or blank values from Flat File

I am importing records from a flat file source to a SQL table which has 4 columns which do not accept NULL values. And what I would like to do is redirect the records which contain a NULL or blank value for the particular 4 fields to a flat file destination.
Below you can see the table configuration:
And here is a sample from my flat file source where I have blanked out the county_code in the first record, the UCN in the second record, and the action_id in the third.
If I run my package as it is currently configured, it errors out due to the constraints:
The column status returned was: "The value violated the integrity constraints for the column.".
So my question is how to I redirect these rows? I think I should do a conditional split, but I am not certain and further I don't know how I would configure that as well. My attempts have been futile so far.
Any suggestions?
Add a Derived Column Transformation after your Flat File Source. There you'll test whether the not nullable columns are null.
For ease of debugging, I would add a flag for each of those columns in question.
null_timestamp (ISNULL(timestamp) || LEN(RTRIM(timestamp)) == 0) ? true : false
An expression like this will determine whether the column from flat file is null or whether the trimmed length is zero.
Once you have your flags tested, then you'd add in a Conditional Split. The conditional split routes rows based on a boolean expression. I would add a Bad Data output to it and use an expression like
null_timestamp || null_country_code || null_etc
Because they are boolean, if we OR the values together if any of those were to be true, then the whole expression becomes true and rows are routed to the bad data path.
I would simply add a conditional split and name the Output accordingly:
Could you load the data to a temp table first, then using 2 separate queries against the temp table either insert to your table, or write out to flat file?

Insert/transform to 0 or 1 based on presence of text

I'd like to insert to a table's bit column either 0 or 1, 0 where there is no text, 1 where there is. I am pulling data through an excel source in SSIS. In the example below, it should go:
Client 1: 1 1 1
Client 2: 1 0 1
How it looks in excel:
Products: Ball Bicycle Bat
Client 1: Ball Bicycle Bat
Client 2: Ball Bat
Client 3: Ball
Client 4: Bicycle Bat
Any way to achieve this in SSIS?
You'll need to use a few items to accomplish this.
How do I add new columns to a data flow task?
A Derived Column will allow you to create these new boolean columns. A derived column uses an Expression to compute a value. You'll need to specify the column name when you create a new one. You'll also want to ensure the data types are as expected.
What's my expression?
The first is whether the no text comes in as NULL, an empty string or a padded string. Knowing whether the current row satisfies one of those conditions will tell us whether we need to make our new column 1 or 0.
A value is either NULL or NOT NULL. If it's not null, then we need to worry about whether it's empty/padded. We test for NULL via the ISNULL expression.
A boolean OR is expressed as ||
An equality test is done via ==
We check for an empty string using the LEN expression.
We need to remove all the trailing spaces from our column to see if it reduces down to an empty string. We'll use RTRIM to accomplish this.
An Expression must evaluate to something, but we have conditional (if) logic. The way we work around that is the ternary operator (boolean test) ? (true expr) : (false expr)
The SSIS system data type for boolean is DT_BOOL
A cast is performed by (DT_TYPE) VALUE
Putting it all together
Add a derived column component to your data flow. Make the column name Col1HasText
Line breaks added for readability
(isnull([Col1]) || LEN(RTRIM([Col1])) == 0) ?
((DT_BOOL) 0) :
((DT_BOOL) 1)
English version: If Col1 is null or if the length of the right-trimmed version of Col1 is zero, then we want to cast one to a data type of DT_BOOL (boolean). Otherwise, cast zero to a data type of DT_BOOL

SSIS Excel error column

I am importing an excel file and grabbing data from it. In a handful of string columns, some rows have the value of #VALUE!. How would I filter these rows out in SSIS? I have tried checking for NULL values, as well as checking to see if that row's column is equal to #VALUE!
([ALT_TYPENAME] == "#VALUE!")
However, the rows pass right through and are not filtered at all. Is it even possible to filter for these? The columns are imported as DT_STR.
Ok, you need to change the order in your conditional split. First you are checking if ISNULL == True, then ISNULL == False.
One of those two conditions will always be true, so the row will be sent down that path, and the third condition ( == "#VALUE!") will never be evaluated.
Try evaluating your last condition first.
You can do this by using a Conditional Split transform in between your Excel source and your destination.
Create an object variable (I name mine Discard) and a Recordset Destination based on that variable. Set your Conditional Split's condition to Column == "#VALUE!" and direct anything that meets that criteria to the Recordset to discard it, while everything else follows the default path to your Destination.
If you need to discard based on multiple columns potentially containing "#VALUE!" just expand the condition to an OR that encompasses all of the columns.
An added benefit of this technique is you can use the Discard Recordset at the end of the job to create a fall out report if you need one.

SSIS Flat File with Numeric DataType with Spaces

I'm using SSIS to load a fixed length Flat File into SQL.
I have a weight field that has been giving me trouble all day.
It has a length of 8 with 6 DECIMAL POSITIONS IMPLIED (99V990099).
The problem i'm having is when it isn't populated and has 8 spaces.
Everything i try gets an error:
"Invalid character value for cast specification"."
OR
"Conversion failed because the data value overflowed the specified type.".
OR
Data conversion failed.
The data conversion for column "REL_WEIGHT" returned status value 2 and status text
"The value could not be converted because of a potential loss of data.".
I've tried declaring it as DT_String & DT_Numeric.
I've tried many variations of:
TRIM([REL_WEIGHT])=="" ? (DT_STR,8,1252)NULL(DT_STR,8,1252) : REL_WEIGHT
ISNULL([REL_WEIGHT]) || TRIM([REL_WEIGHT]) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)[REL_WEIGHT]
TRIM(REL_WEIGHT) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)REL_WEIGHT
But nothing seems to work.
Please someone out there have the fix for this!
I think you may be running afoul of the following point, explained nicely at http://vsteamsystemcentral.com/cs21/blogs/applied_business_intelligence/archive/2009/02/01/ssis-expression-language-and-the-derived-column-transformation.aspx:
You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.
I've solved this on past occasions by loading the data from the flat file as strings, and then deriving a new column in a Derived Column transformation which is of numeric type. You can then perform the appropriate trimming, validation, casting, etc. in the SSIS expression for that new column in the transformation.
Here, I found an example SSIS expression I used at one point to derive a time value from a 4-digit string:
(ISNULL(Last_Update_Time__orig) || TRIM(Last_Update_Time__orig) == "") ? NULL(DT_DBTIME2,0) : (DT_DBTIME2,0)(SUBSTRING(TRIM(Last_Update_Time__orig),1,2)+":"+SUBSTRING(TRIM(Last_Update_Time__orig),3,2)+":00")
There has to be a better way to do it, But i found a way that works.
Create a Derived Column Expression:
TRIM(REL_WEIGHT) == "" ? (DT_STR,9,1252)"0.0000000" : (DT_STR,9,1252)(LEFT(REL_WEIGHT,2) + "." + RIGHT(REL_WEIGHT,6))
THEN Create a Data Conversion Task to change it to Numeric and set scale to 6.
And then Map the [Copy of NewField] to my SQL table field set up as Decimal(8,6).
I don't know how the performance will be of that when loading a million records, probably not the best. If someone knows how to do this in a better way performance wise please let me know.
Thanks,
Jeff

Resources