SSIS Derived Column - Text in Numeric Field is not converting - sql-server

I'm importing thousands of csv files into an SQL DB. They each have two columns: Date and Value. In some of the files, the value column contains simply a period (ex: "."). I've tried to create a derived column that will handle any cell that contains a period with the following code:
FINDSTRING((DT_WSTR,1)[VALUE],".",1) != 0 ? NULL(DT_R8) : [VALUE]
But, when the package runs it gets the following error when it reaches the cell with the period in it:
The data conversion for column "VALUE" returned status value 2 and status text
"The value could not be converted because of a potential loss of data".
I'm guessing there might be an escape character that I'm missing in my FINDSTRING function but I can't seem to find what that may be. Does anyone have any thoughts on how I can get around this issue?

Trying to debug things like this is why I always advocate adding many Derived Columns to the Data Flow. It's impossible to debug that entire expression. Instead, first find the position of the period and add that as a new column. Then you can feed that into the ternary operation and bit by bit you can add data viewers to ensure you are seeing what you expect to see.
Personally, I'd take a different approach. It seems that you'd like to make any columns that are . into a null of type DT_R8.
Add a derived column, TrimmedValue and use this expression to remove any leading/trailing whitespace and then
RTRIM(LTRIM(Value))
Add a second derived column component, this time we'll add column MenopausalValue as it will remove the period. Use this expression
(TrimmmedValue == ".") ? Trimmedvalue : NULL(DT_WSTR, 50)
Now, you can add your final Derived Column wherein we convert the string representation of Value to the floating point representation.
IsNull(MenopausalValue) ? NULL(DT_R8) : (DT_R8) MenopausalValue
If the above shows an error, then you need to apply the following version as I can never remember the evaluation sequence for ternary operations that change type.
(DT_R8) (IsNull(MenopausalValue) ? NULL(DT_R8) : (DT_R8) MenopausalValue)
Examples of breaking these operations into many steps for debugging purposes
https://stackoverflow.com/a/15176398/181965
https://stackoverflow.com/a/31123797/181965
https://stackoverflow.com/a/33023858/181965

You can do it like this:
TRIM(Value) == "." ? NULL(DT_R8) : (DT_R8)Value

Related

How to add leading zeros in ADF data flow from the expression builder

How to add leading zeros in ADF data flow from the expression builder
For example – have column with numeric value as “000001” but it is coming as 1 only in SQL DB , if I put in entire value in single quotes it is coming but I need dynamic way of implementation with out hard coding.
I agree with #Larnu's comments that even if we give 00001 to an int type column it will give as 1 only.
So, we have to give those in single quotes ('00001') to use like that or import the incoming data as string instead of int.
As you are using ADF dataflow, if you want to use the 00001, you can generate those using derived column transformation from SQL source. But this depends on your requirement like how your leading 0's varies. So, use according to it.
Sample demo:
concat('0000', toString(id))
Result:
Use that column as per your requirement, after that you can convert it back to the same input by toInteger(id).

INSERT/UPDATE conditional split in SSIS working incorrectly

I looked for an answer for my specific issue before posting. Didn't find anything. I have a conditional split in SSIS that is inserting and updating, except that it seems to be updating 250+ rows each time it runs, whether an update was made to the source or not. Insert works correctly. But it only works when I "ignore error" on the conditional split, otherwise the split " evaluated to NULL, but the "Conditional Split" requires a Boolean results " error shows up. Any idea on how I can fix this? My conditional split looks like this:
UPDATE = [Copy of ORDER_TYPE] != ORDER_TYPE || [Copy of WEEK] != WEEK || [Copy of GOAL] != GOAL || [Copy of WEEK_START] != WEEK_START || [Copy of WEEK_END] != WEEK_END || [Copy of DIVISION_DESC] != DIVISION_DESC || [Copy of SUB_ORDER_TYPE] != SUB_ORDER_TYPE
INSERT = ISNULL(ID) || ISNULL(WEEK) || ORDER_TYPE == ""
I followed this tutorial.
In a situation like this, it's impossible for us to debug what is happening as we don't have access to your data, your package and the results of the above boolean conditions.
What I do, when faced with a problem like this, is to add one, possibly two, Derived Column task before the Conditional Split. The first I'd called DER Action Flags as we'll generate the boolean conditions for the action we should take.
Add a column IsInsert and IsUpdate and the use the above expressions. Now connect your derived column to the Conditional Split and replace the two expressions to just use our new derived columns. Add a data viewer immediately before the split and you can verify whether your logic is sound.
Given the length of your UPDATE expression, I would break that into individual column evaluations in a derived column before DER Action Flags I described above. Call it something like DER Compute Changed Flags to indicate we're computing the whether the column has changed.
In this Derived Column component, you'll break out each column change check i.e.
Changed_ORDER_TYPE [Copy of ORDER_TYPE] != ORDER_TYPE
Changed_WEEK [Copy of WEEK] != WEEK
That then simplifies the IsUpdate logic to Changed_ORDER_TYPE || Changed_WEEK...
Now that data viewer will show you the exact condition that is resulting in the Change to be erroneously flagged. That distills the problem down to these two inputs and this expression are not evaluating as expected (and that is something we can figure out)
Based on your "ignore error" comment, I assume you have a condition with NULL comparison that might not be covered by the referenced link.
And since this is a series of Comments converted to an answer,
That worked. The data viewer showed that one of my input columns was converting as float, while the table datatype was int, so the values with decimals are different than those in the table, as they get converted as int. fixed.
Future readers, verify your data types (double click the connector lines out of a component and select MetaData) a consistent as data conversion rules might surprise you in unexpected ways.

scala readCsvFile behaviour

I´m using flink to load a csv file to a dataset of pojos, defined through a scala case class, using readCsvFile method, and I have a problem that i cannot solve.
When in the csv there is a record with some format error in any of its fields it is discarded, and I assume that the only way to keep those records is to type them all as String and do the validations myself.
The problem is that if the last field after the delimiter is empty, the record is discarded by default, I think because it is considered as not having the expected number of fields it should, and it is not possible to handle this record error, while if the empty value if in any of the previous fields there is no problem.
Example
field1|field2|field3
a||c
a|b|
In this example, first record is returned by readCsvFile method but not the second.
Is this behaviour right? and there is any walk around to get the record?
Thanks
Case classes and tuples in Flink do not support null values. Therefore, a||c is invalid if the empty field is not a String. I recommend to use the RowCsvInputFormat in this case. It supports nulls and generic rows can be converted to any other class in a following map operator.
The problem is that, as you say, if the field is a String, the record should be valid even if it is null, and this doesn't happen when the null value is in the last field. The behaviour is different depending on the position.
I will try also with RowCsvInputFormat as you recommend.
Thanks

SSIS Package - Derived Column Conditional Expression for Date Fields with Zero

I am relatively new to SSIS and its data types. I have successfully created a Data Flow task that imports data from a comma-delimited .txt flat file to SQL Server. An error occurs when running the task, at the point where a date field in the .txt file has 0.
For a Derived Column expression to convert the date fields with 0 to Null, I have come up with the following so far...
[Latest Bill Due Date]==0 ? NULL(DT_DATE) : (DT_DATE)[Latest Bill Due Date]
...but the logic isn't accepted and the error message appears:
The data types "DT_WSTR" and "DT_I4" are incompatible for binary operator "==". The operand types could not be implicitly cast into compatible types for the operation. To perform this operation, one or both operands need to be explicitly cast with a cast operator.
Thanks in advance for any direction.
I had a similar problem, when in a textfile there was a 00000000 value, had to convert it to null in a datetime column. What ended up working for me was stablishing the table with null value as default in the column and also adding a Script Component as Transformation. Add as a column output something like 'VerifyNullDateVar' and inside the script do something like
if (Row.DATEVAR == 0)
{
//do whatever you want to do if the input value is an actual date
Row.VerifyNullDateVar = 2;
}
else
Row.VerifyNullDateVar = 1;
DATEVAR is the input column you get from the textfile. After that, use a derived column to read the value from VerifyNullDateVar
VerifyNullDating == 1
VerifyNullDating == 2
Finally, you need to set up 2 OLEDB Destination, one when you can save a date value in the Table; and the other one when you ont save anything in it, that way it gets the default null value

SSIS Flat File with Numeric DataType with Spaces

I'm using SSIS to load a fixed length Flat File into SQL.
I have a weight field that has been giving me trouble all day.
It has a length of 8 with 6 DECIMAL POSITIONS IMPLIED (99V990099).
The problem i'm having is when it isn't populated and has 8 spaces.
Everything i try gets an error:
"Invalid character value for cast specification"."
OR
"Conversion failed because the data value overflowed the specified type.".
OR
Data conversion failed.
The data conversion for column "REL_WEIGHT" returned status value 2 and status text
"The value could not be converted because of a potential loss of data.".
I've tried declaring it as DT_String & DT_Numeric.
I've tried many variations of:
TRIM([REL_WEIGHT])=="" ? (DT_STR,8,1252)NULL(DT_STR,8,1252) : REL_WEIGHT
ISNULL([REL_WEIGHT]) || TRIM([REL_WEIGHT]) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)[REL_WEIGHT]
TRIM(REL_WEIGHT) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)REL_WEIGHT
But nothing seems to work.
Please someone out there have the fix for this!
I think you may be running afoul of the following point, explained nicely at http://vsteamsystemcentral.com/cs21/blogs/applied_business_intelligence/archive/2009/02/01/ssis-expression-language-and-the-derived-column-transformation.aspx:
You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.
I've solved this on past occasions by loading the data from the flat file as strings, and then deriving a new column in a Derived Column transformation which is of numeric type. You can then perform the appropriate trimming, validation, casting, etc. in the SSIS expression for that new column in the transformation.
Here, I found an example SSIS expression I used at one point to derive a time value from a 4-digit string:
(ISNULL(Last_Update_Time__orig) || TRIM(Last_Update_Time__orig) == "") ? NULL(DT_DBTIME2,0) : (DT_DBTIME2,0)(SUBSTRING(TRIM(Last_Update_Time__orig),1,2)+":"+SUBSTRING(TRIM(Last_Update_Time__orig),3,2)+":00")
There has to be a better way to do it, But i found a way that works.
Create a Derived Column Expression:
TRIM(REL_WEIGHT) == "" ? (DT_STR,9,1252)"0.0000000" : (DT_STR,9,1252)(LEFT(REL_WEIGHT,2) + "." + RIGHT(REL_WEIGHT,6))
THEN Create a Data Conversion Task to change it to Numeric and set scale to 6.
And then Map the [Copy of NewField] to my SQL table field set up as Decimal(8,6).
I don't know how the performance will be of that when loading a million records, probably not the best. If someone knows how to do this in a better way performance wise please let me know.
Thanks,
Jeff

Resources