Data Conversion text to numeric in SSIS is removing characters - sql-server

I am facing a strange issue while using SSIS "Data Conversion component" to convert string to decimal datatype. I use SSIS 2016.
The source data input has values of mixed data types- string, integer, decimal and is defined as varchar in the flat file source. The target data type expected is numeric. When explicit type conversion happens from string to decimal, we expect the alphanumeric values to get rejected to error table and only the numeric values to pass through.
Instead, we are seeing some alphanumeric values shedding the characters in the value and passing through successfully with no error.
Examples: Value "3,5" converted to 35
Value "11+" converted to 11
We do not have control over source data and will not be able to replace char data before passing data into Data conversion component.
We have tried the below steps as a workaround and it has worked.
i.e,
First Data Conversion from DT_STR to DT_NUMERIC
Capture error rows that fail the above conversion
Second Data Conversion from DT_NUMERIC to DT_DECIMAL
But as the source data is not reliable, we may have to apply this workaround wherever there are numeric fields (int types & deicmals) which is not a friendly solution.
So checking with you all to understand if there is an easier and better solution tried out by anyone.

I did not expect this result, but I tried an expression task and it worked for DT_DECIMAL:
(DT_DECIMAL,1)"11+" -- evaluates to 11.0
But it does not work for DT_NUMERIC. SSIS won't allow a direct numeric result, but it can be nested inside a cast to DT_DECIMAL. Just to demonstrate that, in an expression task even this "numerically valid" cast would not be permitted, because the output simply can't be of type DT_NUMERIC:
(DT_NUMERIC, 3, 0)123
But this is permitted:
(DT_DECIMAL,0)((DT_NUMERIC, 3, 0)123)
So as long as you are happy to specify a precision and scale big enough to hold your data during the "validity" check done by DT_NUMERIC, and then cast it from there to DT_DECIMAL, all in a derived column transform, then DT_NUMERIC seems to enforce the strict semantics you want.
SSIS allows this:
(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"11")
But not either of these:
(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"11+")
(DT_DECIMAL,0)((DT_NUMERIC, 2, 0)"3,5")

#billinkc Sorry for not responding to you earlier.
We are working under some restrictions:
(1) All we want to do is capture datatype issues in input data, so we wanted to harness the capability of SSIS Data Conversion Component in SSIS.
(2) DBA doesn't want us to use SQL for type conversions, so we are required to do these conversions between flat file source and flat file destination using SSIS.
(3) We are required to capture the type conversion errors at every step of conversion into an error output file with error column name and error description, to be used later. So we cannot remove char data in the field before passing it to Data Conversion component.
#allmhuran - We have used Derived column task before Data Conversion component to replace unnecessary characters in one of the other fields, but using the same for type conversion makes achieving (3) difficult. Because error output from Derived column task and Data Conversion component cannot be redirected to the same error output file.
We can completely ignore Data Conversion component and use only Derived column task to do all type conversions, whether single or nested. I am trying this and the error descriptions do not always look good, but the cons of the former method can be overcome. I will try this out!

Related

ORA-22835: Buffer too small and ORA-25137: Data value out of range

We are using a software that has limited Oracle capabilities. I need to filter through a CLOB field by making sure it has a specific value. Normally, outside of this software I would do something like:
DBMS_LOB.SUBSTR(t.new_value) = 'Y'
However, this isn't supported so I'm attempting to use CAST instead. I've tried many different attempts but so far these are what I found:
The software has a built-in query checker/validator and these are the ones it shows as invalid:
DBMS_LOB.SUBSTR(t.new_value)
CAST(t.new_value AS VARCHAR2(10))
CAST(t.new_value AS NVARCHAR2(10))
However, the validator does accept these:
CAST(t.new_value AS VARCHAR(10))
CAST(t.new_value AS NVARCHAR(10))
CAST(t.new_value AS CHAR(10))
Unfortunately, even though the validator lets these ones go through, when running the query to fetch data, I get ORA-22835: Buffer too small when using VARCHAR or NVARCHAR. And I get ORA-25137: Data value out of range when using CHAR.
Are there other ways I could try to check that my CLOB field has a specific value when filtering the data? If not, how do I fix my current issues?
The error you're getting indicates that Oracle is trying to apply the CAST(t.new_value AS VARCHAR(10)) to a row where new_value has more than 10 characters. That makes sense given your description that new_value is a generic audit field that has values from a large number of different tables with a variety of data lengths. Given that, you'd need to structure the query in a way that forces the optimizer to reduce the set of rows you're applying the cast to down to just those where new_value has just a single character before applying the cast.
Not knowing what sort of scope the software you're using provides for structuring your code, I'm not sure what options you have there. Be aware that depending on how robust you need this, the optimizer has quite a bit of flexibility to choose to apply predicates and functions on the projection in an arbitrary order. So even if you find an approach that works once, it may stop working in the future when statistics change or the database is upgraded and Oracle decides to choose a different plan.
Using this as sample data
create table tab1(col clob);
insert into tab1(col) values (rpad('x',3000,'y'));
You need to use dbms_lob.substr(col,1) to get the first character (from the default offset= 1)
select dbms_lob.substr(col,1) from tab1;
DBMS_LOB.SUBSTR(COL,1)
----------------------
x
Note that the default amount (= length) of the substring is 32767 so using only DBMS_LOB.SUBSTR(COL) will return more than you expects.
CAST for CLOB does not cut the string to the casted length, but (as you observes) returns the exception ORA-25137: Data value out of range if the original string is longert that the casted length.
As documented for the CAST statement
CAST does not directly support any of the LOB data types. When you use CAST to convert a CLOB value into a character data type or a BLOB value into the RAW data type, the database implicitly converts the LOB value to character or raw data and then explicitly casts the resulting value into the target data type. If the resulting value is larger than the target type, then the database returns an error.

SSIS Data conversion transformation

in SSIS I read a csv file with column format - for example 1.25 or 2.50.
In the datatransformation task I transform into decimal dt_decimal scala 2.
In the datatable the column has the format decimal(18,2).
The data will be stored with 125.00 or 25.00 instead of 1.25 and 2.50.
What do I have to adjust?
Possible Issue causes
(1) Data type mismatch
I think that a similar issue is caused by data type mismatch between source and destination, or data transformation output and Destination.
(2) Numeric separators
Another cause may be if the numeric values contains a comma , as formatting such as thousands commas 1,000,000 and decimal separator . like 1.02.
Possible solutions
(1) Specify data type in source
To prevent this issue to be caused by data transformations and if your source data is formatted well. Then there is no need for Data Transformation. Inside the Flat File Connection Manager editor. GoTo Advanced Tab, Select the column that contains decimal and change its data type (try DT_NUMERIC and DT_DECIMAL) and precision and scale property.
If the issue still happening, be sure that both input and output has same metadata (precision and scale).
(2) Derived Column
Or you should use a Derived Column Transformation with a similar expression:
(DT_NUMERIC,18,2)[COLUMN]
(3) Replace Separator using derived column
You can replace separator using a derived column
(DT_NUMERIC,18,2)REPLACE([COLUMN], ",", ".")

SSIS Data Conversion Error despite using Data Conversion and accurate destination Datatype

I'm getting the following error when I run my package:
[Data Conversion [2]] Error: Data conversion failed while converting column "FieldName" (373) to column "Copy of FieldName" (110).
The conversion returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
However, I don't understand why. I have double checked the inputs and outputs to validate that they make sense and are what I expect. I've also checked all the raw data in the column in my excel file.
My package setup:
Excel Datasource feeding Data Conversion then Derived Column and finally output to Ole DB Destination (sql)
What I've done:
I opened the advanced editor on the data conversion. I confirmed that the incoming data type is DT_STR which can be expected since the source datatype wasn't correctly identified. It is actually a date in my excel file. I confirmed that the data conversion output column is database timestamp [DT_DBTIMESTAMP] as I have set it to be. My destination table has a DateTime datatype for FieldName.
What am I missing?
I think this is a date format issue, check that column does not contains empty strings or NULL values.
Also check that values are similar to yyyy-MM-dd HH:mm:ss date format.
To read more about SSIS data types check the following article:
Integration Services Data Types
Also when converting string values to datetime, if values are well formated, just map the source column to the destination without Data conversion Transformation and they will be implicitly converted

RODBC sqlUpdate failed: Invalid character value for cast specification

Here is a screenshot of the console output after executing the sqlUpdate() function. As you can see, it fails at the first row of the data frame.
I suspect that the problem has to do with the introduced_at list component of my Bills.df data frame.
Bills.df$introduced_at <- as.Date(Bills.df$introduced_at, "%Y-%m-%d")
Before this statement, introduced_at is of type character. But this changes it to double. I understand that this may have something to do with the R internals, since it uses C data types to store various R data structures. Maybe it's the wrong expression for data type conversion?
Additionally, if this is a Date type issue, I'd also like to know how to properly convert the updated_at type to Datetime to prevent a similar issue from occurring. The structure of updated_at is as follows: 2015-09-01T10:19:13-04:00. I'm not comfortable with as.POSIXlt and asPOSIXct so I'm not sure which function is appropriate.
"updated_at": "2015-09-01T10:19:13-04:00"

SSIS Flat File with Numeric DataType with Spaces

I'm using SSIS to load a fixed length Flat File into SQL.
I have a weight field that has been giving me trouble all day.
It has a length of 8 with 6 DECIMAL POSITIONS IMPLIED (99V990099).
The problem i'm having is when it isn't populated and has 8 spaces.
Everything i try gets an error:
"Invalid character value for cast specification"."
OR
"Conversion failed because the data value overflowed the specified type.".
OR
Data conversion failed.
The data conversion for column "REL_WEIGHT" returned status value 2 and status text
"The value could not be converted because of a potential loss of data.".
I've tried declaring it as DT_String & DT_Numeric.
I've tried many variations of:
TRIM([REL_WEIGHT])=="" ? (DT_STR,8,1252)NULL(DT_STR,8,1252) : REL_WEIGHT
ISNULL([REL_WEIGHT]) || TRIM([REL_WEIGHT]) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)[REL_WEIGHT]
TRIM(REL_WEIGHT) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)REL_WEIGHT
But nothing seems to work.
Please someone out there have the fix for this!
I think you may be running afoul of the following point, explained nicely at http://vsteamsystemcentral.com/cs21/blogs/applied_business_intelligence/archive/2009/02/01/ssis-expression-language-and-the-derived-column-transformation.aspx:
You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.
I've solved this on past occasions by loading the data from the flat file as strings, and then deriving a new column in a Derived Column transformation which is of numeric type. You can then perform the appropriate trimming, validation, casting, etc. in the SSIS expression for that new column in the transformation.
Here, I found an example SSIS expression I used at one point to derive a time value from a 4-digit string:
(ISNULL(Last_Update_Time__orig) || TRIM(Last_Update_Time__orig) == "") ? NULL(DT_DBTIME2,0) : (DT_DBTIME2,0)(SUBSTRING(TRIM(Last_Update_Time__orig),1,2)+":"+SUBSTRING(TRIM(Last_Update_Time__orig),3,2)+":00")
There has to be a better way to do it, But i found a way that works.
Create a Derived Column Expression:
TRIM(REL_WEIGHT) == "" ? (DT_STR,9,1252)"0.0000000" : (DT_STR,9,1252)(LEFT(REL_WEIGHT,2) + "." + RIGHT(REL_WEIGHT,6))
THEN Create a Data Conversion Task to change it to Numeric and set scale to 6.
And then Map the [Copy of NewField] to my SQL table field set up as Decimal(8,6).
I don't know how the performance will be of that when loading a million records, probably not the best. If someone knows how to do this in a better way performance wise please let me know.
Thanks,
Jeff

Resources