I am creating a package in SSIS, and want to convert a file with one large column into multiple columns.
I have a table containing several rows with a single column of raw data. The data was copied from a notepad file, and each row contains pipe delimiters to separate each column, but because it is a notepad file, each row is copied as one large column. I want to convert each column per row to multiple columns based on their start/end positions.
I tried using SSIS Derived Column Transformation with the SUBSTRING function, but the Data Type is automatically populated as text stream[DT_TEXT], and I get the following error:
Error at [Derived Column[113]]; The function “SUBSTRING”
does not support the data type “DT_TEXT” for parameter number 1. The
type of the parameter could not be implicitly cast into a compatible
type for the function. To perform this operation, the operand needs
to be explicitly cast with a cast operator.
Error at [Derived Column[113]]; Evaluating function
'SUBSTRING' failed with error code 0xC0047089.
Error at [Derived Column[113]]; Computing the expression
"SUBSTRING[RawData],1,5)" failed with error code 0xC00470C5. The
expression may have errors, such as divide by zero, that cannot be
detected at parse time, or there may be an out-of-memory error.
Error at [Derived Column[113]]; The expression
"SUBSTRING[RawData], 1,5)" on "Derived Column.Outputs[Derived Column
Output].Coluns[Derived Column 1] is not valid
Error at [Derived Column[113]]; Failed to set property
"Expression" on "Derived Column.Outputs[Derived Column
Output].Columns[Derived Column 1]. "
When I review other Derived Column Transformation illustrations utilizing SUBSTRING with a file containing individual columns, I notice the Data Type is shown as DT_WSTR.
Do I need to convert to this Data Type? If so, how do I explicitly cast DT_TEXT data types to DT_WSTR with a cast operator in SSIS Derived Column Transformation?
Otherwise, how else could I handle this conversion?
Derived Column Name: EmployerNo
Derived Column: Replace 'RawData'
Expression: SUBSTRING( [RawData], 1, 5 )
Data Type: text stream[DT_TEXT]
I expect the RawData column to be split up (converted) into 8 different columns based on their start and end positions.
Refering to SUBSTRING (SSIS Expression) documentation:
Returns the part of a character expression that starts at the specified position and has the specified length.
You have to convert DT_TEXT column to DT_STR/DT_WSTR before using Substring() function, you can do this using a Script Component, you can use a similar function:
string BlobColumnToString(BlobColumn blobColumn)
{
if (blobColumn.IsNull)
return string.Empty;
var blobLength = Convert.ToInt32(blobColumn.Length);
var blobData = blobColumn.GetBlobData(0, blobLength);
var stringData = Encoding.Unicode.GetString(blobData);
return stringData;
}
Or if the DT_TEXT length doesn't exceed the DT_STR length limit try using the following SSIS expression:
SUBSTRING( (DT_STR,1252,4000)[RawData], 1, 5 )
Related
I have an SSIS project where Flat File Source reads CSV file. It contains a field Order Item Id that is formatted as a string like "347262171", surrounded by quotes. I want to convert that to numeric value so I can use it as an index but everything I try gives me result:
Data conversion failed. The data conversion for column "Order Item ID" returned status value 2 and status text "The value could not be converted because of a potential loss of data."
What would be the easiest workaround for this?
You can add a Derived Column Transformation (DCT) to the data flow where you add an expression that removes quotes from the value:
REPLACE( [ID FIELD], "\"", "" )
where ID FIELD is the column with the ID value in your data. Add this column as a new NVARCHAR column to your data flow (ie STRIPPED_ID_FIELD).
Then, add a second DCT, where you cast this value to number (DB_NUMERIC(10,0))[STRIPPED_ID_FIELD], and name it NUM_ID_FIELD.
The reason I'd to this in a second, separate DCT, is that you can add an error output to this second one, and redirect that to a Recordset Destination. Then add a Data Viewer to the error output to see what sort of records are wrong. For instance, ID fields that have a letter that you're not expecting.
You can remove the double quotes in a Flat file connection by specifying Text Qualifier =" ,if you are using flat file .
Image description of where to insert Qualifier
From the source csv file, I have DailyTurnover column which is Int though there were \N values which correspond to NULL.
The question is - how can I convert this \N values to NULL going to destination column using SSIS task which has int data type?
I would handle this by adding a derived column that checks for \N and replaces it with a NULL int.
After your source item, add a Derived Column component. Change the "Derived Column" option to Replace 'DailyTurnover', and enter this expression (untested):
[DailyTurnover] == "\\N" ? NULL(DT_WSTR) : [DailyTurnover]
Then map the derived column to the destination.
EDIT: DT_I4 replaced with DT_WSTR in above expression based on error messages received by OP.
I am relatively new to SSIS and its data types. I have successfully created a Data Flow task that imports data from a comma-delimited .txt flat file to SQL Server. An error occurs when running the task, at the point where a date field in the .txt file has 0.
For a Derived Column expression to convert the date fields with 0 to Null, I have come up with the following so far...
[Latest Bill Due Date]==0 ? NULL(DT_DATE) : (DT_DATE)[Latest Bill Due Date]
...but the logic isn't accepted and the error message appears:
The data types "DT_WSTR" and "DT_I4" are incompatible for binary operator "==". The operand types could not be implicitly cast into compatible types for the operation. To perform this operation, one or both operands need to be explicitly cast with a cast operator.
Thanks in advance for any direction.
I had a similar problem, when in a textfile there was a 00000000 value, had to convert it to null in a datetime column. What ended up working for me was stablishing the table with null value as default in the column and also adding a Script Component as Transformation. Add as a column output something like 'VerifyNullDateVar' and inside the script do something like
if (Row.DATEVAR == 0)
{
//do whatever you want to do if the input value is an actual date
Row.VerifyNullDateVar = 2;
}
else
Row.VerifyNullDateVar = 1;
DATEVAR is the input column you get from the textfile. After that, use a derived column to read the value from VerifyNullDateVar
VerifyNullDating == 1
VerifyNullDating == 2
Finally, you need to set up 2 OLEDB Destination, one when you can save a date value in the Table; and the other one when you ont save anything in it, that way it gets the default null value
I have a column named Paid_Date in my table that has a kind of improper date value with a colon in between date and the time values (e.g. '04MAY2015:00:00:00').
In order to exclude that colon, I have used an expression
((DT_DBTIMESTAMP)SUBSTRING(PAID_DATE,1,9))
in my Derived Column transformation.
When I try running this, I'm getting the following error.
Expression used in my trasformation: (DT_DBTIMESTAMP)SUBSTRING(PAID_DATE,1,9)
Source column data type : varchar
Source column Value: 04MAY2015:00:00:00
Error: [Derived Column [1613]] Error: An error occurred while attempting to perform a type cast.
Detailed Error:[Derived Column [1613]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR.
The "component "Derived Column" (1613)" failed because error code 0xC0049064 occurred,
and the error row disposition on "output column "PaidDate" (1954)" specifies failure on error.
An error occurred on the specified object of the specified component.
There may be error messages posted before this with more information about the failure.
I have even tried to use the following expression:
(DT_DBTIMESTAMP)((DT_STR,50,1252)SUBSTRING(PAID_DATE,1,9))
Is it not possible to convert a string after a SUBSTRING operation?
Casting a string to a DT_DBDATETIME must use a specific string format:
yyyy-mm-dd hh:mm:ss[.fff]
You'll need to transform the string to the standard format before the cast will work.
Extract the month, use a lookup transformation or nested ternary operators to change it to two digit numeric string, then format the results into a string you can cast into a DT_DBDATETIME.
I'm using SSIS to load a fixed length Flat File into SQL.
I have a weight field that has been giving me trouble all day.
It has a length of 8 with 6 DECIMAL POSITIONS IMPLIED (99V990099).
The problem i'm having is when it isn't populated and has 8 spaces.
Everything i try gets an error:
"Invalid character value for cast specification"."
OR
"Conversion failed because the data value overflowed the specified type.".
OR
Data conversion failed.
The data conversion for column "REL_WEIGHT" returned status value 2 and status text
"The value could not be converted because of a potential loss of data.".
I've tried declaring it as DT_String & DT_Numeric.
I've tried many variations of:
TRIM([REL_WEIGHT])=="" ? (DT_STR,8,1252)NULL(DT_STR,8,1252) : REL_WEIGHT
ISNULL([REL_WEIGHT]) || TRIM([REL_WEIGHT]) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)[REL_WEIGHT]
TRIM(REL_WEIGHT) == "" ? (DT_NUMERIC,8,6)0 : (DT_NUMERIC,8,6)REL_WEIGHT
But nothing seems to work.
Please someone out there have the fix for this!
I think you may be running afoul of the following point, explained nicely at http://vsteamsystemcentral.com/cs21/blogs/applied_business_intelligence/archive/2009/02/01/ssis-expression-language-and-the-derived-column-transformation.aspx:
You can add a DT_STR Cast statement to the expression for the MiddleName, but it doesn't change the Data Type. Why can't we change the data type for existing columns in the Derived Column transformation? We're replacing the values, not changing the data type. Is it impossible to change the data type of an existing column in the Derived Column? Let's put it this way: It is not possible to convert the data type of a column when you are merely replacing the value. You can, however, accomplish the same goal by creating a new column in the Data Flow.
I've solved this on past occasions by loading the data from the flat file as strings, and then deriving a new column in a Derived Column transformation which is of numeric type. You can then perform the appropriate trimming, validation, casting, etc. in the SSIS expression for that new column in the transformation.
Here, I found an example SSIS expression I used at one point to derive a time value from a 4-digit string:
(ISNULL(Last_Update_Time__orig) || TRIM(Last_Update_Time__orig) == "") ? NULL(DT_DBTIME2,0) : (DT_DBTIME2,0)(SUBSTRING(TRIM(Last_Update_Time__orig),1,2)+":"+SUBSTRING(TRIM(Last_Update_Time__orig),3,2)+":00")
There has to be a better way to do it, But i found a way that works.
Create a Derived Column Expression:
TRIM(REL_WEIGHT) == "" ? (DT_STR,9,1252)"0.0000000" : (DT_STR,9,1252)(LEFT(REL_WEIGHT,2) + "." + RIGHT(REL_WEIGHT,6))
THEN Create a Data Conversion Task to change it to Numeric and set scale to 6.
And then Map the [Copy of NewField] to my SQL table field set up as Decimal(8,6).
I don't know how the performance will be of that when loading a million records, probably not the best. If someone knows how to do this in a better way performance wise please let me know.
Thanks,
Jeff