difference between data conversion and derived column in ssis? - sql-server

I am learning SSIS for a new requirement. I came across these two transformations - Data Conversion and Derived Column. But we can convert the datatypes in Derived Column itself. So what was the need that Microsoft added this 'data Conversion' transformation? I searched in google but did not get the proper answer.

Hope this helps:
The purpose of Data conversion is to do just data conversion.
While Derived column task is used for most of the transformations. To
achieve this, data conversion is also put in as a part of it. If you
are just intending to do data conversion and no other transform, for
simplicity and readability of the package.
Data conversion gives a simple UI to the end user for full filling the
requirement of changing the data type of incoming columns.
Derived columns also can help us to achieve data conversion but there
we have to explicitly write a code to type cast that.
To give an anology: you can read an Excel file in the dataflow using an Excel source or an OLE DB Source. Doesn't mean the Excel source shouldn't be there. It's easier to use.
Source: Code project

Related

Column "" cannot convert between unicode and non-unicode string data types

I am trying to import the data from the flat file into the Azure SQL database table and I have a merge to merge with another source too. But when I map the fields from the flat file to the Azure SQL database I keep getting the error like
Column "Location" cannot convert between unicode and non-unicode string data types
Upon looking at some forums I tried to change the data type of the field to Unicode string[DT_WSTR] and even I tried to have string [DT_STR]
The Destination Azure SQL database below is the Location field
Can anyone please suggest what I am missing here? Any help is greatly appreciated
Changing the columns data types from the component's advanced editor will not solve the problem. If the values imported contain some Unicode characters, you cannot convert them to non-Unicode strings, and you will receive the following exception. Before providing some solution, I highly recommend reading this article to learn more on data type conversion in SSIS:
SSIS Data types: Change from the Advanced Editor vs. Data Conversion Transformations
Getting back to your issue, there are several solutions you could try:
Changing the destination column data type (if possible)
Using the Data conversion transformation component, implement an error handling logic where the values throwing exceptions are redirected to a staging table or manipulated before re-importing them to the destination table. You can refer to the following article: An overview of Error Handling in SSIS packages
From the flat file connection manager, got to the "Advanced Tab", and change the column data type to DT_STR.

Specifying flat file data types vs using data conversion

This may be a stupid question but I must ask since I see it a lot... I have inherited quite a few packages in which developers will use the the Data Conversion transformation shape when dumping flat files into their respective sql server tables. This is pretty straight forward however I always wonder why wouldn't the developer just specify the correct data types within the flat file connection and then do a straight load into the the table?
For example:
Typically I will see flat file connections with columns that are DT_STR and then converted into the correct type within the package ie: DT_STR of length 50 to DT_I4. However, if the staging table and the flat file are based on the same schema - why wouldn't you just specify the correct types (DT_I4) in the flat file connection? Is there any added benefit (performance, error handling) for using the data conversion task that I am not aware of?
This is a good question with not one right answer. Here is the strategy that I use:
If the data source is unreliable
i.e. sometimes int or date values are strings, like when you have the literal word 'null' instead of the value being blank. I would let the data source be treated as strings and deal with converting the data downstream.
This could mean just staging the data in a table and using the database to do conversions and loading from there. This pattern avoid the source component throwing errors which is always tricky to troubleshoot. Also, it avoids having to add error handling into data conversion components.
Instead, if the database throws a conversion error, you can easily look at the data in your staging table to examine the problem. Lastly, SQL is much more forgiving with date conversions than ssis.
If the data source is reliable
If the dates and numbers are always dates and numbers, I would define the datatypes in the connection manager. This makes it clear what you are expecting from the file and makes the package easier to maintain with fewer components.
Additionally, if you go to the advanced properties of the flatfile source, integers and dates can be set to fast parse which will speed up the read time: https://msdn.microsoft.com/en-us/library/8893ea9d-634c-4309-b52c-6337222dcb39?f=255&MSPPError=-2147217396
When I use data conversion
I rarely use the data conversion component. But one case I find it useful is for converting from / to unicode. This could be necessary when reading from an ado.net source which always treats the input as unicode, for example.
You could change the output data type in the flat file connection manager in Advanced page or right click the source in Data flow, Advanced editor to change the data type before loading it.
I think one of the benefit is the conversion transformation could allow you output the extra column, usually named copy of .., which in some case, you might use both of the two columns. Also, sometimes when you load the data from Excel source, all coming with Unicode, you need to use Data conversion to do the data TF, etc.
Also, just FYI, you could also use Derived Column TF to convert the data type.
UPDATE [Need to be further confirmed]:
From the flat file source connection manager, the maximum length of string type is 255, while in the Data Conversion it could be set over 255.

SSIS: Handling 1/0 Fields in Data Flow Task

I am building a Data Flow Task in an SSIS package that pulls data in from an OLE DB Source (MS Access table), converts data types through a Data Conversion Transformation, then routes that data to an OLE DB Destination (SQL Server table).
I have a number of BIT columns for flag variables in the destination table and am having trouble with truncation when converting these 1/0 columns to (DT_BYTES,1). Converting from DT_WSTR and DT_I4 to (DT_BYTES,1) results in the same truncation, and I have verified that it is happening at that step through the Data Viewer.
It appears that I need to create a derived column similar to what is described in the answers to the question linked below, but instead of converting to DT_BOOL, I need to convert to (DT_BYTES,1), as casting from DT_BOOL to DT_BYTES is apparently illegal?
SSIS Converting a char to a boolean/bit
I have made several attempts at creating a derived column with variations of the logic below, but haven’t had any luck. I am guessing that I need to use Hex literals in the “1 : 0” portion of the expression, but I haven’t been able to find valid syntax for that:
(DT_BYTES,1)([Variable_Name] == (DT_I4)1 ? 1 : 0)
Am I approaching this incorrectly? I can’t be the first person to need to insert BIT data into a SQL Server table, and the process above just seems unnecessarily complex to me.

Storing Serialized Information In SQL Server using F#

I am currently working on a project in F# that takes in data from Excel spreadsheets, determines if it is compatible with an existing table in SQL Server, and then adds the relevant rows to the existing table.
Some of the data I am working with is more specific than the types provided by T-SQL. That is, T-SQL has a type "date", but I need to distinguish between sets of dates that are at the beginning of each month or the end of each month. This same logic applies to many other types as well. If I have types:
Date(Beginning)
Date(End)
they will both be converted to the T-SQL type "date" before being added to the table, therefore erasing some of the more specific information.
In order to solve this problem, I am keeping a log of the serialized types in F#, along with which column number in the SQL Server table they apply to. My question is: is there any way to store this log somewhere internally in SQL Server so that I can access it and compare the serialized types of the incoming data to the serialized types of the data that already exists in the table before making new inserts?
Keeping metadata outside of the DB and maintaining them manually makes your DB "expensive" to manage plus increases the risk of errors that you might not even detect until something bad happens.
If you have control over the table schema, there are at least a couple of simple options. For example, you can add a column that stores the type info. For something simple with just a couple of possible values as you described, just add a new column to store the actual type value. Update the F# code to de-serialize the source into separate DATE and type (BEGINNING/END) values which are then inserted to the table. Simple, easy to maintain and easily consumed.
You could also create a user defined type for each date subtype but that can be confusing to another DBA/dev plus makes it more complicated when retrieving data from your application. This is generally not a good approach.
Yes, you can do that if you want to.

How to apply same data manipulation codes to a group of SSIS components' inputs?

I am new to SSIS.
I have a number of MS access tables to transform to SQL. Some of these tables have datetime fields needed to go under some rules before sitting in respected SQL tables. I want to use Script component that deals with these kind of fields converting them to the desired values.
Since all of these fields need same modification rules, I want to apply the same code base to all of them thus avoiding the code duplication. What would be the best option for this scenario?
I know I can't use the same Script Component and direct all of those datasets outputs to it because unfortunately it doesn't support multi-inputs . So the question is is it possible to apply a set of generic data manipulation rules
on a group of different datasets' fields without repeating the rules. I can use a Script component for each ole db input and apply the same rule on them each. But it would not be an efficient way of doing that.
Any help would be highly appreciated.
SQL Server Integration Services has a specific task to suit this need, called a Data Conversion Transformation. This can be accomplished on the data source or via the task, as noted here.
You can also use the Derived Column transformation to convert data. This transformation is also simple, select an input column and then chose whether to replace this column or create a new output column. Then you apply an expression for the output column.
So why use one over the other?
The Data Conversion transformation (Pictured Below) will take an input, convert the type and provide a new output column. If you use the Derived Column transformation, you get to apply an expression to the data, which allows you to do more complex manipulations on the data.

Resources