In SSIS - How can I split data from row into 2 rows
for example :
FROM :
ID Data
1 On/Off
2 On/Off
TO :
ID Data
1 On
1 Off
2 On
2 Off
Solution Overview
You have to use a script component to achieve this. Use an unsynchronous output buffer to generate multiple rows from on row based on your own logic.
Solution Details
Add a DataFlow Task
In the DataFlow Task add a Flat File Source, Script Component, and a Destination
In the Script Component, select ID, Data columns as Input
Go to the Input and Outputs page, click on the Output and change the Synchronous Input property to none
Add two Output Columns ID and Data into the Output
Change the Script language to Visual Basic
Inside the Script editor write the following code
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim strValues() as String = Row.Data.Split(CChar("/")
For each str as String in strValues
Output0Buffer.AddRow()
Output0Buffer.ID = Row.ID
Output0Buffer.Data = str
Next
End Sub
Additional Information
For more details follow these links:
SSIS - Script Component, Split single row to multiple rows
Output multiple rows from script component per single row input
Using T-SQL
Based on your comments, this is a link that contains a n example of how this can be done using a SQL command
Turning a Comma Separated string into individual rows
Related
I'm writing an SSIS package to load data from a .csv into a db.
There's a column in the csv file that is supposed to have a count, but the records sometimes have text, so I can't just load the data in as an integer. It looks something like this:
I want the data to land in the db destination as an integer instead of a string. I want the transformation to change any text to a 1, any blank value to a 1, and leave all the other numbers as-is.
My attempts have so far included using the Derived Column functionality, which I couldn't get the right expression(s) for it seems, and creating a temp table to run a sql query through the data, which kept breaking my data flow.
There are three approaches you can follow.
(1) Using a derived column
You should add a derived column with the following expression to check if the values are numeric or not:
(DT_I4)[count] == (DT_I4)[count] ? [count] : 1
Then in the derived column editor, go to the error output configuration and set the error handling event to Ignore failure.
Now add another derived column to replace null values with 1 :
REPLACENULL([count_derivedcolumn],1)
You can refer to the following article for a step-by-step guide:
Validate Numeric or Non-Numeric Data in SQL Server Integration Services without the Script Task
(2) Using a script component
If you know C# or Visual Basic.NET, you can add a script component to check if the value is numeric and replace nulls and string values with 1
(3) Update data in SQL
You can stage data in its initial form into the SQL database and use an update query to replace nulls and string values with 1 as follows:
UPDATE [staging_table]
SET [count] = 1
WHERE [count] IS NULL or ISNUMERIC([count]) = 0
Hi I have one doubt in SSIS,
I want load multiple csv files into SQL server table using SSIS package.
while loading time we need consider data from headers on wards.
Source path have 3 csv files with fixed header columns with data
but each file have file desciption and dates creation information before headers and
one file description comes 2row and headers row start from 4th row with data.
Another file description comes from 1 row and 9 row on wards have headers with data and another file will come file description from 5 row and headers row start from 7th row. Columns headers are fixed in the all csv files
Files location :
C:\test\a.csv
C:\test\b.csv
C:\test\c.csv
a.csv file data like below :
here descritpion and dates comes 2and 3 row.actual data start from 4th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-20
id |name|loc
1 |a |hyd
b.csv file data like below :
here descritpion and dates comes 1and 2 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
10 |b |chen
c.csv file data like below :
here descritpion and comes 5 and 6 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
20 |c |bang
Based on above 3 file I want load data into target sql server table emp :
id | Name |Sal
1 |a |hyd
2 |b |chen
3 |c |bang
here I tried like below in the package side:
create variable :
filelocationpath: C:\test\
filename : C:\test\a.csv
drag and drop the for-each loop container :
choose the type of enumerator for-each file enumerator
directory: c:\test
variable mapping :filename configure it.
type of file: *.csv
retrieve filename: filename and extension
Inside for-each loop container I drag and drop the data-flow task
and create flat file connection, here used one of file is configure and header row skipped is 1 and used data conversion required column and configure to OLE DB destination table and create dynamic connection expression for flat-file connection to pass filename dynamically.
After executing the package 2nd file is failed due to description and dates information:
description and dates is not constantly comes fixed rows next day files
description and dates will comes with different rows
Is there possible to find dynamical how many row will skip and that count will pass in header row skip.is it possible in SSIS.
Please tell me how to achieve this task in SSIS
If you have constantly count of rows which you should skip then try to go on utube and find this video: Delete Top N Rows from Flat File in SSIS Package.
In case you still need to find that amount and you don't know it that try to write into variable the amount for useless rows and then that value paste for processing package.
Workaround
In the Flat File connection manager uncheck the read header from first row option, then go to the advanced tab and define the columns metadata manually (column name, length ...)
Within the Data Flow Task, add a script component
In the Script Component Editor, go to the Input and Output Tab and add an Output column of type boolean
In the script editor, keep checking if the first column value is equal to the column header, while this condition is not met always set the output column value to false, when the column value is equal to the column header then set the output column value for all remaining rows to True
Next to the Script component, add a Conditional split to filter row based on the generated column value (rows with False value must be ignored)
Create a new file connection with a single column for the same file.
Add a Data flow task with a transformation script component.
Attach to the script component a readwrite variable as index (skiprows on the example code) and check the first characters of each row in the process input row.
bool checkRow;
int rowCount;
public override void PreExecute()
{
base.PreExecute();
checkRow = true;
rowCount = 0;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (checkRow)
{
rowCount++;
if (Row.Data.StartsWith("id |"))
checkRow = false;
}
}
public override void PostExecute()
{
base.PostExecute();
Variables.skiprows = rowCount;//set script variable
}
Then you just need to set your variable in the expression 'HeaderRowsToSkip' for the original flat file connection.
If the files are going to be very large, you can force the script to fail when you had found the first row (zero division for example). Add an error event and set the system variable "Propagate" to false (#[System::Propagate]=false).
I have two merge joins in data flow task. I want to set the IsSorted property for inputs of second merge join.
But it is giving error as "The IsSorted Property must be set to True on both sources of this transformation."
following is the image of this:
UPDATE 1
From the answer and the comments below, the IsSorted property can be found in the Data sources (Excel + OLEDB) advanced editor. But the Merge Join Transformation doesn't have this property. And i need to Merge the first Merge Join output with the Excel Source without using a Sort component.
Update 2 (Workaround)
After the merge join add a script component in which you will add one output column (will be used as second join key). In the script just assign the original key value to this column.
Then in the script input and output properties, set the Output IsSorted property to True
Example:
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Output0Buffer.AddRow()
Output0Buffer.outEmployeeName = Row.EmployeeName
Output0Buffer.outEmployeeNumber = Row.EmployeeNumber
Output0Buffer.outLoginName = Row.LogineName
End Sub
Update 1
If your are looking to generate Sorted output from the Merge Join transformation, then i recommend you to follow this link:
Merge Join component sorted outputs [SSIS]
Initial Answer
The IsSorted property can be edited from the Advanced Editor,
Just right-Click on the OLEDB Source (or Excel Source if needed) , go to the Input and Output Properties, click on the Output and you will find the IsSorted property in the properties Tab
Then you souhld set the SortKeyPosition for the columns
I am using FlatFile Source Manager --> Script COmponent as Trans --> OLEDB destination in my data flow.
Source reads all the rows from flat file and i want to skip the last row (Trailer record) updating the database.
Since it contains the NULL values, database throws error.
Please assist me how to resolve this.
Regards,
VHK
To ignore the last row you have to do the following steps:
Add a DataFlow Task (let's name it DFT RowCount)
Add a Global Variable of Type System.Int32 (Name: User::RowCount)
In this DataFlow Task add a Flat File Source (The file you want to import)
Add a RowCount component next to the Flat File Source
Map the RowCount result to the variable User::RowCount
Add Another DataFlow Task (let's name it DFT Import)
In DFT Import add a Flat File Source (File you need to Import)
Add a Script Component next to the Flat File Source
Add User::RowCount Variable to the Script ReadOnly Variables
Add an Output Column of type DT_BOOL (Name: IsLastRow)
In the Script Window write the following Script
Dim intRowCount As Integer = 0
Dim intCurrentRow As Integer = 0
Public Overrides Sub PreExecute()
MyBase.PreExecute()
intRowCount = Variables.RowCount
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
intCurrentRow += 1
If intCurrentRow = intRowCount Then
Row.IsLastRow = True
Else
Row.IsLastRow = False
End If
End Sub
Add a Conditional Split Next to the Script Component
Split Rows using the Following Expression
[IsLastRow] == False
Add the OLEDB Destination next to the conditional Split
Side Note: if you want to ignore rows for another case (not last row) just change the script writen in the script component to meet your requirements
If your requirement is to avoid rows having null values in the flat file then you can follow below approach,
Read data from flat file using source component.
Use Conditional Split component, and in the case expression provide as !ISNULL(Column1) && !ISNULL(Column2) (Column1 and Column2 can be as your wish. If your flat file has a column named, say ID and it does not have null value except the last row, then you can use as !ISNULL(ID)).
Map the case output to the OLEDB destination.
Hope this would help you a lot.
How to skip some records in script component without using conditional split component?
Create a script component with asynchronous outputs
To skip records in a script component, you need to create the script component with asynchronous outputs. By default, a script component uses synchronous output, which means that each and every row that is input to the script will also be an output from the script.
If you're using SQL Server 2005, I think you'll have to start with a new Script component, because you can't change from synchronous to asynchronous once you've worked with a Script component. In SSIS for SQL Server 2008 you can switch a Script component from synchronous to asynchronous.
Edit your Script component and select the Inputs and Outputs tab.
Select the Output buffer in the treeview.
Select the SynchronousInputID property and change the value to None.
Select the Output Columns branch in the treeview. You must use the Add Column button to create a column for each input column.
Scripting
Skipping rows
Now you can edit your script. In the procedure that processes the rows, you will add some code to control skipping and outputting rows. When you want to skip a row, you will use the Row.NextRow() command where Row is the name of the input buffer. Here's an example:
If Row.number = 5 Then
Row.NextRow()
End If
In this example rows that have a 5 in the number column will be skipped.
Outputting rows
After applying your other transformation logic, you need to indicate that the row should go to the output. This is initiated with the Output0.AddRow() command where Output0 is the name of the output buffer. The AddRow function creates the next output buffer, which pushes the previous row out of the Script component.
After you create the new row, you must assign values to the columns in the new row.
Output0Buffer.AddRow()
Output0Buffer.number = Row.number
This example adds a new row to the buffer and assigns the number value from the input buffer to the number column in the output buffer.