I am using FlatFile Source Manager --> Script COmponent as Trans --> OLEDB destination in my data flow.
Source reads all the rows from flat file and i want to skip the last row (Trailer record) updating the database.
Since it contains the NULL values, database throws error.
Please assist me how to resolve this.
Regards,
VHK
To ignore the last row you have to do the following steps:
Add a DataFlow Task (let's name it DFT RowCount)
Add a Global Variable of Type System.Int32 (Name: User::RowCount)
In this DataFlow Task add a Flat File Source (The file you want to import)
Add a RowCount component next to the Flat File Source
Map the RowCount result to the variable User::RowCount
Add Another DataFlow Task (let's name it DFT Import)
In DFT Import add a Flat File Source (File you need to Import)
Add a Script Component next to the Flat File Source
Add User::RowCount Variable to the Script ReadOnly Variables
Add an Output Column of type DT_BOOL (Name: IsLastRow)
In the Script Window write the following Script
Dim intRowCount As Integer = 0
Dim intCurrentRow As Integer = 0
Public Overrides Sub PreExecute()
MyBase.PreExecute()
intRowCount = Variables.RowCount
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
intCurrentRow += 1
If intCurrentRow = intRowCount Then
Row.IsLastRow = True
Else
Row.IsLastRow = False
End If
End Sub
Add a Conditional Split Next to the Script Component
Split Rows using the Following Expression
[IsLastRow] == False
Add the OLEDB Destination next to the conditional Split
Side Note: if you want to ignore rows for another case (not last row) just change the script writen in the script component to meet your requirements
If your requirement is to avoid rows having null values in the flat file then you can follow below approach,
Read data from flat file using source component.
Use Conditional Split component, and in the case expression provide as !ISNULL(Column1) && !ISNULL(Column2) (Column1 and Column2 can be as your wish. If your flat file has a column named, say ID and it does not have null value except the last row, then you can use as !ISNULL(ID)).
Map the case output to the OLEDB destination.
Hope this would help you a lot.
Related
I am new to SSIS. I am trying to create a separate excel file dynamically in data flow task for each iteration of the for-each loop? Please guide
You can utilize the following approach.
Create an excel file template on the folder where you want to drop the new files.
Connect your excel file destination to the template file created in the folder.
Create two variables:
variable: IterationCount Data Type Int default value 1.
Variable: FileName Data Type: string
Expression = "Mybasefilename_" + (DT_STR, 4,1252)[User::IterationCount] + ".xlsx"
On your excel file connection hit right click and hit properties go to expression and hit three ellipses and look for filename property.
Set the property value choosing #[User::Filename] variable. If the Name property is not available use the connection string property, however, you should add the folder path as part of your filename variable to create the entire file destination and name.
Last step in your FELC you need to update the IterationCount variable in each iteration.
So, we cannot catch the index of the iteration then you need to use an expression in the FELC, expression task, or a script task to update the IterationCount variable.
Expression task example:
#[User::IterationCount] = #[User::IterationCount] + 1
Helpful Links:
Microsoft - SSIS ForEach Loop Container
SSIS Expression Task
SSIS - Updating variables using Script Task
Hi I have one doubt in SSIS,
I want load multiple csv files into SQL server table using SSIS package.
while loading time we need consider data from headers on wards.
Source path have 3 csv files with fixed header columns with data
but each file have file desciption and dates creation information before headers and
one file description comes 2row and headers row start from 4th row with data.
Another file description comes from 1 row and 9 row on wards have headers with data and another file will come file description from 5 row and headers row start from 7th row. Columns headers are fixed in the all csv files
Files location :
C:\test\a.csv
C:\test\b.csv
C:\test\c.csv
a.csv file data like below :
here descritpion and dates comes 2and 3 row.actual data start from 4th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-20
id |name|loc
1 |a |hyd
b.csv file data like below :
here descritpion and dates comes 1and 2 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
10 |b |chen
c.csv file data like below :
here descritpion and comes 5 and 6 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
20 |c |bang
Based on above 3 file I want load data into target sql server table emp :
id | Name |Sal
1 |a |hyd
2 |b |chen
3 |c |bang
here I tried like below in the package side:
create variable :
filelocationpath: C:\test\
filename : C:\test\a.csv
drag and drop the for-each loop container :
choose the type of enumerator for-each file enumerator
directory: c:\test
variable mapping :filename configure it.
type of file: *.csv
retrieve filename: filename and extension
Inside for-each loop container I drag and drop the data-flow task
and create flat file connection, here used one of file is configure and header row skipped is 1 and used data conversion required column and configure to OLE DB destination table and create dynamic connection expression for flat-file connection to pass filename dynamically.
After executing the package 2nd file is failed due to description and dates information:
description and dates is not constantly comes fixed rows next day files
description and dates will comes with different rows
Is there possible to find dynamical how many row will skip and that count will pass in header row skip.is it possible in SSIS.
Please tell me how to achieve this task in SSIS
If you have constantly count of rows which you should skip then try to go on utube and find this video: Delete Top N Rows from Flat File in SSIS Package.
In case you still need to find that amount and you don't know it that try to write into variable the amount for useless rows and then that value paste for processing package.
Workaround
In the Flat File connection manager uncheck the read header from first row option, then go to the advanced tab and define the columns metadata manually (column name, length ...)
Within the Data Flow Task, add a script component
In the Script Component Editor, go to the Input and Output Tab and add an Output column of type boolean
In the script editor, keep checking if the first column value is equal to the column header, while this condition is not met always set the output column value to false, when the column value is equal to the column header then set the output column value for all remaining rows to True
Next to the Script component, add a Conditional split to filter row based on the generated column value (rows with False value must be ignored)
Create a new file connection with a single column for the same file.
Add a Data flow task with a transformation script component.
Attach to the script component a readwrite variable as index (skiprows on the example code) and check the first characters of each row in the process input row.
bool checkRow;
int rowCount;
public override void PreExecute()
{
base.PreExecute();
checkRow = true;
rowCount = 0;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (checkRow)
{
rowCount++;
if (Row.Data.StartsWith("id |"))
checkRow = false;
}
}
public override void PostExecute()
{
base.PostExecute();
Variables.skiprows = rowCount;//set script variable
}
Then you just need to set your variable in the expression 'HeaderRowsToSkip' for the original flat file connection.
If the files are going to be very large, you can force the script to fail when you had found the first row (zero division for example). Add an error event and set the system variable "Propagate" to false (#[System::Propagate]=false).
In SSIS - How can I split data from row into 2 rows
for example :
FROM :
ID Data
1 On/Off
2 On/Off
TO :
ID Data
1 On
1 Off
2 On
2 Off
Solution Overview
You have to use a script component to achieve this. Use an unsynchronous output buffer to generate multiple rows from on row based on your own logic.
Solution Details
Add a DataFlow Task
In the DataFlow Task add a Flat File Source, Script Component, and a Destination
In the Script Component, select ID, Data columns as Input
Go to the Input and Outputs page, click on the Output and change the Synchronous Input property to none
Add two Output Columns ID and Data into the Output
Change the Script language to Visual Basic
Inside the Script editor write the following code
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim strValues() as String = Row.Data.Split(CChar("/")
For each str as String in strValues
Output0Buffer.AddRow()
Output0Buffer.ID = Row.ID
Output0Buffer.Data = str
Next
End Sub
Additional Information
For more details follow these links:
SSIS - Script Component, Split single row to multiple rows
Output multiple rows from script component per single row input
Using T-SQL
Based on your comments, this is a link that contains a n example of how this can be done using a SQL command
Turning a Comma Separated string into individual rows
I've got some SSIS packages that take CSV files that come from the vendor and puts them into our local database. The problem I'm having is that sometimes the vendor adds or removes columns and we don't have time to update our packages before our next run, which causes the SSIS packages to abend. I want to somehow prevent this from happening.
I've tried reading in the CSV files line by line, stripping out new columns, and then using an insert statement to put the altered line into the table, but that takes far longer than our current process (the CSV files can have thousands or hundreds of thousands of records).
I've started looking into using ADO connections, but my local machine has neither the ACE nor JET providers and I think the server the package gets deployed to also lacks those providers (and I doubt I can get them installed on the deployment server).
I'm at a loss as to what I can do to be able to load tables and be able to ignore newly added or removed columns (although if a CSV file is lacking a column the table has, that's not a big deal) that's fast and reliable. Any ideas?
I went with a different approach, which seems to be working (after I worked out some kinks). What I did was take the CSV file rows and put them into a temporary datatable. When that was done, I did a bulk copy from the datatable to my database. In order to deal with missing or new columns, I determined what columns were common to both the CSV and the table and only processed those common columns (new columns were noted in the log file so they can be added later). Here's my BulkCopy module:
Private Sub BulkCopy(csvFile As String)
Dim i As Integer
Dim rowCount As Int32 = 0
Dim colCount As Int32 = 0
Dim writeThis As ArrayList = New ArrayList
tempTable = New DataTable()
Try
'1) Set up the columns in the temporary data table, using commonColumns
For i = 0 To commonColumns.Count - 1
tempTable.Columns.Add(New DataColumn(commonColumns(i).ToString))
tempTable.Columns(i).DataType = GetDataType(commonColumns(i).ToString)
Next
'2) Start adding data from the csv file to the temporary data table
While Not csvReader.EndOfData
currentRow = csvReader.ReadFields() 'Read the next row of the csv file
rowCount += 1
writeThis.Clear()
For index = 0 To UBound(currentRow)
If commonColumns.Contains(csvColumns(index)) Then
Dim location As Integer = tableColumns.IndexOf(csvColumns(index))
Dim columnType As String = tableColumnTypes(location).ToString
If currentRow(index).Length = 0 Then
writeThis.Add(DBNull.Value)
Else
writeThis.Add(currentRow(index))
End If
'End Select
End If
Next
Dim row As DataRow = tempTable.NewRow()
row.ItemArray = writeThis.ToArray
tempTable.Rows.Add(row)
End While
csvReader.Close()
'3) Bulk copy the temporary data table to the database table.
Using copy As New SqlBulkCopy(dbConnection)
'3.1) Set up the column mappings
For i = 0 To commonColumns.Count - 1
copy.ColumnMappings.Add(commonColumns(i).ToString, commonColumns(i).ToString)
Next
'3.2) Set the destination table name
copy.DestinationTableName = tableName
'3.3) Copy the temporary data table to the database table
copy.WriteToServer(tempTable)
End Using
Catch ex As Exception
message = "*****ERROR*****" + vbNewLine
message += "BulkCopy: Encountered an exception of type " + ex.GetType.ToString()
message += ": " + ex.Message + vbNewLine + "***************" + vbNewLine
LogThis(message)
End Try
End Sub
There may be something more elegant out there, but this so far seems to work.
Look into BiML, which build and executes your SSIS Package dynamically based on the meta-data at run time.
Based on this comment:
I've tried reading in the CSV files line by line, stripping out new
columns, and then using an insert statement to put the altered line
into the table, but that takes far longer than our current process
(the CSV files can have thousands or hundreds of thousands of
records).
And this:
I used a csvreader to read the file. The insert was via a sqlcommand
object.
It would appear at first glance that the bottleneck is not in the flat file source, but in the destination. An OLEDB Command executes in a row by row fashion, one statement per input row. By changing this to an OLEDB destination, it will convert the process to a bulk insert operation. To test this out, just use the flat file source and connect it to a derived column. Run that and check the speed. If it's faster, change to the oledb destination and try again. It also helps to be inserting into a heap (no clustered or nonclustered indexes) and use tablock.
However, this does not solve your whole varied file problem. I don't know what the flat file source does if you are short a column or more from how you originally configured it at design time. It might fail, or it might import the rows in some jagged form where part of the next row is assigned to the last columns in the current row. That could be a big mess.
However, I do know what happens, when a flat file source gets extra columns. I put in this connect item for it which was sadly rejected: https://connect.microsoft.com/SQLServer/feedback/details/963631/ssis-parses-flat-files-incorrectly-when-the-source-file-contains-unexpected-extra-columns
What happens is that the extra columns are concatenated into the last column. If you plan for it, you could make the last column large and then parse in SQL from the staging table. Also, you could just jam the whole row into SQL and parse each column from there. That's a bit clunky though because you'll have a lot of CHARINDEX() checking the position of values all of the place.
An easier option might be to parse it in .Net in a script task using some combo of split() to get all the values and check the count of values in the array to know how many columns you have. This would also allow you to direct the rows to different buffers based on what you find.
And lastly, you could ask the vendor to commit to a format. Either a fixed number of columns or use a format that handles variation like XML.
I've got a C# solution (I haven't checked it, but I think it works) for a source script component.
It will read the header into an array using split.
And then for each data row use the same split function and use the header value to check the column and use rowval to set the output.
You will need to put all the output columns in to the output area.
All columns that are not present will have a null value on exit.
public override void CreateNewOutputRows()
{
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"[filepath and name]"))
{
while (!sr.EndOfStream)
{
string FullText = sr.ReadToEnd().ToString();
string[] rows = FullText.Split('\n');
//Get header values
string[] header = rows[0].Split(',');
for (int i = 1; i < rows.Length - 1; i++)
{
string[] rowVals = rows[i].Split(',');
for (int j = 0; j < rowVals.Length - 1; j++)
{
Output0Buffer.AddRow();
//Deal with each known header name
switch (header[j])
{
case "Field 1 Name": //this is where you use known column names
Output0Buffer.FieldOneName = rowVals[j]; //Cast if not string
break;
case "Field 2 Name":
Output0Buffer.FieldTwoName = rowVals[j]; //Cast if not string
break;
//continue this pattern for all column names
}
}
}
}
}
}
I have static table that has the has something like this:
xy_Jan10
yz_Feb11
xx_March14
by_Aug09
etc.
these names are static and they are stored in a table. I am using ForEachLoop container, so first i am reading and saving all the static names that i mentioned above into an system object variable. Next i am using the ForeachLoop containter and looping through for each of the file name and saving it into another string variable called strFileName. So in my for each loop container, i have script task that checks first if the file exists and here is where i have the problem, for each file name that comes to the variable i want to check if that file name exist firs, if exists i want to load it into my table, if not exist then i want to check the next file name, if next file name does not exist then i want to check the next variable name inline and so on. I only want to load if the variable file name matches the files on the network drive, if it is not found then i want to check next one until i go through each one in my static list names. My issue now script task stops when there is no match with the file names but i want it to go to the next variable name in the list and load it because there are a lot of other matches that are not loaded. the script task stops at the first one where it finds non much. Here is my script task:
please not the files i am loading are SAS files.
Public Sub Main()
'
' Add your code here
'
Dim directory As DirectoryInfo = New DirectoryInfo("\\840KLM\DataMart\CPTT\Cannon")
Dim file As FileInfo() = directory.GetFiles("*.sas7bdat")
If file.Length > 0 Then
Dts.Variables("User::var_IsFileExist").Value = True
Else
Dts.Variables("User::var_IsFileExist").Value = False
End If
Dts.TaskResult = ScriptResults.Success
End Sub
It looks like you need to wrap the script task inside a ForEach loop container. There's plenty of information about how to do this on the web, or even on Stack Overflow: How to loop through Excel files and load them into a database using SSIS package?
or
http://www.sqlis.com/sqlis/post/Looping-over-files-with-the-Foreach-Loop.aspx