Hi I have one doubt in SSIS,
I want load multiple csv files into SQL server table using SSIS package.
while loading time we need consider data from headers on wards.
Source path have 3 csv files with fixed header columns with data
but each file have file desciption and dates creation information before headers and
one file description comes 2row and headers row start from 4th row with data.
Another file description comes from 1 row and 9 row on wards have headers with data and another file will come file description from 5 row and headers row start from 7th row. Columns headers are fixed in the all csv files
Files location :
C:\test\a.csv
C:\test\b.csv
C:\test\c.csv
a.csv file data like below :
here descritpion and dates comes 2and 3 row.actual data start from 4th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-20
id |name|loc
1 |a |hyd
b.csv file data like below :
here descritpion and dates comes 1and 2 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
10 |b |chen
c.csv file data like below :
here descritpion and comes 5 and 6 row.actual data start from 9th row onwards
descritiion:empinfromationforhydlocation
creadeddate:2018-04-21
id |name|loc
20 |c |bang
Based on above 3 file I want load data into target sql server table emp :
id | Name |Sal
1 |a |hyd
2 |b |chen
3 |c |bang
here I tried like below in the package side:
create variable :
filelocationpath: C:\test\
filename : C:\test\a.csv
drag and drop the for-each loop container :
choose the type of enumerator for-each file enumerator
directory: c:\test
variable mapping :filename configure it.
type of file: *.csv
retrieve filename: filename and extension
Inside for-each loop container I drag and drop the data-flow task
and create flat file connection, here used one of file is configure and header row skipped is 1 and used data conversion required column and configure to OLE DB destination table and create dynamic connection expression for flat-file connection to pass filename dynamically.
After executing the package 2nd file is failed due to description and dates information:
description and dates is not constantly comes fixed rows next day files
description and dates will comes with different rows
Is there possible to find dynamical how many row will skip and that count will pass in header row skip.is it possible in SSIS.
Please tell me how to achieve this task in SSIS
If you have constantly count of rows which you should skip then try to go on utube and find this video: Delete Top N Rows from Flat File in SSIS Package.
In case you still need to find that amount and you don't know it that try to write into variable the amount for useless rows and then that value paste for processing package.
Workaround
In the Flat File connection manager uncheck the read header from first row option, then go to the advanced tab and define the columns metadata manually (column name, length ...)
Within the Data Flow Task, add a script component
In the Script Component Editor, go to the Input and Output Tab and add an Output column of type boolean
In the script editor, keep checking if the first column value is equal to the column header, while this condition is not met always set the output column value to false, when the column value is equal to the column header then set the output column value for all remaining rows to True
Next to the Script component, add a Conditional split to filter row based on the generated column value (rows with False value must be ignored)
Create a new file connection with a single column for the same file.
Add a Data flow task with a transformation script component.
Attach to the script component a readwrite variable as index (skiprows on the example code) and check the first characters of each row in the process input row.
bool checkRow;
int rowCount;
public override void PreExecute()
{
base.PreExecute();
checkRow = true;
rowCount = 0;
}
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (checkRow)
{
rowCount++;
if (Row.Data.StartsWith("id |"))
checkRow = false;
}
}
public override void PostExecute()
{
base.PostExecute();
Variables.skiprows = rowCount;//set script variable
}
Then you just need to set your variable in the expression 'HeaderRowsToSkip' for the original flat file connection.
If the files are going to be very large, you can force the script to fail when you had found the first row (zero division for example). Add an error event and set the system variable "Propagate" to false (#[System::Propagate]=false).
Related
I'm receiving around 100 excel files on daily basis ,in these 100 files there are 4 types of files which name start with (ALC,PLC,GLC and SLC) and then some random No. and each excel file sheetname is same as filename.
Now inside of each type and each file at cell A3 there is 'request by' and then user name for eg-Request by 'Ajeet' and we want to pick the file which is requested by only 'Ajeet', first few rows are not formatted, actual data start from.
ALC data start from A33 Cell
PLC data start from A36 Cell
GLC data start from A32 cell
SLC data start from A38 cell
And few files having no data so in that case "NoData" is mentioned in respective type of files from where data start.
All type of file containing same no. of column.
So how can we handle all these situation in SSIS and load the data into a single SQL table but without using script task. I have attached snapshot one of the file for your reference.
This will help.
how-to-read-data-from-an-excel-file-starting-from-the-nth-row-with-sql-server-integration-services
Copying the solution here in case the link is unavailable
Solution 1 - Using the OpenRowset Function
Solution 2 - Query Excel Sheet
Solution 3 - Google It
Google it, The information above is from the first search result
Created tables as below :
source:([id:`symbol$()] ric:();source:();Date:`datetime$())
property:([id:`symbol$()] Value:())
Then i have two .csv files which include two tables datas.
property.csv showing as below :
id,Value
TEST1,1
TEST2,2
source.csv showing as below :
id,ric,source,Date
1,TRST,QO,2017-07-07 11:42:30.603
2,TRST2,QOT,2018-07-07 11:42:30.603
Now , how to load csv file data into each tables one time
You can use the 0: to load delimited records. https://code.kx.com/wiki/Reference/ZeroColon
The most simple form of the function is (types; delimiter) 0: filehandle
The types should be given as their uppercase letter representations, one for each column or a blank space to ignore a column. e.g using "SJ" for source.csv would mean I wanted to read in the id column as a symbol and the value column as a long.
The delimiter specifies how each columns is separated, in your case Comma Separated Values (CSV). You can pass in the delimiter as a string "," which will treat every row as part of the data and return a nested list of the columns which you can either insert into a table with matching schema or you can append on headers and flip the dictionary manually and then flip to get a table like so: flip `id`value!("IS";",") 0: `:test.txt.
If you have column headers as the first row in the csv you can pass an enlisted delimeter enlist "," which will then use the column headers and return a table in kdb with these as the headers, which you can then rename if you see fit.
As the files you want to read in have different types for the columns and are to bed into you could create a function to read them in for examples
{x insert (y;enlist ",") 0:z}'[(`source;`property);("SSSP";"SJ");(`:source.csv;`:property.csv)]
Which would allow you to specify the name of the table that should be created, the column types and the file handle of the file.
I would suggest a timestamp instead of the (depreciated) datetime as it is stored as a long instead of a float so there will be no issues with comparison.
you can use key to list the contents of the dir ;
files: key `:.; /get the contents of the dir
files:files where files like "*.csv"; /filter the csv files
m:`property.csv`source.csv!("SJ";"JSSZ"); /create the mappings for each csv file
{[f] .[first ` vs f;();:; (m#f;enlist csv) 0: hsym f]}each files
and finally, load each csv file; please note here the directory is 'pwd', you might need to add the dir path to each file before using 0:
In SSIS - How can I split data from row into 2 rows
for example :
FROM :
ID Data
1 On/Off
2 On/Off
TO :
ID Data
1 On
1 Off
2 On
2 Off
Solution Overview
You have to use a script component to achieve this. Use an unsynchronous output buffer to generate multiple rows from on row based on your own logic.
Solution Details
Add a DataFlow Task
In the DataFlow Task add a Flat File Source, Script Component, and a Destination
In the Script Component, select ID, Data columns as Input
Go to the Input and Outputs page, click on the Output and change the Synchronous Input property to none
Add two Output Columns ID and Data into the Output
Change the Script language to Visual Basic
Inside the Script editor write the following code
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Dim strValues() as String = Row.Data.Split(CChar("/")
For each str as String in strValues
Output0Buffer.AddRow()
Output0Buffer.ID = Row.ID
Output0Buffer.Data = str
Next
End Sub
Additional Information
For more details follow these links:
SSIS - Script Component, Split single row to multiple rows
Output multiple rows from script component per single row input
Using T-SQL
Based on your comments, this is a link that contains a n example of how this can be done using a SQL command
Turning a Comma Separated string into individual rows
OK I'm using SQL Server 2012.
I have a 'Orders Folder' that contains CSV files. I need to loop through the csv files, load them into a SQL table, and then move the csv's into a 'Archive Folder'. I want to perform this in SSIS and know this is possible using a Foreach loop container, which I understand.
The CSV file has the following headings:
Customer / Item / Qty / Date
My SQL table has the following headings:
Customer / Item / Qty / Date / User
The tricky bit is the csv files contain the username for each order and are named like this: (USERNAME obviously changes)
FWD_Order_USERNAME_01_02_2016_1006_214.csv
I need to extract the USERNAME and append it to the SQL table for each csv file when it is imported - how do I do this?
Thanks,
Michael
You can populate a package variable with the username from the file name, and use a Derived Column transformation to add it as a new column in your dataflow.
You are using for each loop to process the files.
You must be storing file name in variable.
1) Create one more variable user name and using expression set value property expression(to get user name ), set property EvaluateAsExpression = true; for variable.
2) In data flow add derived column using user variable map this to user column in destination.
I have been given a task to load a simple flat file into another using ssis package. The source flat file contains a zip code field, now my task is to extract and load into another flat file that accepts only the ones with correct zip code which is 5 digit zip code , and redirect the invalid rows to a new file.
Since I am new to SSIS, any help or ideas is much appreciated.
You can add a derived column which determines the length of the field. Then you can add a conditional split based on that column. <= 5 goes the good path, > 5 goes the reject path.