How to handle this scenario in single SSIS package? - sql-server

I'm receiving around 100 excel files on daily basis ,in these 100 files there are 4 types of files which name start with (ALC,PLC,GLC and SLC) and then some random No. and each excel file sheetname is same as filename.
Now inside of each type and each file at cell A3 there is 'request by' and then user name for eg-Request by 'Ajeet' and we want to pick the file which is requested by only 'Ajeet', first few rows are not formatted, actual data start from.
ALC data start from A33 Cell
PLC data start from A36 Cell
GLC data start from A32 cell
SLC data start from A38 cell
And few files having no data so in that case "NoData" is mentioned in respective type of files from where data start.
All type of file containing same no. of column.
So how can we handle all these situation in SSIS and load the data into a single SQL table but without using script task. I have attached snapshot one of the file for your reference.

This will help.
how-to-read-data-from-an-excel-file-starting-from-the-nth-row-with-sql-server-integration-services
Copying the solution here in case the link is unavailable
Solution 1 - Using the OpenRowset Function
Solution 2 - Query Excel Sheet
Solution 3 - Google It
Google it, The information above is from the first search result

Related

Merging Text/Data files into Rows and Columns over for 80m lines of data

I've been assigned to take over 5k of csv files and merge them to create seperate files which contain transposed data with each filename becoming a column in a new file (the source column 1 being extracted from each file as the data) and rows = dates.
I was after some input/suggestions on how to accomplish this..
Example details as follows:
File1.csv -> File5000.csv
Each file contains the following
Date, Quota, Price, % Value, BaseCost,...etc..,Units
'date1','value1-1','value1-2',....,'value1-8'
'date2','value2-1','value2-2',....,'value2-8'
....etc....
'date20000','value20000-1','value20000-2',....,'value20000-8'
The resulting/merged csv file(s) would look like this:
Filename- Quota.csv
Date,'File1','File2','File3',etc.,'File5000'
'date1','file1-value1-1''file2-value1-1','file3-value1-1',
etc.,'File5000-value20000-1'
'date20000','file1-value20000-1','file2-value20000-1','file3-value20000-1',
etc.,'File5000-value20000-1'
Filename Price,csv
Date,'File1','File2','File3',etc.'File5000'
'date1','file1-value2-1''file2-value2-1','file3-value2-1',
etc.,'File5000-value2-1'
'date20000','file1-value20000-2','file2-value20000-2','file3-value20000-2',
etc.,'File5000-value20000-2'
....up to Filename: Units.csv
Date,'File1','File2','File3',etc.'File5000'
'date1','file1-value2-8''file2-value2-8','file3-value2-8',
etc.,'File5000-value20000-8'
'date20000','file1-value20000-8','file2-value20000-8','file3-value20000-8',
etc.,'File5000-value20000-8'
I've been able to use an array contruct to reformat the data, but due to the shear number of files and entries it uses way too much RAM - the array gets too big, and this approach is not scalable.
I was thinking of simply loading each of the 5,000 files one at a time and extracting each line 'one at a time' per file, then outputing the results to each new files 1-8 row-by-row, however this may take an extremely long time to convert the data even on an SSD drive with over 80million lines of data in 5k+ files.
The idea was it would load File1.csv, extract the first line, store the Date and first column data into a simple array. Then load the second File2.csv, extract the first line, check if the Date matches and if so store the first column data in the same array....repeat for all 5k files and once completed store the array into a new file Column1-8.csv. Then repeat each file again for the corresponding dates and only extract the first data column of each file to add to the Value1.csv file. Then repeat the whole process for Column2 data, up to Column8....taking forever :(
Any ideas/suggestions on approach via scripting language?
Note: The machine it will likely run on only has 8GB RAM, using *nix.

Auto-generating destinations of split files in SSIS

I am working on my first SSIS package. I have a view with data that looks something like:
Loc Data
1 asd
1 qwe
2 zxc
3 jkl
And I need all of the rows to go to different files based on the Loc value. So all of the data rows where Loc = 1 should end up in the file named Loc1.txt, and the same for each other Loc.
It seems like this can be accomplished with a conditional split to flat file, but that would require a destination for each Location. I have a lot of Locations, and they all will be handled the same way other than being split in to different files.
Is there a built in way to do this without creating a bunch of destination components? Or can I at least use the script component to act as a way?
You should be able to set an expression using a variable. Define your path up to the directory and then set the variable equal to that column.
You'll need an Execute SQL task to return a Single Row result set, and loop that in a container for every row in your original result set.
I don't have access at the moment to post screenshots, but this link should help outline the steps.
So when your package runs the expression will look like:
'C:\Documents\MyPath\location' + #User::LocationColumn + '.txt'
It should end up feeding your directory with files according to location.
Set the User::LocationColumn equal to the Location Column in your result set. Write your result set to group by Location, so all your records write to a single file per Location.
I spent some time try to complete this task using the method #Phoenix suggest, but stumbled upon this video along the way.
I ended up going with the method shown in the video. I was hoping I wouldn't have to separate it in to multiple select statements for each location and an extra one to grab the distinct locations, but I thought the SSIS implementation in the video was much cleaner than the alternative.
Change the connection manager's connection string, in which you have to use variable which should be changed.
By varying the variable, destination file also changes
and connection string is :
'C:\Documents\ABC\Files\' + #User::data + '.txt'
vote this if it helps you

SSIS error handling: redirect rows that have zip code field more than 5 from a flat file

I have been given a task to load a simple flat file into another using ssis package. The source flat file contains a zip code field, now my task is to extract and load into another flat file that accepts only the ones with correct zip code which is 5 digit zip code , and redirect the invalid rows to a new file.
Since I am new to SSIS, any help or ideas is much appreciated.
You can add a derived column which determines the length of the field. Then you can add a conditional split based on that column. <= 5 goes the good path, > 5 goes the reject path.

How to combine column header names in excel,before loading it into a target table using SSIS

There is an excel sheet(this screen shot is only a part ,which shows first 5 columns from a wider 800 column sheet!) with below mentioned format.
The column names are in the 4th row .I need to make column names as a combination of 3rd and
4th columns,before loading it into table using SQL SERVER INTEGRATION SERVICE(SSIS).How to import the file to meet the requirements>?
example of column names required in the target table:-
[Plcn Pyll SUM 00BASELINE]
[Plcn Pyll SUM 04QUARTER_2014_15_Q1]
[Plcn Pyll SUM 08QUARTER_2014_15_Q2]
How to import the excel file to meet the requirement?
(SSIS Flat file connection manager has some limitations right? So this has to be handled in excel vb level?)
If I convert excel to txt and try to import using flat file connection manager,It shows
"there is more than one data source column with same name" .
The thing is 00BASELINE is appeared more than one time among the 800 column headers..

compare two excel files in a ssis "Foreach Loop Container"

Introduction : I have Multiple Excel files which loop through a Foreach Loop Container in SSIS Package.
The first Excel file Excel1.xlsx contains the old data (for example :I have a column named EffectiveDate populated with 2001-01-01 to 2013-04-01of
The second Excel file Excel2.xlsx contains the new entries with EffectiveDate from 2013-05-01 and also contains some old data from Excel1.xlsx.
These two files loop through Foreach Loop Container.
Problem : Once the first Excel file Excel1.xlsx is loaded , i want to compare it with second Excel file Excel2.xlsx and update the EffectiveDate of old data in Excel2.xlsx with EffectiveDate of matching rows in Excel1.xlsx
And all other rows( or new Entires) of Excel2.xlsx with GetDate().
Is it possible to get it done in a single Data Flow Task?
And also how do i compare two excel files in a single container?
You can have 2 Excel sources within one data flow task. You could use a merge join to compare the values, and feed that to an excel output.
If you want to loop through 10 excel files, comparing 1 to another, I would suggest that your merge join output be the 2nd excel source, and map your container variable to the first excel source. That way, Everything from Excel file 1 will be put into the output file, then for each subsequent file only the entries not listed already in the output file will be added.
If you get hung up on any of the steps individually I'm sure myself or others can help you push through the sticking points.

Resources