SSIS export sql table to multiple flat files based on rowcount - sql-server

I know this may be a simple task but I have yet to find a simple answer. I have a large sql table that I want to export into multiple flat files (.csv to be exact) that are 10,000 records each. I want to do this using SSIS and from what I gather I will need a FOREACH LOOP container. This is as far as I have got. As an added bonus, a few of the columns have commas in the data itself so when the file gets delimited by commas the data still needs to be preserved without taking out the original comma
All the videos I have come across have been using scripts or delimited by the type of data or some other way. I just want to have csv files based on a set number of records in each file. Any help is much appreciated.

Related

With SSIS, how do you export SQL results to multiple CSV files?

In my SSIS package, I have an Execute SQL Task that is supposed to return up to one hundred million (100,000,000) rows.
I would like to export these results to multiple CSV files, where each file has a maximum of 500,000 rows. So if the SQL task generates 100,000,000 results, I would like to produce 200 csv files with 500,000 records in each.
What are the best SSIS tasks that can automatically partition the results into many exported CSV files?
I am currently developing a script task but find that it's not very performant. I am a bit new to SSIS so I am not familiar with all the different tasks available, and I'm wondering if maybe there's another one that can do it much more efficiently.
Any recommendations?
Static approach
First add a dataflow task.
In the dataflow task add the following:
A source: in the screenshot ADO NET Source. That contains the query to retrieve the data
A conditional split: Every condtion you add will result in a blue output arrow. You need to connect every arrow to a destination
Excel destination or flat file destiation. Depending if you want Excel files or csv files. For CSV files you'll need to setup a file connection.
In the conditional split you can add multiple conditions to split out your data and have a default output.
Flat file connection manager:
Dynamic approach
Use Execute SQL Task to retrieve the variables to start a for loop. (BatchSize, Start, End)
Add a for / foreach
Add a dataflow task in the loop, pass in the parameters from the loop.
(You can pass parameters/expressions to sub process in the dataflow using the expressions property. )
Fetch the data with a source in a dataflow task based on the parameters from the for loop.
Write to a destination (Excel/CSV) with a dynamic name based from the parameters of the loop.

Can SSIS support loading of files with varying column lengths in each row?

Currently I receive a daily file of around 750k rows and each row has a 3 character identifier at the start.
For each identifier, the number of columns can change but are specific to the identifier (e.g. SRH will always have 6 columns, AAA will always have 10 and so on).
I would like to be able to automate this file into an SQL table through SSIS.
This solution is currently built in MSACCESS using VBA just looping through recordsets using a CASE statement, it then writes a record to the relevant table.
I have been reading up on BULK INSERT, BCP (w/Format File) and Conditional Split in SSIS however I always seem to get stuck at the first hurdle of even loading the file in as SSIS errors due to variable column layouts.
The data file is pipe delimited and looks similar to the below.
AAA|20180910|POOL|OPER|X|C
SRH|TRANS|TAB|BARKING|FORM|C|1.026
BHP|1
*BPI|10|16|18|Z
BHP|2
*BPI|18|21|24|A
(* I have added the * to show that these are child records of the parent record, in this case BHP can have multiple BPI records underneath it)
I would like to be able to load the TXT file into a staging table, and then I can write the TSQL to loop through the records and parse them to their relevant tables (AAA - tblAAA, SRH - tblSRH...)
I think you should read each row as one column of type DT_WSTR and length = 4000 then you need to implement the same logic written using vba within a Script component (VB.NET / C#), there are similar posts that can give you some insights:
SSIS ragged file not recognized CRLF
SSIS reading LF as terminator when its set as CRLF
How to load mixed record type fixed width file? And also file contain two header
SSIS Flat File - CSV formatting not working for multi-line fileds
how to skip a bad row in ssis flat file source

SSIS: add non-existent column to a CSV source

I am loading a large set (10s of thousands) of CSV files into a single staging sql server table, using standard SSIS approach.
Vast majority of source CSV files have identical column structure (order, set of columns, data types). There's around 140 columns all together.
However, in certain (<1%) cases a source file will be lacking some columns (I know exactly which columns they are, and there are three possible combinations of missing columns). This is by design i.e. this is a valid business scenario (meh).
Can I somehow create a "virtual" column (filled with NULL/empty/blank values) for a source CSV connection if (and only if) that column does not exist in the physical source CSV file?
I know I can read CSV header with a C# scripting component and create multiple source connections, and re-direct to the right data flow based on existence (or lack) of certain columns but I am hoping for a more "elegant" solution, with just single CSV data source "smart" enough to "artificially" add blank columns that are missing in the source file.
For simplicity let's assume that the full column set is:
ID;C1;C2;C3
And that C3 is missing occasionally i.e. some CSV files are:
ID;C1;C2
Any hints welcome.
No, there is no "smart" CSV data source built in to SSIS.
You are certainly going to need to use a script component, but instead of using a Script Task outside the dataflow that directs the control flow to the correct dataflow, you can simply create one dataflow that has a script component as the data source. The script component reads the CSV that is currently being imported, and if the column in question is missing, it supplies it with NULL or default values.

What is a better alternative to Excel for loading data to a SQL Server database?

I have a huge amount of trouble loading spreadsheets into a SQL Server database.
Currently, I'm using an SSIS package to load the data and I have had to make lots of adjustments to get the data to load:
All numbers must be formatted as text (otherwise they don't load properly).
Sometimes numbers must be preceded with single quote (') to get them to load.
If a column has a mix of number cells and text cells, the text cells must come first in the file (otherwise only numbers load and text comes in as NULL).
If a user changes a column name the file will not load.
If a user changes a tab name the file won't load.
If a user adds a new column (even at the end of a sheet) the file won't load.
Extra sheets in the file is not a problem, thankfully!
Dates seem sensitive whether or not they will load properly.
Connection strings to the Excel file must include "IMEX=1" or things are worse.
Scheduled SSIS jobs must be run as 32-bit even on 64-bit system.
I've been loading the data (usually 200,000-500,000 rows per file) into a table with all fields defined as nvarchar. Then, when loaded I transfer that data in the next step of the SSIS package to the working table with typed data fields.
All of the requirements that I must put on the user for how to format the Excel file is really a pain. We usually have to send the file back multiple times until all the formatting issues are correct before the file will load. I'd like to eliminate this thrash.
I know I'm not the only one that is facing this type of problem. So, I must ask...
What is a better alternative to Excel for loading data into a SQL Server database?
Or, am I going about this the wrong way? Should I be using something other than SSIS to load Excel spreadsheets?
You can try OpenRowSet:
SELECT *
INTO SomeTable
From OpenRowSet('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=\\servername\c$\filename.xls;HDR=YES;IMEX=1', [Sheet2$])
Not really a SQL answer, but an easy one:
You could require the users to copy and paste data to an excel spreadsheet where everything but the data fields to be included are locked. This will prevent many of the pain points described.

How to import variable record length CSV file using SSIS?

Has anyone been able to get a variable record length text file (CSV) into SQL Server via SSIS?
I have tried time and again to get a CSV file into a SQL Server table, using SSIS, where the input file has varying record lengths. For this question, the two different record lengths are 63 and 326 bytes. All record lengths will be imported into the same 326 byte width table.
There are over 1 million records to import.
I have no control of the creation of the import file.
I must use SSIS.
I have confirmed with MS that this has been reported as a bug.
I have tried several workarounds. Most have been where I try to write custom code to intercept the record and I cant seem to get that to work as I want.
I had a similar problem, and used custom code (Script Task), and a Script Component under the Data Flow tab.
I have a Flat File Source feeding into a Script Component. Inside there I use code to manipulate the incomming data and fix it up for the destination.
My issue was the provider was using '000000' as no date available, and another coloumn had a padding/trim issue.
You should have no problem importing this file. Just make sure when you create the Flat File connection manager, select Delimited format, then set SSIS column length to maximum file column length so it can accomodate any data.
It appears like you are using Fixed width format, which is not correct for CSV files (since you have variable length column), or maybe you've incorrectly set the column delimiter.
Same issue. In my case, the target CSV file has header & footer records with formats completely different than the body of the file; the header/footer are used to validate completeness of file processing (date/times, record counts, amount totals - "checksum" by any other name ...). This is a common format for files from "mainframe" environments, and though I haven't started on it yet, I expect to have to use scripting to strip off the header/footer, save the rest as a new file, process the new file, and then do the validation. Can't exactly expect MS to have that out-of-the box (but it sure would be nice, wouldn't it?).
You can write a script task using C# to iterate through each line and pad it with the proper amount of commas to pad the data out. This assumes, of course, that all of the data aligns with the proper columns.
I.e. as you read each record, you can "count" the number of commas. Then, just append X number of commas to the end of the record until it has the correct number of commas.
Excel has an issue that causes this kind of file to be created when converting to CSV.
If you can do this "by hand" the best way to solve this is to open the file in Excel, create a column at the "end" of the record, and fill it all the way down with 1s or some other character.
Nasty, but can be a quick solution.
If you don't have the ability to do this, you can do the same thing programmatically as described above.
Why can't you just import it as a test file and set the column delimeter to "," and the row delimeter to CRLF?

Resources