With SSIS, how do you export SQL results to multiple CSV files? - sql-server

In my SSIS package, I have an Execute SQL Task that is supposed to return up to one hundred million (100,000,000) rows.
I would like to export these results to multiple CSV files, where each file has a maximum of 500,000 rows. So if the SQL task generates 100,000,000 results, I would like to produce 200 csv files with 500,000 records in each.
What are the best SSIS tasks that can automatically partition the results into many exported CSV files?
I am currently developing a script task but find that it's not very performant. I am a bit new to SSIS so I am not familiar with all the different tasks available, and I'm wondering if maybe there's another one that can do it much more efficiently.
Any recommendations?

Static approach
First add a dataflow task.
In the dataflow task add the following:
A source: in the screenshot ADO NET Source. That contains the query to retrieve the data
A conditional split: Every condtion you add will result in a blue output arrow. You need to connect every arrow to a destination
Excel destination or flat file destiation. Depending if you want Excel files or csv files. For CSV files you'll need to setup a file connection.
In the conditional split you can add multiple conditions to split out your data and have a default output.
Flat file connection manager:
Dynamic approach
Use Execute SQL Task to retrieve the variables to start a for loop. (BatchSize, Start, End)
Add a for / foreach
Add a dataflow task in the loop, pass in the parameters from the loop.
(You can pass parameters/expressions to sub process in the dataflow using the expressions property. )
Fetch the data with a source in a dataflow task based on the parameters from the for loop.
Write to a destination (Excel/CSV) with a dynamic name based from the parameters of the loop.

Related

Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row

Excel to SQL (SSIS) - Importing more then 1 file, ever file has more then 1 sheet and the data from excel starts from the 3rd row.
How would you build this the best way?
I know how to do each 1 separate but together I got into a pickle.
Please help me as I haven't found Videos or sites regarding this.
Just to clarify -
The tables (in excel) have the same design (each in different sheet).
Some excel files have 4 sheets some have only 3.
Many thanks,
Eyal
Assuming that all of the Excel files to be imported are located in the same folder, you will first create a For-Each loop in your control flow. Here you will create a user variable that will be assigned the full path and file name of the Excel file being read (you'll need to define the .xls or .xlsx extension in the loop in order to limit it to reading only Excel files). The following link shows how to set up the first part.
How to read data from multiple Excel files with SQL Server Integration Services
Within this loop you will then create a another For-Each loop that will loop through all of the Worksheets in that current Excel file being read. Apply the following link to perform that task of reading the rows and columns from each worksheet into the database table.
Use SSIS to import all of the worksheets from an Excel file
The outer loop will pick up the Excel file and the inner loop will read each worksheet, regardless of the number. They key is that the format of each worksheet must be the same. Also, using the Excel data flow task, you can define from which line of each worksheet to begin reading. The process will continue until all of the Excel files have been read.
For good tracking and auditing purposes, it is a good idea to include counters in the automated process to track the number of files and worksheets for each that were read. I also like to first import all of the records into staging tables where any issues and cleaning can be performed for efficiently using SQL before populating the results to the final production tables.
Hope this all helps.

Best Practice. Script task or Truncate Table Reread Flat File

I am wondering which one is the best practice out of these two options:
Reverse read text file (.txt) and take the line where the date > max(date) in table by using script task in ssis
re-Read Flat File (.txt) every time SSIS runs, truncate and reinsert everything in the table.
Thanks in advance.
You should follow these steps:
Add an Execute SQL Task that Get The MaxDate and Store it into a Date Variable
Add a DataFlow Task
In the DataFlow Task add a Flat File Source, Conditional Split, OLEDB Destination
In the Conditional Split, filter only rows Where [Date] > #[User::Date Variable]
This approach is better then the two approaches you mentioned
Additional Information
SSIS Basics: Using the Execute SQL Task to Generate Result Sets
Simple SSIS: Importing Data from Flat Text Files
Result Sets in the Execute SQL Task

How to automate export of multiple tables to delimited text in SSIS?

I am trying to create some sort of automation whereby I can generate a series of pipe-delimited text extracts for about 100 different tables each month. Each extract would be based on a simple query like this:
SELECT *
FROM tablename
WHERE AsOfDate = 'currentmonth'
where both tablename and currentmonth would be variables. The tablename variable name would change for each of the tables but currentmonth would remain the same throughout the execution.
I have been attempting to build an SSIS package that uses a ForEach Loop container that runs through a list of all the table names and passes that variable into a SQL string, which is then used by the OLE DB Data source in the data flow.
However, all of these tables have different columns. Based on what I can tell, it would not be feasible to do a simple OLE DB Source to a Flat File Destination within that loop container since the Flat File Connection Manager must be configured to account for the different columns of each table.
Would there be any feasible way to do this outside of configuring the process manually for each of the 100+ tables?
You could look into BiML which programmatically creates your dataflows based on metadata.
Or you could use a Script task that loops through the tables, loops through their columns, and generates text files instead of using any dataflow at all.

SSIS export sql table to multiple flat files based on rowcount

I know this may be a simple task but I have yet to find a simple answer. I have a large sql table that I want to export into multiple flat files (.csv to be exact) that are 10,000 records each. I want to do this using SSIS and from what I gather I will need a FOREACH LOOP container. This is as far as I have got. As an added bonus, a few of the columns have commas in the data itself so when the file gets delimited by commas the data still needs to be preserved without taking out the original comma
All the videos I have come across have been using scripts or delimited by the type of data or some other way. I just want to have csv files based on a set number of records in each file. Any help is much appreciated.

Dynamic Columns in Flat File Destination

I am working on a generic SSIS package that receives a flat file, add new columns to it, and generate a new flat file.
The problem I have is that the number of new columns varies based on a stored procedure XML parameter. I tried to use the "Execute Process Task" to call BCP, but the XML parameter is too long for the command line.
I search on the web and found that you cannot dynamically change the SSIS package during runtime and that I would have to use a script task to generate the output. I started going trough that path and found that you still have to let the script component know how may columns will be receiving and that is exactly what I do not know at design time.
I found a third party SSIS extension from CozyRoc, but I want to do it without any extensions.
Has anyone done something like this?
Thanks!
If the number of columns is unknown at run time then you will have to do something dynamically, and that means using a script task and/or a script component.
The workflow could be:
Parse the XML to get the number of rows
Save the number of rows in a package variable
Add columns to the flat file based on the variable
This is all possible using script tasks, although if there is no data flow involved, it might be easier to do the whole thing in an external Perl script or C# program and just call that from your package.

Resources