SSIS Export all data from one table into multiple files - loops

I have a table called customers which contains around 1,000,000 records. I need to transfer all the records to 8 different flat files which increment the number in the filename e.g cust01, cust02, cust03, cust04 etc.
I've been told this can be done using a for loop in SSIS. Please can someone give me a guide to help me accomplish this.
The logic behind this should be something like "count number of rows", "divide by 8", "export that amount of rows to each of the 8 files".

To me, it will be more complex to create a package that loops through and calculates the amount of data and then queries the top N segments or whatever.
Instead, I'd just create a package with 9 total connection managers. One to your Data Database (Source) and then 8 identical Flat File Connection managers but using the patterns of FileName1, Filename2 etc. After defining the first FFCM, just copy, paste and edit the actual file name.
Drag a Data Flow Task onto your Control Flow and wire it up as an OLE/ADO/ODBC source. Use a query, don't select the table as you'll need something to partition the data on. I'm assuming your underlying RDBMS supports the concept of a ROW_NUMBER() function. Your source query will be
SELECT
MT.*
, (ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) % 8 AS bucket
FROM
MyTable AS MT;
That query will pull back all of your data plus assign a monotonically increasing number from 1 to ROWCOUNT which we will then apply the modulo (remainder after dividing) operator to. By modding the generated value by 8 guarantees us that we will only get values from 0 to 7, endpoints inclusive.
You might start to get twitchy about the different number bases (base 0, base 1) being used here, I know I am.
Connect your source to a Conditional Split. Use the bucket column to segment your data into different streams. I would propose that you map bucket 1 to File 1, bucket 2 to File 2... finally with bucket 0 to file 8. That way, instead of everything being a stair step off, I only have to deal with end point alignment.
Connect each stream to a Flat File Destination and boom goes the dynamite.

You could create a rownumber with a Script Component (don't worry very easy): http://microsoft-ssis.blogspot.com/2010/01/create-row-id.html
or you could use a rownumber component like http://microsoft-ssis.blogspot.com/2012/03/custom-ssis-component-rownumber.html or http://www.sqlis.com/post/Row-Number-Transformation.aspx
For dividing it in 8 files you could use the Balanced Data Distributor or the Conditional Split with a modulo expression (using your new rownumber column):

Related

How to export more than 1 million rows to Excel with SSIS 2016?

I'm using SSIS with SQL Server 2016 to produce both text and Excel files (also version 2016). Some data flow tasks return more than a million rows of results. Writing to text is not an issue. However, Excel is limited to 1 million rows per sheet.
How do I configure the Excel Destination, or other component, to ensure more than a million rows are saved in the target workbook using multiple sheets as needed?
Sometimes, you need to push back to whomever has requested "exporting all the data to Excel" as it's just not an option. Is an analyst really going to be able to do anything with a a spreadsheet with multi-million rows? No, no they will not.
Purely as an exercise of "how could I do this really bad thing"
(excluded from this exercise is a custom Script Destination as I didn't feel like writing code)
You must determine ahead of time what is a reasonable upper bound on the the number of sheets to be created. You can either limit your worksheet to a million or hit the actual limit of 1,048,576 rows per sheet. Possibly 1,048,575 rows because you want to repeat your header across sheets.
Whatever that maximum number of sheets is, N, you will need to create to create N Excel Destinations
You'll need to have a ROW_NUMBER() function applied to your source data so you'll have to have a custom query there
SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS RowNum FROM dbo.MyTable;
The Conditional Split will use the modulo function to assign rows to their paths
RowNum % N == 1
RowNum % N == 2
...
RowNum % N == (N-1)
and the default output path is for == 0

How to read a flat file and load it into 2 different SQL tables(different table structure) using SSIS 2019

I have a flat file which doesn't have the header record. The data except the trailing record is like a fixed width flat file with no delimiters.
Data in flat file looks like:
TOM ROLLS
DAVECHILLS
TOTAL2XYZ
Fixed Width data(first 2 lines as shown in the above flat file data)
ColumnName Start position End position
Name 1 4
Last_name 5 9
I want to load the data(till trailing record) in data_table and the trailing record(starting with Total) in another table. The data in the total table should look like
c1 c2
2 XYZ
For the data table, I am currently using "fixed width" and dividing the data into different column and it is working fine. Can you please help to load this last trailing record in a different table(Total table as discussed above)
You have not provided enough data for me to test because I can find several methods to load one row and accomplish what you are asking but those methods not necessarily work with multiple rows depending on the structure of you source data.
On the surface it appears that you simply need to make another flat file connection and define the starting and end position to extract only the data for the second table.

SQL Invoice Query Performance converting Credits to negative numbers

I have a 3rd party database that contains Invoice data I need to report on. The Quantity and Amount Fields are stored as Positive numbers regardless of whether the "invoice" is a Credit Memo or actual Invoice. There is a single character field that contains the Type "I" = Invoice, "R" = Credit.
In a report that is equating 1.4 million records, I need to sum this data, so that Credits subtract from the total and Invoices add to the total, and I need to do this for 8 different columns in the report (CurrentYear, PreviousYear, etc)
My problem is performance of the many different ways to achieve this.
The Best performing seems to be using a CASE statement within the equation like so:
Case WHEN ARH.AccountingYear - 2 = #iCurrentYear THEN ARL.ShipQuantity * (CASE WHEN InvoiceType = 'R' THEN -1 ELSE 1 END) ELSE 0 END as PPY_INVOICED_QTY
But code readable wise, this is super ugly since I have to do it to 8 different columns, performance is good, runs against all 1.4M records in 16 seconds.
Using a Scalar UDF kills performance
Case WHEN ARH.AccountingYear - 2 = #iCurrentYear THEN ARL.ShipQuantity * dbo.fn_GetMultiplier(ARH.InvoiceType) ELSE 0 END as PPY_INVOICED_QTY
Takes almost 5 minutes. So can't do that.
Other options I can think of would be:
Multiple levels of Views, use a new view to add a Multiplier column, then SELECT from that and do the multiplication using the new column
Build a table that has 2 columns and 2 records, R, -1 and I, 1, and join it based on InvoiceType, but this seems excessive.
Any other ideas I am missing, or suggestions on best practice for this sort of thing? I cannot change the stored data, that is established by the 3rd party application.
I decided to go with the multiple views as Igor suggested, actually using the nested version, even though readability is lower, maintenance is easier due to only 1 named view instead of 2. Performance is similar to the 8 different case statements, so overall running in just under 20 seconds.
Thanks for the insights.

SSIS- Split output into multiple files

I am using SSIS Data Tools to create data extracts from a legacy system.
Our new system needs the files that it imports to be split into 5MB files.
Is there anyway that I can split the files into separate files?
I'm thinking that because the data is already in the database, I can do a loop, or something similar, will select a certain amount of records at a time.
Any input appreciated!
If your source is SQL, use the Row_Number function against the table key to allocate a number per row e.g.
Row_number() OVER (Order by Customer_Id) as RowNumber
and then wrap your query in a CTE or make it a sub query with a where clause to give you the number of rows that will equate to a 5MD file e.g.
WHERE RowNumber >= 5000 and RowNumber <10000
You will need to call this source target several times (with different Row Start and Row End values), so probably best to
Find number of total records in control flow and set a TotalRows parameter
Create a loop in your control flow
Set 3 parameters in your control flow to iterate the through each set of records and store the data in seperate files. e.g. first loop would set
RowStart = 0
RowEnd = 5000
FileName = MyFile_[date]_0_to_4999

SPSS :Loop through the values of variable

I have a dataset that has patient data according to the site they visited our mobile clinic. I have now written up a series of commands such as freqs and crosstabs to produce the analyses I need, however I would like this to be done for patients at each site, rather than the dataset as whole.
If I had only one site, a mere filter command with the variable that specifies a patient's site would suffice, but alas I have 19 sites, so I would like to find a way to loop through my code to produce these outputs for each site. That is to say for i in 1 to 19:
1. Take the i th site
2. Compute a filter for this i th site
3. Run the tables using this filtered data of patients at ith site
Here is my first attempt using DO REPEA. I also tried using LOOP earler.
However it does not work I keep getting an error even though these are closed loops.
Is there a way to do this in SPSS syntax? Bear in mind I do not know Python well enough to do this using that plugin.
*LOOP #ind= 1 TO 19 BY 1.
DO REPEAT #ind= 1 TO 20.
****8888888888888888888888888888888888888888888888888888888 Select the Site here.
COMPUTE filter_site=(RCDSITE=#ind).
USE ALL.
FILTER BY filter_site.
**********************Step 3: Apply the necessary code for tables
*********Participation in the wellness screening, we actually do not care about those who did FP as we are not reporting it.
COUNT BIO= CheckB (1).
* COUNT FPS=CheckF(1).
* COUNT BnF= CheckB CheckF(1).
VAL LABEL BIO
1 ' Has the Wellness screening'
0 'Does not have the wellness screening'.
*VAL LABEL FPS
1 'Has the First patient survey'.
* VAL LABEL BnF
1 'Has either Wellness or FPS'
2 'Has both surveys done'.
FREQ BIO.
*************************Use simple math to calcuate those who only did the Wellness/First Patient survey FUB= F+B -FnB.
*******************************************************Executive Summary.
***********Blood Pressure.
FREQ BP.
*******************BMI.
FREQ BMI.
******************Waist Circumference.
FREQ OBESITY.
******************Glucose.
FREQ GLUCOSE.
*******************Cholesterol.
FREQ TC.
************************ Heamoglobin.
FREQ HAEMOGLOBIN.
*********************HIV.
FREQ HIV.
******************************************************************************I Lifestyle and General Health.
MISSING VALUES Gender GroupDep B8 to B13 ('').
******************Graphs 3.1
Is this just Frequencies you are producing? Try the SPLIT procedure by the variable RCDSITE. Should be enough.
SPLIT FILES allows you to partition your data by up to eight variables. Then each procedure will automatically iterate over each group.
If you need to group the results at a higher level than the procedure, that is, to run a bunch of procedures for each group before moving on to the next one so that all the output for a group will be together, you can use the SPSSINC SPLIT DATASET and SPSSINC PROCESS files extension commands to do this.
These commands require the Python Essentials. That and the commands can be downloaded from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral) if you have at least version 18.
HTH,
Jon Peck
A simple but perhaps not very elegant way is to select from the menu: Data/Select Cases/If condition, there you enter the filter for site 1 and press Paste, not OK.
This will give the used filter as syntax code.
So with some copy/paste/replace/repeat you can get the freqs and all other results based on the different sites.

Resources