SSIS Conditional Split to Flat File Destinations - sql-server

I've got a data flow task which has a conditional split which then leads to two different flat file destination. The thing that is puzzling me, is why do I have different 'available destination columns' in the flat file editor's mappings tab than I do in the alternative flat file editor's mapping tab.
I'm hoping this is a Newbie error, but it's had me stumped all morning.

At the bottom of the screen, is a box with the connection manager.
If you have two different flatfile destination then you should have a connection manager for each. Inside those objects is where the columns for each destination is defined. Double click them and examine the column definitions for each. They are probably different, which is why you have a different set of columns available on the mapping page for each of your destinations.

Related

Importing Excel Data Seems to Randomly Give Null Values

Using SSIS for Visual Studio 2017 for some excel file imports.
I've created a package with several loop containers that call to specific packages to handle some files. I have an issue with one particular package being executed in that it seemingly randomly decides the data for columns is NULL per excel file. I was/am under the impression that this is part of the registry setting for TypeGuessRows (changed initially to 0 then to 1000 as a test) located at
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel
The reason I think this is because the various files being brought in generally have the same data, but it seems that if the first few rows of columns in the source data contains only numbers, that the data with mixed values will not be brought in correctly. All other columns aside from this seems fine.
Looking at the source files, all have the same datatype.
I've tried changing the registry TypeGuessRows value and ensured that the output column property was string-based instead of numerical.
The connection string has IMEX=1
So I fixed it. Or at least found a sufficient workaround that should help anyone in my situation. I think it has to do with the cache of SSIS.
I ended up putting a sort function on the problem column so the records getting read as NULL for having a random data type are read first, and not being considered random. I will say, I tried this initially and it didn't work.
Through a little experiment of making a new data flow in the same package I discovered that this solution actually does work, hence me thinking the cache was the issue.
If anyone has any further questions on this, let me know.
This issue is related to the OLEDB provider used to read excel files: Since excel is not a database where each column has a specific data type, OLEDB provider tries to identify the dominant data types found in each column and replace all other data types that cannot be parsed with NULLs.
There are many articles found online discussing this issue and giving several workarounds (links listed below).
But after using SSIS for years, i can say that best practice is to convert excel files to csv files and read them using Flat File components.
Or, if you don't have the choice to convert excel to flat files then you can force excel connection manager to ignore headers from the first row bu adding HDR=NO to the connection string and adding IMEX=1 to tell the OLEDB provider to specify data types from the first row (which is the header - all string most of the time), in this case all columns are imported as string and no values are replaced with NULLs but you will lose the headers and a additional row (header row is imported).
If you cannot ignore the header row, just add a dummy row that contains dummy string values (example: aaa) after the header row and add IMEX=1 to the connection string.
Helpful links
SSIS Excel Data Import - Mixed data type in Rows
Mixed data types in Excel column
Importing data from Excel having Mixed Data Types in a column (SSIS)
Why SSIS always gets Excel data types wrong, and how to fix it!
EXCEL IN SSIS: FIXING THE WRONG DATA TYPES
IMEX= 1 extended properties in ssis

SSRS - producing csv file with mixed format header & footer

I have a query,
I have 3 CTE's produced by a TSQL query, which are used as a list of union-ed demographics, we have a requirement to submit this file in a particular format with a header and footer row in each file saved as csv. Can someone suggest the best method to produce the following format from SSRS, ultimately the header and footer will be generated from a SQL table which will hold the available submission ID's, I am wondering if this format can be automated from SSRS, or if there is a better approach, as you can see the header and footer appear to be tab separated, without the additional columns, then the comma separated data block, followed by another tab separated footer.
Any suggestions would be appreciated - thanks very much
format required below (dummy data):
0001011RNL DBS 473520180116
10,6374635,19540714,,,7253647987,p1,,john,,2,3 MADE UP ST,,local,CUMBRIA,,ca12 0T1,,,,,,,,A82635,,
10,9283746,19650325,,,7536482965,p2,,peter,,2,38 MADE UP ST,,WIGTON,,,ca12 0T3,,,,,,,,A82045,,
9901011RNL DBS 473520180116
UPDATED based on user suggestions: SSIS appears to be the best approach - I have found the following resource which appears to be what is required, I have taken this from https://www.sqlservercentral.com/Forums/Topic815666-148-1.aspx
provided by John Dempsey
Place 2 Data Flow Tasks on the Control Flow tab.
Use the first Data Flow Task to generate your Header and Detail records (rows)
a. Obtain the data from your data source
b. Transform your data as needed to satisfy the results for your Detail Rows
c. Set up your Flat File Destination in the layout for your Detail Rows. (Don't worry about Header yet)
d. Upon opening your Flat File Destination task you will notice a item for the "Header:". If you were to enter something in the box here it would be hard-coded as the header for you within your file. But, instead of it being hard-coded you can make it dynamic. You can create a SSIS variable to store your dynamically built header information, then assign it to the Header property of the Flat File Destination in the 1st Data Flow task.
Set dynamic Header SSIS variable to Header property of Flat File Destination in 1st Data Flow Task.
a. On the Control Flow Tab select the first Data Flow Task then go to the properties window for task
b. Expand [+] Expressions property and click the [...] button
c. In the Property Expressions Editor window click in the left column to bring up a list of properties available within the Data Flow Task for you Header and Detail Rows.
d. Choose the property [Flat File Destination].[Header] (if you haven't renamed it yet) setting the expression to the name of your SSIS variable name you created for your dynamic header.
Your dynamic header should now show up when you run the first data flow task with your detail rows.
Open up the second data flow and create a source for your footer, I used a script component as an input data source, mapping the source row to an SSIS variable that contained my Footer Row.
Create your Flat File Destination for you footer making sure that you set the Overwrite property = False. Your Flat File Destination will be the same file name as your header and detail file name from you first Data Flow Task, but the layout will be different in order to line up with the footer.
Upon running you SSIS, the package will run through executing your first Data Flow Task which writes the data from the Header property, then each detail row. Then, it will run the second Data Flow Task writing the footer row to the end of the same file used in your Header and Detail Data Flow Task.

SSIS : Creating a flat file with different row formats

I want to create a flat file output, where format of rows is different.
file has header, middle data rows, footer row.
file will look Like below
H|deptcode123|deptNameXYZ|totalemp300
E|Sam|Johnson|address1|empCode1|........many other columns
E|Sam2|Johnson2|address2|empCode2|........many other columns
E|Sam4|Johnson3|address3|empCode3|........many other columns
E|Sam5|Johnson4|address4|empCode4|........many other columns
J|300|250000
How can I generate this file in SSIS. Input will come from different tables, I am planning to write 3 separate queries/ sp's to get the header, middle row and footer row record.
To do this you need a data flow and connection manager for each different type of rowset. For example to have different header, body, and footer you would need 3 dataflows and 3 flat file connection managers. Each flat file connection manager points to the same file. The trick is to make sure the setting Overwrite data in the file in the Flat File destination is unchecked. This way each data flow executes and appends to the file and each data flow can have its discrete columns and data types.
If you want to create a flat file where rows has with different metadata. You have to use a one column flat file connection manager. With Dt_WStr data type and length = 4000
Use 3 consecutive DataFlow task using the same Flat file destination
First one write the header, second one the middle rows, third one the footer.
You can concatenate values from the select statment or using a Script Component

SSIS: add non-existent column to a CSV source

I am loading a large set (10s of thousands) of CSV files into a single staging sql server table, using standard SSIS approach.
Vast majority of source CSV files have identical column structure (order, set of columns, data types). There's around 140 columns all together.
However, in certain (<1%) cases a source file will be lacking some columns (I know exactly which columns they are, and there are three possible combinations of missing columns). This is by design i.e. this is a valid business scenario (meh).
Can I somehow create a "virtual" column (filled with NULL/empty/blank values) for a source CSV connection if (and only if) that column does not exist in the physical source CSV file?
I know I can read CSV header with a C# scripting component and create multiple source connections, and re-direct to the right data flow based on existence (or lack) of certain columns but I am hoping for a more "elegant" solution, with just single CSV data source "smart" enough to "artificially" add blank columns that are missing in the source file.
For simplicity let's assume that the full column set is:
ID;C1;C2;C3
And that C3 is missing occasionally i.e. some CSV files are:
ID;C1;C2
Any hints welcome.
No, there is no "smart" CSV data source built in to SSIS.
You are certainly going to need to use a script component, but instead of using a Script Task outside the dataflow that directs the control flow to the correct dataflow, you can simply create one dataflow that has a script component as the data source. The script component reads the CSV that is currently being imported, and if the column in question is missing, it supplies it with NULL or default values.

What is a better alternative to Excel for loading data to a SQL Server database?

I have a huge amount of trouble loading spreadsheets into a SQL Server database.
Currently, I'm using an SSIS package to load the data and I have had to make lots of adjustments to get the data to load:
All numbers must be formatted as text (otherwise they don't load properly).
Sometimes numbers must be preceded with single quote (') to get them to load.
If a column has a mix of number cells and text cells, the text cells must come first in the file (otherwise only numbers load and text comes in as NULL).
If a user changes a column name the file will not load.
If a user changes a tab name the file won't load.
If a user adds a new column (even at the end of a sheet) the file won't load.
Extra sheets in the file is not a problem, thankfully!
Dates seem sensitive whether or not they will load properly.
Connection strings to the Excel file must include "IMEX=1" or things are worse.
Scheduled SSIS jobs must be run as 32-bit even on 64-bit system.
I've been loading the data (usually 200,000-500,000 rows per file) into a table with all fields defined as nvarchar. Then, when loaded I transfer that data in the next step of the SSIS package to the working table with typed data fields.
All of the requirements that I must put on the user for how to format the Excel file is really a pain. We usually have to send the file back multiple times until all the formatting issues are correct before the file will load. I'd like to eliminate this thrash.
I know I'm not the only one that is facing this type of problem. So, I must ask...
What is a better alternative to Excel for loading data into a SQL Server database?
Or, am I going about this the wrong way? Should I be using something other than SSIS to load Excel spreadsheets?
You can try OpenRowSet:
SELECT *
INTO SomeTable
From OpenRowSet('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=\\servername\c$\filename.xls;HDR=YES;IMEX=1', [Sheet2$])
Not really a SQL answer, but an easy one:
You could require the users to copy and paste data to an excel spreadsheet where everything but the data fields to be included are locked. This will prevent many of the pain points described.

Resources