I'm using a double while loop to get a lot of results from several different tables. I get everything I need (500+ subjects, each with 1000+ rows), but each comes into a different grid. I would like to save each "while" result to a different .csv file. Is there any way
Might be possible to do using SQLCMD or BCP, but will be quite cumbersome to code, using quite a few variables and dynamic SQL.
If faced with this scenario, I would personally go with an SSIS package -
Use package variables to generate destination filenames dynamically
Use a for each loop container instead of the while loop
Put a dataflow task inside the container and use the Select code as source and the file from step one as the destination
It is pretty easy to do.
Related
I will be importing records from thousands of CSV files using SSIS. These CSV files will contain a Postal Code column, which has the format A5A 5A5, where "A" is any letter and "5" is any number from 0 to 9.
There is a space between the "A5A" and "5A5" that I want to remove, so that all Postal Codes appear as "A5A5A5".
I am reviewing the documentation and see several options, and I'm trying to narrow down the best one, i.e. the one that requires the least number of steps. So far I am looking at the Derived Column transformation, but that would involve adding another column to my SQL Table.
Is there a way I can trim the space without having to add an extra column?
As #Larnu answers via comments, a Derived Column is likely the most appropriate component to use here.
The expression you're looking for is a REPLACE. Syntax ought to be
REPLACE([PostalCode], " ", "")
You have 10 columns from your CSV. The Derived Column can either replace and existing or add a new column to row buffer. I would advocate adding a new column. PostalCodeStripped or something like that. At some point, something weird is going to happen with the data and you'll get an A5A 5A5 that didn't get the space stripped. Having both the original and the parsed value available in debugging can help sort out problems (Oh, this has a non-breaking space or a tab instead of a space, or in addition to)
But, just because a column is in the buffer does not mean you need to create a column for that in the destination table. Just unmap the PostalCode from the row buffer and map PostalCodeStripped to the PostalCode column in the database. You'll see what I'm talking about in the destination component. By default, they'll map based on name matching but you're welcome to wire them up however you see fit.
ETL is an alternate option. Bulk load the data into a staging table. Then do a simple select into the destination to do the transformation. I might be tempted to not use SSIS. BCP or Import-DbaCsv (DBATools powershell module) would both be a quick alternates. If you know PowerShell and want to process the files in a pipe, you can pipe the files into Import-DbaCsv. The PowerShell script can also execute Invoke-DbaQuery to run update or insert queries to do the transformation.
SSIS can also just do the bulk load and then run the T-SQL to do the transformations. I don't like the overhead of maintaining and upgrading SSIS packages. I'd take T-SQL jobs over SSIS jobs any day. (We have about 1/2 year for a FTE to upgrade our SSIS packages to SQL 2019. The T-SQL jobs just keep working when moved to a new version.)
Or go the ETL route and do the transformation in the SSIS data flow. A Derived Column transformation between a flat file source and a OLE DB destination should do the trick.
To handle multiple files, you can use the Foreach Loop Container. There's an enumerator for files using a wildcard path. (The initial T-SQL task just truncates the table for testing.)
You'll need to parameterize the thing to get the file source to be each file.
For PowerShell it might be something like (no transformation yet) the script below.
Get-ChildItem 'C:\TestFolder\*.csv' |
import-dbacsv -SqlInstance 'localhost\DEV' -Database 'Test' -Schema 'dbo' -Table 'Test' -AutoCreateTable -verbose
If you run this in the ISE, be aware of a bug where the connection might not be released after calling import-dbacsv that will cause it to hang. This is not an issue in the command line from what I can tell. (If this happens to you, you might have to kill the ISE process - closing it is not enough.)
I am creating an SSIS package to import CSV file data into a SQL Server table.
Some of the rows in the CSV files will have missing values.
For example, if a row has the format: value1,value2,value3 and value2 is missing,
then it will render as: value1,,value3 in the csv file.
When the above happens (value2 is missing) in my SSIS package, I want NULL to go into the receiving SQL Server column that would hold value2.
I understand that I can add a "Script" task to my SSIS package to apply this rule. However, I'm concerned that this will drastically reduce the performance of my SSIS package. I'm not an expert on the inner workings of SSIS/SQL Server, but I'm concerned that this script will cause my script to lose "BULK INSERT" capabilities (and other efficiencies) since the script will have to inspect every row and apply the changes as needed.
Can anyone confirm if adding such a script will cause major performance impacts? Or does the SSIS/SQL-Server engine run the script on every row and then bulk-insert? Is there another way I can apply this rule without taking a performance hit?
Firstly, you can use script task when required. Script task will be executed only once for each execution of the whole package not for every row. For every row there is another component called script component. When the other regular SSIS tasks are not enough to achieve what you want you can surely use script component. I don't believe it is a performance killer unless you implement it badly.
Secondly, this particular requirement you can simply use Flat File Source task to import your csv file. It will put the value NULL when there is no value. I'm considering this is a valid csv value and each row has correct number of comma for every fields (total field - 1 actually) even if value is empty or null for some fields.
I want to copy and merge data from tables with identical structure (in a number of different source databases) to a single table of similar structure in a destination database. From time to time I need to add or remove a source database.
This is currently achieved using a Data Flow Task containing an OLEDB source with a SQL query within which there is a UNION for each of the databases I am extracting from. There is quite a lot of SQL within each UNION so, if I need to add fields, I need to add the same additional SQL to each UNION. Similarly, when I add or remove a source database I need to add or remove a UNION.
I was hoping that, rather than use such a UNION with a lot of duplicated code, I could, instead, use a Foreach Loop Container that executes SQL contained in a variable using parameters to substitute the name of the database and other database dependent items within the SQL on each iteration but I hit problems with that as I assume the Data Flow Task within the loop could not interpret the incoming fields because of the use what is effectively dynamic SQL.
Any suggestions as to how I might best achieve this without duplicating a lot of SQL?
It sounds like you have your loop figured out for moving from database to database. As long as the table schemas are identical (other than names as noted) from database to database, this should work for you.
Inside the For Each Loop container, create either a Script Task or an Execute SQL Task, whichever you're more comfortable working with.
Use that task to dynamically generate the SQL of your OLE DB Source query, changing the Customer Code prefix for each iteration. Assign the SQL text to a variable, either directly in a Script Task, or by assigning the Result Set of your Execute SQL Task (the result set being the query text) to a variable.
Inside your Data Flow Task, in the OLE DB Source, under Data Access Mode select "SQL Command from variable". Select the variable that you populated with your query in the last task.
You'll also need to handle changing the connection string between iterations, but, again, it sounds like you have a handle on that part already.
I'm trying to manipulate a column in SSIS which looks like below after i removed unwanted rows with derived column and conditional split in my data flow task. The source for this is a flatfile.
XXX008001161022061116030S1TVCO3057
XXX008002161022061146015S1PUAG1523
XXX009001161022063116030S1DVLD3002
XXX009002161022063146030S1TVCO3057
XXX009003161022063216015S1PUAG1523
XXX010001161022065059030S1MVMA3020
XXX010002161022065129030S1TVCO3057
XXX01000316102206515901551PPE01504
The first three numbers from the left (starting with "008" first row) represent a series, and the next three ("001") represent another number within the series. what i need is to change all of the first three numbers starting from "001" to the end.
The desired reslut would thus look like:
XXX001001161022061116030S1TVCO3057
XXX001002161022061146015S1PUAG1523
XXX002001161022063116030S1DVLD3002
XXX002002161022063146030S1TVCO3057
XXX002003161022063216015S1PUAG1523
XXX003001161022065059030S1MVMA3020
XXX003002161022065129030S1TVCO3057
XXX00300316102206515901551PPE01504
...
My potential solution would be to load the file to a temporary database table and query it with SQL from there, but i am trying to avoid this.
The final destination is a flatfile.
Does anybody have any ideas how to pull this off in SSIS? Other solutions are appreciated also.
Thanks in advance
I would definitely use the staging table approach and use windows functions to accomplish this. I could see a use case if SSIS was on another machine than the database engine and there was a need to offload the processing to the SSIS box.
In that case I would create a script transformation. You can process each row and make the necessary changes before passing the row to the output. You can use C# or VB.
There are many examples out there. Here is MSDN article - https://msdn.microsoft.com/en-us/library/ms136114.aspx
I am working on a generic SSIS package that receives a flat file, add new columns to it, and generate a new flat file.
The problem I have is that the number of new columns varies based on a stored procedure XML parameter. I tried to use the "Execute Process Task" to call BCP, but the XML parameter is too long for the command line.
I search on the web and found that you cannot dynamically change the SSIS package during runtime and that I would have to use a script task to generate the output. I started going trough that path and found that you still have to let the script component know how may columns will be receiving and that is exactly what I do not know at design time.
I found a third party SSIS extension from CozyRoc, but I want to do it without any extensions.
Has anyone done something like this?
Thanks!
If the number of columns is unknown at run time then you will have to do something dynamically, and that means using a script task and/or a script component.
The workflow could be:
Parse the XML to get the number of rows
Save the number of rows in a package variable
Add columns to the flat file based on the variable
This is all possible using script tasks, although if there is no data flow involved, it might be easier to do the whole thing in an external Perl script or C# program and just call that from your package.