Importing multiple UTF-8 files with varying number of columns - sql-server

So we all know you can't bulk insert with an UTF-8 text file into SQL Server 2008. Then its a flat file source to a database for me. However I have 200 files. How can I import these all at once instead of creating a SSIS task for each different set of column numbers?
For example create a SISS data flow task for the ones with 30, a different SSIS task for the ones with 34 columns etc.
Note that a FOR EACH LOOP CONTAINER does not work. I tried and it failed .
Also after column 20 the order of the columns in the text file is different in some of the files as well .

For 200 files with differing schemas, you will want to consider automating SSIS package (and Data Flow Task) creation. Once a Source Adapter (Flat File or other) is added to a Data Flow Task, the schema of source is coupled to the Data Flow. You can see this in action if you create a Flat File Source connected to one of your files, add some other component to the Data Flow Task, connect a Data Flow Path from the Source Adapter to the component, and then open the Metadata page on the Data Flow Path Editor. From inside the Business Intelligence Development Studio, you cannot dynamically modify Data Flow Path schemas, but you can dynamically create them from a .Net application or by using a third-party package-generation platform.
Hope this helps,
Andy

Related

Load all CSVs from path on local drive into AzureSQL DB w/Auto Create Tables

I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this

create reusable controls/packages in an one dtsx package in SSIS?

We use SSIS for our data automation. The caveat is we don't use the normal way mentioned online. For our environment, we save the Package.dtsx file on a server that has a windows job that will execute it using dtexec.exe.
I have multiple SSIS packages to pull data from various sources (Oracle, MySQL, SQL Server) and the general flow for them is the same. The table names are different but I will use data as the table names for one of the sources/SSIS packages.
backup the table data into bak_data on the destination DB
import new data from the source into data
compare data quality (row count) against data and bak_data
if data quality does meet our threshold, send a success e-mail (execute task against our destination DB using db_send_dbmail)
if the data quality does not meet our threshold, backup data to bad_data then restore from bak_data to data and send failure e-mail
Since the steps are always the same I thought I could use Control Flow Package Parts and then just use variables for the table names and what not.
But upon further investigation I realized I cannot do that because the Control Flow Package .dtsxp is a separate file referenced in/from the Package.dtsx file?
I can copy it to our automation server but not sure if that will be enough to work when Package.dtsx is executed using dtexec.
Is there anyway I can create reusable controls/packages with my constraint/situation?

Load excel files with different columns from a directory into database using SSIS package

I have 40 excel sheets in a single folder. I want to load them all in different tables sql server database through SSIS package. The difficulty I am having is because of different number and name of columns in each excel sheet.
Can this task be achievable through a single package?
Another option, if you want to do it in one data flow, you can write custom C# source component with multiple outputs. In the script task you'll figure out the file type and send the data to the proper output.
NPOI library(https://npoi.codeplex.com/) is a good way to read excel files in C#.
But if you have fixed file formats I would prefer to create N Data Flows inside Foreach loop container. Use regular Excel source components and just ignore errors in each data flow. This will let you get a file and try to load it in each data flow one by one. On error you will not fail the package but just go to the next data flow until you find the proper file format.
It can only be done, by adding multiple sources or using a script component, white a flag on what sheet it is. Then you can use a conditional split and enter multiple destinations.

How to automatically import XML file into various new tables using SSIS

I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.

Speeding Up ETL DB2 to SQL Server?

I came across this blog post when looking for a quicker way of importing data from a DB2 database to SQL Server 2008.
http://blog.stevienova.com/2009/05/20/etl-method-fastest-way-to-get-data-from-db2-to-microsoft-sql-server/
I'm trying to figure out how to achieve the following:
3) Create a BULK Insert task, and load up the file that the execute process task created. (note you have to create a .FMT file for fixed with import. I create a .NET app to load the FDF file (the transfer description) which will auto create a .FMT file for me, and a SQL Create statement as well – saving time and tedious work)
I've got the data in a TXT file and a separate FDF with the details of the table structure. How do I combine them to create a suitable .FMT file?
I couldn't figure out how to create the suitable .FMT files.
Instead I ended up creating replica tables from the source DB2 system in SQL Server and ensured that that column order was the same as what was coming out from the IBM File Transfer Utility.
Using an Excel sheet to control what File Transfers/Tables should be loaded, allowing me to enable/disable as I please, along with a For Each Loop in SSIS I've got a suitable solution to load multiple tables quickly from our DB2 system.

Resources