Automated file import with SSIS package - sql-server

I am very new to SSIS and its capabilities. I am busy building a new project that will upload files to a database. The problem I am facing is that files and tables differentiate from one another.
So what was done is I created a table that will map each file's columns to the specific table's column the data needs to be stored in, in a separate table. I want the user to manage this part when they receive a new file or the file layout changes some how.
As far as I know about SSIS is that you can map each file to a table and it can be scheduled as task.
My question is will SSIS be able to handle this or should I handle this process in code?
Many thanks in advance

I would say it all depends on the amount of data that would be imported into your SQL server, for large data sets (Normally 10000+ Rows) it becomes a necessity to utilize the SSIS as you would receive performance gains in your application. Here is a simple example of creating a SSIS package using code. For smaller data operations I would suggest using a combination of this and this. Or to Create a dynamic table on your SQL server based on the file format, look at this

SSIS can be very picky about file formats, so if the files are completely different, then it probably isnt the tool for the job. For flat files, SSIS requires the ordering of columns to be the same.
If you know that your files will only ever arrive in one of 5 formats (for example), it wouldn't be much trouble to write 5 packages to import them. If any new file could have a totally different schema, I dont think SSIS would be the right tool for the job.

Related

Automate import of CSV files in SQL Server

I'm currently using SSIS to import a whole slew of CSV files into our system on a regular basis. These import processes are scheduled using the SQL Server Agent - which should have a happy ending. However, one of our vendors from which we're receiving data likes to change up the file format every now and then (feels like twice a month) and it is a royal pain to implement these changes in SSIS.
Is there a less painful way for me to get these imported into SQL Server? My requirements are fairly simple:
The file formats are CSV, they're delimited with commas, and are text qualified with double quotes.
The file name will indicate into which table I need these imported
It needs to be something which can be automated
Changes in file format should not be that much of a pain
If something does go wrong, I need to be able to know what it was - logging of some sort
Thanks so much!
BULK INSERT is another option you can choose. You can define your own templets of the file with it:
https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql
https://jamesmccaffrey.wordpress.com/2010/06/21/using-sql-bulk-insert-with-a-format-file/
You can look into using BIML, which dynamically generates packages based on the meta data at run time.
I have tried Java solution "dbis". Please check below.
https://dbisweb.wordpress.com/
It have migration info in to xml file. You can edit it in any text editor.
But it will need static table name.

Extract data from thousands of Excel files into database

We use SharePoint 2013 as a library to hold thousands of Excel files, with almost never consistent formatting, to manage projects occurring on servers. Somewhere in these maybe formatted as table objects is a common set of server names.
Somehow, without being able to change this process in the short term, I need to pull data from all these files to identify how many projects are targeting a particular server.
I've got access to SQL Server 2016 enterprise, and wondering if something like PolyBase could help with this? I also wonder about SSIS but I don't expect any tables to look exactly like another one.
Other tools may be an option, but I'm not sure what can handle this scale and variety. I think daily updates to the data would be enough, but even so it's still a mess.
How do I pull thousands of varied excel tables into a database? Is this even possible?
Any longer term solution that doesn't allow them to format and annotate like excel is unlikely to actually be adopted.
The less you know in advance, the more difficult it will be...
Some ideas:
Technology
read about FROM OPENROWSET which allows to read from an Excel
read about linked server
Use Excel and its great abilities through VBA to iterate through all your Excel-Sheets, open them, analyse them and fill proper tables. Within Excel you know most about your messy data...
Target structure
You might create thousands of tables, each representing one single sheet in all your Excel files. You could query these tables with dynamically created SQL (using meta-data of INFORMATION_SCHEMA) or think about Full-Text-Search
You might import each sheet into one single XML-structure (SELECT * ... FOR XML PATH('...')). In this case you'd need a target table with columns for Path and name of your Excel, Name of the sheet and an XML column for your data. Another approach was to represent each File on one XML and include all sheets there. Try to define common naming for all your data. Querying XML allows to query columns without knowing their actual names (XQuery with XPath using *).
If your Excels are xlsx already, you might open them with UNZIP and take the existing XML as-is.
To be honest: I do not think that any tool can do the magic to import such a wide range of mess automatically...

How can I import multiple csv files from a folder into sql, into their own separate table

I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.

Convert or output SSIS package/job to SQL script?

I understand this may be a little far-fetched, but is there a way to take an existing SSIS package and get an output of the job it's doing as T-SQL? I mean, that's basically what it is right? Transfering data from one database to another can be done with T-SQL as well.
I'm wondering this because I'm trying to get away from using SSIS packages for data transfer and instead using EF/linq to do this on the fly in my application. My thought process is that currently I have an SSIS package that transfers and formats data from one database to another in preparation to be spit out to an excel. This SSIS package runs nightly and helps speed up the generation of the excel as once the data is transferred to the second db, it's already nice and formatted correctly.
However, if I could leverage EF and maybe some linq to sql in order to format the data from the first database on the fly and spit it out to excel quickly without having to use this second db, that would be great. So can my original question be done, can I extract the t-sql representation of an SSIS package some how?
SSIS packages are not exclusively T-SQL. They can consist of custom back-end code, file system changes, Office document creation steps, etc, to name only a few. As a result, generating the entirety of an SSIS package's work into T-SQL isn't possible, because the full breadth of it's work isn't limited to SQL Server.

SSIS import file versus retrieving data directly from source

We wish to import data into a SQL Server database from a source location located elsewhere in the company WAN in another country.
We are to be using SSIS to perform the import but wonder where would be the best place to perform the extract and transform. We could create a view on the source SQL server and SSIS will directly retrieve data from that. The alternative would be to drop a file out of the source and have SSIS import the data from that file.
I am thinking the former is a cleaner solution but would be interested to know whether there are any benefits in using files or potential issues with grabbing the data direct?
Thanks
I would avoid using files if possible, especially if your starting point is a database. By extracting to a file, you would be adding an unnecessary layer in the process that would increase the possiblity of errors. Typical issues of using extracted files include unwittingly using an old / incomplete file (if extract failed) and masking user manually edited changes (direct in file for data issues).
If you have a SQL Server database, then creating a stored procedure, view or entering sql into SSIS would give you defined interface between source and SSIS. Including the transform with the extract does blur the interface a little, but is quite common for simple transformation that do not depend on any target (or secondary source) data for the load.
An issue you may need to consider when grabbing data (with either approach) is the transactional state of data. Depending on your source, you may need to handle data in various states of completeness and act appropriately.

Resources