SSIS import file versus retrieving data directly from source - sql-server

We wish to import data into a SQL Server database from a source location located elsewhere in the company WAN in another country.
We are to be using SSIS to perform the import but wonder where would be the best place to perform the extract and transform. We could create a view on the source SQL server and SSIS will directly retrieve data from that. The alternative would be to drop a file out of the source and have SSIS import the data from that file.
I am thinking the former is a cleaner solution but would be interested to know whether there are any benefits in using files or potential issues with grabbing the data direct?
Thanks

I would avoid using files if possible, especially if your starting point is a database. By extracting to a file, you would be adding an unnecessary layer in the process that would increase the possiblity of errors. Typical issues of using extracted files include unwittingly using an old / incomplete file (if extract failed) and masking user manually edited changes (direct in file for data issues).
If you have a SQL Server database, then creating a stored procedure, view or entering sql into SSIS would give you defined interface between source and SSIS. Including the transform with the extract does blur the interface a little, but is quite common for simple transformation that do not depend on any target (or secondary source) data for the load.
An issue you may need to consider when grabbing data (with either approach) is the transactional state of data. Depending on your source, you may need to handle data in various states of completeness and act appropriately.

Related

What ssis design pattern is good for flat file to table load

Planning to import 1000s of text file to sql server tables. All these files are having different structures and it goes to corresponding new tables in
SQl. Different methods coming in mind is using of Biml /creating ssis packages with number of data flows/using import wizard.
What is the ssis design pattern to achieve this in most time efficient way. This is a onetime load though.
How to handle the failure ? : I am not considering the checkpoint because ,when Control Flow tasks are run in parallel checkpoints act a little erratically.
This might give you a starting point if you want to see what each file has for data, without opening every one of them. The DefaultDir will reside on your SQL Server that you are running from.
SELECT * FROM
OPENROWSET ('MSDASQL', 'Driver={Microsoft Access Text Driver (*.txt, *.csv)};DefaultDir=C:\PathtoFiles',
'select * from FileName.csv');

Extract data from thousands of Excel files into database

We use SharePoint 2013 as a library to hold thousands of Excel files, with almost never consistent formatting, to manage projects occurring on servers. Somewhere in these maybe formatted as table objects is a common set of server names.
Somehow, without being able to change this process in the short term, I need to pull data from all these files to identify how many projects are targeting a particular server.
I've got access to SQL Server 2016 enterprise, and wondering if something like PolyBase could help with this? I also wonder about SSIS but I don't expect any tables to look exactly like another one.
Other tools may be an option, but I'm not sure what can handle this scale and variety. I think daily updates to the data would be enough, but even so it's still a mess.
How do I pull thousands of varied excel tables into a database? Is this even possible?
Any longer term solution that doesn't allow them to format and annotate like excel is unlikely to actually be adopted.
The less you know in advance, the more difficult it will be...
Some ideas:
Technology
read about FROM OPENROWSET which allows to read from an Excel
read about linked server
Use Excel and its great abilities through VBA to iterate through all your Excel-Sheets, open them, analyse them and fill proper tables. Within Excel you know most about your messy data...
Target structure
You might create thousands of tables, each representing one single sheet in all your Excel files. You could query these tables with dynamically created SQL (using meta-data of INFORMATION_SCHEMA) or think about Full-Text-Search
You might import each sheet into one single XML-structure (SELECT * ... FOR XML PATH('...')). In this case you'd need a target table with columns for Path and name of your Excel, Name of the sheet and an XML column for your data. Another approach was to represent each File on one XML and include all sheets there. Try to define common naming for all your data. Querying XML allows to query columns without knowing their actual names (XQuery with XPath using *).
If your Excels are xlsx already, you might open them with UNZIP and take the existing XML as-is.
To be honest: I do not think that any tool can do the magic to import such a wide range of mess automatically...

Automated file import with SSIS package

I am very new to SSIS and its capabilities. I am busy building a new project that will upload files to a database. The problem I am facing is that files and tables differentiate from one another.
So what was done is I created a table that will map each file's columns to the specific table's column the data needs to be stored in, in a separate table. I want the user to manage this part when they receive a new file or the file layout changes some how.
As far as I know about SSIS is that you can map each file to a table and it can be scheduled as task.
My question is will SSIS be able to handle this or should I handle this process in code?
Many thanks in advance
I would say it all depends on the amount of data that would be imported into your SQL server, for large data sets (Normally 10000+ Rows) it becomes a necessity to utilize the SSIS as you would receive performance gains in your application. Here is a simple example of creating a SSIS package using code. For smaller data operations I would suggest using a combination of this and this. Or to Create a dynamic table on your SQL server based on the file format, look at this
SSIS can be very picky about file formats, so if the files are completely different, then it probably isnt the tool for the job. For flat files, SSIS requires the ordering of columns to be the same.
If you know that your files will only ever arrive in one of 5 formats (for example), it wouldn't be much trouble to write 5 packages to import them. If any new file could have a totally different schema, I dont think SSIS would be the right tool for the job.

How to automatically import XML file into various new tables using SSIS

I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.

SQL Server XML (multi-namespace) Parsing question

I'm trying to use SQL Server Integration Services (SSIS) to parse an XML file and map the elements to database columns across several tables in a single database.
However, when I use the Data Flow Task->XML Source to try and parse an example XML file (that file is located here, XSD is located here), it says
"http://www.exchangenetwork.net/schema/TRI/4:TransferWasteQuantity" has multiple members named "http://www.exchangenetwork.net/schema/TRI/F:WasteQuantityCatastrophicMeasure"
Is there any way to get SSIS to parse XML data such as this? This schema changes regularly so I'd prefer to do as little parsing code outside of the data mappings as possible. Also if there's a better way to do this outside of SSIS (say, by using SQL Server Analysis Services) then that would work too.
Apparently, due to the complexity of the XML, this is not possible with the latest iterations of SQL, at least without significant modification of the incoming XML submission and XSD, which would likely lead to data corruption, therefore it is conclusive that it is not feasible to implement.

Resources