Import XML files with different structures using SSIS - sql-server

I have multiple XML files with different structures, yet they all share a specific set of nodes.
My objective is to only import those mutual nodes into an SQL Server table.
How can I proceed, knowing that I can't generate an .Xsd for every type as there are many possible XML files variations?
Thanks in advance.

Simple solution would be to load all these XML files into an XML table (2 columns: FileId and XmlData). This can be done as a 1st step of the package.
Then you will write a stored procedure and it will shred XML from this table into your final tables. This stored procedure will be called as a 2nd step in the SSIS package.

I don't think it is easy to do that using SSIS.
If you don't have other choices, you may use a Script Component as Source and parse the XML files using a C# or VB.NET script, then only generate output with desired columns.
You can refer to the following articles for more information about using Script component:
Creating a Source with the Script Component
SSIS Script Component Overview
SchemaMapper class library
From a while, I was working on a project called SchemaMapper. It is a class library developed using C#. You can use it to import tabular data from XML, and other formats having different structures into a unified SQL server table.
SchemaMapper - C# schema mapping class library
You can refer to the project wiki for a step by step guide:
SchemaMapper Wiki
Also, feel free to check the following answer, it may give you some insights:
How to Map Input and Output Columns dynamically in SSIS?

So what I did as a solution to this particular case was:
1- add a C# script that reads the xml file and keep the common nodes only and save their values to my dts variables.
2- insert into my SQL Server table the variables I just populated.
Both tasks were in a for each loop to go through all the xml files in a specific directory.
Hope it helps!

Related

Automate import of CSV files in SQL Server

I'm currently using SSIS to import a whole slew of CSV files into our system on a regular basis. These import processes are scheduled using the SQL Server Agent - which should have a happy ending. However, one of our vendors from which we're receiving data likes to change up the file format every now and then (feels like twice a month) and it is a royal pain to implement these changes in SSIS.
Is there a less painful way for me to get these imported into SQL Server? My requirements are fairly simple:
The file formats are CSV, they're delimited with commas, and are text qualified with double quotes.
The file name will indicate into which table I need these imported
It needs to be something which can be automated
Changes in file format should not be that much of a pain
If something does go wrong, I need to be able to know what it was - logging of some sort
Thanks so much!
BULK INSERT is another option you can choose. You can define your own templets of the file with it:
https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql
https://jamesmccaffrey.wordpress.com/2010/06/21/using-sql-bulk-insert-with-a-format-file/
You can look into using BIML, which dynamically generates packages based on the meta data at run time.
I have tried Java solution "dbis". Please check below.
https://dbisweb.wordpress.com/
It have migration info in to xml file. You can edit it in any text editor.
But it will need static table name.

How can I import multiple csv files from a folder into sql, into their own separate table

I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.

How to automatically import XML file into various new tables using SSIS

I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.

Possible to query the create script for an XML Schema Collection?

As a first step to adding our SQL objects to version control I've written a script that will query sys.objects, sys.modules to script out all objects as CREATEs. I used an example from here to script out my tables. The goal is to preserve the change history of our SQL objects and eventually automate SQL steps we currently achieve manually. Now I need to script out XMLSchemaCollections as a CREATE but I haven't had luck finding an example of how. I imagine it would include querying the several sys.xml_schema_* tables and piecing the XML together element by agonizing element. Does anyone have a working example of how to achieve this?
NOTE: The requirement is to achieve this through a SQL script, not a 3rd party component (i.e. RedGate) nor a Visual Studio Database Project.
You can use the function xml_schema_namespace (Transact-SQL)
Reconstructs all the schemas or a specific schema in the specified XML
schema collection.

Import Complex XML file into SQL server

I have about 6000 XML files need to import into SQL Database. I need to find out some fields in those XML files and import into multiple tables. Those files are InfoPath files saved as XML docs. Each file have about 20 fields need to go into 6 tables at respective columns. I analyzed the data fields and created tables and relation between them. Now, have to start importing the data.
I tried using XML source in SSIS but the data is too complex for SSIS to hold. So, xml source will not work for me. Worked on using Script task but I'm not core C#/VB developer. Also tried to use Bulk Insert; but i don't have bulk insert permissions. Friend suggested me to use XQuery,but i think Xquery works for parsing if the XML file is in table(Not sure).
Can anyone please tell me the best approach to start this task.
Thanks in Advance!
Kumar
I had faced similar scenario and here is approach I used:
1. create tables to hold xml data (with columns as xml data type)
2. read source xml files as TEXT and insert them into those tables
3. use XQuery in stored procedure to extract data and populate target tables
I can provide more pointers/information if you need.
Best thing to achieve this is to use the Open Source tool - Talend Open Studio

Resources