Import Complex XML file into SQL server - sql-server

I have about 6000 XML files need to import into SQL Database. I need to find out some fields in those XML files and import into multiple tables. Those files are InfoPath files saved as XML docs. Each file have about 20 fields need to go into 6 tables at respective columns. I analyzed the data fields and created tables and relation between them. Now, have to start importing the data.
I tried using XML source in SSIS but the data is too complex for SSIS to hold. So, xml source will not work for me. Worked on using Script task but I'm not core C#/VB developer. Also tried to use Bulk Insert; but i don't have bulk insert permissions. Friend suggested me to use XQuery,but i think Xquery works for parsing if the XML file is in table(Not sure).
Can anyone please tell me the best approach to start this task.
Thanks in Advance!
Kumar

I had faced similar scenario and here is approach I used:
1. create tables to hold xml data (with columns as xml data type)
2. read source xml files as TEXT and insert them into those tables
3. use XQuery in stored procedure to extract data and populate target tables
I can provide more pointers/information if you need.

Best thing to achieve this is to use the Open Source tool - Talend Open Studio

Related

Import XML files with different structures using SSIS

I have multiple XML files with different structures, yet they all share a specific set of nodes.
My objective is to only import those mutual nodes into an SQL Server table.
How can I proceed, knowing that I can't generate an .Xsd for every type as there are many possible XML files variations?
Thanks in advance.
Simple solution would be to load all these XML files into an XML table (2 columns: FileId and XmlData). This can be done as a 1st step of the package.
Then you will write a stored procedure and it will shred XML from this table into your final tables. This stored procedure will be called as a 2nd step in the SSIS package.
I don't think it is easy to do that using SSIS.
If you don't have other choices, you may use a Script Component as Source and parse the XML files using a C# or VB.NET script, then only generate output with desired columns.
You can refer to the following articles for more information about using Script component:
Creating a Source with the Script Component
SSIS Script Component Overview
SchemaMapper class library
From a while, I was working on a project called SchemaMapper. It is a class library developed using C#. You can use it to import tabular data from XML, and other formats having different structures into a unified SQL server table.
SchemaMapper - C# schema mapping class library
You can refer to the project wiki for a step by step guide:
SchemaMapper Wiki
Also, feel free to check the following answer, it may give you some insights:
How to Map Input and Output Columns dynamically in SSIS?
So what I did as a solution to this particular case was:
1- add a C# script that reads the xml file and keep the common nodes only and save their values to my dts variables.
2- insert into my SQL Server table the variables I just populated.
Both tasks were in a for each loop to go through all the xml files in a specific directory.
Hope it helps!

Is it possible to feed a database table from a XML file?

We have some (stable) data that is saved in some generic database (database that contains a database structure and its data). To be used, this data must be re-written. Currently, we have an application that export this data to XML files to some very specific location.
We need to add this data to some databases. I know it's possible to load XML inside tables, but we'd like a direct link between the XML files and the database tables (reducing data duplication and risk of seeing people update the generated tables instead of using proper methods).
Is that possible?
Would it be very slow?
You can use SSIS to import XML files into database tables. This will work well if the xml files conform to a schema.
https://www.mssqltips.com/sqlservertip/3141/importing-xml-documents-using-sql-server-integration-services/

How can I import multiple csv files from a folder into sql, into their own separate table

I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.

How to automatically import XML file into various new tables using SSIS

I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.

Speeding Up ETL DB2 to SQL Server?

I came across this blog post when looking for a quicker way of importing data from a DB2 database to SQL Server 2008.
http://blog.stevienova.com/2009/05/20/etl-method-fastest-way-to-get-data-from-db2-to-microsoft-sql-server/
I'm trying to figure out how to achieve the following:
3) Create a BULK Insert task, and load up the file that the execute process task created. (note you have to create a .FMT file for fixed with import. I create a .NET app to load the FDF file (the transfer description) which will auto create a .FMT file for me, and a SQL Create statement as well – saving time and tedious work)
I've got the data in a TXT file and a separate FDF with the details of the table structure. How do I combine them to create a suitable .FMT file?
I couldn't figure out how to create the suitable .FMT files.
Instead I ended up creating replica tables from the source DB2 system in SQL Server and ensured that that column order was the same as what was coming out from the IBM File Transfer Utility.
Using an Excel sheet to control what File Transfers/Tables should be loaded, allowing me to enable/disable as I please, along with a For Each Loop in SSIS I've got a suitable solution to load multiple tables quickly from our DB2 system.

Resources