I am searching for an alternative way to load XML data in database with SSIS.
Why alternative ? Because we have had some very bad experiences with SSIS and XML:
SSIS not raising error when XML is not matching the XSD (unexpected fields were present in XML)
SSIS not able to load basic XML because the structure didn't suit him (we have finished by editing the XML on the fly to change the structure)
low performance
(very) cumbersome development with complex XML with parent-child nodes
when looping on several XML, SSIS skips some of them, without raising any error, it's like if they were invisible. If you run again the treatment, it works.
So I would like to avoid these issues by using an alternative way to load XML with SSIS.
Why still using SSIS ? Because it has some benefits anyway:
centralized management and monitoring
specific audit flow, allowing to follow loading of XML, Txt files, tables, etc.
developers skilled on SSIS
So I am searching how to do something close to:
Loop on a folder where are several XML -> SSIS component
For each XML, Insert a value in audit table marking it as "in treatment"
Validate XML with XSD and raise error if mismatch -> ?
If error, move the XML in a reject folder or flag the XML in audit table as "error" -> ?
If no error, Parse XML as a DOM (or equivalent) -> ?
Insert in database tables the content of XML, by keeping the link between parent and child node. So it means I need to be able to retrieve on the fly some sequences from SQLServer database -> ?
Update the audit table to set the XML as "Loaded" -> ?
Move the XML in another folder
Without SSIS in the equation, I would use java program or python but I would like to keep SSIS at least as a container.
Thanks a lot
Related
We have some (stable) data that is saved in some generic database (database that contains a database structure and its data). To be used, this data must be re-written. Currently, we have an application that export this data to XML files to some very specific location.
We need to add this data to some databases. I know it's possible to load XML inside tables, but we'd like a direct link between the XML files and the database tables (reducing data duplication and risk of seeing people update the generated tables instead of using proper methods).
Is that possible?
Would it be very slow?
You can use SSIS to import XML files into database tables. This will work well if the xml files conform to a schema.
https://www.mssqltips.com/sqlservertip/3141/importing-xml-documents-using-sql-server-integration-services/
I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.
I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.
I have about 6000 XML files need to import into SQL Database. I need to find out some fields in those XML files and import into multiple tables. Those files are InfoPath files saved as XML docs. Each file have about 20 fields need to go into 6 tables at respective columns. I analyzed the data fields and created tables and relation between them. Now, have to start importing the data.
I tried using XML source in SSIS but the data is too complex for SSIS to hold. So, xml source will not work for me. Worked on using Script task but I'm not core C#/VB developer. Also tried to use Bulk Insert; but i don't have bulk insert permissions. Friend suggested me to use XQuery,but i think Xquery works for parsing if the XML file is in table(Not sure).
Can anyone please tell me the best approach to start this task.
Thanks in Advance!
Kumar
I had faced similar scenario and here is approach I used:
1. create tables to hold xml data (with columns as xml data type)
2. read source xml files as TEXT and insert them into those tables
3. use XQuery in stored procedure to extract data and populate target tables
I can provide more pointers/information if you need.
Best thing to achieve this is to use the Open Source tool - Talend Open Studio
I'm trying to use SQL Server Integration Services (SSIS) to parse an XML file and map the elements to database columns across several tables in a single database.
However, when I use the Data Flow Task->XML Source to try and parse an example XML file (that file is located here, XSD is located here), it says
"http://www.exchangenetwork.net/schema/TRI/4:TransferWasteQuantity" has multiple members named "http://www.exchangenetwork.net/schema/TRI/F:WasteQuantityCatastrophicMeasure"
Is there any way to get SSIS to parse XML data such as this? This schema changes regularly so I'd prefer to do as little parsing code outside of the data mappings as possible. Also if there's a better way to do this outside of SSIS (say, by using SQL Server Analysis Services) then that would work too.
Apparently, due to the complexity of the XML, this is not possible with the latest iterations of SQL, at least without significant modification of the incoming XML submission and XSD, which would likely lead to data corruption, therefore it is conclusive that it is not feasible to implement.