We have some (stable) data that is saved in some generic database (database that contains a database structure and its data). To be used, this data must be re-written. Currently, we have an application that export this data to XML files to some very specific location.
We need to add this data to some databases. I know it's possible to load XML inside tables, but we'd like a direct link between the XML files and the database tables (reducing data duplication and risk of seeing people update the generated tables instead of using proper methods).
Is that possible?
Would it be very slow?
You can use SSIS to import XML files into database tables. This will work well if the xml files conform to a schema.
https://www.mssqltips.com/sqlservertip/3141/importing-xml-documents-using-sql-server-integration-services/
Related
I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this
Could anyone explain in laymans terms, what is the difference between those two data types and moreso, which are the pros and cons of using either of those to store files in a database.
If context is needed then I am creating a web app, where users can upload a multitude of different data, like images, Excel files, .docx etc.
Blobs are stored in a varbinary(MAX) column with the value stored in data pages inside the database data file(s).
With FILESTREAM, values are stored as individual files separately on the filesystem, with an individual file for each row and value. These files are managed internally by SQL Server and can be stored and retrieved using T-SQL just like normal varbinary(MAX) values or with Win32 APIs.
There are also FileTables, which is a specialized table with a predefined schema on top of FILESTREAM. FileTables provide T-SQL access like blobs and FILESTREAM and can optionally be access via a SQL Server managed UNC path, similarly to a normal Windows share. Creating/deleing files via the share inserts/deletes rows from the file table and visa-versa.
We use SharePoint 2013 as a library to hold thousands of Excel files, with almost never consistent formatting, to manage projects occurring on servers. Somewhere in these maybe formatted as table objects is a common set of server names.
Somehow, without being able to change this process in the short term, I need to pull data from all these files to identify how many projects are targeting a particular server.
I've got access to SQL Server 2016 enterprise, and wondering if something like PolyBase could help with this? I also wonder about SSIS but I don't expect any tables to look exactly like another one.
Other tools may be an option, but I'm not sure what can handle this scale and variety. I think daily updates to the data would be enough, but even so it's still a mess.
How do I pull thousands of varied excel tables into a database? Is this even possible?
Any longer term solution that doesn't allow them to format and annotate like excel is unlikely to actually be adopted.
The less you know in advance, the more difficult it will be...
Some ideas:
Technology
read about FROM OPENROWSET which allows to read from an Excel
read about linked server
Use Excel and its great abilities through VBA to iterate through all your Excel-Sheets, open them, analyse them and fill proper tables. Within Excel you know most about your messy data...
Target structure
You might create thousands of tables, each representing one single sheet in all your Excel files. You could query these tables with dynamically created SQL (using meta-data of INFORMATION_SCHEMA) or think about Full-Text-Search
You might import each sheet into one single XML-structure (SELECT * ... FOR XML PATH('...')). In this case you'd need a target table with columns for Path and name of your Excel, Name of the sheet and an XML column for your data. Another approach was to represent each File on one XML and include all sheets there. Try to define common naming for all your data. Querying XML allows to query columns without knowing their actual names (XQuery with XPath using *).
If your Excels are xlsx already, you might open them with UNZIP and take the existing XML as-is.
To be honest: I do not think that any tool can do the magic to import such a wide range of mess automatically...
I have a very large XML file (with an xsd file) which has many different objects that need to be imported into SQL Server tables. When I mean objects, I mean the highest level XML wrapping tag e.g. Products, Orders, Locations etc
My current method using SSIS has been to:
Create a Data Flow Task
Set up a new XML source in the data flow task
Drag a new OLE DB Destination into the task
Link the XML output to the input of the OLE DB Destination
Open the OLE DB Destination task and ask it to create a new table for the data to go into
I have to repeat steps 3-5 for all the objects in the XML file which could run into the hundreds. Theres no way that I can manually do this.
Is there anyway to get SSIS to just create new tables for all the different objects in SQL server and import the data into those? So it would automatically create dbo.Products, dbo.Locations, dbo.Customers and put the correct XML data into those tables.
I can't see any other feasible way of doing this.
Is there anyway to get SSIS to just create new tables for all the
different objects in SQL server and import the data into those?
No :(
There's really two problems, here. You have not gotten to the second one, yet, which is ssis is probably going to choke reading a very large xml file. The XML source component loads the entire file into memory when it reads it.
There are a couple of alternatives that I can think of:
- use XSLT transforms
- roll your own sax parser and use a script component source
For the XSLT method, you would transform each object into a flatfile i.e. parse just your customer data into a csv format and then add data flows to read in each flat file. The drawbacks are that ssis uses an earlier version of XSLT which also loads the whole file into memory, rather than streaming it. However, I have seen this perform very well on files that were 0.5 GB in size. Also, XLST can be challenging to learn if you are not already familiar, but it is very powerful in getting data into a relational form.
The sax parser method would allow you to stream the file and pull out the parts that you want into a relational form. In a script component, you could direct different objects to different outputs.
I have about 6000 XML files need to import into SQL Database. I need to find out some fields in those XML files and import into multiple tables. Those files are InfoPath files saved as XML docs. Each file have about 20 fields need to go into 6 tables at respective columns. I analyzed the data fields and created tables and relation between them. Now, have to start importing the data.
I tried using XML source in SSIS but the data is too complex for SSIS to hold. So, xml source will not work for me. Worked on using Script task but I'm not core C#/VB developer. Also tried to use Bulk Insert; but i don't have bulk insert permissions. Friend suggested me to use XQuery,but i think Xquery works for parsing if the XML file is in table(Not sure).
Can anyone please tell me the best approach to start this task.
Thanks in Advance!
Kumar
I had faced similar scenario and here is approach I used:
1. create tables to hold xml data (with columns as xml data type)
2. read source xml files as TEXT and insert them into those tables
3. use XQuery in stored procedure to extract data and populate target tables
I can provide more pointers/information if you need.
Best thing to achieve this is to use the Open Source tool - Talend Open Studio