Retrieve specific fields of the imported to SQL Server multiple XMLs - sql-server

I have multiple XML files with the same structure, I have imported them to SQL Server 2017 with the following commands:
DDL:
CREATE DATABASE xmlFiles
GO
USE xmlFiles
CREATE TABLE tblXMLFiles (IntCol int, XmlData xml);
GO
DML:
USE xmlFiles
INSERT INTO [dbo].[tblXMLFiles](XmlData) SELECT * FROM OPENROWSET(BULK 'C:\xmls\1.xml', SINGLE_BLOB) AS x;
INSERT INTO [dbo].[tblXMLFiles](XmlData) SELECT * FROM OPENROWSET(BULK 'C:\xmls\2.xml', SINGLE_BLOB) AS x;
…
INSERT INTO [dbo].[tblXMLFiles](XmlData) SELECT * FROM OPENROWSET(BULK 'C:\xmls\N.xml', SINGLE_BLOB) AS x;
Now I want to query the data:
USE xmlFiles
GO
DECLARE #XML AS XML, #hDoc AS INT, #SQL NVARCHAR (MAX)
SELECT #XML = XmLData FROM tblXMLFiles
EXEC sp_xml_preparedocument #hDoc OUTPUT, #XML
SELECT Surname , GivenNames
FROM OPENXML(#hDoc, 'article/ref-list/ref/mixed-citation')
WITH
(
Surname [varchar](100) 'string-name/surname',
GivenNames [varchar](100) 'string-name/given-names'
)
EXEC sp_xml_removedocument #hDoc
GO
The query is working, but the problem is that it returns the data only when there is only one row in a data source table — tblXMLFiles. If I add more than one row, I get empty result set.
Important:
The situation is changing if I add to the outer SELECT clause (SELECT #XML = XmLData…) the TOP statement, then it returns the queried data of the specific row number, according to the TOP value.
How can I retrieve the data not only when there is one line in the table, but many rows?

FROM OPENXML with the corresponding SPs to prepare and to remove a document is outdated and should not be used any more. Rather use the appropriate methods the XML data type provides.
Without an example of your XML it is quite difficult to offer a solution, but my magic crystal ball tells me, that it might be something like this:
SELECT f.IntCol
,mc.value('(string-name/surname)[1]','nvarchar(max)') AS Surname
,mc.value('(string-name/given-names)[1]','nvarchar(max)') AS GivenNames
FROM dbo.tblXMLFiles AS f
OUTER APPLY f.XmlData.nodes('article/ref-list/ref/mixed-citation') AS A(mc)

Related

What is the best way to import this very large xml file into a SQL Server database

I have a XML file with 95 gb of data (1444 mio rows). I need to import some of the data into a SQL Server table.
I have made a sample file that I'm trying to import into my SQL Server with the following code. I don't get any errors, but I also don't get any data in the table.
Sample file: https://1drv.ms/u/s!AnJeuk8W8KbEjblueb6bjKaDWJ5XAw?e=2molDZ
CREATE DATABASE DMR_DB2
GO
USE DMR_DB2
GO
CREATE TABLE XMLwithOpenXML
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'C:\Users\kn\Desktop\ESStatistikListeModtag-20220911-222128\Test.xml', SINGLE_BLOB) AS x;
--SELECT * FROM XMLwithOpenXML
DECLARE #XML AS XML, #hDoc AS INT, #SQL NVARCHAR (MAX)
SELECT #XML = XMLData FROM XMLwithOpenXML
EXEC sp_xml_preparedocument #hDoc OUTPUT, #XML
SELECT KoeretoejIdent, KoeretoejArtNummer, KoeretoejArtNavn
FROM OPENXML(#hDoc, 'ESStatistikListeModtag_I/StatistikSamling/Statistik')
WITH
(
KoeretoejIdent [varchar](50) '#KoeretoejIdent',
KoeretoejArtNummer [varchar](100) '#KoeretoejArtNummer',
KoeretoejArtNavn [varchar](100) 'KoeretoejArtNavn'
)
EXEC sp_xml_removedocument #hDoc
GO
Please try the following solution.
It works for your small XML sample.
Microsoft proprietary OPENXML() and its companions sp_xml_preparedocument and sp_xml_removedocument are kept just for backward compatibility with the obsolete SQL
Server 2000. Their use is diminished just to very few fringe cases.
Starting from SQL Server 2005 onwards, it is strongly recommended to re-write your SQL and switch it to XQuery.
As #Lamu already pointed out, SQL Server XML type column can hold up to 2 GB XML.
For 95 GB you would need to use SSQL Server Integration Services (SSIS).
SSIS has no limitation on the XML fie size. It will handle whatever the Operation System (OS) file system allows.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.tbl;
CREATE TABLE tbl (
ID INT IDENTITY(1, 1) PRIMARY KEY,
XmlColumn XML
);
INSERT INTO tbl(XmlColumn)
SELECT * FROM OPENROWSET(BULK N'c:\Downloads\Test.xml', SINGLE_BLOB) AS x;
WITH XMLNAMESPACES(DEFAULT 'http://skat.dk/dmr/2007/05/31/')
SELECT c.value('(KoeretoejIdent/text())[1]', 'VARCHAR(50)') as KoeretoejIdent
, c.value('(KoeretoejArtNummer/text())[1]', 'INT') as KoeretoejArtNummer
, c.value('(KoeretoejArtNavn/text())[1]', 'NVARCHAR(50)') as KoeretoejArtNavn
FROM tbl
CROSS APPLY XmlColumn.nodes('/ESStatistikListeModtag_I/StatistikSamling/Statistik') AS t(c);

Getting XML to feed into SQL Server table

I am trying to get a (for right now) simple XML to feed into a SQL Server table.
The XML is:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfSafeEODBalance xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SafeEODBalance>
<Lane>1</Lane>
<PouchId>06292019053041001</PouchId>
<BusinessDay>6/29/2019</BusinessDay>
<BusinessStartingTime>6/29/2019 5:36:58 AM</BusinessStartingTime>
<BusinessEndingTime>6/30/2019 12:15:55 AM</BusinessEndingTime>
<StartingBalance>0.0000</StartingBalance>
<EndingBalance>8</EndingBalance>
</SafeEODBalance>
<SafeEODBalance>
<Lane>2</Lane>
<PouchId>06292019053042002</PouchId>
<BusinessDay>6/29/2019</BusinessDay>
<BusinessStartingTime>6/29/2019 5:36:58 AM</BusinessStartingTime>
<BusinessEndingTime>6/30/2019 12:15:55 AM</BusinessEndingTime>
<StartingBalance>100.0000</StartingBalance>
<EndingBalance>2</EndingBalance>
</SafeEODBalance>
</ArrayOfSafeEODBalance>
And saved to C:\Users\cj\Documents\EodBalance.xml
I have set up the SQL Server table [dbo].[EndofDay] which has the columns of each of these exactly:
Here is the query I am trying:
INSERT INTO [dbo].[EndofDay] ([PouchID], [Lane], [BusinessDay], BusinessStartingTime, BusinessEndingTime, [StartingBalance], [EndingBalance])
SELECT
MY_XML.SafeEODBalance.query('PouchId').value('.', 'VARCHAR(25)'),
MY_XML.SafeEODBalance.query('Lane').value('.', 'NCHAR(2)'),
MY_XML.SafeEODBalance.query('BusinessDay').value('.', 'DATE'),
MY_XML.SafeEODBalance.query('BusinessStartingTime').value('.', 'DATETIME'),
MY_XML.SafeEODBalance.query('BusinessEndingTime').value('.', 'DATETIME'),
MY_XML.SafeEODBalance.query('StartingBalance').value('.', 'NCHAR(10)'),
MY_XML.SafeEODBalance.query('EndingBalance').value('.', 'NCHAR(10)')
FROM
(SELECT CAST(MY_XML AS XML)
FROM OPENROWSET(BULK 'C:\Users\cj\Documents\EodBalance.xml',SINGLE_BLOB) AS T(MY_XML)) AS T(MY_XML)
CROSS APPLY MY_XML.nodes('SafeEODBalance/SafeEODBalances') AS MY_XML (SafeEODBalance);
When I run this I get:
(0 rows affected)
Completion time: 2019-08-29T16:07:12.3361442-04:00
Which obviously should feed two lines into this, but it is giving nothing in the table.
Here is adjusted working SQL. Just uncomment the INSERT lines when you are ready.
SQL
WITH XmlFile (xmlData) AS
(
SELECT CAST(BulkColumn AS XML)
FROM OPENROWSET(BULK 'C:\Users\cj\Documents\EodBalance.xml', SINGLE_BLOB) AS x
)
--INSERT INTO [dbo].[EndofDay]
--([PouchID], [Lane], [BusinessDay], BusinessStartingTime, BusinessEndingTime, [StartingBalance], [EndingBalance])
SELECT c.value('(PouchId/text())[1]', 'VARCHAR(25)') AS [PouchId]
, c.value('(Lane/text())[1]', 'NCHAR(2)') AS [Lane]
, c.value('(BusinessDay/text())[1]', 'DATE') AS [BusinessDay]
, c.value('(BusinessStartingTime)[1]', 'datetime') AS [BusinessStartingTime]
, c.value('(BusinessEndingTime/text())[1]', 'datetime') AS [BusinessEndingTime]
, c.value('(StartingBalance/text())[1]', 'MONEY') AS [StartingBalance]
, c.value('(EndingBalance/text())[1]', 'MONEY') AS [EndingBalance]
FROM XmlFile CROSS APPLY xmlData.nodes('/ArrayOfSafeEODBalance/SafeEODBalance') AS t(c);
** EDIT ** As pointed out in the comments below, this answer uses legacy functions and SPs so should not be used unless you are running on a pre-2005 version of SQL
Here is a slightly different approach, using a variable to store the XML from OPENROWSET and the stored procedure sp_xml_preparedocument to convert it into an XML document.
Once in XML document form it can be queried using OPENXML(). This has the possible advantage that if you have a large or complex XML structure from which you wish to make several extracts, you can re-use the XML document repeatedly without having to reload the original XML file.
Be sure to remove the XML document using sp_xml_removedocument when you have finished with it to free up the server cache.
-- Load the XML file and convert it to an XML document
DECLARE #XML AS XML, #hXML AS INT;
SELECT #XML = CONVERT(XML, x.BulkColumn)
FROM OPENROWSET(BULK 'C:\Users\cj\Documents\EodBalance.xml\EodBalance.xml', SINGLE_BLOB) AS x;
EXEC sp_xml_preparedocument #hXML OUTPUT, #XML
-- Select data from the XML document
SELECT Lane, PouchID, BusinessDay, BusinessStartingTime, BusinessEndingTime, StartingBalance, EndingBalance
FROM OPENXML(#hXML, 'ArrayOfSafeEODBalance/SafeEODBalance') WITH
(
Lane [varchar](2) 'Lane',
PouchId [varchar](50) 'PouchId',
BusinessDay [date] 'BusinessDay',
BusinessStartingTime [datetime] 'BusinessStartingTime',
BusinessEndingTime [datetime] 'BusinessEndingTime',
StartingBalance [varchar](50) 'StartingBalance',
EndingBalance [varchar](50) 'EndingBalance'
);
-- Remove the XML document from the cache
EXEC sp_xml_removedocument #hXML;

XML Parsing View

Good Day
I have a query that views and sorts the contents of my SQL. It's easy if you know what the xml file is but I am trying to create the xml file as a parameter because it's not always the same. How can I add the parameter to the path? I did try but it says it's incorrect.
This is the code in a view that works:
select
c3.value('#CtlgID','nvarchar(50)') AS 'ID',
c4.value('#label','nvarchar(50)') AS 'ID',
c5.value('#label','nvarchar(50)') AS 'ID'
from
(
select
cast(c1 as xml)
from
OPENROWSET (BULK 'C:\ISP\bin\EN\XML\Cataloghi\menuCat_756.xml',SINGLE_BLOB) as T1(c1)
)as T2(c2)
cross apply c2.nodes('/node') T3(c3)
cross apply c2.nodes('/node/node') T4(c4)
cross apply c2.nodes('/node/node/node') T5(c5)
I am trying to add this to a stored procedure:
PROCEDURE [dbo].[Update_ISP_Child]
-- Add the parameters for the stored procedure here
#p1 nvarchar(50) = 'menuCat_756.xml'
AS
BEGIN
select
c3.value('#CtlgID','nvarchar(50)') AS 'ID',
c4.value('#label','nvarchar(50)') AS 'ID',
c5.value('#label','nvarchar(50)') AS 'ID'
from
(
select
cast(c1 as xml)
from
OPENROWSET (BULK 'C:\ISP\bin\EN\XML\Cataloghi\' + #p1,SINGLE_BLOB) as T1(c1)
)as T2(c2)
cross apply c2.nodes('/node') T3(c3)
cross apply c2.nodes('/node/node') T4(c4)
cross apply c2.nodes('/node/node/node') T5(c5)
END
When I add my parameter as #p1 it doesn't work.
Thanks.
Ruan
You have to build the whole statement dynamically:
--the file's name
DECLARE #filename VARCHAR(250)='menuCat_756.xml';
--a tolerant staging table to help us get the result of EXEC
DECLARE #staging TABLE(TheXml NVARCHAR(MAX));
--a dynamically created command
DECLARE #cmd NVARCHAR(MAX)=
N'select c1
from OPENROWSET (BULK ''C:\ISP\bin\EN\XML\Cataloghi\' + #filename + ''',SINGLE_CLOB) as T1(c1)';
--This will shift the "result set" (which is one single XML) into our staging table
INSERT INTO #staging
EXEC(#cmd);
--Test it
SELECT * FROM #staging
Hint 1: I changed SINLGE_BLOB to SINLGE_CLOB
Hint 2: You must cast the result to XML before you use it as XML.
It is good practice to do casts after the import to avoid errors hardly to find...

Openrowset bulk insert on a specific row

I am new to SQL, but am trying to learn its logic, I am assuming bulk insert will insert on all rows in this case a blob. (pdf file) below is my code but what I am trying to accomplish is, inserting a pdf file that I have put on SQL server in a row that has a matching Primary Key that I specify. So far I am missing the where clause to specify the PK
Declare #sql varchar(max)
Declare #filePath varchar(max)
Set #filePath = 'C:\iphone.pdf'
Set #sql='INSERT INTO HDData.dbo.PurchasedCellPhoneInfo(Receipt) SELECT * FROM OPENROWSET(BULK '''+ #filePath+''', SINGLE_BLOB) AS BLOB'
exec(#sql)
May I use an update t-SQL query instead of insert? and how would I drop the where to specify a specific row I want to insert this blob in?
Any help would be appreciated.
I also have tried this, following #misterPositive's suggestion for update query:
Declare #criteria varchar(50)
SET #criteria ='352014075399147'
UPDATE HDData.dbo.PurchasedCellPhoneInfo SET Receipt =
(SELECT Receipt FROM OPENROWSET (BULK 'C:\352014075399147.pdf, SINGLE_BLOB') a)
WHERE(IMEI = #criteria)
i do recieve this message:
Either a format file or one of the three options SINGLE_BLOB, SINGLE_CLOB, or SINGLE_NCLOB must be specified. i like this update query as it seems to fit what im trying to do.
You can do this to UPDATE:
UPDATE MyTable
SET blobField =
(SELECT BulkColumn FROM OPENROWSET (BULK 'C:\Test\Test1.pdf', SINGLE_BLOB) a)
WHERE (CriteriaField = #criteria)
Here is another way for PK
CREATE VIEW [dbo].[VWWorkDataLoad]
AS
SELECT RecordLine
FROM [dbo].[WorkDataLoad];
Now BULK INSERT should then look like:
BULK INSERT [dbo].[VWWorkDataLoad] FROM 'D:\NPfiles\TS082114.trn'
WITH (FIRSTROW = 2,FIELDTERMINATOR = ',' , ROWTERMINATOR = '\n');
If you want to insert new records then you could have an identity column for your PK and not have to worry about it. I have also seen functions used when a table is designed without identity on PK. Something like GetTableNamePK() in the select list.
If you want to update an existing record then you will want a where clause as you mentioned. This worked for me in testing:
Update TestBlob Set BinaryData = (SELECT * FROM OPENROWSET(BULK 'c:\temp\test.pdf', SINGLE_BLOB) AS BLOB)
Where ID = 2
If you do not want to use the Identity or Function this worked where ID is a primary key and I want to insert the BLOB with PK of 3:
INSERT INTO TestBlob2 (ID, BinaryData) SELECT 3, * FROM OPENROWSET(BULK 'c:\temp\test.pdf', SINGLE_BLOB) AS BLOB

Update a table row with XML (SQL Server)

I have a xml file below generated using
SELECT * FROM case WHERE ticketNo=#ticketNo FOR XML RAW,
ELEMENTS
The XML looks like this:
<row>
<ticketNo>1</ticketNo>
<caller>name</caller>
<category>3</category>
<service>4</service>
<workgroup>5</workgroup>
</row>
And I update my table using this query with the same with some value changed
UPDATE case
set caller = xmldoc.caller
set category = xml.category
from OpenXml(#idoc, '/row')
with (ticketNo VARCHAR(50)'./ticketNo',
caller VARCHAR(50) './caller',
category VARCHAR(50) './category') xmldoc
where dbo.tb_itsc_case.ticketNo = xmldoc.ticketNo
Is it possible to update the table without specifying the individual column?
You can not do an update without specifying the columns and you can not get data from XML without specifying what nodes to get the data from.
If you can use the "new" XML data type that was introduced in SQL Server 2005 you can do like this instead.
declare #XML xml =
'<row>
<ticketNo>1</ticketNo>
<caller>name</caller>
<category>3</category>
<service>4</service>
<workgroup>5</workgroup>
</row>'
update [case] set
[caller] = #XML.value('(/row/caller)[1]', 'varchar(50)'),
category = #XML.value('(/row/category)[1]', 'varchar(50)')
where
ticketNo = #XML.value('(/row/ticketNo)[1]', 'varchar(50)')
I was able to use this to update 1 row in a table. It requires you to have all fields specified in the XML. It replaces the existing row with the one specified in the XML.
I wish I could figure out how to do it for only the columns that are in the XML. I work on projects where the fields in the table change frequently and it requires re-specifying all the procedures to list out the column names.
create procedure Update_TableName (#xml xml) as
DECLARE #handle INT
EXEC sp_xml_preparedocument #handle OUTPUT, #xml
DECLARE #ID varchar(255)
SELECT #ID = ID FROM OPENXML (#handle, '/data', 2) WITH TableName
if exists (select 1 from TableName where ID = #ID)
delete from TableName where ID = #ID
Insert into TableName
SELECT * FROM OPENXML (#handle, '/data', 2) WITH TableName
EXEC sp_xml_removedocument #handle
XML:
<data>
<ID>9999</ID>
<Column1>Data</Column1>
<Column2>Data</Column2>
<Column3>Data</Column3>
</data>

Resources