Bulk Insert Cannot Ignore Errors in XML file? - sql-server

I create a table called XMLTable, with one column called XMLCol of data type xml.
Then I am trying to use SQL Server bulk insert to insert data into the table:
BULK INSERT [XMLTable] FROM 'F:\mydata.dat' WITH (DATAFILETYPE = 'widechar', FORMATFILE = 'F:\myformat.XML', MAXERRORS = 2147483647, ROWS_PER_BATCH = 1, TABLOCK);
I set MAXERRORS to 2147483647 so that the bulk insert will ignore errors and continue insert remaining records when it encounters any errors in the XML data.
My format file(myformat.xml) is:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="\x2C\x00"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" xsi:type="SQLNVARCHAR" NULLABLE="YES"/>
</ROW>
</BCPFORMAT>
My data file(mydata.dat) is:
As you can see, there are two records in the data file. I modify the first one by changing the '/' to 0 so that it is invalid. The second record is a valid XML.
When I bulk insert the data, I will get the following error:
Msg 9455, Level 16, State 1, Line 1
XML parsing: line 1, character 5, illegal qualified name character
The error is expected since I modify the data file by purpose. However, the BULK INSERT will not ignore the error and insert the second record even I specify MAXERROR as 2147483647, why?
Based on MS official document at https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15 , it said "MAXERRORS = max_errors Specifies the maximum number of syntax errors allowed in the data before the bulk-import operation is canceled. Each row that cannot be imported by the bulk-import operation is ignored and counted as one error. If max_errors is not specified, the default is 10.". Then why only one error makes the whole BULK INSERT fail?

You can try to use the following approach. (1) Your destination DB table will use NVARCHAR(MAX) data type to hold the XML. (2) TRY_CAST() will show you XML or problematic string that just looks like XML as a NULL.
This way BULK INSERT will always work without any error. And you will be able to troubleshoot problematic XML if it is not well-formed, contains wrong characters, or any anything else.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS XMLTable;
CREATE TABLE XMLTable(XMLData NVARCHAR(MAX));
BULK INSERT [XMLTable]
FROM 'e:\Temp\XMLBulkINSERT\Data.DAT'
WITH
(
DATAFILETYPE = 'widechar',
FORMATFILE = 'e:\Temp\XMLBulkINSERT\Data_FORMATFILE.xml',
MAXERRORS = 2147483647,
ROWS_PER_BATCH = 12,
TABLOCK
);
SELECT *
, TRY_CAST(XMLData AS XML) AS RealXML
FROM dbo.XMLTable;

Related

sql server 15 parse xml column returns null

I want to make query xml value sql server and get auditingCompanyAddress value
create table sqm (data xml)
insert into sqm
select '<taxComplianceReport xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.gsis.gr/cpaaudit/v12">
<auditingGeneralInformation>
<auditingHeader>
<auditingCompany>
<auditingCompanyTitle>ΣΥΝΕΡΓΑΖΟΜΕΝΟΙ ΟΡΚΩΤΟΙ ΛΟΓΙΣΤΕΣ Α.Ε. </auditingCompanyTitle>
<auditingCompanyRegisterNum>125</auditingCompanyRegisterNum>
<auditingCompanyVatNumber>094394659</auditingCompanyVatNumber>
<auditingCompanyAddress>Φ. ΝΕΓΡΗ 3, 11257 ΑΘΗΝΑ</auditingCompanyAddress>
<auditingCompanyFee>4000</auditingCompanyFee>
</auditingCompany>
<certifiedAccountant1>
<certifiedAccountantsName>ΚΑΛΛΕΣ ΝΙΚΟΛΑΟΣ</certifiedAccountantsName>
<certifiedAccountantsRegisterNum>1590</certifiedAccountantsRegisterNum>
<accountantVatNumber>035209342</accountantVatNumber>
<certifiedAccountantsCity />
<certifiedAccountantsFee>0</certifiedAccountantsFee>
</certifiedAccountant1>
<disclaimer>true</disclaimer>
<companyName>ΣΑΛΑΓΙΑΝΝΗΣ Γ.ΑΒΕΕ</companyName>
<companyVatNumber>094357246</companyVatNumber>
<periodFrom>2018-01-01</periodFrom>
<periodTo>2018-12-31</periodTo>
<fiscalYear>2018</fiscalYear>
<conclusionReportVatCompliance>1</conclusionReportVatCompliance>
<nonImportantDiffReportVatCompliance>2</nonImportantDiffReportVatCompliance>
<pendingQuestions>false</pendingQuestions>
<fiscalSubjectsNotAuditedDueToVatProblems>
<exists>false</exists>
<comments />
</fiscalSubjectsNotAuditedDueToVatProblems>
</auditingHeader>
</auditingGeneralInformation>
</taxComplianceReport>'
I am try to get with the following sql query:
select
m.c.value('(auditingCompanyAddress)[1]', 'VARCHAR(max)') as auditingCompanyAddress
from sqm as s
outer apply s.data.nodes('taxComplianceReport/auditingGeneralInformation/auditingHeader/auditingCompany') as m(c)
but returns null. I think the problem is the taxComplianceReport but I dont know how to resolve.
Any idea?
There's a namespace in your XML yet you don't define it in your SQL. Define the DEFAULT one and it works:
WITH XMLNAMESPACES(DEFAULT'http://www.gsis.gr/cpaaudit/v12')
SELECT m.c.value('(auditingCompanyAddress)[1]', 'VARCHAR(max)') as auditingCompanyAddress
FROM sqm AS s
OUTER APPLY s.data.nodes('taxComplianceReport/auditingGeneralInformation/auditingHeader/auditingCompany') AS m(c);
Note that, for me, this returns the varchar value 'F. ??G?? 3, 11257 ?T???' as (at least on my collation) a varchar cannot contain characters like Λ and Σ. If you get ?s as well, ensure you are using nvarchars.

XML parsing: line 1, character 345, duplicate attribute

I am trying to get the particular attribute value from XML column, but I'm getting an error
XML parsing: line 1, character 345, duplicate attribute
My code:
select
ship_to_cust_num,
tank_num,
tank_capacity_qty,
tank_pkg_type_code,
COALESCE(REPLACE(CAST(CAST(b.tank_inspection AS NTEXT) AS XML).value('(/TankInspection/Questions/Question[#AASAQno="9"]/#QAns)[1]', 'VARCHAR(50)'), '#', ''), 0)
from
bulk_site_tank (nolock)b
where
convert(varchar, b.tank_inspection) != 'NULL'
The simple answer is that the error is telling you the problem. But to explain further. Take this simple statement:
DECLARE #xml varchar(MAX);
SET #XML = '
<root>
<child>
<element attribute="1">value</element>
<element attribute="2" attribute="2">Another Value</element>
</child>
</root>';
SELECT *
FROM (VALUES(CONVERT(xml, #XML)))V(X);
If you run that, you'll get the error:
Msg 9437, Level 16, State 1, Line 11 XML parsing: line 5, character 46, duplicate attribute
Unsurprising, as if you look, the second element node has attribute declared twice.
So, how do you fix this?
Firstly, this means that you're storing your XML data as a datatype other than in an xml data type. XML should be stored using the xml data type (that's exactly what it's for), and only valid XML can be stored in it; as a result you wouldn't have been able to insert invalid XML into the row and wouldn't be in this position. As you are, there's only one thing you can do; find all the "bad" rows:
SELECT tank_inspection
FROM bulk_site_tank
WHERE TRY_CONVERT(xml,tank_inspection) IS NULL
AND tank_inspection IS NOT NULL;
Then inspect every single row returned in the above dataset and fix the data. Make it valid XML. After that, fix your data type:
ALTER TABLE bulk_site_tank ALTER COLUMN tank_inspection xml;
Now everything is valid XML, you can fix that query of yours:
SELECT ship_to_cust_num,
tank_num,
tank_capacity_qty,
tank_pkg_type_code,
REPLACE(b.tank_inspection.value('(/TankInspection/Questions/Question[#AASAQno="9"]/#QAns)[1]', 'varchar(50)'), '#', '') --AS ?
FROM bulk_site_tank b
WHERE b.tank_inspection IS NOT NULL;
Note I change to ANSI_NULL syntax, and got rid of the NOLOCK (as I assume you don't know what it actually does here). The CAST/CONVERT expressions are gone too, and I've removed the COALESCE. As your value expression returns a varchar(50) and the COALESCE has a 0 for the second parameter. This would implicitly cast the value returned from the XML to an int and likely result in a conversion error.
I'm afraid it's up to you to clean up your data though, no one else can help you here I'm afraid. This is just one reason why poor data type choices is a problem; as if the correct data type was used then,as I said before, the invalid XML could never have been inserted.
Good luck!

Simple SQL Bulk Insert not working

I'm trying to create a simple Bulk Insert command to import a fixed width text file into a table. Once I have this working I'll then expand on it to get my more complex import working.
I'm currently receiving the error...
Msg 4866, Level 16, State 7, Line 1
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
Obviously I have checked the terminator in the file. For test data I just typed a 3 line text file in Notepad. At this stage I'm just trying to import one column per line. I have padded the first two lines so each one is 18 characters long.
Test.txt
This is line one
This is line two
This is line three
When I view the file in Notepad++ and turn on all characaters I see CRLF on the end of each line and no blank lines at the end of the file.
This is the SQL I'm using:
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
with (FormatFile='c:\temp\aba\test2.xml')
Here is the format XML file. I have tried several variations. This one was created using the BCP command.
bcp strata.dbo.vjr_bulk_staging format nul -f test2.xml -x -n -T -S Server\Instance
This created a record and a row entry for my rowid column which I thought was a problem as that is an identity field, so I removed it.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="18" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="raw" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
I'm testing on SQL Server 2008 R2 Express.
Any ideas where I'm going wrong?
I think the problem is with your prefix being 2 bytes long:
xsi:type="CharPrefix" PREFIX_LENGTH="2"
From what you have posted you don't have a prefix in your data file. Set the PREFIX_LENGTH to 0 in your format file, or provide the proper prefix in your data file.
You can find more information about prefix datatypes and what the prefix is about in the documentation: Specify Prefix Length in Data Files by Using bcp (SQL Server).
I think what you really wanted is type CharTerm with a proper TERMINATOR (/r/n in your case).
This works.
Option 1: Non-XML Format File
9.0
1
1 SQLCHAR 0 18 "\r\n" 2 raw SQL_Latin1_General_CP1_CI_AS
or simply
9.0
1
1 SQLCHAR "" "" "\r\n" 2 "" ""
Option 2: XML Format File
Ugly as hell work-around
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" xsi:type="SQLINT"/>
<COLUMN SOURCE="1" xsi:type="SQLCHAR"/>
</ROW>
</BCPFORMAT>
P.s.
It seems to me that there is a bug in the design of the XML format file.
Unlike the Non-XML format file, there is no option to indicate the position of the loaded column (and the names are just for the clarity of the scripts, they have no real meanning).
The XML example in the documentation does not work
Use a Format File to Skip a Table Column (SQL Server)
Could you please try the following command and check if the BULK insert is happening.please note I have added the last line mentioning the delimiter.
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
WITH ( FIELDTERMINATOR ='\t', ROWTERMINATOR ='\n',FIRSTROW=1 )

Bulk Import of XML Into Existing Tables

I am new to XML and SQL Server and am trying import an XML file into SQL Server 2010. I have 14 tables that I would like to parse the data into. All 14 table names are listed in the XML as nodes (I think) I found some example code that worked with the simple example XML, but my XML seems a little more complicated and may not be structured optimally; unfortunately, I can't change that. As a basic attempt, I tried to insert the data into just one field of one existing table (SILVX_SN16000), but the Message pane shows "(0 rows(s) affected). Thanks in advance for looking at this.
USE TEST
Declare #xml XML
Select #xml =
CONVERT(XML,bulkcolumn,2) FROM OPENROWSET(BULK 'C:\Users\Kevin_S\Documents \SilvxInSightImport.xml',SINGLE_BLOB) AS X
SET ARITHABORT ON
Insert into [SILVX_SN16000]
(
md_group
)
Select
P.value('MD_GROUP[1]','NVARCHAR(255)') AS md_group
From #xml.nodes('/TableData/Row') PropertyFeed(P)
Here is a much-shortened (rows removed) version of my XML:
<?xml version="1.0" ?>
<SilvxInSightImport Version="1.0" Host="uslsss17" Date="14-09-14_20-40-02">
<Tables Count="14">
<Table Name="SN16000">
<TableSchema>
<Column><COLUMN_NAME>PARENT_HPKEY</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>MD_GROUP</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>PKEY</COLUMN_NAME><DATA_TYPE>NUMBER</DATA_TYPE></Column>
<Column><COLUMN_NAME>S_STATE</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>NAME</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>ROUTER_ID</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
<Column><COLUMN_NAME>IP_ADDR</COLUMN_NAME><DATA_TYPE>VARCHAR2</DATA_TYPE></Column>
</TableSchema>
<TableData>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>161888</PKEY><NAME>UODEDTM010</NAME><ROUTER_ID>10.41.32.129</ROUTER_ID> <IP_ADDR>10.41.32.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>278599</PKEY><NAME>UODEETM010</NAME><ROUTER_ID>10.41.4.129</ROUTER_ID> <IP_ADDR>10.41.4.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
<Row><MD_GROUP>100.120.25162</MD_GROUP><PARENT_HPKEY>100</PARENT_HPKEY> <PKEY>183583</PKEY><NAME>UODEGRM010</NAME><ROUTER_ID>10.41.76.129</ROUTER_ID> <IP_ADDR>10.41.76.129</IP_ADDR><S_STATE>IS-NR</S_STATE></Row>
NT_HPKEY>100</PARENT_HPKEY><PKEY>811003</PKEY><NAME>UODWTIN010</NAME> <ROUTER_ID>10.27.36.130</ROUTER_ID><IP_ADDR>10.27.36.130</IP_ADDR><S_STATE>IS-NR</S_STATE> </Row>
</TableData>
</Table>
</Tables>
</SilvxInSightImport>
The xPath in .nodes() must specify the whole path to the Row nodes so you should start with SilvxInSightImport and work your way down to Row.
/SilvxInSightImport/Tables/Table/TableData/Row
In your case you have multiple table nodes, one for each table and I assume you only need one table at a time. You can use a predicate on the table name in the .nodes() xPath expression.
/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row
Your whole query for SN16000 should look something like this.
select T.X.value('(MD_GROUP/text())[1]', 'varchar(20)') as MD_GROUP,
T.X.value('(PARENT_HPKEY/text())[1]', 'int') as PARENT_HPKEY,
T.X.value('(PKEY/text())[1]', 'int') as PKEY,
T.X.value('(NAME/text())[1]', 'varchar(20)') as NAME,
T.X.value('(ROUTER_ID/text())[1]', 'varchar(20)') as ROUTER_ID,
T.X.value('(IP_ADDR/text())[1]', 'varchar(20)') as IP_ADDR,
T.X.value('(S_STATE/text())[1]', 'varchar(20)') as S_STATE
from #XML.nodes('/SilvxInSightImport/Tables/Table[#Name = "SN16000"]/TableData/Row') as T(X)
You have to sort out the data types used for each column.
SQL Fiddle

Skip Column in OPENROWSET (BULK)

Trying to bulk insert lots of rows into a table.
My SQL statement:
INSERT INTO [NCAATreasureHunt-dev].dbo.CatalinaCodes(Code)
SELECT (Code)
FROM OPENROWSET(BULK 'C:\Users\Administrator\Desktop\NCAATreasureHunt\10RDM.TXT',
FORMATFILE='C:\Users\Administrator\Desktop\NCAATreasureHunt\formatfile.xml') as t1;
10RDM.TXT:
DJKF61TGN7
Q9TVM16Z6Z
X44T4169FN
JQ2PT1ZXZK
C7NW71QPNG
SFJRR1FWKZ
TYZJW1ZPFY
9MR3M1J3N5
QJ6R217JTK
TVJVW19TYT
formatfile.xml
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="C1" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="C1" NAME="Code" xsi:type="SQLNVARCHAR" />
</ROW>
</BCPFORMAT>
This is the error I'm getting:
Cannot insert the value NULL into column 'Claimed', column does not allow nulls. INSERT fails.
I'm trying to skip the Claimed column. What am I doing wrong in my format file?
See if this answer helps.
With an XML format file, you cannot skip a column when you are
importing directly into a table by using a bcp command or a BULK
INSERT statement. However, you can import into all but the last column
of a table. If you have to skip any but the last column, you must
create a view of the target table that contains only the columns
contained in the data file. Then, you can bulk import data from that
file into the view.
To use an XML format file to skip a table column by using
OPENROWSET(BULK...), you have to provide explicit list of columns in
the select list and also in the target table, as follows:
INSERT ... SELECT FROM OPENROWSET(BULK...)

Resources