MSSQL Bulk Insert Unicode 16 - sql-server

I'm trying to insert an UTF-8 csv file into the MSSQL.
The file contains scandiniavian, russian and chinese characters which do not get properly represented by the insert. e.g. Brøndy is being represented as Br├©ndy. Furthermore, this string takes up a length of 7 spaces instead of 6.
First Try
Table Definition
CREATE TABLE [PSAP].[staging].[ADRC]
(
[MANDT] [NVARCHAR] (12) COLLATE Latin1_General_CI_AS NULL
, [CITY1] [NVARCHAR] (80) COLLATE Latin1_General_CI_AS NULL
)
XML Format File
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='";"' />
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR='";"' />
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR='"' />
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR='\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="CLIENT" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="ADRC" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Bulk Insert Statement
BULK INSERT [PSAP].[staging].[ADRC]
FROM 'E:\sourceData\ADRC.csv' WITH (FIRSTROW = 2, FORMATFILE = 'E:\sourceData\ADRC.csv.xml' )
Data Example
"CLIENT";"CITY1"
"600";"Brøndy"
"600";"武戏"

Related

Inserting XML data in to a table

I need to insert an external XML file data into a SQL Server table. I tried the below code. But this inserts a single record with NULL values for all the columns
Declare #xml XML
Select #xml =
CONVERT(XML,bulkcolumn,2) FROM OPENROWSET(BULK
'C:\Users\PC901\Downloads\Tags.xml',SINGLE_BLOB) AS X
SET ARITHABORT ON
TRUNCATE TABLE Tags
Insert into Tags
(
ID,WikiPostId,ExcerptPostId,Count,TagName
)
Select
P.value('ID[1]','BIGINT') AS ID,
P.value('WikiPostId[1]','BIGINT') AS WikiPostId,
P.value('ExcerptPostId[1]','BIGINT') AS ExcerptPostId,
P.value('Count[1]','BIGINT') AS Count,
P.value('TagName[1]','VARCHAR(100)') AS TagName
From #xml.nodes('/tags') PropertyFeed(P)
SELECT * FROM Tags
and the sample XML would be
<?xml version="1.0" encoding="utf-8"?>
<tags>
<row Id="1" TagName=".net" Count="283778" ExcerptPostId="3624959" WikiPostId="3607476" />
<row Id="2" TagName="html" Count="826083" ExcerptPostId="3673183" WikiPostId="3673182" />
<row Id="3" TagName="javascript" Count="1817846" ExcerptPostId="3624960" WikiPostId="3607052" />
<row Id="4" TagName="css" Count="588062" ExcerptPostId="3644670" WikiPostId="3644669" />
<row Id="5" TagName="php" Count="1286873" ExcerptPostId="3624936" WikiPostId="3607050" />
</tags>
Here you go:
declare #xml xml = '<?xml version="1.0" encoding="utf-8"?>
<tags>
<row Id="1" TagName=".net" Count="283778" ExcerptPostId="3624959" WikiPostId="3607476" />
<row Id="2" TagName="html" Count="826083" ExcerptPostId="3673183" WikiPostId="3673182" />
<row Id="3" TagName="javascript" Count="1817846" ExcerptPostId="3624960" WikiPostId="3607052" />
<row Id="4" TagName="css" Count="588062" ExcerptPostId="3644670" WikiPostId="3644669" />
<row Id="5" TagName="php" Count="1286873" ExcerptPostId="3624936" WikiPostId="3607050" />
</tags>'
Select
P.value('#Id','BIGINT') AS ID,
P.value('#WikiPostId','BIGINT') AS WikiPostId,
P.value('#ExcerptPostId','BIGINT') AS ExcerptPostId,
P.value('#Count','BIGINT') AS Count,
P.value('#TagName','VARCHAR(100)') AS TagName
From #xml.nodes('/tags/row') PropertyFeed(P)
outputs
ID WikiPostId ExcerptPostId Count TagName
----------- -------------------- -------------------- -------------------- ----------
1 3607476 3624959 283778 .net
2 3673182 3673183 826083 html
3 3607052 3624960 1817846 javascript
4 3644669 3644670 588062 css
5 3607050 3624936 1286873 php
(5 rows affected)

Replace multiple XML attribute values in all records with Transact-SQL?

Problem
My MSSQL database has a table records with an XML column data, which is used like this:
<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>
I want to update all rows in the table at once, replacing /record/field/#lang by en-US where it is en-GB at the moment (all elements with that attribute value).
Already tried something like...
declare #i int;
declare #xml xml;
set #xml = (select top(1) [data] from [my-database].[dbo].[records]);
select #i = #xml.value('count(/record/field[lang="en-GB"])', 'int')
while #i > 0
begin
set #xml.modify('
replace value of
(/record/field[lang="en-GB"]/text())[1]
with "en-US"
')
set #i = #i - 1
end
select #xml;
... but it returns the data unchanged and only works if a single row is selected. How can I make this work and update all rows in one go?
Solution
I ended up using XQuery as suggested by Shnugo. My slightly generalized query looks like this:
UPDATE [my-database].[dbo].[records] SET data = data.query(N'
<record>
{
for $attr in /record/#*
return $attr
}
{
for $fld in /record/*
return
if (local-name($fld) = "field")
then <field>
{
for $attr in $fld/#*
return
if (local-name($attr) = "lang" and $attr = "en-GB")
then attribute lang {"en-US"}
else $attr
}
{$fld/node()}
</field>
else $fld
}
</record>
')
FROM [my-database].[dbo].[records]
WHERE [data].exist('/record/field[#lang="en-GB"]') = 1;
SELECT * FROM [my-database].[dbo].[records]
The name of the top most node <record> needs to be hard-coded it seems, because MSSQL server doesn't support dynamic element names (nor attribute names). Its attributes as well as all child elements other than <field> are copied automatically with above code.
An ugly solution without xquery, xpath...:
DECLARE #xml XML = N'<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>'
SET #xml = REPLACE(CAST(#xml AS nvarchar(max)), '"en-GB"', '"en-US"')
SELECT #xml
And use modify()
DECLARE #nodeCount int
DECLARE #i int
SET #i = 1
SELECT #nodeCount = #xml.value('count(/record/field/#lang)','int')
WHILE (#i <= #nodeCount)
BEGIN
Set #xml.modify('replace value of (/record/field/#lang)[.="en-GB"][1] with "en-US"')
SET #i = #i + 1
END
SELECT #xml
Demo link: Rextester
I add this as a second answer, as it follows a completely different approach. The following code will use .query() with a FLWOR query to read the XML as-is but change the attribute lang, when the content is en_GB:
DECLARE #xml XML=
N'<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>';
The query
SELECT #xml.query
(N'
<record id="{/record/#id}">
{
for $fld in /record/field
return <field>
{
for $attr in $fld/#*
return
if(local-name($attr)="lang" and $attr="en-GB") then attribute lang {"en-US"}
else $attr
}
{$fld/text()}
</field>
}
</record>
')
The result
<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-US">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-US">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>
UPDATE: This works with all table's rows too:
Try this to update a full table at once:
DECLARE #tbl TABLE(ID INT IDENTITY,YourXml XML)
INSERT INTO #tbl VALUES
(
N'<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>'
)
,(
N'<record id="2">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>'
);
UPDATE #tbl SET YourXml=YourXml.query
(N'
<record id="{/record/#id}">
{
for $fld in /record/field
return <field>
{
for $attr in $fld/#*
return
if(local-name($attr)="lang" and $attr="en-GB") then attribute lang {"en-US"}
else $attr
}
{$fld/text()}
</field>
}
</record>
');
SELECT * FROM #tbl
I'd avoid the cast to a string type due to side effects (but this might be the easiest approach, especially if the XML might include other nodes, which you do not show in your example...)
I'd avoid loops too.
My approach was to shredd and re-create the XML:
DECLARE #xml XML=
N'<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en-GB">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en-GB">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>';
--The query will read all field's values and rebuild the XML with the changed language
WITH Shredded AS
(
SELECT fld.value(N'#tag',N'nvarchar(max)') AS tag
,fld.value(N'#occ',N'int') AS occ
,fld.value(N'#lang',N'nvarchar(max)') AS lang
,fld.value(N'(./text())[1]',N'nvarchar(max)') AS content
FROM #xml.nodes(N'/record/field') AS A(fld)
)
SELECT #xml.value(N'(/record/#id)[1]',N'int') AS [#id]
,(
SELECT tag AS [#tag]
,occ AS [#occ]
,CASE WHEN lang='en-GB' THEN 'en_US' ELSE lang END AS [#lang]
,content AS [*]
FROM Shredded
FOR XML PATH('field'),TYPE
) AS [*]
FOR XML PATH(N'record')
The result
<record id="1">
<field tag="DI" occ="1" lang="de-DE">Höhe</field>
<field tag="DI" occ="1" lang="en_US">height</field>
<field tag="WA">173</field>
<field tag="EE">cm</field>
<field tag="DI" occ="2" lang="de-DE">Breite</field>
<field tag="DI" occ="2" lang="en_US">width</field>
<field tag="WA">55</field>
<field tag="EE">cm</field>
</record>
Yeah, unfortunately, the replace value of statement only updates one node at a time. So in your case, a quick and dirty replace would be the easiest to write (and, with luck, maybe even the fastest to run):
update t set [data] = cast(
replace(cast(t.[data] as nvarchar(max)), N' lang="en-GB"', N' lang="en-US"')
as xml)
from dbo.Records t
where t.[data].exist('/record/field[#lang="en-GB"]') = 1;
If you XML schema varies such as that there is no guarantee that the /record node will be always at the top level, you might want to modify the filter as such:
where t.[data].exist('//record/field[#lang="en-GB"]') = 1;
Another approach would be to use a FLWOR statement, but if the XML structure varies significantly and contains other unpredictable nodes, it becomes rather difficult not to lose anything accidentally. Which in turn will lead to poorer performance. For this approach to be viable, your XML schema has to be very stable.

How to call a stored procedure with parameter from dataimporthandler

Hi can anybody tell me how to call a stored procedure with parameter in db-config.xml file.
I have a stored procedure which takes date time as a parameter and it returns me the rows which are created or modified after the specified parameter.
While trying to call that stored procedure from dataimporthandler i am not getting success.
My db-config.xml is like below
<dataConfig>
<dataSource type="JdbcDataSource" name="ds-1" driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://DataBaseURL;databaseName=DBtest" user="sa" password="MyPassword" readOnly="True" />
<document>
<entity dataSource="ds-1" name="JobSeekerDetails" query="exec [SP_GetAllJobSeekerDetails] '${dih.last_index_time}'"
deltaImportQuery="exec [SP_GetAllJobSeekerDetails] '${dih.last_index_time}'"
deltaQuery="exec [SP_GetAllJobSeekerDetails] '${dih.last_index_time}'">
<field column="FName" name="FName" />
<field column="LName" name="LName" />
<field column="Exp_year" name="Exp_year" />
<field column="Exp_Month" name="Exp_Month" />
<field column="INDUSTRY" name="INDUSTRY" />
<field column="DESIREDPOSITION" name="DESIREDPOSITION" />
<field column="ROLE_NAME" name="ROLE_NAME" />
<field column="PHONE_NUMBER" name="PHONE_NUMBER" />
<field column="COUNTRY_NAME" name="COUNTRY_NAME" />
<field column="STATE_NAME" name="STATE_NAME" />
<field column="CITY_NAME" name="CITY_NAME" />
<field column="POSTAL_CODE" name="POSTAL_CODE" />
</entity>
</document>
</dataConfig>
Please help me to find out where i am doing mistakes.
Also i have tried without using exec key word inside query parameter.
Thanks in advance.

xml.value() method in SQL Server (Getting a value inside an XML query)

I have an XML Query like this:
<ChangeSet xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Change DateTime="2011-12-02T09:01:58.3615661-08:00" UserId="3123">
<Table ChangeType="Insert" Name="EVNT_LN_AFF">
<Keys>
<Key FieldName="DIR_CD" Value="NB" />
<Key FieldName="LN_ID" Value="A" />
<Key FieldName="EVNT_ID" Value="10T000289" />
</Keys>
<ChangedFields>
<Field FieldName="DIR_CD" Previous="" Current="NB" />
<Field FieldName="LN_ID" Previous="" Current="A" />
<Field FieldName="EVNT_ID" Previous="" Current="10T000289" />
<Field FieldName="UD_DTTM" Previous="" Current="12/2/2011 9:01:59 AM" />
<Field FieldName="UD_USER_ID" Previous="" Current="3123" />
</ChangedFields>
</Table>
(The query goes on)
Now I want to use a statement like this:
SELECT TOP 1000 [CHG_LOG_ID]
, [EVNT_ID]
, [DATA_XML_TXT]
, [UD_DTTM]
FROM [MY_PROJ].[dbo].[EVNT_CHG_LOG]
WHERE DATA_XML_TXT.value('(/ChangeSet/Change/Table/ChangedFields/UD_USER_ID)[0]','varchar(50)') like '%3123%'
But when I execute the query, I don't get any results.
I tested the following XQuery, and it should give you what you need:
SELECT TOP 1000 [CHG_LOG_ID]
, [EVNT_ID]
, [DATA_XML_TXT]
, [UD_DTTM]
FROM [MY_PROJ].[dbo].[EVNT_CHG_LOG]
WHERE DATA_XML_TXT.value('(/ChangeSet/Change/Table/ChangedFields/Field[#FieldName="UD_USER_ID"]/#Current)[1]','varchar(50)') like '%3123%'
Note: Indexing for XQuery starts at 1 instead of 0

Validating individual XML elements in SQL Server 2008R2

I'm writing a stored procedure to process XML data uploaded by the user:
<People>
<Person Id="1" FirstName="..." LastName="..." />
<Person Id="2" FirstName="..." LastName="..." />
<Person Id="3" FirstName="..." LastName="..." />
<Person Id="4" FirstName="..." LastName="..." />
<Person Id="5" FirstName="..." LastName="..." />
</People>
I would like to use a schema to make sure that the entities are valid, but I don't want the entire process to fail just because of one invalid entity. Instead, I would like to log all invalid entities to a table and process the valid entities as normal.
Is there a recommended way to do this?
A pure SQL approach would be:
Create a schema collection that defines <Person>:
CREATE XML SCHEMA COLLECTION [dbo].[testtest] AS
N'<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Person">
<xs:complexType>
<xs:attribute name="Id" type="xs:int" use="required"/>
<xs:attribute name="FirstName" type="xs:string" use="required"/>
<xs:attribute name="LastName" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
'
(one-time operation)
Have an XML query that selects each <Person> node from <People> as a separate row.
Declare a cursor on that query and select each row into an untyped xml variable. After the select, try to assign to a typed xml variable from within a try-catch block.
Resulting code would look like:
declare #source xml = N'
<People>
<Person Id="1" FirstName="..." LastName="..." />
<Person Id="2" FirstName="..." LastName="..." />
<Person Id="f" FirstName="..." LastName="..." />
<Person Id="4" FirstName="..." LastName="..." />
<Person Id="5" FirstName="..." LastName="..." />
</People>';
declare foo cursor
local
forward_only
read_only
for
select t.p.query('.')
from #source.nodes('People/Person') as t(p)
;
declare #x xml (dbo.testtest);
declare #x_raw xml;
open foo;
fetch next from foo into #x_raw;
while ##fetch_status = 0
begin
begin try
set #x = #x_raw;
print cast(#x_raw as nvarchar(max)) + ': OK';
end try
begin catch
print cast(#x_raw as nvarchar(max)) + ': FAILED';
end catch;
fetch next from foo into #x_raw;
end;
close foo;
deallocate foo;
Result:
<Person Id="1" FirstName="..." LastName="..."/>: OK
<Person Id="2" FirstName="..." LastName="..."/>: OK
<Person Id="f" FirstName="..." LastName="..."/>: FAILED
<Person Id="4" FirstName="..." LastName="..."/>: OK
<Person Id="5" FirstName="..." LastName="..."/>: OK
A simpler option is to create a CLR stored procedure that would parse XML in a .NET language.

Resources