SELECT FROM OPENROWSET( BULK...) changing special characters

SELECT FROM OPENROWSET( BULK...) changing special characters - sql-server

I've been facing an issue for a few hours, and I can't seem to get my head around this one.
So I have a SQL Server database 2008R2, Collation SQL_Latin1_General_CP1_CI_AS.
Inside there is a table, with a field named incoming_name. The collation of this field is also SQL_Latin1_General_CP1_CI_AS, and it is a NVARCHAR(255).
I have a .csv file with around 123000 rows. It's a basic csv, no double quotes around text, but no comma inside the fields, so when I run a manual import into my database it works fine. The incoming_name field contains all kind of text, but never longer than 255 characters. And in a few lines there are french accents (like 'Ch*â*teau d'Agassac').
Now I try to use the code
select
test_file.[INCOMING_NAME] COLLATE SQL_Latin1_General_CP1_CI_AS
as [INCOMING_NAME]
, test_file.[PRODUCT_CODE] AS [PRODUCT_CODE]
FROM
OPENROWSET(
BULK 'INSERT PATH OF THE .CSV HERE',
FORMATFILE = 'INSERT PATH OF THE FORMAT FILE HERE',
FIRSTROW = 2
) AS test_file
With the format file
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',' MAX_LENGTH="255" COLLATION="SQL_LATIN1_GENERAL_CP1_CI_AS" />
<FIELD ID="29" xsi:type="CharTerm" TERMINATOR='\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="4" NAME="INCOMING_NAME" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="29" NAME="PRODUCT_CODE" xsi:type="SQLNVARCHAR"/>
</ROW>
The import works fine, and I get all my data, with the right values in the right fields, except for the accents...
For example when I add where test_file.incoming_name like '%agassac%' at the end of my query, I get a result like 'ChÃ¢teau d'Agassac' instead of the original data 'Château d'Agassac' in my database.
What I don't understand is that I feel like at every step of the process, I did pick an accent sensitive collation, with a unicode datatype (NVARCHAR), so I really don't understand why the import doesn't pick the accents.
Thanks for reading this long question,
John.
EDIT: Ok, it looks like the .csv file I want to import is encoded with utf-8, and SQL Server 2008 doesn't want to support utf-8 import. Now I have no idea what to do. Any idea welcome...

I think adding widenative as DATAFILETYPE should resolve the issue. Please refer to this link for further details: http://msdn.microsoft.com/en-us/library/ms189941.aspx

Related

How to load UTF-8 CSV files using Bulk Insert and an XML Format file in SQL Server 2017

After much trying, I have found that since SQL server 2017 (2016?), loading UTF-8 encoded CSV files through Bulk Insert has become possible by using the options CODEPAGE = 65001 and DATAFILETYPE = 'Char', as explained in some other questions.
What doesn't seem to work, is doing the same when using an XML formatfile. I have tried this by still using the CODEPAGE and DATAFILETYPE options, and also with these options omited. And I have tried this with the most simple dataset. One row, one column, containing some text with an UTF-8 character.
This is the XML Formatfile I am using.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="STREET" xsi:type="NCharTerm" TERMINATOR="\r\n" MAX_LENGTH="1000" COLLATION="Latin1_General_CS_AS_WS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="STREET" NAME="STREET" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Even through the source data only contains some text with 1 special character, the end result looks like this: 慊潫ⵢ瑓晥慦⵮瑓慲鿃⁳㐱
When using xsi:type="CharTerm" instead of xsi:type="NCharTerm" the result looks like this: ...-StraÃŸs ...
Am I doing something wrong, or has UTF-8 support not been properly implemented for XML format files?

After playing around with this, I have found the solution.
Notes
This works with or without BOM header. It is irrelevant.
The culprit was using the COLLATION parameter in the XML file. Omitting it solved the encoding problem. I have an intuitive sense as to why this is the case, but maybe someone with more insight could explain in the comments...
The DATAFILETYPE = 'char' option doesn't seem necessary.
In the XML file, the xsi:type for the field needs to be CharTerm, not NCharTerm.
This works with \r\n, \n, or \r. As long as you set the TERMINATOR correctly, this works. No \n\0 variations required (this would even break functionality, since this is not UTF-16 or UCS-2).
Below you can find a proof-of-concept for easy reuse...
data.txt
ß
ß
ß
Table
CREATE TABLE [dbo].[TEST](
TEST [nvarchar](500) NULL
)
formatfile.xml
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="TEST" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Bulk insert
bulk insert TEST..TEST
from 'data.txt'
with (formatfile = 'formatfile.xml', CODEPAGE = 65001)

Change your terminator to TERMINATOR="\r\0\n\0". You have to account for the extra bytes when using NCharTerm.

Simple SQL Bulk Insert not working

I'm trying to create a simple Bulk Insert command to import a fixed width text file into a table. Once I have this working I'll then expand on it to get my more complex import working.
I'm currently receiving the error...
Msg 4866, Level 16, State 7, Line 1
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
Obviously I have checked the terminator in the file. For test data I just typed a 3 line text file in Notepad. At this stage I'm just trying to import one column per line. I have padded the first two lines so each one is 18 characters long.
Test.txt
This is line one
This is line two
This is line three
When I view the file in Notepad++ and turn on all characaters I see CRLF on the end of each line and no blank lines at the end of the file.
This is the SQL I'm using:
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
with (FormatFile='c:\temp\aba\test2.xml')
Here is the format XML file. I have tried several variations. This one was created using the BCP command.
bcp strata.dbo.vjr_bulk_staging format nul -f test2.xml -x -n -T -S Server\Instance
This created a record and a row entry for my rowid column which I thought was a problem as that is an identity field, so I removed it.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="18" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="raw" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
I'm testing on SQL Server 2008 R2 Express.
Any ideas where I'm going wrong?

I think the problem is with your prefix being 2 bytes long:
xsi:type="CharPrefix" PREFIX_LENGTH="2"
From what you have posted you don't have a prefix in your data file. Set the PREFIX_LENGTH to 0 in your format file, or provide the proper prefix in your data file.
You can find more information about prefix datatypes and what the prefix is about in the documentation: Specify Prefix Length in Data Files by Using bcp (SQL Server).
I think what you really wanted is type CharTerm with a proper TERMINATOR (/r/n in your case).

This works.
Option 1: Non-XML Format File
9.0
1
1 SQLCHAR 0 18 "\r\n" 2 raw SQL_Latin1_General_CP1_CI_AS
or simply
9.0
1
1 SQLCHAR "" "" "\r\n" 2 "" ""
Option 2: XML Format File
Ugly as hell work-around
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" xsi:type="SQLINT"/>
<COLUMN SOURCE="1" xsi:type="SQLCHAR"/>
</ROW>
</BCPFORMAT>
P.s.
It seems to me that there is a bug in the design of the XML format file.
Unlike the Non-XML format file, there is no option to indicate the position of the loaded column (and the names are just for the clarity of the scripts, they have no real meanning).
The XML example in the documentation does not work
Use a Format File to Skip a Table Column (SQL Server)

Could you please try the following command and check if the BULK insert is happening.please note I have added the last line mentioning the delimiter.
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
WITH ( FIELDTERMINATOR ='\t', ROWTERMINATOR ='\n',FIRSTROW=1 )

Skip Column in OPENROWSET (BULK)

Trying to bulk insert lots of rows into a table.
My SQL statement:
INSERT INTO [NCAATreasureHunt-dev].dbo.CatalinaCodes(Code)
SELECT (Code)
FROM OPENROWSET(BULK 'C:\Users\Administrator\Desktop\NCAATreasureHunt\10RDM.TXT',
FORMATFILE='C:\Users\Administrator\Desktop\NCAATreasureHunt\formatfile.xml') as t1;
10RDM.TXT:
DJKF61TGN7
Q9TVM16Z6Z
X44T4169FN
JQ2PT1ZXZK
C7NW71QPNG
SFJRR1FWKZ
TYZJW1ZPFY
9MR3M1J3N5
QJ6R217JTK
TVJVW19TYT
formatfile.xml
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="C1" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="C1" NAME="Code" xsi:type="SQLNVARCHAR" />
</ROW>
</BCPFORMAT>
This is the error I'm getting:
Cannot insert the value NULL into column 'Claimed', column does not allow nulls. INSERT fails.
I'm trying to skip the Claimed column. What am I doing wrong in my format file?

See if this answer helps.
With an XML format file, you cannot skip a column when you are
importing directly into a table by using a bcp command or a BULK
INSERT statement. However, you can import into all but the last column
of a table. If you have to skip any but the last column, you must
create a view of the target table that contains only the columns
contained in the data file. Then, you can bulk import data from that
file into the view.
To use an XML format file to skip a table column by using
OPENROWSET(BULK...), you have to provide explicit list of columns in
the select list and also in the target table, as follows:
INSERT ... SELECT FROM OPENROWSET(BULK...)

Bulk insert using format file

My database named 'dictionary' have two column named 'column1' and 'column2'. Both can accept NULL value. The data-type of both columns is INT. Now I want to insert into only column2 from a text file using bcp. I made a format file. My format file is like that
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column2" xsi:type="SQLINT"/>
</ROW>
</BCPFORMAT>
and my bulk statement is like
BULK INSERT dictionary
FROM 'C:\Users\jka\Desktop\n.txt'
WITH
(
FIELDTERMINATOR = '\n',
ROWTERMINATOR = '\n',
FORMATFILE = 'path to my format file.xml'
)
But it didn't work? How can I solve this?
N:B:
My txt file looks like
123
456
4101
......
One more question Edited:
i can fill one colum by this technique but when i fill another column from a text file like before from the 1st row. how can i do that ???

Assuming that your format file is correct I believe you need to ditch FIELDTERMINATOR and ROWTERMINATOR from your BULK INSERT
BULK INSERT dictionary
FROM 'C:\Users\jka\Desktop\n.txt'
WITH (FORMATFILE = 'path to my format file.xml')
Also make sure that:
input file's encoding is correct. In your case most likely it should be ANSI and not UTF-8 or Unicode.
row terminator (which is second field terminator in your format file) is actually \r\n and not \n.
UPDATE Since you need to skip first column:
With an XML format file, there is no way to skip a column when you are importing directly into a table by using a BULK INSERT statement. In order to achieve desired result and still use XML format file you need to use OPENROWSET(BULK...) and provide explicit list of columns in the select list and in the target table.
So to insert data only to column2 use:
INSERT INTO dictionary(column2)
SELECT column2
FROM OPENROWSET(BULK 'C:\temp\infile1.txt',
FORMATFILE='C:\temp\bulkfmt.xml') as t1;
If your data file has only one field your format file can look like this
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="C1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
</RECORD>
<ROW>
<COLUMN SOURCE="C1" NAME="column2" xsi:type="SQLINT"/>
</ROW>
</BCPFORMAT>

Your data file contains one field, so your format file should reflect that
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>

SQL Server - BulkInsert from positional .txt using .xml fileformat error

Good afternoon to all, I have this scenario:
I am using SQL Server 'BulkInsert' command to insert data in a table from a positional (.txt) file.
I use, to define structure of the file, a .xml file that defines position (and lenght) of the fields and their names.
These are 2 sample rows of the .txt positional file:
AAA111111Surname 1 Name 1
BBB222222Surname 23 Name 99
My .xml format file is defined as below:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="3" />
<FIELD ID="2" xsi:type="CharFixed" LENGTH="6" />
<FIELD ID="3" xsi:type="CharFixed" LENGTH="20" />
<FIELD ID="4" xsi:type="CharFixed" LENGTH="20" />
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="AlfaCode" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="2" NAME="NumericCode" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="Surname" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="Name" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
My SQL Server code is:
DELETE from MY_TABLE
BULK INSERT MY_TABLE FROM 'C:\Directory\InputFile.txt'
WITH (
FORMATFILE = 'C:\Directory\FormatFile.xml'
)
But when I run in SQL Server the sp, I have the following error:
Msg 4866, Level 16, State 1, Line 3
The bulk load failed. The column is too long in the data file for row 1, column 58. Verify that the field terminator and row terminator are specified correctly.
Msg 7399, Level 16, State 1, Line 3
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 3
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
This has always run perfectly until 2 months ago, then some wrong data were introduced into the file and procedure failed. Now data into the InputFile.txt are correct again, but procedure doesen't work
I checked more than 1 time InputFile.txt, FormatFile.xml and, to be sure, also MY_TABLE, but all seems perfect.
I am desperate because all seems ok, I compared also old .xml files substiuted adding only some fields.
Please answer ASAP and sorry if my english is very bad.
Don't esitate to tell me other informations.
Thanks to all

I think it's most likely your input file must still be wrong. (even though you think you fixed it).
In your example, you have 50 characters on line 1 before the line break.
Your XML says you should only have 49 chars! (3+6+20+20)
Your second line in the example only has 39 characters before the linebreak.
It's probably also worth opening your txt file in a text editor that will show you the line breaks. For instance, in notepad++, go to View -> Show Symbol -> Show All Characters. Then you can see the CR and LF characters, to verify they are there.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SELECT FROM OPENROWSET( BULK...) changing special characters - sql-server

I think adding widenative as DATAFILETYPE should resolve the issue. Please refer to this link for further details: http://msdn.microsoft.com/en-us/library/ms189941.aspx

Related

How to load UTF-8 CSV files using Bulk Insert and an XML Format file in SQL Server 2017

Simple SQL Bulk Insert not working

Skip Column in OPENROWSET (BULK)

Bulk insert using format file

SQL Server - BulkInsert from positional .txt using .xml fileformat error

Categories

Resources