SQL Server - BulkInsert from positional .txt using .xml fileformat error - sql-server

Good afternoon to all, I have this scenario:
I am using SQL Server 'BulkInsert' command to insert data in a table from a positional (.txt) file.
I use, to define structure of the file, a .xml file that defines position (and lenght) of the fields and their names.
These are 2 sample rows of the .txt positional file:
AAA111111Surname 1 Name 1
BBB222222Surname 23 Name 99
My .xml format file is defined as below:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="3" />
<FIELD ID="2" xsi:type="CharFixed" LENGTH="6" />
<FIELD ID="3" xsi:type="CharFixed" LENGTH="20" />
<FIELD ID="4" xsi:type="CharFixed" LENGTH="20" />
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="AlfaCode" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="2" NAME="NumericCode" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="Surname" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="Name" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
My SQL Server code is:
DELETE from MY_TABLE
BULK INSERT MY_TABLE FROM 'C:\Directory\InputFile.txt'
WITH (
FORMATFILE = 'C:\Directory\FormatFile.xml'
)
But when I run in SQL Server the sp, I have the following error:
Msg 4866, Level 16, State 1, Line 3
The bulk load failed. The column is too long in the data file for row 1, column 58. Verify that the field terminator and row terminator are specified correctly.
Msg 7399, Level 16, State 1, Line 3
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 3
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
This has always run perfectly until 2 months ago, then some wrong data were introduced into the file and procedure failed. Now data into the InputFile.txt are correct again, but procedure doesen't work
I checked more than 1 time InputFile.txt, FormatFile.xml and, to be sure, also MY_TABLE, but all seems perfect.
I am desperate because all seems ok, I compared also old .xml files substiuted adding only some fields.
Please answer ASAP and sorry if my english is very bad.
Don't esitate to tell me other informations.
Thanks to all

I think it's most likely your input file must still be wrong. (even though you think you fixed it).
In your example, you have 50 characters on line 1 before the line break.
Your XML says you should only have 49 chars! (3+6+20+20)
Your second line in the example only has 39 characters before the linebreak.
It's probably also worth opening your txt file in a text editor that will show you the line breaks. For instance, in notepad++, go to View -> Show Symbol -> Show All Characters. Then you can see the CR and LF characters, to verify they are there.

Related

Problem with SQL bulk insert tab delimited file

I am having a problem with using bulk insert. The issue is that the source files (tab delimited) that I'm dealing with contain rows that end in cr/lf without filling in values of the empty columns with tab for the rest of the row. So when the data is pulled into SQL Server, it's combining those shortened lines into the previous line. so basically it's combining multiple rows into one rather than writing it as two separate rows with nulls at the end of the first row.
Example to illustrate the problem: sample .txt file
column1 column2 column3 column4 column5
1 2 3 4 5
2 5 4 6
4 4 6 4
4 5 6 4 6
SQL to create table and bulk insert
CREATE TABLE test (
[column1] varchar(MAX) NULL,
[column2] varchar(MAX) NULL,
[column3] varchar(MAX) NULL,
[column4] varchar(MAX) NULL,
[column5] varchar(MAX) NULL
)
BULK INSERT test
FROM 'c:\temp\testimport.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r'
);
The really strange thing is that I can use the data import wizard and it imports the data perfectly, without any issue, and handles the lack of tabs for the columns just fine. But I don't know what the wizard is doing behind the scenes to make this happen. I would love to have the code it uses to create the table and do the insert as that would probably answer my question for me. At the end of the day I can't use the wizard as this will eventually be part of an automated task I'll be running against an SQL Server Express database on multiple files with different names but the same column header.
Maybe bulk insert isn't the way to go here? Or there is something obvious I'm missing that someone else might know off the top of their head. Either way, all help is appreciated and thanks in advance.
As Tim H suggested I've made a few attempts at creating a format file to accommodate the data. Results so far are as follows.
Using
bcp temp.dbo.test format nul -x -f test_format.xml -n -T
produces
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Using this temp file as is produces......
Msg 4866, Level 16, State 7, Line 31
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
My attempt to edit the XML to work.....
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Which does insert the data, but unfortunately still produces the same jumbled insert with overlapping lines in the same row.
Do you have control over the source files? If not, are the width of each column a fixed width or variable width? I know your create table example uses varchar(max)'s. The bulk insert feature in SQL Server allows you to use a format file that better defines how the expected input should be formatted, by column, including whether a column is nullable. Microsoft's doc for bulk inserts is actually pretty helpful (https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15), especially the link at the end of the page to format files.
This page (https://learn.microsoft.com/en-us/sql/relational-databases/import-export/keep-nulls-or-use-default-values-during-bulk-import-sql-server?view=sql-server-ver15) directly deals with null values, which would be your predicament.
A better answer would be to add the following to your BULK INSERT...WITH statement: KEEPNULLS. It does as you would expect: it keeps null values instead of tossing them. The bulk insert utility will toss nulls by default.
Never found a direct solution from SQL express for this. I ended up going with PowerShell scripting to solve the problem. Import-CSV pulled the data from the files uniformly and without issue. Not sure why, but it handled the data far better than SQL did. From there I used variables for each line and Invoke-SQLCmd and some SQL scripting to import them into the DB. Worked like a charm. Since this process is all on the local server there aren't any security issues to worry about, so it was an acceptable solution. Thanks again for all the suggestions and help though.

How to load UTF-8 CSV files using Bulk Insert and an XML Format file in SQL Server 2017

After much trying, I have found that since SQL server 2017 (2016?), loading UTF-8 encoded CSV files through Bulk Insert has become possible by using the options CODEPAGE = 65001 and DATAFILETYPE = 'Char', as explained in some other questions.
What doesn't seem to work, is doing the same when using an XML formatfile. I have tried this by still using the CODEPAGE and DATAFILETYPE options, and also with these options omited. And I have tried this with the most simple dataset. One row, one column, containing some text with an UTF-8 character.
This is the XML Formatfile I am using.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="STREET" xsi:type="NCharTerm" TERMINATOR="\r\n" MAX_LENGTH="1000" COLLATION="Latin1_General_CS_AS_WS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="STREET" NAME="STREET" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Even through the source data only contains some text with 1 special character, the end result looks like this: 慊潫ⵢ瑓晥慦⵮瑓慲鿃⁳㐱
When using xsi:type="CharTerm" instead of xsi:type="NCharTerm" the result looks like this: ...-Straßs ...
Am I doing something wrong, or has UTF-8 support not been properly implemented for XML format files?
After playing around with this, I have found the solution.
Notes
This works with or without BOM header. It is irrelevant.
The culprit was using the COLLATION parameter in the XML file. Omitting it solved the encoding problem. I have an intuitive sense as to why this is the case, but maybe someone with more insight could explain in the comments...
The DATAFILETYPE = 'char' option doesn't seem necessary.
In the XML file, the xsi:type for the field needs to be CharTerm, not NCharTerm.
This works with \r\n, \n, or \r. As long as you set the TERMINATOR correctly, this works. No \n\0 variations required (this would even break functionality, since this is not UTF-16 or UCS-2).
Below you can find a proof-of-concept for easy reuse...
data.txt
ß
ß
ß
Table
CREATE TABLE [dbo].[TEST](
TEST [nvarchar](500) NULL
)
formatfile.xml
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="20"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="TEST" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
Bulk insert
bulk insert TEST..TEST
from 'data.txt'
with (formatfile = 'formatfile.xml', CODEPAGE = 65001)
Change your terminator to TERMINATOR="\r\0\n\0". You have to account for the extra bytes when using NCharTerm.

Simple SQL Bulk Insert not working

I'm trying to create a simple Bulk Insert command to import a fixed width text file into a table. Once I have this working I'll then expand on it to get my more complex import working.
I'm currently receiving the error...
Msg 4866, Level 16, State 7, Line 1
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
Obviously I have checked the terminator in the file. For test data I just typed a 3 line text file in Notepad. At this stage I'm just trying to import one column per line. I have padded the first two lines so each one is 18 characters long.
Test.txt
This is line one
This is line two
This is line three
When I view the file in Notepad++ and turn on all characaters I see CRLF on the end of each line and no blank lines at the end of the file.
This is the SQL I'm using:
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
with (FormatFile='c:\temp\aba\test2.xml')
Here is the format XML file. I have tried several variations. This one was created using the BCP command.
bcp strata.dbo.vjr_bulk_staging format nul -f test2.xml -x -n -T -S Server\Instance
This created a record and a row entry for my rowid column which I thought was a problem as that is an identity field, so I removed it.
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="18" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="raw" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
I'm testing on SQL Server 2008 R2 Express.
Any ideas where I'm going wrong?
I think the problem is with your prefix being 2 bytes long:
xsi:type="CharPrefix" PREFIX_LENGTH="2"
From what you have posted you don't have a prefix in your data file. Set the PREFIX_LENGTH to 0 in your format file, or provide the proper prefix in your data file.
You can find more information about prefix datatypes and what the prefix is about in the documentation: Specify Prefix Length in Data Files by Using bcp (SQL Server).
I think what you really wanted is type CharTerm with a proper TERMINATOR (/r/n in your case).
This works.
Option 1: Non-XML Format File
9.0
1
1 SQLCHAR 0 18 "\r\n" 2 raw SQL_Latin1_General_CP1_CI_AS
or simply
9.0
1
1 SQLCHAR "" "" "\r\n" 2 "" ""
Option 2: XML Format File
Ugly as hell work-around
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" xsi:type="SQLINT"/>
<COLUMN SOURCE="1" xsi:type="SQLCHAR"/>
</ROW>
</BCPFORMAT>
P.s.
It seems to me that there is a bug in the design of the XML format file.
Unlike the Non-XML format file, there is no option to indicate the position of the loaded column (and the names are just for the clarity of the scripts, they have no real meanning).
The XML example in the documentation does not work
Use a Format File to Skip a Table Column (SQL Server)
Could you please try the following command and check if the BULK insert is happening.please note I have added the last line mentioning the delimiter.
USE [Strata]
GO
drop table VJR_Bulk_Staging
Create Table [dbo].[VJR_Bulk_Staging](
[rowid] int Identity(1,1) Primary Key,
[raw] [varchar](18) not null)
GO
Bulk Insert [VJR_Bulk_Staging]
From 'c:\temp\aba\test.txt'
WITH ( FIELDTERMINATOR ='\t', ROWTERMINATOR ='\n',FIRSTROW=1 )

Bulk Insert with Format File (Fixed Width) - Unexpected end of file was encountered

BULK INSERT [Alldlyinventory]
FROM 'C:\Users\Admin\Documents\2NobleEstates\DATA\Download\Output\test.txt'
WITH (FORMATFILE = 'C:\SQL Data\FormatFiles\test.xml');
Format file:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="8" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="7" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="4" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="DAY_NUMBER" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="2" NAME="LCBO_NO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="LOCATION_NUMBER" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LISTING_STATUS" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="QTY_ON_HAND" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
But I am getting the following error on SQL Server 2014:
Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
It's a fixed width import.
Sample txt:
2016032803170570371L 000000014
2016032803367430371L 000000013
2016032803403800371L 000000036
2016032804007540371L 000000015
Looking at your sample text file, it looks like you have a row terminator that is carriage return ({CR}) + linefeed ({LF}).
You can inspect this by opening the text file with a text editor that can show special symbols. I can recommend Notepad++ which is free and good for this purpose (Menu View>Show Symbol>Show All Characters).
If the row terminator is indeed {CR}{LF}, you should use xsi:type="CharTerm" along with a TERMINATOR="\r\n" attribute for the last <FIELD> in the <RECORD> element:
<RECORD>
...
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
You can find more information on fixed field import in the following link: XML Format Files (SQL Server) # Importing fixed-length or fixed-width fields

MS SQL Server, Bulk Insert failing insert file in UTF-16 BE

I have a problem with Bulk Insert on MS SQL Server 2012. Input file is saved in UTF-16 BE.
BULK INSERT Positions
FROM 'C:\DEV\Test\seq.filename.csv'
WITH
(
DATAFILETYPE = 'widechar',
FORMATFILE = 'C:\DEV\Test\Format.xml'
);
Fortmat file:
<?xml version="1.0" encoding="utf-16"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="ActionCode" xsi:type="NCharFixed" LENGTH="4" />
<FIELD ID="T1" xsi:type="NCharFixed" LENGTH="2" />
<FIELD ID="ReofferCounter" xsi:type="NCharFixed" LENGTH="6" />
<FIELD ID="T1" xsi:type="NCharFixed" LENGTH="2" />
... other fields....
</RECORD>
<ROW>
<COLUMN SOURCE="ActionCode" NAME="DT" xsi:type="SQLNCHAR" LENGTH="255" />
<COLUMN SOURCE="ReofferCounter" NAME="NO" xsi:type="SQLNCHAR" LENGTH="255" />
</ROW>
</BCPFORMAT>
Input file sample:
02|+00|... other cols....
02|+00|... other cols....
I have two problems:
1) If the input file has encoding UTF-16 BE, I get only chinesee characters instead of numbers.
2) If I convert the input file to the UTF-16 LE, I see correct characters, but the character data are shifted one character to the left - as if BOM was parsed (and counted as 1 character), but not transformed to the output (which I do not desire).
Questions:
1) I there a way, how to import a file in UTF-16 BE withou conversion to LE?
2) What causes the shift and how to avoid it?

Resources