Unicode flat file import to SQL - sql-server

I am trying to bulk import data into ms-sql 2016, but, because of 2-bytes length characters (like Ü, Ä, etc), I am facing problem:
wrapping fields
Source is fixed-length, unicode (utf-8) text file with special (wide) characters:
this a sample part of file:
ABS525 0128211024200
ABS526 0128211024200
ABS527 0128211024200
ABS528 0128211024200
ABS529 0128211024200
Ölrücklaufleitung 0128211037390
Ölzu- und Ölrücklaufle0128211037390
Ölzulaufleitung 0128211037390
field lengths are: 22 - 4 - 3 - 5 - 1
I tried every way:
- import wizard in Management Studio,
- SSDT import,
- bulk import,
- openrowset,
- bcp command line
nothing worked, actually, they work unless there is a special character in the row.
This is my bulk insert code:
BULK INSERT [tecdoc2].[dbo].[211]
FROM 'C:\Users\Administrator\Desktop\D_TAF24\211yeni.0128'
WITH (MAXERRORS=50, CODEPAGE = '65001', DATAFILETYPE = 'widechar', FORMATFILE = 'C:\Users\Administrator\Desktop\BCP_Formats\a211.xml')
This is my format file (here, I tried a lot of combinations):
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="22" />
<FIELD ID="2" xsi:type="CharFixed" LENGTH="4" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="3" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="5" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ArtNr" xsi:type="SQLNVARCHAR" LENGTH="22" />
<COLUMN SOURCE="2" NAME="DLNr" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="3" NAME="SA" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="4" NAME="GenArtNr" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="5" NAME="Losch-Flag" xsi:type="SQLNCHAR" />
</ROW>
</BCPFORMAT>
all fields in sql are nvarchar (with the specified lengths, actually I made a lot of trials here: double the specified lengths, or 'max', etc)
would you have any advice? I would appreciate.
With Kind Regards,
Murat

This is exactly the problem I am having but with OPENROWSET. If the file is delimited it work fine.
The only way around this issue I have found is to import the whole row into a single nvarchar(Big Enough) column and parse it out with the database. Works fine then, but a royal pain in the bottom.
If you change your format file to be:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="35" />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="RowData" xsi:type="SQLNVARCHAR" LENGTH="35"/>
</ROW>
</BCPFORMAT>
Then you import query can be:
INSERT INTO [tecdoc2].[dbo].[211]
(
ArtNr
,DLNr
,SA
,GenArtNr
,[Losch-Flag]
)
SELECT SUBSTRING(src.RowData, 0, 22) AS ArtNr
,SUBSTRING(src.RowData, 23, 4) AS DLNr
,SUBSTRING(src.RowData, 27, 3) AS SA
,SUBSTRING(src.RowData, 30, 5) AS GenArtNr
,SUBSTRING(src.RowData, 35, 1) AS 'Losch-Flag'
FROM OPENROWSET ( BULK 'C:\Users\Administrator\Desktop\D_TAF24\211yeni.0128'
,FORMATFILE = 'C:\Users\Administrator\Desktop\BCP_Formats\a211.xml'
,CODEPAGE = '65001' -- Unicode
,FIRSTROW = 1
) AS src

Related

Problem with SQL bulk insert tab delimited file

I am having a problem with using bulk insert. The issue is that the source files (tab delimited) that I'm dealing with contain rows that end in cr/lf without filling in values of the empty columns with tab for the rest of the row. So when the data is pulled into SQL Server, it's combining those shortened lines into the previous line. so basically it's combining multiple rows into one rather than writing it as two separate rows with nulls at the end of the first row.
Example to illustrate the problem: sample .txt file
column1 column2 column3 column4 column5
1 2 3 4 5
2 5 4 6
4 4 6 4
4 5 6 4 6
SQL to create table and bulk insert
CREATE TABLE test (
[column1] varchar(MAX) NULL,
[column2] varchar(MAX) NULL,
[column3] varchar(MAX) NULL,
[column4] varchar(MAX) NULL,
[column5] varchar(MAX) NULL
)
BULK INSERT test
FROM 'c:\temp\testimport.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r'
);
The really strange thing is that I can use the data import wizard and it imports the data perfectly, without any issue, and handles the lack of tabs for the columns just fine. But I don't know what the wizard is doing behind the scenes to make this happen. I would love to have the code it uses to create the table and do the insert as that would probably answer my question for me. At the end of the day I can't use the wizard as this will eventually be part of an automated task I'll be running against an SQL Server Express database on multiple files with different names but the same column header.
Maybe bulk insert isn't the way to go here? Or there is something obvious I'm missing that someone else might know off the top of their head. Either way, all help is appreciated and thanks in advance.
As Tim H suggested I've made a few attempts at creating a format file to accommodate the data. Results so far are as follows.
Using
bcp temp.dbo.test format nul -x -f test_format.xml -n -T
produces
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Using this temp file as is produces......
Msg 4866, Level 16, State 7, Line 31
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
My attempt to edit the XML to work.....
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Which does insert the data, but unfortunately still produces the same jumbled insert with overlapping lines in the same row.
Do you have control over the source files? If not, are the width of each column a fixed width or variable width? I know your create table example uses varchar(max)'s. The bulk insert feature in SQL Server allows you to use a format file that better defines how the expected input should be formatted, by column, including whether a column is nullable. Microsoft's doc for bulk inserts is actually pretty helpful (https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15), especially the link at the end of the page to format files.
This page (https://learn.microsoft.com/en-us/sql/relational-databases/import-export/keep-nulls-or-use-default-values-during-bulk-import-sql-server?view=sql-server-ver15) directly deals with null values, which would be your predicament.
A better answer would be to add the following to your BULK INSERT...WITH statement: KEEPNULLS. It does as you would expect: it keeps null values instead of tossing them. The bulk insert utility will toss nulls by default.
Never found a direct solution from SQL express for this. I ended up going with PowerShell scripting to solve the problem. Import-CSV pulled the data from the files uniformly and without issue. Not sure why, but it handled the data far better than SQL did. From there I used variables for each line and Invoke-SQLCmd and some SQL scripting to import them into the DB. Worked like a charm. Since this process is all on the local server there aren't any security issues to worry about, so it was an acceptable solution. Thanks again for all the suggestions and help though.

Bulk Insert with Format File (Fixed Width) - Unexpected end of file was encountered

BULK INSERT [Alldlyinventory]
FROM 'C:\Users\Admin\Documents\2NobleEstates\DATA\Download\Output\test.txt'
WITH (FORMATFILE = 'C:\SQL Data\FormatFiles\test.xml');
Format file:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="8" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="7" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="4" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="10" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="DAY_NUMBER" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="2" NAME="LCBO_NO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="LOCATION_NUMBER" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="LISTING_STATUS" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="QTY_ON_HAND" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
But I am getting the following error on SQL Server 2014:
Msg 4832, Level 16, State 1, Line 1
Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
It's a fixed width import.
Sample txt:
2016032803170570371L 000000014
2016032803367430371L 000000013
2016032803403800371L 000000036
2016032804007540371L 000000015
Looking at your sample text file, it looks like you have a row terminator that is carriage return ({CR}) + linefeed ({LF}).
You can inspect this by opening the text file with a text editor that can show special symbols. I can recommend Notepad++ which is free and good for this purpose (Menu View>Show Symbol>Show All Characters).
If the row terminator is indeed {CR}{LF}, you should use xsi:type="CharTerm" along with a TERMINATOR="\r\n" attribute for the last <FIELD> in the <RECORD> element:
<RECORD>
...
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
You can find more information on fixed field import in the following link: XML Format Files (SQL Server) # Importing fixed-length or fixed-width fields

Unable to retrieve specific fields from a text file in bulk insert

I have the following lines in a text file delimited by "|". I only want to retrieve the Surname and Firstname and write it into a table.
Released_Date|Label|Type|Id|FormId|Title|Surname|First_Name|Middle_Name
25/07/2014|XCS|CDE|V000011|F000011|Miss|Dālwó|Cabĉver|Ann
25/07/2014|XCS|CDE|V000011|F000011|Miss|Rtyālwó|sabĉper|Joanne
I created the XML file to retrieve only the Surname and firstname:
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\n"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Surname"/>
<COLUMN SOURCE="2" NAME="First_Name"/>
</ROW>
</BCPFORMAT>
And I create the stored procedure to read it:
ALTER PROC dbo.ImportTextFile
AS
BULK INSERT test FROM 'C:\Program Files\Data Import.txt'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n',
FIRSTROW =2,
FORMATFILE = 'C:\Program Files\cabcolumns.xml'
);
There are no errors but the problem is the whole row from the text file gets inserted into the two columns of the table but I want only the Surname and First_Name. I'm not sure what I am doing wrong. I have also given the DDL of the table below. Please help.
CREATE TABLE [dbo].[test](
[Surname] [nvarchar](4000) COLLATE SQL_Latin1_General_CP1253_CI_AI NULL,
[First_Name] [nvarchar](4000) COLLATE SQL_Latin1_General_CP1253_CI_AI NULL
) ON [PRIMARY]
i think that the issue is in the terminator in the XML file and in the numbering of the source columns.
first test could be a quick update to change field terminator on a sample of data (as a test, to understand if the terminator itself is an issue) updating all the configuration files accordingly.
ruled out the terminator issue, reading documentation you can find an example on how to skip columns when importing data (notice the filed ids):
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="7" NAME="Surname"/>
<COLUMN SOURCE="8" NAME="First_Name"/>
</ROW>
</BCPFORMAT>
then to import:
ALTER PROC dbo.ImportTextFile
AS
BULK INSERT test FROM 'C:\Program Files\Data Import.txt'
WITH (FIRSTROW = 2, FORMATFILE = 'C:\Program Files\cabcolumns.xml', LASTROW = 3);
explicitly setting the number of the last row you can avoid issues should the last line be empty or the system having troubles correctly detecting the end of the data.

"BULK LOAD DATA CONVERSION ERROR for csv file

I am trying to import .csv file but i am getting "BULK LOAD DATA CONVERSION ERROR" for last column. File looks like:
"123456","123","001","0.00"
I have tried below rowterminator:
ROW TERMINATOR = "\"\r\n"
Nothing is working. Any ideas on what is causing this record to have this error? Thanks
As per given example below, remove the quotes in your csv and use the terminator as "\r\n".
Always use format xml when doing bulk insert. It provides several advantages such as validation of data files etc.
The format file maps the fields of the data file to the columns of the table. You can use a non-XML or XML format file to bulk import data when using a bcp command or a BULK INSERT or INSERT or Transact-SQL command
Considering the input file given by you, suppose you have a table as given below :
CREATE TABLE myTestFormatFiles (
Col1 smallint,
Col2 nvarchar(50),
Col3 nvarchar(50),
Col4 nvarchar(50)
);
Your sample Data File will be as follows :
10,Field2,Field3,Field4
15,Field2,Field3,Field4
46,Field2,Field3,Field4
58,Field2,Field3,Field4
Sample format XML file will be :
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Col1" xsi:type="SQLSMALLINT"/>
<COLUMN SOURCE="2" NAME="Col2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="Col3" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="Col4" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
If you are unfamiliar with format files, check XML Format Files (SQL Server).
Example is illustrated here

Use xml format file to edit csv before bulk insert into ms sql

SQL :Bulk insert
bulk insert TESTING
from 'D:\Testing.csv'
with
( FIRSTROW=2,
DATAFILETYPE='char',
FIELDTERMINATOR=',',
ROWTERMINATOR = '\n',
FORMATFILE = 'D:\Testing.xml');
XML : Format file
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="Address1" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="Address2" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="Address3" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="Address4" xsi:type="CharTerm" TERMINATOR='\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="Address1" NAME="COLUMN1" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="Address2" NAME="COLUMN2" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="Address3" NAME="COLUMN3" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="Address4" NAME="COLUMN4" xsi:type="SQLVARYCHAR" />
</ROW>
</BCPFORMAT>
The csv file that I have used contain address. I have created a SQL table before bulk insert. There are four column for address.
Testing.csv
"Address1","Address2","Address3","Address4"
"Lot 180, Street 19, "," Oakland Park, "," Kuala Lumpur, "," Selangor"
I want to get the output like in the table above. When i try use the xml format file in bulk insert, I received following error message:
Bulk load: An unexpected end of file was encountered in the data file.
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK"
for linked server "(null)".

Resources