Bulk insert fixed width fields - sql-server

How do you specify field lengths with the Bulk Insert command?
Example: If I had a table named c:\Temp\TableA.txt and it had:
123ABC
456DEF
And I had a table such as:
use tempdb
CREATE TABLE TABLEA(
Field1 char(3),
Field2 char(3)
)
BULK INSERT TableA FROM 'C:\Temp\TableA.txt'
SELECT * FROM TableA
Then how would I specify the lengths for Field1 and Field2?

I think you need to define a format file
e.g.
BULK INSERT TableA FROM 'C:\Temp\TableA.txt'
WITH (FORMATFILE = 'C:\Temp\Format.xml')
SELECT * FROM TableA
For that to work, though, you need a Format File, obviously.
See here for general info about creating one:
Creating a Format File
At a guess, from looking at the Schema, something like this might do it:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="3"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="3"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Field1" xsi:type="SQLCHAR" LENGTH="3"/>
<COLUMN SOURCE="2" NAME="Field2" xsi:type="SQLCHAR" LENGTH="3"/>
</ROW>
</BCPFORMAT>

You'd want to use a format file with your BULK INSERT. Something like:
9.0
2
1 SQLCHAR 0 03 "" 1 Field1 ""
2 SQLCHAR 0 03 "\r\n" 2 Field2 ""

Related

Problem with SQL bulk insert tab delimited file

I am having a problem with using bulk insert. The issue is that the source files (tab delimited) that I'm dealing with contain rows that end in cr/lf without filling in values of the empty columns with tab for the rest of the row. So when the data is pulled into SQL Server, it's combining those shortened lines into the previous line. so basically it's combining multiple rows into one rather than writing it as two separate rows with nulls at the end of the first row.
Example to illustrate the problem: sample .txt file
column1 column2 column3 column4 column5
1 2 3 4 5
2 5 4 6
4 4 6 4
4 5 6 4 6
SQL to create table and bulk insert
CREATE TABLE test (
[column1] varchar(MAX) NULL,
[column2] varchar(MAX) NULL,
[column3] varchar(MAX) NULL,
[column4] varchar(MAX) NULL,
[column5] varchar(MAX) NULL
)
BULK INSERT test
FROM 'c:\temp\testimport.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r'
);
The really strange thing is that I can use the data import wizard and it imports the data perfectly, without any issue, and handles the lack of tabs for the columns just fine. But I don't know what the wizard is doing behind the scenes to make this happen. I would love to have the code it uses to create the table and do the insert as that would probably answer my question for me. At the end of the day I can't use the wizard as this will eventually be part of an automated task I'll be running against an SQL Server Express database on multiple files with different names but the same column header.
Maybe bulk insert isn't the way to go here? Or there is something obvious I'm missing that someone else might know off the top of their head. Either way, all help is appreciated and thanks in advance.
As Tim H suggested I've made a few attempts at creating a format file to accommodate the data. Results so far are as follows.
Using
bcp temp.dbo.test format nul -x -f test_format.xml -n -T
produces
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharPrefix" PREFIX_LENGTH="2" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Using this temp file as is produces......
Msg 4866, Level 16, State 7, Line 31
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
My attempt to edit the XML to work.....
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\t" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column1" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="2" NAME="column2" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="column3" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="column4" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="column5" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Which does insert the data, but unfortunately still produces the same jumbled insert with overlapping lines in the same row.
Do you have control over the source files? If not, are the width of each column a fixed width or variable width? I know your create table example uses varchar(max)'s. The bulk insert feature in SQL Server allows you to use a format file that better defines how the expected input should be formatted, by column, including whether a column is nullable. Microsoft's doc for bulk inserts is actually pretty helpful (https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?redirectedfrom=MSDN&view=sql-server-ver15), especially the link at the end of the page to format files.
This page (https://learn.microsoft.com/en-us/sql/relational-databases/import-export/keep-nulls-or-use-default-values-during-bulk-import-sql-server?view=sql-server-ver15) directly deals with null values, which would be your predicament.
A better answer would be to add the following to your BULK INSERT...WITH statement: KEEPNULLS. It does as you would expect: it keeps null values instead of tossing them. The bulk insert utility will toss nulls by default.
Never found a direct solution from SQL express for this. I ended up going with PowerShell scripting to solve the problem. Import-CSV pulled the data from the files uniformly and without issue. Not sure why, but it handled the data far better than SQL did. From there I used variables for each line and Invoke-SQLCmd and some SQL scripting to import them into the DB. Worked like a charm. Since this process is all on the local server there aren't any security issues to worry about, so it was an acceptable solution. Thanks again for all the suggestions and help though.

Unicode flat file import to SQL

I am trying to bulk import data into ms-sql 2016, but, because of 2-bytes length characters (like Ü, Ä, etc), I am facing problem:
wrapping fields
Source is fixed-length, unicode (utf-8) text file with special (wide) characters:
this a sample part of file:
ABS525 0128211024200
ABS526 0128211024200
ABS527 0128211024200
ABS528 0128211024200
ABS529 0128211024200
Ölrücklaufleitung 0128211037390
Ölzu- und Ölrücklaufle0128211037390
Ölzulaufleitung 0128211037390
field lengths are: 22 - 4 - 3 - 5 - 1
I tried every way:
- import wizard in Management Studio,
- SSDT import,
- bulk import,
- openrowset,
- bcp command line
nothing worked, actually, they work unless there is a special character in the row.
This is my bulk insert code:
BULK INSERT [tecdoc2].[dbo].[211]
FROM 'C:\Users\Administrator\Desktop\D_TAF24\211yeni.0128'
WITH (MAXERRORS=50, CODEPAGE = '65001', DATAFILETYPE = 'widechar', FORMATFILE = 'C:\Users\Administrator\Desktop\BCP_Formats\a211.xml')
This is my format file (here, I tried a lot of combinations):
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="22" />
<FIELD ID="2" xsi:type="CharFixed" LENGTH="4" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="3" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="5" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="1" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ArtNr" xsi:type="SQLNVARCHAR" LENGTH="22" />
<COLUMN SOURCE="2" NAME="DLNr" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="3" NAME="SA" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="4" NAME="GenArtNr" xsi:type="SQLNCHAR" />
<COLUMN SOURCE="5" NAME="Losch-Flag" xsi:type="SQLNCHAR" />
</ROW>
</BCPFORMAT>
all fields in sql are nvarchar (with the specified lengths, actually I made a lot of trials here: double the specified lengths, or 'max', etc)
would you have any advice? I would appreciate.
With Kind Regards,
Murat
This is exactly the problem I am having but with OPENROWSET. If the file is delimited it work fine.
The only way around this issue I have found is to import the whole row into a single nvarchar(Big Enough) column and parse it out with the database. Works fine then, but a royal pain in the bottom.
If you change your format file to be:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="35" />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="RowData" xsi:type="SQLNVARCHAR" LENGTH="35"/>
</ROW>
</BCPFORMAT>
Then you import query can be:
INSERT INTO [tecdoc2].[dbo].[211]
(
ArtNr
,DLNr
,SA
,GenArtNr
,[Losch-Flag]
)
SELECT SUBSTRING(src.RowData, 0, 22) AS ArtNr
,SUBSTRING(src.RowData, 23, 4) AS DLNr
,SUBSTRING(src.RowData, 27, 3) AS SA
,SUBSTRING(src.RowData, 30, 5) AS GenArtNr
,SUBSTRING(src.RowData, 35, 1) AS 'Losch-Flag'
FROM OPENROWSET ( BULK 'C:\Users\Administrator\Desktop\D_TAF24\211yeni.0128'
,FORMATFILE = 'C:\Users\Administrator\Desktop\BCP_Formats\a211.xml'
,CODEPAGE = '65001' -- Unicode
,FIRSTROW = 1
) AS src

Unable to retrieve specific fields from a text file in bulk insert

I have the following lines in a text file delimited by "|". I only want to retrieve the Surname and Firstname and write it into a table.
Released_Date|Label|Type|Id|FormId|Title|Surname|First_Name|Middle_Name
25/07/2014|XCS|CDE|V000011|F000011|Miss|Dālwó|Cabĉver|Ann
25/07/2014|XCS|CDE|V000011|F000011|Miss|Rtyālwó|sabĉper|Joanne
I created the XML file to retrieve only the Surname and firstname:
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\n"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Surname"/>
<COLUMN SOURCE="2" NAME="First_Name"/>
</ROW>
</BCPFORMAT>
And I create the stored procedure to read it:
ALTER PROC dbo.ImportTextFile
AS
BULK INSERT test FROM 'C:\Program Files\Data Import.txt'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n',
FIRSTROW =2,
FORMATFILE = 'C:\Program Files\cabcolumns.xml'
);
There are no errors but the problem is the whole row from the text file gets inserted into the two columns of the table but I want only the Surname and First_Name. I'm not sure what I am doing wrong. I have also given the DDL of the table below. Please help.
CREATE TABLE [dbo].[test](
[Surname] [nvarchar](4000) COLLATE SQL_Latin1_General_CP1253_CI_AI NULL,
[First_Name] [nvarchar](4000) COLLATE SQL_Latin1_General_CP1253_CI_AI NULL
) ON [PRIMARY]
i think that the issue is in the terminator in the XML file and in the numbering of the source columns.
first test could be a quick update to change field terminator on a sample of data (as a test, to understand if the terminator itself is an issue) updating all the configuration files accordingly.
ruled out the terminator issue, reading documentation you can find an example on how to skip columns when importing data (notice the filed ids):
<?xml version="1.0"?>
<BCPFORMAT
xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR=","/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR="\n"/>
</RECORD>
<ROW>
<COLUMN SOURCE="7" NAME="Surname"/>
<COLUMN SOURCE="8" NAME="First_Name"/>
</ROW>
</BCPFORMAT>
then to import:
ALTER PROC dbo.ImportTextFile
AS
BULK INSERT test FROM 'C:\Program Files\Data Import.txt'
WITH (FIRSTROW = 2, FORMATFILE = 'C:\Program Files\cabcolumns.xml', LASTROW = 3);
explicitly setting the number of the last row you can avoid issues should the last line be empty or the system having troubles correctly detecting the end of the data.

"BULK LOAD DATA CONVERSION ERROR for csv file

I am trying to import .csv file but i am getting "BULK LOAD DATA CONVERSION ERROR" for last column. File looks like:
"123456","123","001","0.00"
I have tried below rowterminator:
ROW TERMINATOR = "\"\r\n"
Nothing is working. Any ideas on what is causing this record to have this error? Thanks
As per given example below, remove the quotes in your csv and use the terminator as "\r\n".
Always use format xml when doing bulk insert. It provides several advantages such as validation of data files etc.
The format file maps the fields of the data file to the columns of the table. You can use a non-XML or XML format file to bulk import data when using a bcp command or a BULK INSERT or INSERT or Transact-SQL command
Considering the input file given by you, suppose you have a table as given below :
CREATE TABLE myTestFormatFiles (
Col1 smallint,
Col2 nvarchar(50),
Col3 nvarchar(50),
Col4 nvarchar(50)
);
Your sample Data File will be as follows :
10,Field2,Field3,Field4
15,Field2,Field3,Field4
46,Field2,Field3,Field4
58,Field2,Field3,Field4
Sample format XML file will be :
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="100" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="Col1" xsi:type="SQLSMALLINT"/>
<COLUMN SOURCE="2" NAME="Col2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="Col3" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="Col4" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
If you are unfamiliar with format files, check XML Format Files (SQL Server).
Example is illustrated here

Bulk insert using format file

My database named 'dictionary' have two column named 'column1' and 'column2'. Both can accept NULL value. The data-type of both columns is INT. Now I want to insert into only column2 from a text file using bcp. I made a format file. My format file is like that
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="column2" xsi:type="SQLINT"/>
</ROW>
</BCPFORMAT>
and my bulk statement is like
BULK INSERT dictionary
FROM 'C:\Users\jka\Desktop\n.txt'
WITH
(
FIELDTERMINATOR = '\n',
ROWTERMINATOR = '\n',
FORMATFILE = 'path to my format file.xml'
)
But it didn't work? How can I solve this?
N:B:
My txt file looks like
123
456
4101
......
One more question Edited:
i can fill one colum by this technique but when i fill another column from a text file like before from the 1st row. how can i do that ???
Assuming that your format file is correct I believe you need to ditch FIELDTERMINATOR and ROWTERMINATOR from your BULK INSERT
BULK INSERT dictionary
FROM 'C:\Users\jka\Desktop\n.txt'
WITH (FORMATFILE = 'path to my format file.xml')
Also make sure that:
input file's encoding is correct. In your case most likely it should be ANSI and not UTF-8 or Unicode.
row terminator (which is second field terminator in your format file) is actually \r\n and not \n.
UPDATE Since you need to skip first column:
With an XML format file, there is no way to skip a column when you are importing directly into a table by using a BULK INSERT statement. In order to achieve desired result and still use XML format file you need to use OPENROWSET(BULK...) and provide explicit list of columns in the select list and in the target table.
So to insert data only to column2 use:
INSERT INTO dictionary(column2)
SELECT column2
FROM OPENROWSET(BULK 'C:\temp\infile1.txt',
FORMATFILE='C:\temp\bulkfmt.xml') as t1;
If your data file has only one field your format file can look like this
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="C1" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="24"/>
</RECORD>
<ROW>
<COLUMN SOURCE="C1" NAME="column2" xsi:type="SQLINT"/>
</ROW>
</BCPFORMAT>
Your data file contains one field, so your format file should reflect that
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="\r\n"/>
</RECORD>

Resources