I need to import some data from some small csv files containing cyrillic and other utf-8 Characters in SQL Server 2014.
I know that only sqlserver 2016 (13) supports utf-8 natively but....
Googleing around I built this procedure:
BULK INSERT [test].[dbo].[csv]
FROM 'C:\Book1.csv'
WITH
(
FIELDTERMINATOR = ','
,ROWTERMINATOR = '\n'
,DATAFILETYPE = 'widechar'
,CODEPAGE = 'OEM'
,FIRSTROW = 1
,TABLOCK
)
that works well
But, since files to import have different structures (different field names in different position), I thought best solution would be to use a .fmt file for each file structure, therefore something like this:
BULK INSERT [test].[dbo].[csv]
FROM 'C:\Book1.csv'
WITH
(
FORMATFILE = 'C:\csv.FMT'
,DATAFILETYPE = 'widechar'
,CODEPAGE = 'OEM'
,FIRSTROW = 1
,TABLOCK
)
WHERE
csv.fmt is
12.0
9
1 SQLNCHAR 2 200 "," 1 UserID Latin1_General_CI_AS
2 SQLNCHAR 2 60 "," 2 TypeID Latin1_General_CI_AS
3 SQLNCHAR 2 60 "," 3 TrackingID Latin1_General_CI_AS
4 SQLNCHAR 2 200 "," 4 Name Latin1_General_CI_AS
5 SQLNCHAR 2 200 "," 5 Address Latin1_General_CI_AS
6 SQLNCHAR 2 20 "," 6 PostalCode Latin1_General_CI_AS
7 SQLNCHAR 2 200 "," 7 City Latin1_General_CI_AS
8 SQLNCHAR 2 200 "," 8 StateOrProvince Latin1_General_CI_AS
9 SQLNCHAR 2 20 "\n" 9 CountryID Latin1_General_CI_AS
but I get
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
I tried with different versions of terminator: "\n", "\r\n", also "\r\0\n\0" as suggested by
bcp test.dbo.csv format nul -F 1 -w -T -f csv.fmt
but none works.
Can suggest direction?
Thanks
Related
When I execute the below bcp command to import data from a .csv file into my table in SQL Server, I get an ODBC error.
Here is the command:
DECLARE #Error INT
EXEC #Error = master..xp_cmdshell 'bcp DB.dbo.tbl_CASHBAL IN "H:\Imports\CASH BAL.csv" -f H:\CASHBAL.fmt -S myserver -U user -P xxxx'
SELECT #Error
Here is the table structure in SQL:
CREATE TABLE [UBS].[dbo].tbl_BilotherCASHBALN
(
[DATE] VARCHAR(100),
[SCODE] VARCHAR(100),
[MY-ACC-N] VARCHAR(100),
[YOUR-ACC-N] VARCHAR(100),
[CASH-BAL] VARCHAR(100)
)
This is my format file:
12.0
5
1 SQLCHAR 0 100 "\t" 1 DATE SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "\t" 2 SCODE SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\t" 3 MY-ACC-N SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "\t" 4 YOUR-ACC-N SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 100 "\r\n" 5 CASH-BAL SQL_Latin1_General_CP1_CI_AS
And this is the error I get when I execute my bcp command:
Error = [Microsoft][ODBC Driver 11 for SQL Server]
Unexpected EOF encountered in BCP data file
I have included a screenshot of the complete error message here:
.
And here is one row of data from the .csv file (which I need in the SQL Server table without double quotes):
"2021-01-30","IX","0001234567","XYZ01234","2305123.19"
Finally I was able to figure this out. I simply needed to modify the format file by adding one 'space' as the field terminator of my last column. I had not noticed that there was a space after my last column, and that there was a space after every last item in each row. Also, I needed to add one extra column in my format file that has double quote as the delimiter, and for this I set the column number to 0, so it is not imported. The final format file with proper field terminators looks like this:
8.0
6
1 SQLCHAR 0 1 "\"" 0 Unwanted ""
1 SQLCHAR 0 100 "\",\"" 1 DATE SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "\",\"" 2 SCODE SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 100 "\",\"" 3 MY-ACC-N SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 100 "\",\"" 4 YOUR-ACC-N SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 100 "\" " 5 CASH-BAL SQL_Latin1_General_CP1_CI_AS
And the execution line within SQL Server Management Studio looks like this (I added F 2 to skip the header:)
DECLARE #Error INT
EXEC #Error = master..xp_cmdshell 'bcp DB.dbo.tbl_CASHBAL IN "H:\Imports\CASH BAL.csv" -f H:\CASHBAL.fmt -F 2 -S myserver -U user -P xxxx'
SELECT #Error
See this small sample of my CSV file:
"ID","TRANSACTION_TIME","CONTAINER_NUMBER","EVENT"
33115541,"2019-04-03 00:47:41.000000","MSKU1128096",
33115538,"2019-04-03 01:34:49.000000","MSKU1128096","Gate Out"
33115545,"2019-04-03 00:47:55.000000","MSKU4717839",
This is the format file I created
14.0
4
1 SQLCHAR 0 0 ",\"" 2 ID ""
2 SQLCHAR 0 0 "\",\"" 3 TRANSACTION_TIME ""
3 SQLCHAR 0 0 "\",\"" 4 CONTAINER_NUMBER ""
4 SQLCHAR 0 0 "\"\r\n" 5 EVENT SQL_Latin1_General_CP1_CI_AS
The issue is that the 4th column may have null values as you can see from rows 1 and 3 (excluding header)
See below my BULK INSERT command
bulk insert dbo.DRISPIN_CONTAINER_HISTORY_STG1
from 'e:\dri_container_history_initial.csv'
with (
firstrow = 2,
formatfile = 'e:\container_history_initial.fmt'
)
When I run this I get the following error:
Msg 8152, Level 16, State 13, Line 305
String or binary data would be truncated.
I have also tried specifying a Prefix Length of 2, but get some different errors.
I know I can possibly take the the values in with the qualifiers into staging table and then strip them out. But ideally I would like to see if there is a way to do this with BULK INSERT or BCP
Thanks in advance
Full CSV support was addedin SQL Server 2017. I suspect that's the version used here since the file's format version number is 14.0.
The following command will load the file using a double quote as the FIELDQUOTE character and CRLF as the row terminator :
create table testtable
(
"ID" bigint,
"TRANSACTION_TIME" datetime2(0),
"CONTAINER_NUMBER" varchar(200),
"EVENT" varchar(200)
)
bulk insert dbo.testtable
from 'c:\path\to\testcsv.csv'
with (
format='csv',
FIRSTROW=2
)
select * from testtable
The results are :
ID TRANSACTION_TIME CONTAINER_NUMBER EVENT
33115541 2019-04-03 00:47:41 MSKU1128096 NULL
33115538 2019-04-03 01:34:49 MSKU1128096 Gate Out
33115545 2019-04-03 00:47:55 MSKU4717839 NULL
FORMAT = 'CSV' still can't handle a missing newline at the end of the file
I am trying to use bulk insert for a .txt file, which is separated using a comma, but a few columns also have a double quotes, because of which when bulk insert is used, some rows are not inserted properly.
I am using bulk insert with fmt file but it still gives me error: Cannot bulk load. Invalid column number in the format file "\server\Data\Transfer\formatfile1.fmt".
I am out of options to think what is wrong with the format file that I have created.
Here is the data in .txt format
"NUMBER","DATE","JOIN_NUMBER","CEO","FAX","ACTNUM"
1,3/31/2005 0:00:00,2,,"9037983933",5
6,3/31/2005 0:00:00,7,,"5048899070",7
7,3/31/2005 0:00:00,8,,"2289384313",7
12,3/31/2005 0:00:00,11,"Tom Johnson, SVP","8607611980",8
13,3/31/2005 0:00:00,12,,"2252146851",3
This is the formatfile:
13.0
6
1 SQLCHAR 0 5 ",\"" 1 NUMBER SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 50 ",\"" 2 DATE SQL_Latin1_General_CP1_CI_AS
3 SQLCHAR 0 20 ",\"" 3 JOIN_NUMBER SQL_Latin1_General_CP1_CI_AS
4 SQLCHAR 0 500 "\",\"" 4 CEO SQL_Latin1_General_CP1_CI_AS
5 SQLCHAR 0 12 "\",\"" 5 Fax SQL_Latin1_General_CP1_CI_AS
6 SQLCHAR 0 10 "\"\r\n" 6 ACTNUM SQL_Latin1_General_CP1_CI_AS
I am using SQL Server 2016.
This is the sql code I am trying to use for bulk insert:
GO
SET ANSI_WARNINGS OFF
GO
BULK INSERT FDICDev.dbo.fs220D_test
FROM '\\server\Data\Transfer\textdata.txt'
WITH ( FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FORMATFILE = '\\server\Data\Transfer\formatfile1.fmt' )
Can someone point out what is the issue going on here!!!
Your format file is wrong. You have told it to expect a double-quote character as part of the column terminator of, well, every column. However, only your 5th and 6th columns are quote-qualified.
The terminators that are just a comma should be represented with ",".
Only if a quote character is expected on either side of the comma should you include the \".
Here's a helpful tutorial.
I have an input file with records
1,2014030000000212,0x060000000000000000000000000000
1,2014030000000215,0x050000000000000000000000000000
1,2014030000000221,0x080000000000000000000000000000
I use a FormatFile
11.0
3
1 SQLINT 0 4 "," 1 ClientCode ""
2 SQLCHAR 0 20 "," 2 AccountID SQL_Latin1_General_CP1_CI_AS
3 SQLBINARY 0 64 "\r\n" 3 mask ""
when I use BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (FORMATFILE = 'C:\Temp\BinaryFormat.txt') it inserts the data, but it messes up my varbinaries, and it looks like this
49 2014030000000212 0x3078303630303030303030303030303030303030303030303030303030303030
49 2014030000000215 0x3078303530303030303030303030303030303030303030303030303030303030
49 2014030000000221 0x3078303830303030303030303030303030303030303030303030303030303030
I also just noticed that my ClientCode is also wrong. it is 49 instead of 1. If there something I'm doing wrong?
This is my table definition
CREATE TABLE TempBinaryMask
(
ClientCode int,
AccountID varchar(20),
mask varbinary(64)
)
For some reason a FormatFile is the problem.
I changed my input file to
1,2014030000000212,060000000000000000000000000000
1,2014030000000215,050000000000000000000000000000
1,2014030000000221,080000000000000000000000000000
and used
BULK INSERT TempBinaryMask from 'C:\Temp\BinaryData.txt' WITH (DATAFILETYPE='char', FIELDTERMINATOR=',')
to import the data, and it worked perfectly.
I've tried the XML format file, as well as the non-XML, and both gave me different types of errors
Trying to use BULK INSERT with SQL Server.
I have a table like this:
CREATE TABLE AA
(
AA_ID int identity NOT NULL,
B_ID int NOT NULL,
QUALITY_CODE nvarchar(4),
VALUE_NUM numeric(18,4)
)
A FORMATFILE like this:
10.0
3
1 SQLINT 0 0 "\t" 2 B_ID ""
2 SQLNCHAR 0 0 "\t" 3 QUALITY_CODE Latin1_General_CI_AS
3 SQLNUMERIC 0 0 "\r\n" 4 VALUE_NUM ""
and a data file like this:
6 OK 50.0000
6 OK 49.0000
6 OK 1023.0000
6 OK 340.0000
When I issue this statement:
BULK INSERT dbo.AA
FROM 'C:/path/aa.dat'
WITH ( FORMATFILE = 'C:/path/aa.fmt')
I get this error:
[Microsoft][SQL Server Native Client 10.0][SQL Server]Invalid data for type "numeric"(SQL-42000) [state was 42000 now 01000]
I have checked that my data file has CR LF line endings.
I have checked that my data file has tab characters between each field.
I cannot work out what is going wrong.
My code and files are here: https://github.com/rjattrill/MsSql_BulkInsertExample
I found that the best approach was to import most things as VARCHAR using OPENROWSET and then allow INSERT to cast automatically. The built-in datatype conversion with the bulk utilities look to be poorly documented and difficult to use. It feels easier to import most things as VARCHAR with SELECT FROM OPENROWSET and then allow the built in DML capabilities to cast at the INSERT stage - but possibly at the cost of some performance.
Here is an updated format file:
10.0
4
1 SQLINT 0 0 "\t" 2 B_ID ""
2 SQLCHAR 0 0 "\t" 3 PERIOD_START Latin1_General_CI_AS
3 SQLCHAR 0 0 "\t" 4 QUALITY_CODE Latin1_General_CI_AS
4 SQLCHAR 0 0 "\r\n" 5 VALUE_NUM ""
And code to use it:
INSERT INTO dbo.AA (B_ID, PERIOD_START, QUALITY_CODE, VALUE_NUM)
SELECT a.* FROM OPENROWSET(
BULK 'C:/src_github/MsSql_BulkInsertExample/aa.dat',
FORMATFILE = 'C:/src_github/MsSql_BulkInsertExample/aa.fmt',
FIRSTROW = 1
) as a
Full example here: https://github.com/rjattrill/MsSql_BulkInsertExample