SQL Server bulk insert command fails importing textfile - sql-server

I am trying to dump the following txt into a table (using the wizard did not work either)
http://download.geonames.org/export/dump/admin1CodesASCII.txt
using the following
drop table tempregions
create table TempRegions
(
code varchar(500),
name varchar(500),
asciiName varchar(500),
somenumber varchar(500)
);
BULK INSERT GeoNames
FROM 'C:\Users\Administrator\Desktop\geonames\admin1CodesASCII.txt'
WITH(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
)
go
getting the following error (using sql server 2012)
Msg 4864, Level 16, State 1, Line 10
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (geonameid).
thanks for the help

The text contains non-standard ASCII characters and you fail to define a codepage. The error is there to protrect you. Find and define an appropriate codepage as per syntax at http://msdn.microsoft.com/en-us/library/ms188365.aspx

Related

Bulk Insert support for Unicode seperator

I am using Azure data factory to Archive data from Azure Sql Db to Azure Blob Store and Bulk insert to retrieve the data.
I am using below as row and column seperator.
Column delimiter:\u0001
Row delimiter:\u0003
My Bulk Insert is below.
BULK INSERT mytable 'MyPath/file.txt'
WITH (DATA_SOURCE = 'MySource',FIELDTERMINATOR ='\u0001', ROWTERMINATOR = '\u0003');
I am getting the below error:
Msg 4866, Level 16, State 1, Line 41
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
Documentation said Unicode is supported for FIELDTERMINATOR and ROWTERMINATOR then what could be the issue?
It seems unicode is not fully supported for bulk insert.
**Only the t, n, r, 0 and '\0' characters work with the backslash escape character to produce a control character.
Link: https://learn.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server?view=azuresqldb-current

UTF-8 file BULK INSERT with broken pipe field returns error

I'm trying to load an UTF-8 flat file with a broken pipe (¦) delimiter.
The content of the flat file is very simple and the line is ended by CRLF
AAA¦BBB¦CCC
The code is
create table l_testfile
(COL1 nvarchar(255),
COL2 nvarchar(255),
COL3 nvarchar(255)
)
BULK INSERT l_testfile
FROM 'C:\testfile.txt'
WITH (CODEPAGE = '65001', DATAFILETYPE = 'Char', FIELDTERMINATOR = '¦')
And this results in the error
Msg 4832, Level 16, State 1, Line 16 Bulk load: An unexpected end of
file was encountered in the data file. Msg 7399, Level 16, State 1,
Line 16 The OLE DB provider "BULK" for linked server "(null)" reported
an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 16 Cannot fetch a row from OLE DB
provider "BULK" for linked server "(null)".
When changing the broken pipe (¦) to a normal pipe (|), the BULK INSERT works ok. Also loading an ANSI file with a broken pipe is not giving any error.
Am I missing something ?
I don't have access to a SQL 2016 instance I can test this on right now, but I believe the issue is caused by the different ways the ¦ character is encoded between UTF-8 and your local varchar codepage.
When you specify FIELDTERMINATOR in the BULK INSERT command, you're specifying it as a varchar - but the encoding of ¦ in most single-byte codepages is 0xA6, whereas in UTF-8 it's 0xC2A6 - as a result, the terminator is never matched, which causes the error (I'm not certain but I suspect this is because the single-byte value is converted internally to the UCS2-LE representation 0x00A6).
I believe the bulk insert should work correctly if you use the the bytes which make up ¦ in UTF-8 as the FIELDTERMINATOR:
BULK INSERT l_testfile
FROM 'C:\testfile.txt'
WITH (CODEPAGE = '65001', DATAFILETYPE = 'Char', FIELDTERMINATOR = '0xC2A6')
(Using | as a delimiter is successful because it's encoded

Bulk Insert Formatting Issue from CSV File

I am doing a bulk insert from a CSV file.
In one of my columns, I am using a colon such as this 36:21.0. For every row in this column I am getting the following error:
"Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 11, column 3 (MyColumnName)."
Does anyone know a workaround to this so that I will be able to bulk insert the columns that have a colon in the data along with the rest of my columns?
Here is my query if you are interested:
BULK INSERT dbo.[PropertyDefinition] FROM
'//MY CSV FILE PATH HERE'
WITH(
FIRSTROW = 2,
DATAFILETYPE ='char',
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
Your query is correct.
I don't think that colon is causing the problem because the field-terminator and row-terminator does not include colon.
This problem is usually caused due to data type miss-match in the file and the table.
Just make sure that the datatype you are giving for column 3 is matching with the datatype of data in the file at row 11, column 3.

Bulk insert breaks when adding DATAFILETYPE='widenative'

I have this little sql script to import a semicolon separated file into a specific table of my database:
BULK
INSERT foo_bar
FROM 'C:\Users\JohnDoe\projects\foo\ftp-data-importator\bar.txt'
WITH
(
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
FIRSTROW = 2,
MAXERRORS = 100000,
ERRORFILE = 'c:\temp\foobar_bulk_log.txt'
)
GO
And it's working like a charm.
The only problem is that some special unicode characters like ó or é are not being inserted respecting the encoding of the file.
So I added the next line between the WITH keyword parentheses:
DATAFILETYPE = 'widenative'
And instead of respecting the encoding is breaking the whole execution and giving me the next error:
Msg 4866, Level 16, State 5, Line 5 The bulk load failed. The column
is too long in the data file for row 1, column 1. Verify that the
field terminator and row terminator are specified correctly. Msg 7301,
Level 16, State 2, Line 5 Cannot obtain the required interface
("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server
"(null)".
Where is the problem?
Instead of DataFileType try using CODEPAGE=1252.
Try specifying widechar instead of widenative Your original statement is using character mode, not native BCP format. Also, ensure the source file is Unicode (not UTF-8).

Bulk insert with text qualifier in SQL Server

I am trying to bulk insert few records in a table test from a CSV file ,
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
The bulk insert code should rid of the first row and insert the data into the table . it gets rid of first row alright but gets confused in the delimiter section . The first column is wkt and the column value is double quoted and has comma within the value .
So I guess I question is if there is a way to tell the BULK INSERT that the double quoted part is one column regardless of the comma within it ?
the CSV file looks like this ,
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))", 123123.22
You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field.
Create a text file called "level_2.fmt" and save it.
11.0
2
1 SQLCHAR 0 8000 "\"," 1 wkt SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 40 "\r\n" 2 area SQL_Latin1_General_CP1_CI_AS
The first line, "11.0" refers to your version of SQL. The second line shows that your table, [level2_import], has two columns. Each line after that will describe a column, and obeys the following format:
[Source Column Number][DataType][Min Size][Max Size][Delimiter pattern][Destination Column Number][Destination Column Name][Case sensitivity of database]
Once you've created that file, you can read in your data with the following bulk insert statement:
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FORMATFILE='D:\level_2.fmt'
);
Refer to this blog for a detailed explanation of the format file.
SQL Server 2017 finally added support for text qualifiers and the CSV format defined in RFC 4180. It should be enough to write :
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH ( FORMAT = 'CSV', ROWTERMINATOR = '\n', FIRSTROW = 2 )
Try removing .fmt to the file and use .txt instead, that worked for me
I have this issue working with LDAP data the dn contains commas, as do other fields that contain dns. Try changing your field terminator to another, unused character, like a pipe | or Semicolon ;. Do this in the data and the file definition.
so the code should be:
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
and your CSV:
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))"; 123123.22

Resources