BULK INSERT issues with SQL Server 2012 and higher releases - sql-server

We work with a 3rd party and they provide us files that are basically a dump from their DB. Our company supports SQL Server 2012 as well as SQL Server 2014 and up. I need to BULK INSERT these files and have ONE set of files work for any client.
They provide us the files, from a UNIX system, as utf-8 encoded. I am aware that SQL Server 2012 doesn't support utf-8. From reading on here, I have gone the route of converting those files to utf-16 (using Textpad8). In total there are about 22 files.
I use the following syntax:
BULK INSERT database.dbo.tablename
FROM '\\server\filename.txt'
WITH (FIRSTROW =2, ROWTERMINATOR ='0x0a')
That of course works for all the files on SQL Server 2014 box.
ONE file of the 22 does NOT work for SQL Server 2012 and I cannot figure out what is wrong. That particular file goes into a table defined this way:
CREATE TABLE [dbo].[Map]
(
termid int NOT NULL,
mapguid char(22) NOT NULL,
mapsequence int NOT NULL,
conceptguid char(22) NOT NULL,
mapdefnguid char(22) NOT NULL,
mapquality int NULL,
CONSTRAINT [PK_Map]
PRIMARY KEY CLUSTERED ([termid] ASC, [mapguid] ASC, [mapsequence] ASC)
) ON [PRIMARY];
This is what the sample data looks like
termid mapguid mapsequence conceptguid mapdefnguid mapquality
260724 Nm9T2QFFs67xk2/zCgEDHw 0 AExH2wEce5u4wbhnqf4ZgQ TDMQWQE6UQdXAoATCgECyQ
172288 AW8L6AEj+br0hsZ3CgEBig 0 BgCTWgDjf6OlTk1oCwsLDQ AUKoDQEjn6KrxIAJCgEBmw
377707 PtArUQE7q1ajeoiRCgEDAQ 0 ACSYtQDsdrQtN1h2qf79/w TDMQWQE6UsYdrYAbCgECeg
tab is column separator, and LF is the rowterminator character
This is the error I get:
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (termid).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 1 (termid).
I've searched that error on google (and here) and have seen where you may get that error if something is actually specified as literally 'NULL' instead of being blank.
I've even gone so far as to create my own file and I still get the same errors. In my own file, I actually populate the last row, thinking maybe that was causing issues, but the error seems to indicate it doesn't like something with the very first column.
Can anyone help me with some suggestions please?

I don't know if this is REALLY an answer, but somehow, the file imports fine with utf-8 encoding, which doesn't make a lot of sense to me since SQL 2012 isn't supposed to support that. I looked at the data in the table and it appears to be fine, so I don't really have an explanation there.
I then converted the file to utf-16 and re-ran the process and started getting the above errors again, so...shrug

Related

T-SQL BULK INSERT type mismatch

I am trying to do a simple BULK INSERT from a large CSV file to a table. The table and the file have matching columns. This is my code:
BULK INSERT myTable
FROM 'G:\Tests\mySource.csv'
WITH (
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
-- ROWTERMINATOR = '0x0a',
BATCHSIZE = 1000,
MAXERRORS = 2
)
GO
As you can see I have tried with row terminators \n and 0x0a (and a bunch more)
I keep getting a type mismatch error:
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 18 (createdAt).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 18 (createdAt).
Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 4, column 18 (createdAt).
Msg 4865, Level 16, State 1, Line 1
Cannot bulk load because the maximum number of errors (2) was exceeded.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "BULK" for linked server "(null)" reported an error. The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 1
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
Column createdAt is of type datetime:
CREATE TABLE [dbo].[myTable]
(
...
[createdAt] [datetime] NULL,
...
)
These are the values of the createdAt column as taken from the first three rows:
2020-08-22 13:51:57
2020-08-22 14:13:13
2020-08-22 14:16:23
I also tried with a different number format as suggested. I also tried changing the column type to DATETIME2(n):
2020-08-22T13:51:57
2020-08-22T14:13:13
2020-08-22T14:16:23
I have no idea what else to review.
I would appreciate any help.
Thanks!
There are many formats of string literals to be converted to dates & times supported by SQL Server - see the MSDN Books Online on CAST and CONVERT. Most of those formats are dependent on what settings you have - therefore, these settings might work some times - and sometimes not. And the DATETIME datatype in particular is notoriously picky about what formats of string literals work - and which others (most) don't.... DATETIME2(n) is much more forgiving and less picky to deal with!
The way to solve this is to use the (slightly adapted) ISO-8601 date format that is supported by SQL Server - this format works always - regardless of your SQL Server language and dateformat settings.
The ISO-8601 format is supported by SQL Server comes in two flavors:
YYYYMMDD for just dates (no time portion); note here: no dashes!, that's very important! YYYY-MM-DD is NOT independent of the dateformat settings in your SQL Server and will NOT work in all situations!
or:
YYYY-MM-DDTHH:MM:SS for dates and times - note here: this format has dashes (but they can be omitted), and a fixed T as delimiter between the date and time portion of your DATETIME.
This is valid for SQL Server 2000 and newer.
If you use SQL Server 2008 or newer and the DATE datatype (only DATE - not DATETIME!), then you can indeed also use the YYYY-MM-DD format and that will work, too, with any settings in your SQL Server.
Don't ask me why this whole topic is so tricky and somewhat confusing - that's just the way it is. But with the YYYYMMDD format, you should be fine for any version of SQL Server and for any language and dateformat setting in your SQL Server.
The recommendation for SQL Server 2008 and newer is to use DATE if you only need the date portion, and DATETIME2(n) when you need both date and time. You should try to start phasing out the DATETIME datatype if ever possible
In your case, I'd try one of two things:
if you can - use DATETIME2(n) instead of DATETIME as your column's datatype - that alone might solve all your problems
if you can't use DATETIME2(n) - try to use 2020-08-22T13:51:57 instead of
2020-08-22 13:51:57 for specifying your date&time in the CSV import file.

Bulk Load Data Conversion Error - Can't Find Answer

For some reason I keep receiving the following error when trying to bulk insert a CSV file into SQL Express:
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 2, column 75 (Delta_SM_RR).
Msg 4864, Level 16, State 1, Line 89
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 3, column 75 (Delta_SM_RR).
Msg 4864, Level 16, State 1, Line 89
Bulk load data conversion error (type mismatch or invalid character for the
specified codepage) for row 4, column 75 (Delta_SM_RR).
... etc.
I have been attempting to insert this column as both decimal and numeric, and keep receiving this same error (if I take out this column, the same error appears for the subsequent column).
Please see below for an example of the data, all data points within this column contain decimals and are all rounded after the third decimal point:
Delta_SM_RR
168.64
146.17
95.07
79.85
60.52
61.03
-4.11
-59.57
1563.09
354.36
114.78
253.46
451.5
Any sort of help or advice would be greatly appreciated as it seems that a number of people of SO have come across this issue. Also, if anyone knows of another automated way to load a CSV into SSMS, that would be a great help as well.
Edits:
Create Table Example_Table
(
[Col_1] varchar(255),
[Col_2] numeric(10,5),
[Col_3] numeric(10,5),
[Col_4] numeric(10,5),
[Col_5] date,
[Delta_SM_RR] numeric(10,5),
)
GO
BULK INSERT
Example_Table
FROM 'C:\pathway\file.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
FIRSTROW = 2
);
Table Schema - This is a standalone table (further calculations and additional tables are built off of this single table, however at the time of bulk insert it is the only table)
It's likely that your data has an error in it. That is, that there is a character or value that can't be converted explicitly to NUMERIC or DECIMAL. One way to check this and fix it is to
Change [Delta_SM_RR] numeric(10,5) to [Delta_SM_RR] nvarchar(256)
Run the bulk insert
Find your error row: select * from Example_Table where [Delta_SM_RR] like '%[^-.0-9]%'
Fix the data at the source, or delete from Example_Table where [Delta_SM_RR] like '%[^-.0-9]%'
The last statements returns/deletes rows where there is something other than a digit, period, or hyphen.
For your date column you can follow the same logic above, by changing the column to VARCHAR, and then find your error by using ISDATE() to find the ones which can't be converted.
I'll bet anything there is some weird character in your data set. Open your data set in Notepad++ and view the data. Any aberration should become apparent very quickly! The problem is coming from Col75 and it's affecting the first several rows, and thus everything that comes after that also fails to load.
Make sure that .csv is not using text qualifiers and that none of your fields in the .csv have a comma inside the desired value.
I am struggling with this issue right now. The issue is that I have a 68 column report I am trying to import.
Column 17 is a "Description" column that has a double quote text qualifier on top of the comma delimitation.
Bulk insert with a comma field terminator won't identify the double quote text qualifier and munge all of the data to the right of the offending column.
It looks like to overcome this, you need to create a .fmt file to instruct the Bulk Insert which columns it needs to treat as simple delimited, and which columns it needs to treat as delimited and qualified (see this answer).

Netezza error - Missing values in datafile cause: ERROR [HY000] Parse error in the data file

I'm currently working with the Netezza appliance v7.1 using Aginity Version 4.3.1671.22924.
I'm trying to load data from a textfile by creating an external table. The textfile is tab-separated and contains missing values (which are causing the error). The SQL-command, which I pass to the Aginity Query Analyzer, is executed 'successfull' - but when try to get a view of the data, the error message "Error [HY000] Parse error in the data file" is returned.
My data file looks something like this:
1 NYT 2009
2 2010
3 CTIO 2010
The first column is an ID, the second a name and the third year.
Note that the second column has a missing observation in second row - if I fill it out, there is no error. Also, If I put the missing value in the third column, there is no error either.
CREATE EXTERNAL TABLE TESTLOAD(
id varchar(3) null,
name varchar(5) null,
year integer null
)
USING (
dataobject('address of the txt file containing data')
remotesource 'ODBC'
delimiter '\t'
fillRecord true );
select * from TESTLOAD;
The external table TESTLOAD is created successfully, but the view 'select * from TESTLOAD' cannot be executed for some reason that involves the missing values.
My hope was that the fillRecord option would handle the missing values but I might be misunderstanding the manual there.
Does anybody know how to handle the issue?
Any help on this is much appreciated!

SQL Server 2005 Alter table column not being recognized

This is on a replicated table in SQL Server 2005
I used the following command:
ALter table dbo.apds alter column docket nvarchar(12) null
and it executed with no errors, everything looks clean.
Column spec shows it now has 12 (was previously set to 6) on both tables, publisher
and subscriber.
But when I try to put more than 6 characters in that column, I get the error:
Msg 8152 lefel 16, state 13, procedure trgapdsupdate, line 5
String or binary data would be truncated.
I can still only write 6 characters of data to that column even though it shows 12 as the
column specification..
Any ideas?
Thank you in advance..
You say
I get the error: msg 8152 lefel 16, state 13, procedure trgapdsupdate,
line 5 string or binary data would be truncated. The statement has
been terminated
So what is trgapdsupdate?
From the name it looks like an update trigger on the table apds?
Does that need to be updated to deal with the new column value? For example writing to an audit table whose definition needs to be updated.

Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)"

I try to load my database with tons of data from a .csv file sized 1.4 GB. But when I try to run my code I get errors.
Here's my code:
USE [Intradata NYSE]
GO
CREATE TABLE CSVTest1
(Ticker varchar(10) NULL,
dateval date NULL,
timevale time(0) NULL,
Openval varchar(10) NULL,
Highval varchar(10) NULL,
Lowval varchar(10) NULL,
Closeval varchar(10) NULL,
Volume varchar(10) NULL
)
GO
BULK
INSERT CSVTest1
FROM 'c:\intramerge.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest1
GO
--Drop the table to clean up database.
DROP TABLE CSVTest1
GO
I try to build a database with lots of stockquotes. But I get this error message:
Msg 4832, Level 16, State 1, Line 2 Bulk load: An unexpected end of
file was encountered in the data file. Msg 7399, Level 16, State 1,
Line 2 The OLE DB provider "BULK" for linked server "(null)" reported
an error. The provider did not give any information about
the error. Msg 7330, Level 16, State 2, Line 2 Cannot fetch a row from
OLE DB provider "BULK" for linked server "(null)"
I do not understand much of SQL, but I hope to catch a thing or two. Hope someone see what might be very obvious.
Resurrecting an old question, but in case this helps someone else: after much trial-and-error I was finally (finally!) able to get rid of this error by changing this:
ROWTERMINATOR = '\n'
To this:
ROWTERMINATOR = '0x0A'
I had same issue.
Solution:
Verify the CSV or textfile in text editors like notepad+. Last line might be incomplete. Remove it.
I got the same error when I had a different number of delimited fields in my CSV than columns I had in my table. Check if you have the right number of fields in intramerge.csv.
Methods to determine rows with issues:
Open CSV in spreadsheet, add Filter to all data and look for empty values
and here are the rows with less columns
Use this page https://csvlint.com to create your validation rules and you can detect your problems in your CSV as well.
This is my solution: just give up.
I always end up using SSMS and [ Tasks > Import Data ].
I have never managed to get a real world .csv file to import using this method. This is utterly an useless function that only works on pristine datasets that don't exist in the real world. Perhaps I've never had any luck because the datasets I deal with are quite messy and are generated by third parties.
And if it goes wrong, it doesn't give any clue as to why. Microsoft, you sadden me with your utter incompetence in this area.
Microsoft, perhaps add some error messages, so it says why it rejected it? Which line did it fail on? Which column did it fail on? It's almost impossible to fix the issue if you don't know why it failed!
It was an old question but It seems that my finding would enlight some other people having a similar issue.
The default SSIS timeout value appears to be 30 seconds. This makes any service bound or IO bound operation in your package goes well beyond that timeout value and causes a timeout. Increasing that timeout value (change to "0" for no timeout) will resolve the issue.
I got this error when my format file (i.e. specified using the FORMATFILE param) had a column width that was smaller than the actual column size (e.g. varchar(50) instead of varchar(100)).
I got this exception when the char field in my SQL table was too small for the text coming in. Try making the column bigger.
This might be a bad idea with a full 1.5GB, but you can try it on a subset (start with a few rows):
CREATE TABLE CSVTest1
(Ticker varchar(MAX) NULL,
dateval varchar(MAX) NULL,
timevale varchar(MAX) NULL,
Openval varchar(MAX) NULL,
Highval varchar(MAX) NULL,
Lowval varchar(MAX) NULL,
Closeval varchar(MAX) NULL,
Volume varchar(MAX) NULL
)
... do your BULK INSERT, then
SELECT MAX(LEN(Ticker)),
MAX(LEN(dateval)),
MAX(LEN(timevale)),
MAX(LEN(Openval)),
MAX(LEN(Highval)),
MAX(LEN(Lowval)),
MAX(LEN(Closeval)),
MAX(LEN(Volume))
This will help tell you if your estimates of column are way off. You might also find your columns are out of order, or the BULK INSERT might still fail for some other reason.
I encountered a similar issue, but in this case the file being loaded contained some blank lines. Removing the blank lines solved it.
Alternatively, as the file was delimited, I added the correct number of delimiters to the blank lines, which again allowed the file to import successfully - use this option if the blank lines need to be loaded.
This can also happen if you file columns are separated with ";" but you are using "," as the FIELDTERMINATOR (or the other way around)
i just want to share my solution to this. The problem was the size of table columns, use varchar(255) and all should work.
The bulk insert will not tell you if the import values will "fit" into the field format of the target table.
For example: I tried to import decimal values into a float field. But as the values all had a comma as decimal point, it was unable to insert them into the table (it was expecting a point).
These unexpected results often happen when the provided CVS value is an export from an Excel file. Your computer's regional settings will decide which decimal point will be used when saving an Excel file into a CSV. CSV's provided by different people will cause different results.
Solution: import all fields as VARCHAR, and try to deal with the values afterwards.
For anyone who happens to come across this post, my problem was a simple oversight in regard to syntax. I had this inline with some Python, and brought it straight into SSMS:
BULK
INSERT access_log
FROM '[my path]'
WITH (FIELDTERMINATOR = '\\t', ROWTERMINATOR = '\\n');
The problem being, of course, the double backslashes which were needed in Python for the way I had this embedded as a string in the script. Correcting to '\t' and '\n' obviously fixed it.
Same happend with me, Turns out that this was due to duplicate column names. Renamed the columns to be unique. & It works fine
Please look at your file, if any special characters or spaces at end of the file, then remove and try again.
I came across another potential reason. I got this error when my table had a data source as int but the user had commas in the csv file. Change to number formatting and it imported the data.
My case is I use txt file to import data into SQL Server. All the columns are matched and I can't find what's wrong. At the end, it's encoding problem.
Solution: Use notepad++ to change to the right file encoding.
I am getting this error when I try to pass Null for int columns even though those columns are nullable.
So, I opened the csv file in a editor and replaced all Null values with empty value. And it worked.
Before Data:
636,NULL,NULL,1,5,K0007,105,NULL,2023-02-15 11:27:11.563
After Data:
636,,,1,5,K0007,105,,2023-02-15 11:27:11.563

Resources