Workaround to BULK INSERT NULL values [SQL Server] - sql-server

I have not used SQL Server much (I usually use PostgreSQL) and I find hard to believe / accept that one simply cannot insert NULL values from a text file using BULK INSERT, if the file has a value that indicates null or missing data (NULL, NA, na, null, -, ., etc.).
I know BULK INSERT can keep NULL if the field is empty (link, and this is not a nice solution for my case because I have > 50 files, all of them relatively big > 25GB, so I do not want to). But I cannot find a way to tell SQL Server / BULK INSERT that certain value should be interpreted as NULL.
This is, I would say, pretty standard in importing data from text files in most tools. (e.g. COPY table_name FROM 'file_path' WITH (DELIMITER '\t', NULL 'NULL') in PostgreSQL, or readr::read_delim(file = "file", delim = "\t", na = "NULL") in R and the readr package, just to name a couple of examples).
Even more annoying is the fact that the file I want to import was actually exported from SQL Server. It seems that by default, instead of leaving NULL as empty fields in the text files, it writes the value NULL (which makes the file bigger, but anyway). So it seems very odd that the "import" feature (BULK INSERT or the bcp utility) of one tool (SQL Server) cannot properly import the files exported by default by the very same tool.
I've been googling around (link1, link2, link3, link4) and cannot find a workaround for this (different than editing my files to change NULL for empty fields, or import everything as varchar and later work in database to change types and so on). So I would really appreciate any ideas.
For the sake of a reproducible example, here is a sample table where I want to import this sample data stored in a text file:
Sample table:
CREATE TABLE test
(
[item][varchar](255) NULL,
[price][int] NULL
)
Sample data stored in file.txt:
item1, 34
item2, NULL
item3, 55
Importing the data ...
BULK INSERT test
FROM 'file.txt'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
But this fails because on the second line it finds NULL for an integer field. This field, however, allows NULL values. So I want it to understand that this is just a NULL value and not a character value.

Related

Insert fixed values using BCP

I'm trying to import a TXT file using bcp.
My TXT file is like this:
abc|cba
xyz|zyx
My Table is like this:
Field_1 -> Identity field
Field_2 -> Varchar(3)
Field_3 -> Varchar(3)
Filed_4 -> Varchar(1) In this case I must set with default value 'P'
Filed_5 -> Varchar(1) In this case I must set with default value 'C'
My table with values must be:
1,abc,cba,P,C
2,xyz,zyx,P,C
Note-> My TXT file is huge (around 200GB), I can't import into another table to then pass all value to this table (just saying).
##Version-> SQL Server 2014 (SP2)
You cannot generate data via BCP, you must depend on SQL Server to do that as Jeroen commented. To add to his comment, the identity value is not a default, you should continue to use the identity property of the column.
For both (identity and default), you must use the -f option to BCP. This the option to include a format file to direct the BCP utility to see and handle the data as stated in the format file.
Using a format file, you can specify which columns in the file are mapped to which columns are in the destination table. To exclude a column, just set its destination value to "0".
The format files and the bcp utility are much larger topics in and of themselves, but to answer your question; yes it is possible and using a format file with modified destination values (set to "0") is the way to do it.
Doing this, you can process the data once. Using powershell to append data is possible, but unnecessary and less efficient. To do this in one action with bcp, you need to use a format file.

NULL Value Handling for CSV Files Via External Tables in Snowflake

I am trying to get the NULL_IF parameter of a file format working when applied to an external table.
I have a source CSV file containing NULL values in some columns. NULLS in the source file appear in the format "\N" (all non numeric values in the file are quoted). Here is an example line from the raw csv where the ModifiedOn value is NULL in the source system:
"AirportId" , "IATACode" , "CreatedOn" , "ModifiedOn"
1 , "ACU" , "2015-08-25 16:58:45" , "\N"
I have a file format defined including the parameter NULL_IF = "\\N"
The following select statement successfully interprets the correct rows as holding NULL values.
SELECT $8
FROM #MyS3Bucket
(
file_format => 'CSV_1',
pattern => '.*MyFileType.*.csv.gz'
)
However if I use the same file format with an external table like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (value:c8::varchar)
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
Each row holds \N as a value rather than NULL.
I assume this is caused by external tables providing a single variant output that can then be further split rather than directly presenting individual columns in the csv file.
One solution is to code the NULL handling into the external view like this:
CREATE OR REPLACE EXTERNAL TABLE MyTable
MyColumn varchar as (NULLIF(value:c8::varchar,'\\N'))
WITH LOCATION = #MyS3Bucket
FILE_FORMAT = (FORMAT_NAME = 'CSV_1')
PATTERN = '.*MyFileType_.*.csv.gz';
However this leaves me at risk of having to re-write a lot of external table code if the file format changes whereas the file format could\should centralise that NULL definition. It would also mean the NULL conversion would have to be handled column by column rather than file by file increasing code complexity.
Is there a way that I can have the NULL values appear through an external table without handling them explicitly through column definitions?
Ideally this would be applied through a file format object but changes to the format of the raw file are not impossible.
I am able to reproduce the issue, and it seems like a bug. If you have access to Snowflake support, it could be better to submit a support case regarding to this issue, so you can easily follow the process.

BULK INSERT from CSV into SQL Server causes error

I've got the simple table in CSV format:
999,"01/01/2001","01/01/2001","7777777","company","channel","01/01/2001"
990,"01/01/2001","01/01/2001","767676","hhh","tender","01/01/2001"
3838,"01/01/2001","01/01/2001","888","jhkh","jhkjh","01/01/2001"
08987,"01/01/2001","01/01/2001","888888","hkjhjkhv","jhgjh","01/01/2001"
8987,"01/01/2001","01/01/2001","9999","jghg","hjghg","01/01/2001"
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
090009,"","","77777","","","01/01/2001"
980989,"01/01/2001","01/01/2001","888","","jhkh","01/01/2001"
0000,"01/01/2001","01/01/2001","99999","jhjh","","01/01/2001"
92929,"01/01/2001","01/01/2001","222","","","01/01/2001"
I'm trying to import that data into SQL Server using BULK INSERT (Transact-SQL)
set dateformat DMY;
BULK INSERT Oracleload
FROM '\\Mac\Home\Desktop\Test\T_DOGOVOR.csv'
WITH
(FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
KEEPNULLS);
On the output I get the next error:
Msg 4864, Level 16, State 1, Line 4
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (date_begin)....
Something wrong with date format maybe. But what script I need to write to fix that error?
Please help.
Thanks in advance.
BULK INSERT (nor bcp) cannot (properly) handle CSV files, specially if they have (correctly) " quotes. Alternatives are SSIS or PowerShell.
I always look at the data in Notepad++ to see if there are some weird characters, or non-printable characters, like a line break or something. For this, it seems like you can open it using Notepad (if you don't have Notepad++) do a find-replace for " to nothing... Save the file, and re-do the Bulk Load.
This record:
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
The first column has a numeric type of some kind. You can't put the jhkjhj value into that field.
Additionally, some records have empty values ("") in date fields. These are likely to be to interpreted as empty strings, rather than null dates, and not convert properly.
But the error refers to "row 1, column 2". That's this value:
"01/01/2001"
Again, the import is interpreting this as a string, rather than a date. I suspect it's trying to import the quotes (") instead of just using them as separators.
You might try bulk loading to a special holding table, and then re-importing from there. Alternatively, you can change how data is exported or write a program to pre-clean it — strip the quotes from fields that shouldn't have them, isolate records that have data that won't insert to an exception file and report.

Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)"

I try to load my database with tons of data from a .csv file sized 1.4 GB. But when I try to run my code I get errors.
Here's my code:
USE [Intradata NYSE]
GO
CREATE TABLE CSVTest1
(Ticker varchar(10) NULL,
dateval date NULL,
timevale time(0) NULL,
Openval varchar(10) NULL,
Highval varchar(10) NULL,
Lowval varchar(10) NULL,
Closeval varchar(10) NULL,
Volume varchar(10) NULL
)
GO
BULK
INSERT CSVTest1
FROM 'c:\intramerge.csv'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
--Check the content of the table.
SELECT *
FROM CSVTest1
GO
--Drop the table to clean up database.
DROP TABLE CSVTest1
GO
I try to build a database with lots of stockquotes. But I get this error message:
Msg 4832, Level 16, State 1, Line 2 Bulk load: An unexpected end of
file was encountered in the data file. Msg 7399, Level 16, State 1,
Line 2 The OLE DB provider "BULK" for linked server "(null)" reported
an error. The provider did not give any information about
the error. Msg 7330, Level 16, State 2, Line 2 Cannot fetch a row from
OLE DB provider "BULK" for linked server "(null)"
I do not understand much of SQL, but I hope to catch a thing or two. Hope someone see what might be very obvious.
Resurrecting an old question, but in case this helps someone else: after much trial-and-error I was finally (finally!) able to get rid of this error by changing this:
ROWTERMINATOR = '\n'
To this:
ROWTERMINATOR = '0x0A'
I had same issue.
Solution:
Verify the CSV or textfile in text editors like notepad+. Last line might be incomplete. Remove it.
I got the same error when I had a different number of delimited fields in my CSV than columns I had in my table. Check if you have the right number of fields in intramerge.csv.
Methods to determine rows with issues:
Open CSV in spreadsheet, add Filter to all data and look for empty values
and here are the rows with less columns
Use this page https://csvlint.com to create your validation rules and you can detect your problems in your CSV as well.
This is my solution: just give up.
I always end up using SSMS and [ Tasks > Import Data ].
I have never managed to get a real world .csv file to import using this method. This is utterly an useless function that only works on pristine datasets that don't exist in the real world. Perhaps I've never had any luck because the datasets I deal with are quite messy and are generated by third parties.
And if it goes wrong, it doesn't give any clue as to why. Microsoft, you sadden me with your utter incompetence in this area.
Microsoft, perhaps add some error messages, so it says why it rejected it? Which line did it fail on? Which column did it fail on? It's almost impossible to fix the issue if you don't know why it failed!
It was an old question but It seems that my finding would enlight some other people having a similar issue.
The default SSIS timeout value appears to be 30 seconds. This makes any service bound or IO bound operation in your package goes well beyond that timeout value and causes a timeout. Increasing that timeout value (change to "0" for no timeout) will resolve the issue.
I got this error when my format file (i.e. specified using the FORMATFILE param) had a column width that was smaller than the actual column size (e.g. varchar(50) instead of varchar(100)).
I got this exception when the char field in my SQL table was too small for the text coming in. Try making the column bigger.
This might be a bad idea with a full 1.5GB, but you can try it on a subset (start with a few rows):
CREATE TABLE CSVTest1
(Ticker varchar(MAX) NULL,
dateval varchar(MAX) NULL,
timevale varchar(MAX) NULL,
Openval varchar(MAX) NULL,
Highval varchar(MAX) NULL,
Lowval varchar(MAX) NULL,
Closeval varchar(MAX) NULL,
Volume varchar(MAX) NULL
)
... do your BULK INSERT, then
SELECT MAX(LEN(Ticker)),
MAX(LEN(dateval)),
MAX(LEN(timevale)),
MAX(LEN(Openval)),
MAX(LEN(Highval)),
MAX(LEN(Lowval)),
MAX(LEN(Closeval)),
MAX(LEN(Volume))
This will help tell you if your estimates of column are way off. You might also find your columns are out of order, or the BULK INSERT might still fail for some other reason.
I encountered a similar issue, but in this case the file being loaded contained some blank lines. Removing the blank lines solved it.
Alternatively, as the file was delimited, I added the correct number of delimiters to the blank lines, which again allowed the file to import successfully - use this option if the blank lines need to be loaded.
This can also happen if you file columns are separated with ";" but you are using "," as the FIELDTERMINATOR (or the other way around)
i just want to share my solution to this. The problem was the size of table columns, use varchar(255) and all should work.
The bulk insert will not tell you if the import values will "fit" into the field format of the target table.
For example: I tried to import decimal values into a float field. But as the values all had a comma as decimal point, it was unable to insert them into the table (it was expecting a point).
These unexpected results often happen when the provided CVS value is an export from an Excel file. Your computer's regional settings will decide which decimal point will be used when saving an Excel file into a CSV. CSV's provided by different people will cause different results.
Solution: import all fields as VARCHAR, and try to deal with the values afterwards.
For anyone who happens to come across this post, my problem was a simple oversight in regard to syntax. I had this inline with some Python, and brought it straight into SSMS:
BULK
INSERT access_log
FROM '[my path]'
WITH (FIELDTERMINATOR = '\\t', ROWTERMINATOR = '\\n');
The problem being, of course, the double backslashes which were needed in Python for the way I had this embedded as a string in the script. Correcting to '\t' and '\n' obviously fixed it.
Same happend with me, Turns out that this was due to duplicate column names. Renamed the columns to be unique. & It works fine
Please look at your file, if any special characters or spaces at end of the file, then remove and try again.
I came across another potential reason. I got this error when my table had a data source as int but the user had commas in the csv file. Change to number formatting and it imported the data.
My case is I use txt file to import data into SQL Server. All the columns are matched and I can't find what's wrong. At the end, it's encoding problem.
Solution: Use notepad++ to change to the right file encoding.
I am getting this error when I try to pass Null for int columns even though those columns are nullable.
So, I opened the csv file in a editor and replaced all Null values with empty value. And it worked.
Before Data:
636,NULL,NULL,1,5,K0007,105,NULL,2023-02-15 11:27:11.563
After Data:
636,,,1,5,K0007,105,,2023-02-15 11:27:11.563

Oracle considers empty strings to be NULL while SQL Server does not - how is this best handled?

I have to write a component that re-creates SQL Server tables (structure and data) in an Oracle database. This component also has to take new data entered into the Oracle database and copy it back into SQL Server.
Translating the data types from SQL Server to Oracle is not a problem. However, a critical difference between Oracle and SQL Server is causing a major headache. SQL Server considers a blank string ("") to be different from a NULL value, so a char column can be defined as NOT NULL and yet still include blank strings in the data.
Oracle considers a blank string to be the same as a NULL value, so if a char column is defined as NOT NULL, you cannot insert a blank string. This is causing my component to break whenever a NOT NULL char column contains a blank string in the original SQL Server data.
So far my solution has been to not use NOT NULL in any of my mirror Oracle table definitions, but I need a more robust solution. This has to be a code solution, so the answer can't be "use so-and-so's SQL2Oracle product".
How would you solve this problem?
Edit: here is the only solution I've come up with so far, and it may help to illustrate the problem. Because Oracle doesn't allow "" in a NOT NULL column, my component could intercept any such value coming from SQL Server and replace it with "#" (just for example).
When I add a new record to my Oracle table, my code has to write "#" if I really want to insert a "", and when my code copies the new row back to SQL Server, it has to intercept the "#" and instead write "".
I'm hoping there's a more elegant way.
Edit 2: Is it possible that there's a simpler solution, like some setting in Oracle that gets it to treat blank strings the same as all the other major database? And would this setting also be available in Oracle Lite?
I don't see an easy solution for this.
Maybe you can store your values as one or more blanks -> ' ', which aren't NULLS in Oracle, or keep track of this special case through extra fields/tables, and an adapter layer.
My typical solution would be to add a constraint in SQL Server forcing all string values in the affected columns to have a length greater than 0:
CREATE TABLE Example (StringColumn VARCHAR(10) NOT NULL)
ALTER TABLE Example
ADD CONSTRAINT CK_Example_StringColumn CHECK (LEN(StringColumn) > 0)
However, as you have stated, you have no control over the SQL Database. As such you really have four choices (as I see it):
Treat empty string values as invalid, skip those records, alert an operator and log the records in some manner that makes it easy to manually correct / re-enter.
Convert empty string values to spaces.
Convert empty string values to a code (i.e. "LEGACY" or "EMPTY").
Rollback transfers that encounter empty string values in these columns, then put pressure on the SQL Server database owner to correct their data.
Number four would be my preference, but isn't always possible. The action you take will really depend on what the oracle users need. Ultimately, if nothing can be done about the SQL database, I would explain the issue to the oracle business system owners, explain the options and consequences and make them make the decision :)
NOTE: I believe in this case SQL Server actually exhibits the "correct" behaviour.
Do you have to permit empty strings in the SQL Server system? If you can add a constraint to the SQL Server system that disallows empty strings, that is probably the easiest solution.
Its nasty and could have unexpected side effects.. but you could just insert "chr(0)" rather than ''.
drop table x
drop table x succeeded.
create table x ( id number, my_varchar varchar2(10))
create table succeeded.
insert into x values (1, chr(0))
1 rows inserted
insert into x values (2, null)
1 rows inserted
select id,length(my_varchar) from x
ID LENGTH(MY_VARCHAR)
---------------------- ----------------------
1 1
2
2 rows selected
select * from x where my_varchar is not null
ID MY_VARCHAR
---------------------- ----------
1
NOT NULL is a database constraint used to stop putting invalid data into your database. This is not serving any purpose in your Oracle database and so I would not have it.
I think you should just continue to allow NULLS in any Oracle column that mirrors a SqlServer column that is known to contain empty strings.
If there is a logical difference in the SqlServer database between NULL and empty string, then you would need something extra to model this difference in Oracle.
I'd go with an additional column on the oracle side. Have your column allow nulls and have a second column that identifies whether the SQL-Server side should get a null-value or empty-string for the row.
For those that think a Null and an empty string should be considered the same. A null has a different meaning from an empty string. It captures the difference between 'undefined' and 'known to be blank'. As an example a record may have been automatically created, but never validated by user input, and thus receive a 'null' in the expectation that when a user validates it, it will be set to be empty. Practically we may not want to trigger logic on a null but may want to on an empty string. This is analogous to the case for a 3 state checkbox of Yes/No/Undefined.
Both SQL and Oracle have not got it entirely correct. A blank should not satisfy a 'not null' constraint, and there is a need for an empty string to be treated differently than a null is treated.
If you are migrating data you might have to substitute a space for an empty string. Not very elegant, but workable. This is a nasty "feature" of Oracle.
I've written an explanation on how Oracle handles null values on my blog a while ago. Check it here: http://www.psinke.nl/blog/hello-world/ and let me know if you have any more questions.
If you have data from a source with empty values and you must convert to an Oracle database where columns are NOT NULL, there are 2 things you can do:
remove the not null constraint from the Oracle column
Check for each individual column if it's acceptable to place a ' ' or 0 or dummy date in the column in order to be able to save your data.
Well, main point I'd consider is absence of tasks when some field can be null, the same field can be empty string and business logic requires to distinguish these values. So I'd make this logic:
check MSSQL if column has NOT NULL constraint
check MSSQL if column has CHECK(column <> '') or similar constraint
If both are true, make Oracle column NOT NULL. If any one is true, make Oracle column NULL. If none is true, raise INVALID DESIGN exception (or maybe ignore it, if it's acceptable by this application).
When sending data from MSSQL to Oracle, just do nothing special, all data would be transferred right. When retrieving data to MSSQL, any not-null data should be sent as is. For null strings you should decide whether it should be inserted as null or as empty string. To do this you should check table design again (or remember previous result) and see if it has NOT NULL constraint. If has - use empty string, if has not - use NULL. Simple and clever.
Sometimes, if you work with unknown and unpredictable application, you cannot check for existence of {not empty string} constraint because of various forms of it. If so, you can either use simplified logic (make Oracle columns always nullable) or check whether you can insert empty string into MSSQL table without error.
Although, for the most part, I agree with most of the other responses (not going to get into an argument about any I disagree with - not the place for that :) )
I do notice that OP mentioned the following:
"Oracle considers a blank string to be the same as a NULL value, so if a char column is defined as NOT NULL, you cannot insert a blank string."
Specifically calling out CHAR, and not VARCHAR2.
Hence, talking about an "empty string" of length 0 (ie '' ) is moot.
If he's declared the CHAR as, for example, CHAR(5), then just add a space to the empty string coming in, Oracle's going to pad it anyway. You'll end up with a 5 space string.
Now, if OP meant VARCHAR2, well yeah, that's a whole other beast, and yeah, the difference between empty string and NULL becomes relevant.
SQL> drop table junk;
Table dropped.
SQL>
SQL> create table junk ( c1 char(5) not null );
Table created.
SQL>
SQL> insert into junk values ( 'hi' );
1 row created.
SQL>
SQL> insert into junk values ( ' ' );
1 row created.
SQL>
SQL> insert into junk values ( '' );
insert into junk values ( '' )
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("GREGS"."JUNK"."C1")
SQL>
SQL> insert into junk values ( rpad('', 5, ' ') );
insert into junk values ( rpad('', 5, ' ') )
*
ERROR at line 1:
ORA-01400: cannot insert NULL into ("GREGS"."JUNK"."C1")
SQL>
SQL> declare
2 lv_in varchar2(5) := '';
3 begin
4 insert into junk values ( rpad(lv_in||' ', 5) );
5 end;
6 /
PL/SQL procedure successfully completed.
SQL>

Resources