I am trying to import a csv file ('|' delimited though) using bulk insert. The problem is that some columns have text qualifier double quotes. However this is not consistent across all rows in the same column. Inside the double quotes I find the text delimiter '|' which is part of the text, For example
Col1|Col2|Col3
1|text1|101
2|"text2|text2a"|102
3|text3|103
4|"text4"|104
In this case I can't use the format file where I could specify the column delimiter because col2 does not have consistent delimiter (it is either | or |").
However, if I try to import via the import wizard and select text qualifier '"' then the wizard successfully sorts everything into columns, recognizing '"' as text qualifier and not as column delimiter ("| or |"). Is there an equivalent way using the bulk insert?
Related
I'm trying to import data from a .csv file into a SQL Server table.
Using the code below, I can read from the file:
BULK INSERT #TempTable
FROM '\\Data\TestData\ImportList.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR ='\n', FIRSTROW = 2, Lastrow = 3)
GO
(I added LastRow = 3 so I was just getting a subset of the data rather than dealing with all 2000 rows)
But I am getting multiple columns into a single column:
If I use the Import/Export wizard in SSMS, with the below settings, I see the expected results in the preview:
Can anyone give me some pointers as to how I need to update my query to perform correctly.
Here is a sample of what the CSV data looks like:
TIA.
You probably need to specify " as Text qualifier.
Your fields seem to be quoted and most likely contain comma's, which are currrently splitting your fields.
Or, if it works fine using <none> as Text qualifier, try to use FIELDQUOTE = '' or FIELDQUOTE = '\b' in your query. FIELDQUOTE defaults to '"'.
It's hard to tell what's really wrong without looking at some raw csv data that includes those quotes (as seen in your first screenshot).
I have not used SQL Server much (I usually use PostgreSQL) and I find hard to believe / accept that one simply cannot insert NULL values from a text file using BULK INSERT, if the file has a value that indicates null or missing data (NULL, NA, na, null, -, ., etc.).
I know BULK INSERT can keep NULL if the field is empty (link, and this is not a nice solution for my case because I have > 50 files, all of them relatively big > 25GB, so I do not want to). But I cannot find a way to tell SQL Server / BULK INSERT that certain value should be interpreted as NULL.
This is, I would say, pretty standard in importing data from text files in most tools. (e.g. COPY table_name FROM 'file_path' WITH (DELIMITER '\t', NULL 'NULL') in PostgreSQL, or readr::read_delim(file = "file", delim = "\t", na = "NULL") in R and the readr package, just to name a couple of examples).
Even more annoying is the fact that the file I want to import was actually exported from SQL Server. It seems that by default, instead of leaving NULL as empty fields in the text files, it writes the value NULL (which makes the file bigger, but anyway). So it seems very odd that the "import" feature (BULK INSERT or the bcp utility) of one tool (SQL Server) cannot properly import the files exported by default by the very same tool.
I've been googling around (link1, link2, link3, link4) and cannot find a workaround for this (different than editing my files to change NULL for empty fields, or import everything as varchar and later work in database to change types and so on). So I would really appreciate any ideas.
For the sake of a reproducible example, here is a sample table where I want to import this sample data stored in a text file:
Sample table:
CREATE TABLE test
(
[item][varchar](255) NULL,
[price][int] NULL
)
Sample data stored in file.txt:
item1, 34
item2, NULL
item3, 55
Importing the data ...
BULK INSERT test
FROM 'file.txt'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n')
But this fails because on the second line it finds NULL for an integer field. This field, however, allows NULL values. So I want it to understand that this is just a NULL value and not a character value.
I've got the simple table in CSV format:
999,"01/01/2001","01/01/2001","7777777","company","channel","01/01/2001"
990,"01/01/2001","01/01/2001","767676","hhh","tender","01/01/2001"
3838,"01/01/2001","01/01/2001","888","jhkh","jhkjh","01/01/2001"
08987,"01/01/2001","01/01/2001","888888","hkjhjkhv","jhgjh","01/01/2001"
8987,"01/01/2001","01/01/2001","9999","jghg","hjghg","01/01/2001"
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
090009,"","","77777","","","01/01/2001"
980989,"01/01/2001","01/01/2001","888","","jhkh","01/01/2001"
0000,"01/01/2001","01/01/2001","99999","jhjh","","01/01/2001"
92929,"01/01/2001","01/01/2001","222","","","01/01/2001"
I'm trying to import that data into SQL Server using BULK INSERT (Transact-SQL)
set dateformat DMY;
BULK INSERT Oracleload
FROM '\\Mac\Home\Desktop\Test\T_DOGOVOR.csv'
WITH
(FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
KEEPNULLS);
On the output I get the next error:
Msg 4864, Level 16, State 1, Line 4
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (date_begin)....
Something wrong with date format maybe. But what script I need to write to fix that error?
Please help.
Thanks in advance.
BULK INSERT (nor bcp) cannot (properly) handle CSV files, specially if they have (correctly) " quotes. Alternatives are SSIS or PowerShell.
I always look at the data in Notepad++ to see if there are some weird characters, or non-printable characters, like a line break or something. For this, it seems like you can open it using Notepad (if you don't have Notepad++) do a find-replace for " to nothing... Save the file, and re-do the Bulk Load.
This record:
jhkjhj,"01/01/2001","01/01/2001","9999","01.01.2001","hjhh","01/01/2001"
The first column has a numeric type of some kind. You can't put the jhkjhj value into that field.
Additionally, some records have empty values ("") in date fields. These are likely to be to interpreted as empty strings, rather than null dates, and not convert properly.
But the error refers to "row 1, column 2". That's this value:
"01/01/2001"
Again, the import is interpreting this as a string, rather than a date. I suspect it's trying to import the quotes (") instead of just using them as separators.
You might try bulk loading to a special holding table, and then re-importing from there. Alternatively, you can change how data is exported or write a program to pre-clean it — strip the quotes from fields that shouldn't have them, isolate records that have data that won't insert to an exception file and report.
I am running SQL 2008, bulk insert command, while inserting the data, I am trying to remove (") double quotes from the CSV file, which works partially, but doesnt work for all the records, please check my code and the screenshot of the result.
Bulk Insert tblUsersXTemp
from 'C:\FR0250Members161212_030818.csv'
WITH (FIELDTERMINATOR = '","',
ROWTERMINATOR = '"\n"',
--FormatFile =''
ERRORFILE = 'C:\bulk_insert_BadData.txt')
After you do the bulk insert, you could replace the double quotes.
UPDATE tblUsersXTemp
SET usxMembershipID = REPLACE(usxMembershipID, CHAR(34), '')
You need a format file I believe, that's what I think is going on.
If you use the following Bulk Insert command to import the data without using a format file, then you will land up with a quotation mark prefix to the first column value and a quotation mark suffix for the last column values and a quotation mark prefix for the first column values.
Reference
Example from reference:
BULK INSERT tblPeople
FROM ‘bcp.txt’
WITH (
DATAFILETYPE=‘char’,
FIELDTERMINATOR=‘","’,
ROWTERMINATOR = ‘\n’,
FORMATFILE = ‘bcp.fmt’);
You could also potentially have dirty data that uses quotes for more than just delimiters.
I m looking to do a batch load into a table, called temp_data, where some of the columns are NULLable dates.
Here is what I have till now:
LOAD TABLE some.temp_data
(SomeIntegerColumn ',', SomeDateColumn DATE('YYYYMMDD') NULL('NULL'), FILLER(1), SomeStringColumn ',')
USING CLIENT FILE '{0}' ESCAPES OFF DELIMITED BY ',' ROW DELIMITED BY '#'
and I m trying to load the following file
500,NULL,Monthly#
500,NULL,Monthly#
500,NULL,Monthly#
Unfortunately the error I get is:
ERROR [07006] [Sybase][ODBC Driver][Sybase IQ]Cannot convert NULL,Mon
to a date (column SomeDateColumn)
Any ideas why this wouldn't work?
It appears that it's reading the 8 characters following the first delimiter and trying to interpret them as a date.
Try switching to FORMAT BCP. The following example could work on your sample file:
LOAD TABLE some.temp_data (
SomeIntegerColumn
, SomeDateColumn NULL('NULL')
, SomeStringColumn
)
USING CLIENT FILE '{0}'
ESCAPES OFF
DELIMITED BY ','
ROW DELIMITED BY '#'
FORMAT BCP
In addition,FORMAT BCP also has the advantage of not requiring trailing delimiters.