BULK INSERT some rows being added with quotation marks - sql-server

I'm attempting to BULK INSERT a tab-separated text file into a database only containing VARCHAR data. For some reason, some of the data is getting double quotation marks placed around it randomly, while other rows do not:
domain sku type product
amazon.com b0071n529i laptop hp_4535s_a7k08ut#aba_15.6-inch_laptop
amazon.com b00715sj82 laptop "dell_64gb_mini_pcie_ssd_pata,_f462n"
The statement I'm using looks like this:
BULK INSERT database
FROM 'file.txt' WITH (FIRSTROW = 1, FIELDTERMINATOR = '\t', ROWTERMINATOR = '0x0a');

If your issue is those double quotes then you can do this after insertion that would be the better solution,
UPDATE TABLE A
SET A.Product=Replace(A.Product,'"','')
Where Left(A.Product,1)='"' or Right(A.Product,1)='"'

Related

BULK INSERT type mismatch when table created from same .CSV

I receive update information for items on a daily basis via a CSV file that includes date/time information in the format YYY-MM-DDThh:mm:ss
I used the Management Studio task "Import Flat File..." to create a table dbo.fullItemList and import the contents of the initial file. It identified the date/time columns as type datetime2(7) and imported the data correctly. I then copied this table to create a blank table dbo.dailyItemUpdate.
I want to create a script that imports the CSV file to dbo.dailyItemUpdate, uses a MERGE function to update dbo.fullItemList, then wipes dbo.dailyItemUpdate ready for the next day.
The bit I can't get to work is the import. As the table already exists I'm using the following
BULK INSERT dbo.dailyItemUpdate
FROM 'pathToFile\ReceivedFile.csv'
WITH
(
DATAFILETYPE = 'char',
FIELDQUOTE = '"',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK
)
But I get a "type mismatch..." error on the date/time columns. How come the BULK INSERT fails, even though the data type was picked up by the "Import Flat File" function?

Bulk Insert support for Unicode seperator

I am using Azure data factory to Archive data from Azure Sql Db to Azure Blob Store and Bulk insert to retrieve the data.
I am using below as row and column seperator.
Column delimiter:\u0001
Row delimiter:\u0003
My Bulk Insert is below.
BULK INSERT mytable 'MyPath/file.txt'
WITH (DATA_SOURCE = 'MySource',FIELDTERMINATOR ='\u0001', ROWTERMINATOR = '\u0003');
I am getting the below error:
Msg 4866, Level 16, State 1, Line 41
The bulk load failed. The column is too long in the data file for row 1, column 1. Verify that the field terminator and row terminator are specified correctly.
Documentation said Unicode is supported for FIELDTERMINATOR and ROWTERMINATOR then what could be the issue?
It seems unicode is not fully supported for bulk insert.
**Only the t, n, r, 0 and '\0' characters work with the backslash escape character to produce a control character.
Link: https://learn.microsoft.com/en-us/sql/relational-databases/import-export/specify-field-and-row-terminators-sql-server?view=azuresqldb-current

Bulk Import CSV file into SQL Server - remove double quotes

I am running SQL 2008, bulk insert command, while inserting the data, I am trying to remove (") double quotes from the CSV file, which works partially, but doesnt work for all the records, please check my code and the screenshot of the result.
Bulk Insert tblUsersXTemp
from 'C:\FR0250Members161212_030818.csv'
WITH (FIELDTERMINATOR = '","',
ROWTERMINATOR = '"\n"',
--FormatFile =''
ERRORFILE = 'C:\bulk_insert_BadData.txt')
After you do the bulk insert, you could replace the double quotes.
UPDATE tblUsersXTemp
SET usxMembershipID = REPLACE(usxMembershipID, CHAR(34), '')
You need a format file I believe, that's what I think is going on.
If you use the following Bulk Insert command to import the data without using a format file, then you will land up with a quotation mark prefix to the first column value and a quotation mark suffix for the last column values and a quotation mark prefix for the first column values.
Reference
Example from reference:
BULK INSERT tblPeople
FROM ‘bcp.txt’
WITH (
DATAFILETYPE=‘char’,
FIELDTERMINATOR=‘","’,
ROWTERMINATOR = ‘\n’,
FORMATFILE = ‘bcp.fmt’);
You could also potentially have dirty data that uses quotes for more than just delimiters.

Bulk insert with text qualifier in SQL Server

I am trying to bulk insert few records in a table test from a CSV file ,
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
The bulk insert code should rid of the first row and insert the data into the table . it gets rid of first row alright but gets confused in the delimiter section . The first column is wkt and the column value is double quoted and has comma within the value .
So I guess I question is if there is a way to tell the BULK INSERT that the double quoted part is one column regardless of the comma within it ?
the CSV file looks like this ,
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))", 123123.22
You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field.
Create a text file called "level_2.fmt" and save it.
11.0
2
1 SQLCHAR 0 8000 "\"," 1 wkt SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 40 "\r\n" 2 area SQL_Latin1_General_CP1_CI_AS
The first line, "11.0" refers to your version of SQL. The second line shows that your table, [level2_import], has two columns. Each line after that will describe a column, and obeys the following format:
[Source Column Number][DataType][Min Size][Max Size][Delimiter pattern][Destination Column Number][Destination Column Name][Case sensitivity of database]
Once you've created that file, you can read in your data with the following bulk insert statement:
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FORMATFILE='D:\level_2.fmt'
);
Refer to this blog for a detailed explanation of the format file.
SQL Server 2017 finally added support for text qualifiers and the CSV format defined in RFC 4180. It should be enough to write :
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH ( FORMAT = 'CSV', ROWTERMINATOR = '\n', FIRSTROW = 2 )
Try removing .fmt to the file and use .txt instead, that worked for me
I have this issue working with LDAP data the dn contains commas, as do other fields that contain dns. Try changing your field terminator to another, unused character, like a pipe | or Semicolon ;. Do this in the data and the file definition.
so the code should be:
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
and your CSV:
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))"; 123123.22

SQL Server : convert string to Int when importing from CSV

In my CSV file, there is AuthorID by they are in string format
Is it possible in a SQL Server query, when importing a CSV, to convert String to Integer?
BULK
INSERT Author
FROM 'C:\author.csv'
WITH
(
FIRSTROW = 2,
<Convert Column 1 to Integer>
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
for data manipulation whistle data transfer use SSIS packages, I dont think you can do anything with the data itself with bulk insert statement whistle you are inserting it into a table. The destination column should have the appropriate and the right DATATYPE and it should work.
In you case if you set datatype for your destination column to be INT and you do not have any strings in your data it should work just fine. If you still get errors then you should check you data that you are inserting and if there is any data that needs changing before it can be inserted in an INT datatype column then you might have to consider making use of SSIS package.

Resources