Bulk insert with text qualifier in SQL Server - sql-server

I am trying to bulk insert few records in a table test from a CSV file ,
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
The bulk insert code should rid of the first row and insert the data into the table . it gets rid of first row alright but gets confused in the delimiter section . The first column is wkt and the column value is double quoted and has comma within the value .
So I guess I question is if there is a way to tell the BULK INSERT that the double quoted part is one column regardless of the comma within it ?
the CSV file looks like this ,
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))", 123123.22

You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field.
Create a text file called "level_2.fmt" and save it.
11.0
2
1 SQLCHAR 0 8000 "\"," 1 wkt SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 40 "\r\n" 2 area SQL_Latin1_General_CP1_CI_AS
The first line, "11.0" refers to your version of SQL. The second line shows that your table, [level2_import], has two columns. Each line after that will describe a column, and obeys the following format:
[Source Column Number][DataType][Min Size][Max Size][Delimiter pattern][Destination Column Number][Destination Column Name][Case sensitivity of database]
Once you've created that file, you can read in your data with the following bulk insert statement:
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FORMATFILE='D:\level_2.fmt'
);
Refer to this blog for a detailed explanation of the format file.

SQL Server 2017 finally added support for text qualifiers and the CSV format defined in RFC 4180. It should be enough to write :
BULK INSERT level2_import
FROM 'D:\test.csv'
WITH ( FORMAT = 'CSV', ROWTERMINATOR = '\n', FIRSTROW = 2 )

Try removing .fmt to the file and use .txt instead, that worked for me

I have this issue working with LDAP data the dn contains commas, as do other fields that contain dns. Try changing your field terminator to another, unused character, like a pipe | or Semicolon ;. Do this in the data and the file definition.
so the code should be:
CREATE TABLE Level2_import
(wkt varchar(max),
area VARCHAR(40),
)
BULK
INSERT level2_import
FROM 'D:\test.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n'
)
and your CSV:
"MULTIPOLYGON (((60851.286135090661 510590.66974495345,60696.086128673756 510580.56976811233,60614.7860844061 510579.36978015327,60551.486015895614)))"; 123123.22

Related

Import text file data into SQL Server database

I have a text file with below format
I want to import it in SQL Server database by splitting it into several columns:
Terminal, NetSales, NetAjustment, CancelsCnt, CancelAmount,
CashesCnt, CashesAmount, ClaimsCnt, ClaimsAmount, SalesCommission,
CashCommission, NetDue
I have tried to insert the text file in SQL Server using SSIS but its inserting everything in one column instead of split it, then used SQL scripting to split it into several columns but its not working
I'm having some difficulties to split the column from the text file
Any ideas or help about how I can capture those columns data into a proper format?
I would suggest to use SSIS Bulk Insert Task.
Bulk Insert Task in SSIS
It has identical functionality as a T-SQL BULK INSERT statement.
It allows to specify where real first row starts via its FIRSTROW parameter.
Here is a conceptual example.
SQL
CREATE TABLE dbo.tbl (
Terminal VARCHAR(20),
NetSales VARCHAR(30),
NetAjustment VARCHAR(100),
CancelsCnt INT
...
);
BULK INSERT dbo.tbl
FROM 'e:\Temp\inputFile.csv'
WITH (FORMAT='CSV'
, DATAFILETYPE = 'char' -- { 'char' | 'native' | 'widechar' | 'widenative' }
, FIELDTERMINATOR = '\t' -- for a TAB
, ROWTERMINATOR = '\n'
, FIRSTROW = 8
, CODEPAGE = '65001');
-- test
SELECT * FROM dbo.tbl;

counting rows in csv file before bulk insert , non empty file check

I have a csv file and am trying to run bulk insert. But before I do bulk insert I want to make sure this file is not having only the column header row. It should have at least 1 row of data.
select count(*) from openrowset(BULK 'file.csv', SINGLE_NCLOB) output
Above sql statement returns everything as 1 row. But I want the total row count in that csv file.
You are super close to it, you just do not need to select count, because that is simply telling you that you are trying to load from 1 file.
Instead you can do the following:
DECLARE #lengthOfFile INT
SELECT #lengthOfFile = len(content.BulkColumn)
FROM OPENROWSET(BULK N'file.csv', SINGLE_NCLOB) AS content
IF #lengthOfFile > 0
BEGIN
SELECT #lengthOfFile -- here you implement your bulk load
END
I had the same issue and found a solution. I am bulk inserting data and using FIRSTROW = 2.
If the CSV only contains the header row and no data, BULK INSERT fails with the error:
Msg 7301, Level 16, State 2.
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
Unfortunately error 7301 is used for various conditions and not just for empty CSV data.
So when it occurs I use this query to determine the number of rows in the CSV:
select len(BulkColumn) - len(replace(BulkColumn, char(10), '')) + 1 as line_count
from openrowset(
BULK 'path/to/my_csv.csv',
DATA_SOURCE = 'my_data_source',
SINGLE_CLOB
) as data_file;
The SINGLE_CLOB parameter causes openrowset to load the entire CSV file into the BulkColumn column. Then we get the length of the string, then replace \n with '' and get the link of the resulting string. The difference between the two is the number of '\n' characters in the CSV.
The query adds 1 to it, in order to account for the total number of rows.
So when I encounter error 7031, I use this to check the number of rows of the CSV and if it is 2, then I allow the process to proceed without erroring out.
The performance of this is quite ok. Files that indeed only have a header row take less than a second to cound. I also tried it with a 430mb CSV file that had over 11mn rows and the query executed in around 6min.
Since I only cared about determining whether the CSV had data, I was able to cut the processing of the 430mb file down further to just 20s by only counting the new lines in the first 5,000 characters:
select len(substring(BulkColumn, 1, 5000)) - len(replace(substring(BulkColumn, 1, 5000), char(10), '')) + 1 as row_count
from openrowset(
BULK 'import/manual_import/mp_is_other_tag_applications_2011-2020-06-06.csv',
DATA_SOURCE = 'ds_wasb',
SINGLE_CLOB
) as data_file;

BULK INSERT some rows being added with quotation marks

I'm attempting to BULK INSERT a tab-separated text file into a database only containing VARCHAR data. For some reason, some of the data is getting double quotation marks placed around it randomly, while other rows do not:
domain sku type product
amazon.com b0071n529i laptop hp_4535s_a7k08ut#aba_15.6-inch_laptop
amazon.com b00715sj82 laptop "dell_64gb_mini_pcie_ssd_pata,_f462n"
The statement I'm using looks like this:
BULK INSERT database
FROM 'file.txt' WITH (FIRSTROW = 1, FIELDTERMINATOR = '\t', ROWTERMINATOR = '0x0a');
If your issue is those double quotes then you can do this after insertion that would be the better solution,
UPDATE TABLE A
SET A.Product=Replace(A.Product,'"','')
Where Left(A.Product,1)='"' or Right(A.Product,1)='"'

Bulk Import CSV file into SQL Server - remove double quotes

I am running SQL 2008, bulk insert command, while inserting the data, I am trying to remove (") double quotes from the CSV file, which works partially, but doesnt work for all the records, please check my code and the screenshot of the result.
Bulk Insert tblUsersXTemp
from 'C:\FR0250Members161212_030818.csv'
WITH (FIELDTERMINATOR = '","',
ROWTERMINATOR = '"\n"',
--FormatFile =''
ERRORFILE = 'C:\bulk_insert_BadData.txt')
After you do the bulk insert, you could replace the double quotes.
UPDATE tblUsersXTemp
SET usxMembershipID = REPLACE(usxMembershipID, CHAR(34), '')
You need a format file I believe, that's what I think is going on.
If you use the following Bulk Insert command to import the data without using a format file, then you will land up with a quotation mark prefix to the first column value and a quotation mark suffix for the last column values and a quotation mark prefix for the first column values.
Reference
Example from reference:
BULK INSERT tblPeople
FROM ‘bcp.txt’
WITH (
DATAFILETYPE=‘char’,
FIELDTERMINATOR=‘","’,
ROWTERMINATOR = ‘\n’,
FORMATFILE = ‘bcp.fmt’);
You could also potentially have dirty data that uses quotes for more than just delimiters.

SQL Server bulk insert command fails importing textfile

I am trying to dump the following txt into a table (using the wizard did not work either)
http://download.geonames.org/export/dump/admin1CodesASCII.txt
using the following
drop table tempregions
create table TempRegions
(
code varchar(500),
name varchar(500),
asciiName varchar(500),
somenumber varchar(500)
);
BULK INSERT GeoNames
FROM 'C:\Users\Administrator\Desktop\geonames\admin1CodesASCII.txt'
WITH(
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\r\n'
)
go
getting the following error (using sql server 2012)
Msg 4864, Level 16, State 1, Line 10
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 1 (geonameid).
thanks for the help
The text contains non-standard ASCII characters and you fail to define a codepage. The error is there to protrect you. Find and define an appropriate codepage as per syntax at http://msdn.microsoft.com/en-us/library/ms188365.aspx

Resources