Currently I am using SQLCMD Utility to load the CSV data to SQL Server. Below is my command which was executed in command prompt to load the data:
sqlcmd -Usa -Pxxx -S192.168.1.223,49546 -dlocal -i"/test.sql" -o"/test.log"
I have also copied my test.sql file contents for your reference:
SET NOCOUNT ON
BULK INSERT test FROM
"\\192.168.1.223\test.csv"
WITH
(
MAXERRORS = 1000000,
CODEPAGE = 1251,
FIELDTERMINATOR = '~%',
ROWTERMINATOR = '0x0a'
)
GO
SELECT CONVERT(varchar,##ROWCOUNT) + ' rows affected'
GO
The insert operation is working fine with the above process. But my concern is, in case of any errors due to data type or data length the row is rejected and I am unable to trace the particular row.
Each time I have to look at the log file for the rejected row number and the data file to check the corresponding row.
Is there any option to generate the error/rejected row to another file, as like we have in ORACLE - SQLPLUS Utility to generate bad file?
I think the option your are looking for is not in sqlcmd, but in BULK INSERT:
ERRORFILE ='file_name'
Specifies the file used to collect rows that have formatting errors and cannot be converted to an OLE DB rowset. These rows are copied into this error file from the data file "as is."
Related
I receive update information for items on a daily basis via a CSV file that includes date/time information in the format YYY-MM-DDThh:mm:ss
I used the Management Studio task "Import Flat File..." to create a table dbo.fullItemList and import the contents of the initial file. It identified the date/time columns as type datetime2(7) and imported the data correctly. I then copied this table to create a blank table dbo.dailyItemUpdate.
I want to create a script that imports the CSV file to dbo.dailyItemUpdate, uses a MERGE function to update dbo.fullItemList, then wipes dbo.dailyItemUpdate ready for the next day.
The bit I can't get to work is the import. As the table already exists I'm using the following
BULK INSERT dbo.dailyItemUpdate
FROM 'pathToFile\ReceivedFile.csv'
WITH
(
DATAFILETYPE = 'char',
FIELDQUOTE = '"',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
TABLOCK
)
But I get a "type mismatch..." error on the date/time columns. How come the BULK INSERT fails, even though the data type was picked up by the "Import Flat File" function?
I have an R script that combines years of FFIEC Bank Call Report schedules into flat files--one for each schedule--then writes each schedule to a tab-delimited, non-quoted flat file suitable for bulk inserting into SQL Server. Then I run this bulk insert command:
bulk insert CI from 'e:\CI.txt' with (firstrow = 2, rowterminator = '0x0a', fieldterminator = '\t')
The bulk insert will run for a while then quit, with this error message:
Msg 7301, Level 16, State 2, Line 4
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
I've searched here for answers and the most common problem seems to be the rowterminator argument. I know that the files I've created have a line feed without a carriage return, so '0x0a' is the correct argument (but I tried '\n' and it didn't work).
Interestingly, I tried setting the fieldterminator to gibberish just to see what happened and I got the expected error message:
The bulk load failed. The column is too long in the data file for row 1, column 1."
So that tells me that SQL Server has access to the file and is indeed starting to insert it.
Also, I did a manual import (right click on database, tasks->Import Data) and SQL Server swallowed up the file without a hitch. That tells the layout of the table is fine, and so is the file?
Is it possible there's something at the end of the file that's confusing the bulk insert? I looked in a hex editor and it ends with data followed by 0A (the hex code for a line feed).
I'm stumped and open to any possibilities!
I have a csv file and am trying to run bulk insert. But before I do bulk insert I want to make sure this file is not having only the column header row. It should have at least 1 row of data.
select count(*) from openrowset(BULK 'file.csv', SINGLE_NCLOB) output
Above sql statement returns everything as 1 row. But I want the total row count in that csv file.
You are super close to it, you just do not need to select count, because that is simply telling you that you are trying to load from 1 file.
Instead you can do the following:
DECLARE #lengthOfFile INT
SELECT #lengthOfFile = len(content.BulkColumn)
FROM OPENROWSET(BULK N'file.csv', SINGLE_NCLOB) AS content
IF #lengthOfFile > 0
BEGIN
SELECT #lengthOfFile -- here you implement your bulk load
END
I had the same issue and found a solution. I am bulk inserting data and using FIRSTROW = 2.
If the CSV only contains the header row and no data, BULK INSERT fails with the error:
Msg 7301, Level 16, State 2.
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
Unfortunately error 7301 is used for various conditions and not just for empty CSV data.
So when it occurs I use this query to determine the number of rows in the CSV:
select len(BulkColumn) - len(replace(BulkColumn, char(10), '')) + 1 as line_count
from openrowset(
BULK 'path/to/my_csv.csv',
DATA_SOURCE = 'my_data_source',
SINGLE_CLOB
) as data_file;
The SINGLE_CLOB parameter causes openrowset to load the entire CSV file into the BulkColumn column. Then we get the length of the string, then replace \n with '' and get the link of the resulting string. The difference between the two is the number of '\n' characters in the CSV.
The query adds 1 to it, in order to account for the total number of rows.
So when I encounter error 7031, I use this to check the number of rows of the CSV and if it is 2, then I allow the process to proceed without erroring out.
The performance of this is quite ok. Files that indeed only have a header row take less than a second to cound. I also tried it with a 430mb CSV file that had over 11mn rows and the query executed in around 6min.
Since I only cared about determining whether the CSV had data, I was able to cut the processing of the 430mb file down further to just 20s by only counting the new lines in the first 5,000 characters:
select len(substring(BulkColumn, 1, 5000)) - len(replace(substring(BulkColumn, 1, 5000), char(10), '')) + 1 as row_count
from openrowset(
BULK 'import/manual_import/mp_is_other_tag_applications_2011-2020-06-06.csv',
DATA_SOURCE = 'ds_wasb',
SINGLE_CLOB
) as data_file;
I am using Microsoft SQL Server Management studio and I am currently importing some CSV files in a database. I am importing the CSV files using the BULK INSERT command into already existing tables, using the following query.
BULK INSERT myTable
FROM >>'D:\myfolder\file.csv'
WITH
(FIRSTROW = 2,
FIELDTERMINATOR = ';', --CSV Field Delimiter
ROWTERMINATOR = '\n', -- Used to shift to the next row
ERRORFILE = 'D:\myfolder\Error Files\myErrrorFile.csv',
TABLOCK
)
This works fine for me thus far, but I would like to automate the process of naming columns in tables. More specifically I would like to create a table and use as column names, the contents of the first row of the CSV file. Is that possible?
The easiest way I can think of is:
right-click on the database, select: Tasks -> Import Data...
After that, SQL Server Import and Export Wizard will display. There you have everything to specify and custom settings on importing data from any sources (such as getting column names from first row in a file).
In your case, your data source will be Flat file source.
I have an extremely large database I need to send to the developer, the table has over 120million rows. the developer says he only needs about 10,000 or so rows so I was going to use the sqlcmd -S -d -Q "select top 10000 * from table" -s "," -o "C:\temp\filename.csv"
I decided rather than truncate immediately I would script out the table, rename and test bulk inserting, I tried using
bulk insert tablename from 'c:\temp\filename.csv'
with (
fieldterminator = ',',
rowterminator = '\n'
)
this ends in "Bulk load data conversion error (truncation) for row 1..." error. I also tried in import/export wizard and it fails for the same problem (truncation). increasing the size of the field lengths, solves the problem but I really am having problems understanding why I need to do this. Its the same data from the same table, it should bulk insert right back in?!?
also the problem is happening on every column in the table and by varying lengths, so there is no column with the same number of chars I have to add. all the columns are varchar data type. could the sqlcmd be inserting some kind of corruption in the file? I have tried to look for a problem, I also tried rtrim(ltrim(columname)) to make sure there is no whitespace but I'm not sure this is how it works. I'm using sql server 2012 if this helps.
thanks
You should look into BCP Queryout and BULK INSERT options. Use NATIVE format if you're going from SQL to SQL.
(BCP is command-line):
bcp "select top(10000) * from table" queryout "OUTPUTFILENAME.DAT" -S serverInstanceName -d databaseName -T -n
The Bulk Insert command is SQL (not command line):
bulk insert table from 'path\and\OUTPUTFILENAME.DAT' with (keepidentity,datafiletype = 'native');
(If the table doesn't have an identity, you can eliminate keepidentity,