I'm importing a CSV using sql server bulk option and below is my sql inputs.
MAXERRORS = 1000000,
CODEPAGE = 1251,
FIELDTERMINATOR = '~%',
ROWTERMINATOR = '0x0a',
ERRORFILE = 'C:\MyFile_BadData.log'
My problem is BULK INSERT fails to load the last row data.
Also please note that no errors was reported by the sql bulk option..
If i add a empty newline to the file the loading works without any issues.
But my concern is i cannot modify the CSV file, please suggest your valuable inputs if any
This happens when the last line doesn't end with the row terminator. Make sure the last line ends with the row terminator, then the last row will be imported.
If you can't change the export routine that generates the CSV, use powershell or something to add the row terminator to the CSV. If you can't change the original, copy it to a location where you can change it (include that in your powershell script).
Exactly! The last character in the file must be some kind of special character; Line Break, Space, etc. Can you simply delete the last line and do the import? If you need an automated solution, create a small C# app to open the file, delete the last line, and then save the file. THEN, run your import process. This can all be controlled using the Windows Task Scheduler, so you can even run it as an overnight job, when you are far away from your computer.
https://www.digitalcitizen.life/how-create-task-basic-task-wizard
If you're on a Unix variant, you can also use this to add to the file without having to edit it by hand.
sed -i '' -e '$a\' fname.csv
Note that this will only add a newline if there isn't one. So running it multiple times will not affect a file that already has a trailing newline.
Related
I’m trying to export a table into excel/csv , but I’m having trouble because of one column, which is long and has been concatenated with delimiter of “char(10) + char(13)” for a new lines . When I copy all the data from sql server management studio and use “save as” csv file, the output gets broken . Every place that there is a use of a new line , the output get stretched to more than 1 row and breaks the columns position.
I also tried using the export wizard ( don’t know if it will make a difference ) but with no success as the export keeps failing on the last step (getting a warning of “potential lost conversion from nvarchar to longtext) with error of “data conversion failed ..”
To allow multiline fields in csv, those fields have to be enclosed in quotes:
123,"multiline
field",456
789,second record,147
If this is not the case in your generated csv you might have to tell the generator to quote the fields.
If the quotes are already there the csv is valid and any decent reader should take care of those multiline fields. Of course, if you open the file in Notepad you'll still see multiple lines per record, which is normal.
To avoid such issues, you need to clean the data by replacing the carriage return (char(13)) and line feed (char(10)) in your SELECT statement using the following query:
SELECT replace(replace([ColumnName], char(10), ''), char(13), '')
FROM [dbo].[yourTableName]
Whenever I try to import a CSV file into sql server with more than one column I get an error (well, nothing is imported). I know the file is terminated fine because it works with 1 column ok if I modify the file and table. I am limiting the rows so it never gets to the end, the line terminator is the correct and valid one (also shown by working when having 1 column only).
All I get is this and no errors
0 rows affected
I've also check all the other various questions like this and they all point to a bad end of file or line terminator, but all is well here...
I have tried quotes and no quotes. For example, I have a table with 2 columns of varchar(max).
I run:
bulk insert mytable from 'file.csv' WITH (FIRSTROW=2,lastrow=4,rowterminator='\n')
My sample file is:
name,status
TEST00040697,OK
TEST00042142,OK
TEST00042782,OK
TEST00043431,BT
If I drop a column then delete the second column in the csv ensuring it has the same line terminator \n, it works just fine.
I have also tried specifying the 'errorfile' parameter but it never seems to write anything or even create the file.
Well, that was embarrassing.
SQL Server in it's wisdom is using \t as the default field terminator for a CSV file, but I guess when the documentation says 'FORMAT = 'CSV'' it's an example and not the default.
If only it produced actual proper and useful error messages...
I have a file dump which needs to be imported into SQL Server on a daily basis, which I have created a scheduled task to do this without any attendant. All CSV files are decimated by ',' and it's a Windows CR/LF file encoded with UTF-8.
To import data from these CSV files, I mainly use OpenRowset. It works well until I ran into a file in which there's a value of "S7". If the file contains the value of "S7" then that column will be recognized as datatype of numeric while doing the OpenRowset import and which will lead to a failure for other alphabetic characters to be imported, leaving only NULL values.
This is by far I had tried:
Using IMEX=1: openrowset('Microsoft.ACE.OLEDB.15.0','text;IMEX=1;HDR=Yes;
Using text driver: OpenRowset('MSDASQL','Driver=Microsoft Access Text Driver (*.txt, *.csv);
Using Bulk Insert with or without a format file.
The interesting part is that if I use Bulk Insert, it will give me a warning of unexpected end of file. To solve this, I have tried to use various row terminator indicators like '0x0a','\n', '\r\n' or not designated any, but they all failed. And finally I managed to import some of the records which using a row terminator of ',\n'. However the original file contains like 1000 records and only 100 will be imported, without any notice of errors or warnings.
Any tips or helps would be much appreciated.
Edit 1:
The file is ended with a newline character, from which I can tell from notepad++. I managed to import files which give me an error of unexpected end of file by removing the last record in those files. However even with this method, that I still can not import all records, only a partial of which can be imported.
I ran a query on a MS SQL database using SQL Server Management Studio, and some the fields contained new lines. I selected to save the result as a csv, and apparently MS SQL isn't smart enough to give me a correctly formatted CSV file.
Some of these fields with new lines are wrapped in quotes, but some aren't, I'm not sure why (it seems to quote fields if they contain more than one new line, but not if they only contain one new line, thanks Microsoft, that's useful).
When I try to open this CSV in Excel, some of the rows are wrong because of the new lines, it thinks that one row is two rows.
How can I fix this?
I was thinking I could use a regex. Maybe something like:
/,[^,]*\n[^,]*,/
Problem with this is it matches the last element of one line and the 1st of the next line.
Here is an example csv that demonstrates the issue:
field a,field b,field c,field d,field e
1,2,3,4,5
test,computer,I like
pie,4,8
123,456,"7
8
9",10,11
a,b,c,d,e
A simple regex replacement won't work, but here's a solution based on preg_replace_callback:
function add_quotes($matches) {
return preg_replace('~(?<=^|,)(?>[^,"\r\n]+\r?\n[^,]*)(?=,|$)~',
'"$0"',
$matches[0]);
}
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){5}$~m';
$result=preg_replace_callback($row_regex, 'add_quotes', $source);
The secret to $row_regex is knowing ahead of time how many columns there are. It starts at the beginning of a line (^ in multiline mode) and consumes the next five things that look like fields. It's not as efficient as I'd like, because it always overshoots on the last column, consuming the "real" line separator and the first field of the next row before backtracking to the end of the line. If your documents are very large, that might be a problem.
If you don't know in advance how many columns there are, you can discover that by matching just the first row and counting the matches. Of course, that assumes the row doesn't contain any of the funky fields that caused the problem. If the first row contains column headers you shouldn't have to worry about that, or about legitimate quoted fields either. Here's how I did it:
preg_match_all('~\G,?[^,\r\n]++~', $source, $cols);
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){' . count($cols[0]) . '}$~m';
Your sample data contains only linefeeds (\n), but I've allowed for DOS-style \r\n as well. (Since the file is generated by a Microsoft product, I won't worry about the older-Mac style CR-only separator.)
See an online demo
If you want a java programmatic solution, open the file using the OpenCSV library. If it is a manual operation, then open the file in a text editor such as Vim and run a replace command. If it is a batch operation, you can use a perl command to cleanup the CRLFs.
I have a text file (txt) containing formatted text (just line breaks, carriage returns and tabs)
It also contains German language characters.
I want to use the Bulk Insert comment in T-SQL to read in the text file into one field within a database table.
I ran this command:
CREATE TABLE #MyTestTable (
MyData NVARCHAR(MAX)
)
BULK INSERT [#MyTestTable]
FROM 'D:\MyTextFile.txt'
SELECT * FROM #MyTestTable
The problem is that it reads each line of the text file into a new row in the Temp table. I want it to read the whole file (formatting and all) into one row.
Also the German language characters appear to be lost - replaced by a non-printable character default in the Results View.
Anyone any ideas how I can achieve this?
Thanks.
You can use ROWTERMINATOR and CODEPAGE parameters. Default row terminator is '\r\n'. For the CODEPAGE, you need to know encoding of your raw file and default collation of your DB.
BULK INSERT [#MyTestTable]
FROM 'D:\MyTextFile.txt'
WITH (ROWTERMINATOR = '\0',
CODEPAGE = 'ACP')
Also see http://msdn.microsoft.com/en-us/library/ms188365.aspx
Use this:
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\n'
Where | is your column delimiter.
don't use bulk insert. it is made to take one record per line. You need to write code.
Properly handle the transition from you text file to the unicode (nvarchar) in code. bulk insert probably appplied the standard codepage, loosing your characters.
This really cries for some minor programming job - an hour work or so, plus naother testing and as long for running as it takes.