Whenever I try to import a CSV file into sql server with more than one column I get an error (well, nothing is imported). I know the file is terminated fine because it works with 1 column ok if I modify the file and table. I am limiting the rows so it never gets to the end, the line terminator is the correct and valid one (also shown by working when having 1 column only).
All I get is this and no errors
0 rows affected
I've also check all the other various questions like this and they all point to a bad end of file or line terminator, but all is well here...
I have tried quotes and no quotes. For example, I have a table with 2 columns of varchar(max).
I run:
bulk insert mytable from 'file.csv' WITH (FIRSTROW=2,lastrow=4,rowterminator='\n')
My sample file is:
name,status
TEST00040697,OK
TEST00042142,OK
TEST00042782,OK
TEST00043431,BT
If I drop a column then delete the second column in the csv ensuring it has the same line terminator \n, it works just fine.
I have also tried specifying the 'errorfile' parameter but it never seems to write anything or even create the file.
Well, that was embarrassing.
SQL Server in it's wisdom is using \t as the default field terminator for a CSV file, but I guess when the documentation says 'FORMAT = 'CSV'' it's an example and not the default.
If only it produced actual proper and useful error messages...
Related
I have a file expecting 8 chars per line that I want to load to a table in SQL Server
ABCD1234
ABCD5678
!
DCBA4321
DCBA9876
>
ABCDEFGH
However I may get bad rows. With SSIS I tried all the 3 methods:
Determined with {CL}{RF}, fixed width and finally Ragged Right.
In all cases parsing fails and is redirected to the error table. When I remove the bad lines, everything is fine.
What is strange is that with a small sample like this it still works and is inserted to the expected table.
When file is big, parsing may fail not at the first bad row but second or third and insert all the rest in the ERROR Table.
Isn't it supposed to skip the bad row and insert the good ones in the expected table even when they come after?
Or is there another solution?
Try to add a conditional split component with the following expression in order to ignore bad rows:
LEN([InputColumn]) == 8
I think this will work as expected.
SSIS Basics: Using the Conditional Split
I have a file dump which needs to be imported into SQL Server on a daily basis, which I have created a scheduled task to do this without any attendant. All CSV files are decimated by ',' and it's a Windows CR/LF file encoded with UTF-8.
To import data from these CSV files, I mainly use OpenRowset. It works well until I ran into a file in which there's a value of "S7". If the file contains the value of "S7" then that column will be recognized as datatype of numeric while doing the OpenRowset import and which will lead to a failure for other alphabetic characters to be imported, leaving only NULL values.
This is by far I had tried:
Using IMEX=1: openrowset('Microsoft.ACE.OLEDB.15.0','text;IMEX=1;HDR=Yes;
Using text driver: OpenRowset('MSDASQL','Driver=Microsoft Access Text Driver (*.txt, *.csv);
Using Bulk Insert with or without a format file.
The interesting part is that if I use Bulk Insert, it will give me a warning of unexpected end of file. To solve this, I have tried to use various row terminator indicators like '0x0a','\n', '\r\n' or not designated any, but they all failed. And finally I managed to import some of the records which using a row terminator of ',\n'. However the original file contains like 1000 records and only 100 will be imported, without any notice of errors or warnings.
Any tips or helps would be much appreciated.
Edit 1:
The file is ended with a newline character, from which I can tell from notepad++. I managed to import files which give me an error of unexpected end of file by removing the last record in those files. However even with this method, that I still can not import all records, only a partial of which can be imported.
I'm pretty new to SQL Oracle and my class is going over Bulk Loading at the moment. I pretty much get the idea however I am having a little trouble getting it to read all of my records.
This is my SQL File;
PROMPT Creating Table 'CUSTOMER'
CREATE TABLE CUSTOMER
(CustomerPhoneKey CHAR(10) PRIMARY KEY
,CustomerLastName VARCHAR(15)
,CustomerFirstName VARCHAR(15)
,CustomerAddress1 VARCHAR(15)
,CutomerAddress2 VARCHAR(30)
,CustomerCity VARCHAR(15)
,CustomerState VARCHAR(5)
,CustomerZip VARCHAR(5)
);
Quick and easy. Now This is my Control File to load in the data
LOAD DATA
INFILE Customer.dat
INTO TABLE Customer
FIELDS TERMINATED BY"|"
(CustomerPhoneKey, CustomerLastName, CustomerFirstName, CustomerAddress1 , CutomerAddress2, CustomerCity, CustomerState, CustomerZip)
Then the Data File
2065552123|Lamont|Jason|NULL|161 South Western Ave|NULL|NULL|98001
2065553252|Johnston|Mark|Apt. 304|1215 Terrace Avenue|Seattle|WA|98001
2065552963|Lewis|Clark|NULL|520 East Lake Way|NULL|NULL|98002
2065553213|Anderson|Karl|Apt 10|222 Southern Street|NULL|NULL|98001
2065552217|Wong|Frank|NULL|2832 Washington Ave|Seattle|WA|98002
2065556623|Jimenez|Maria|Apt 13 B|1200 Norton Way|NULL|NULL|98003
The problem is, that only the last record
2065556623|Jimenez|Maria|Apt 13 B|1200 Norton Way|NULL|NULL|98003
is being loaded in. The rest are in my bad file
So I took a look at my log file and the errors I'm getting are
Record 1: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 2: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 3: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 4: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Record 5: Rejected - Error on table CUSTOMER, column CUSTOMERZIP.
ORA-01401: inserted value too large for column
Table CUSTOMER: 1 Row successfully loaded. 5 Rows not loaded due
to data errors. 0 Rows not loaded because all WHEN clauses were
failed. 0 Rows not loaded because all fields were null.
Onto the question. I see that CustomerZip is the problem, and initially I had it as CHAR(5) -- I did this because my understanding of the data type, is that for numeric values like a zip code, I would not be doing arithmetic operations with it, so it would be better to store it as CHAR. Also I did not use VARCHAR2 (5) initially, because seeing as it is a zip code, I don't want the value to vary, It should always be 5. Now maybe I'm just misunderstanding this. So if there is anyone that can clear that up, that would be awesome.
My second question, is "How do I fix this problem?" Given the above understanding of these data types, it doesn't make sense why CHAR(5) NOR VARCHAR2(5) work. As I am getting the same errors for both.
It makes even less sense that one record(the last one) actually works.
Thank you for the help in advance
Your data file has extra, invisible characters. We can't see the original but presumably it was created in Windows and has CRLF new line separators; and you're running SQL*Loader in a UNIX/Linux environment that is only expecting line feed (LF). The carriage return (CR) characters are still in the file, and Oracle is seeing them as part of the ZIP field in the file.
The last line doesn't have a CRLF (or any new-line marker), so on that line - and only that line - the ZIP field is being seen as 5 characters, For all the others it's being seen as six, e.g. 98001^M.
You can read more about the default behaviour in the documentation:
On UNIX-based platforms, if no terminator_string is specified, then SQL*Loader defaults to the line feed character, \n.
On Windows NT, if no terminator_string is specified, then SQL*Loader uses either \n or \r\n as the record terminator, depending on which one it finds first in the data file. This means that if you know that one or more records in your data file has \n embedded in a field, but you want \r\n to be used as the record terminator, then you must specify it.
If you open the data file in an edit like vi or vim, you'll see those extra ^M control characters.
There are several ways to fix this. You can modify the file; the simplest thing to do that is copy and paste the data into a new file created in the environment you'll run SQL*Loader in. There are utilities to convert line endings if you prefer, e.g. dos2unix. Or your Windows editor may be able to save the file without the CRs. You could also add an extra field delimiter to the data file, as Ditto suggested.
Or you could tell SQL*Loader to expect CRLF by changing the INFILE line:
LOAD DATA
INFILE Customer.dat "str '\r\n'"
INTO TABLE Customer
...
... though that will then cause problems if you do supply a file created in Linux, without the CR characters.
There is a utility, dos2unix, that is present on almost all UNIX machines. If you run it, you can output the data file with the DOS/Windows CRLF combination stripped.
I'm importing a csv non-unicode file using SSIS into SQL Server. I get the error "Text was truncated or one or more characters had no match in the target code page". It fails in column 0 on row 70962, which has data just like every other row; the data in the first column is no longer than the data in rows above it.
My column 0 is defined in the flat file connection, and in the database, as 255 wide. The data in row 70692 (and most other rows) is 17 characters.
The strange thing is, if I remove a row above row 70962 in the file, even the first row, and save the csv file, then the import runs fine. If I replace that removed row, and run the import, it fails again.
So I'm not even sure how to identify what the issue is.
If I create a new flat file connection that is a single column, I can import the whole file into a single-column table. But as soon as I add the first column delimiter (i.e. second column), then it fails on that same row.
At the moment I'm just short of idea as to how to debug this further.
You already gave the answer in your question ;)
if I remove a row above row 70962 in the file, even the first row, and
save the csv file, then the import runs fine.
You have a broken delimiter somewhere in the file. when you remove any data before the offending line the mismatch of delimiters is probably not properly handled but simply left open until the very end of the file after which the program handles it for you.
Check the row and column delimiters of the row above the one you mentioned and that very row.
I ran a query on a MS SQL database using SQL Server Management Studio, and some the fields contained new lines. I selected to save the result as a csv, and apparently MS SQL isn't smart enough to give me a correctly formatted CSV file.
Some of these fields with new lines are wrapped in quotes, but some aren't, I'm not sure why (it seems to quote fields if they contain more than one new line, but not if they only contain one new line, thanks Microsoft, that's useful).
When I try to open this CSV in Excel, some of the rows are wrong because of the new lines, it thinks that one row is two rows.
How can I fix this?
I was thinking I could use a regex. Maybe something like:
/,[^,]*\n[^,]*,/
Problem with this is it matches the last element of one line and the 1st of the next line.
Here is an example csv that demonstrates the issue:
field a,field b,field c,field d,field e
1,2,3,4,5
test,computer,I like
pie,4,8
123,456,"7
8
9",10,11
a,b,c,d,e
A simple regex replacement won't work, but here's a solution based on preg_replace_callback:
function add_quotes($matches) {
return preg_replace('~(?<=^|,)(?>[^,"\r\n]+\r?\n[^,]*)(?=,|$)~',
'"$0"',
$matches[0]);
}
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){5}$~m';
$result=preg_replace_callback($row_regex, 'add_quotes', $source);
The secret to $row_regex is knowing ahead of time how many columns there are. It starts at the beginning of a line (^ in multiline mode) and consumes the next five things that look like fields. It's not as efficient as I'd like, because it always overshoots on the last column, consuming the "real" line separator and the first field of the next row before backtracking to the end of the line. If your documents are very large, that might be a problem.
If you don't know in advance how many columns there are, you can discover that by matching just the first row and counting the matches. Of course, that assumes the row doesn't contain any of the funky fields that caused the problem. If the first row contains column headers you shouldn't have to worry about that, or about legitimate quoted fields either. Here's how I did it:
preg_match_all('~\G,?[^,\r\n]++~', $source, $cols);
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){' . count($cols[0]) . '}$~m';
Your sample data contains only linefeeds (\n), but I've allowed for DOS-style \r\n as well. (Since the file is generated by a Microsoft product, I won't worry about the older-Mac style CR-only separator.)
See an online demo
If you want a java programmatic solution, open the file using the OpenCSV library. If it is a manual operation, then open the file in a text editor such as Vim and run a replace command. If it is a batch operation, you can use a perl command to cleanup the CRLFs.