SSIS skip bad row - sql-server

I have a file expecting 8 chars per line that I want to load to a table in SQL Server
ABCD1234
ABCD5678
!
DCBA4321
DCBA9876
>
ABCDEFGH
However I may get bad rows. With SSIS I tried all the 3 methods:
Determined with {CL}{RF}, fixed width and finally Ragged Right.
In all cases parsing fails and is redirected to the error table. When I remove the bad lines, everything is fine.
What is strange is that with a small sample like this it still works and is inserted to the expected table.
When file is big, parsing may fail not at the first bad row but second or third and insert all the rest in the ERROR Table.
Isn't it supposed to skip the bad row and insert the good ones in the expected table even when they come after?
Or is there another solution?

Try to add a conditional split component with the following expression in order to ignore bad rows:
LEN([InputColumn]) == 8
I think this will work as expected.
SSIS Basics: Using the Conditional Split

Related

Processing Interactive Grid manually through PL/SQL and keeps throwing out an error

Used this site https://community.oracle.com/thread/3937159?start=0&tstart=0 to learn how to manually process interactive grids. I got it to work on a small table with 3 columns, but when I tried to get it to work for a bigger table, it keeps throwing this error:
PL/SQL: numeric or value error: character string buffer too small for.
I tried only updating 1 column and converting the datatype to the correct one, and it is not going away.
this message usually means you're trying to store 'AAAA' into a column that only accepts 1, 2 or 3 chars, like varchar2(3).
Make sure your columns have a proper limit size for the data you're processing.

Bulk import only works with 1 column csv file

Whenever I try to import a CSV file into sql server with more than one column I get an error (well, nothing is imported). I know the file is terminated fine because it works with 1 column ok if I modify the file and table. I am limiting the rows so it never gets to the end, the line terminator is the correct and valid one (also shown by working when having 1 column only).
All I get is this and no errors
0 rows affected
I've also check all the other various questions like this and they all point to a bad end of file or line terminator, but all is well here...
I have tried quotes and no quotes. For example, I have a table with 2 columns of varchar(max).
I run:
bulk insert mytable from 'file.csv' WITH (FIRSTROW=2,lastrow=4,rowterminator='\n')
My sample file is:
name,status
TEST00040697,OK
TEST00042142,OK
TEST00042782,OK
TEST00043431,BT
If I drop a column then delete the second column in the csv ensuring it has the same line terminator \n, it works just fine.
I have also tried specifying the 'errorfile' parameter but it never seems to write anything or even create the file.
Well, that was embarrassing.
SQL Server in it's wisdom is using \t as the default field terminator for a CSV file, but I guess when the documentation says 'FORMAT = 'CSV'' it's an example and not the default.
If only it produced actual proper and useful error messages...

Ragged Right in SSIS does not work properly

Hello: I have an SSIS package that imports a flat text file: the text file is a simple, fixed-width file that’s also CR/LF delimited. This means that: EACH record has a set of fixed length columns (the columns are defined using fixed lengths), but each record must also end with a CR/LF.
I’ve defined the package as follows:
PROBLEM:
Some records do not have all of the columns defined, and thus they are shorter. However, ALL records end with a CR/LF. First I tried to import it using “fixed width” file and the shorter records were misaligned because obviously it wasn’t fixed length. Now that I am using ragged right, I am still facing the same issue. Basically, for the shorter records, SSIS borrows from the next line to compensate for the thing. THE NEXT line, however, is just fine.
POSSIBLE SOLUTIONS:
1- Ignore the rest of the columns that are not needed (basically ignore it): this works fine but is not elegant. I was hoping for a better solution.
2- USE the record type at the beginning to split BEFORE defining columns. This works fine also but I have over 500 fields, and the point of using the Flat File import is to be able to generate the columns automatically.
3- Use a script component: that seems like a difficult thing to do.

Incorrect Syntax near ' ' Error, doesn't make sense

I'm trying to import .csv files into a SQL Server database in a web server. I have about 30000 rows in the table. The delimiter is ; in the csv file. It inserted 11202 rows but after that it is not inserting and saying;
Incorrect syntax near 'Farms'. Incorrect syntax near 'Dale'. Incorrect
syntax near 'City'. Incorrect syntax near 'Center'. Incorrect syntax
near 'Depot'.
These rows are;
111203;Greens Farms;12;446;nocity.jpg;NULL
111205;Grosvenor Dale;12;446;nocity.jpg;NULL
111219;Jewett City;12;446;nocity.jpg;NULL
111230;Mansfield Center;12;446;nocity.jpg;NULL
111231;Mansfield Depot;12;446;nocity.jpg;NULL
I thought it is about the space (' ') between the city names like Green Farms but there are so many cities which have blanks and they were inserted successfully in previous rows. I doesn't make any sense.
Do you have any idea about this situation ?
I'd recommend dividing your csv into two files. Of course, the first file will contain the 11202 rows that were successfully imported, and the second would include the remaining ~18798.
One would expect that the first file would be imported with no errors.
Then when you import the second file, you might find that you are dealing with a boundary restriction of some sort, if that file also starts bombing after 10 or 11K imports.
Or, you may more quickly be able to spot the problem importing the smaller second file.
If you are still getting exactly the same errors, but only a limited number, then I'd recommend removing the error rows completely and putting them in yet another file.
In this manner, you'll eventually have imported nearly all your data and you'll be left with a manageable subset where again you may be able to more easily spot the problem.
If, after all that, you've got 10 rows that give errors and you can't see any reason why, just use SQL insert statements to put them in your db.
Hopefully this isn't part of some goal to automate a regularly scheduled process!!
I'd be interested to see how this goes for you. Thanks.

Fix CSV file with new lines

I ran a query on a MS SQL database using SQL Server Management Studio, and some the fields contained new lines. I selected to save the result as a csv, and apparently MS SQL isn't smart enough to give me a correctly formatted CSV file.
Some of these fields with new lines are wrapped in quotes, but some aren't, I'm not sure why (it seems to quote fields if they contain more than one new line, but not if they only contain one new line, thanks Microsoft, that's useful).
When I try to open this CSV in Excel, some of the rows are wrong because of the new lines, it thinks that one row is two rows.
How can I fix this?
I was thinking I could use a regex. Maybe something like:
/,[^,]*\n[^,]*,/
Problem with this is it matches the last element of one line and the 1st of the next line.
Here is an example csv that demonstrates the issue:
field a,field b,field c,field d,field e
1,2,3,4,5
test,computer,I like
pie,4,8
123,456,"7
8
9",10,11
a,b,c,d,e
A simple regex replacement won't work, but here's a solution based on preg_replace_callback:
function add_quotes($matches) {
return preg_replace('~(?<=^|,)(?>[^,"\r\n]+\r?\n[^,]*)(?=,|$)~',
'"$0"',
$matches[0]);
}
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){5}$~m';
$result=preg_replace_callback($row_regex, 'add_quotes', $source);
The secret to $row_regex is knowing ahead of time how many columns there are. It starts at the beginning of a line (^ in multiline mode) and consumes the next five things that look like fields. It's not as efficient as I'd like, because it always overshoots on the last column, consuming the "real" line separator and the first field of the next row before backtracking to the end of the line. If your documents are very large, that might be a problem.
If you don't know in advance how many columns there are, you can discover that by matching just the first row and counting the matches. Of course, that assumes the row doesn't contain any of the funky fields that caused the problem. If the first row contains column headers you shouldn't have to worry about that, or about legitimate quoted fields either. Here's how I did it:
preg_match_all('~\G,?[^,\r\n]++~', $source, $cols);
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){' . count($cols[0]) . '}$~m';
Your sample data contains only linefeeds (\n), but I've allowed for DOS-style \r\n as well. (Since the file is generated by a Microsoft product, I won't worry about the older-Mac style CR-only separator.)
See an online demo
If you want a java programmatic solution, open the file using the OpenCSV library. If it is a manual operation, then open the file in a text editor such as Vim and run a replace command. If it is a batch operation, you can use a perl command to cleanup the CRLFs.

Resources