SQL 2005 CSV Import Quote Delimited with inner Quotes and Commas - sql-server

I have a CSV file with quote text delimiters. Most of the 90000 rows are fine, but I have a few rows that have a text field that contains both a quote and a comma. For example the fields value would be:
AB",AB
When Delimited this becomes
"AB"",AB"
When SQL 2005 attempts to import this I get errors such as...
Messages
Error 0xc0202055: Data Flow Task: The column delimiter for column "Column 4" was not found.
(SQL Server Import and Export Wizard)
This only seems to happen when a quote and comma are in a text value together. Values like
AB"AB which becomes "AB""AB"
or
AB,AB which becomes "AB,AB"
work fine.
Here are some example rows...
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224"",E122/261,8 CO","","B","MP11"
The last row is an example of the problem - the "", causes the error.

I've had MAJOR problems with SSIS. Things that Access, Excel and even DTS seemed to do very well, SSIS chokes on. Variable record-length data is another problem but, yes, these embedded qualifiers are a major problem. Especially if you do not have access to the import files because they're on someone else's server that you pay to gain access to and might even be 4 to 5 GB in size! Cant just to a "replace all" on that every import.
You may want to check into this at Microsoft Downloads called "UnDouble" and here is another workaround you might try.
Seems like with SSIS in SQL Server 2008, the bug is still there. I dont know why they havent addressed this in the parser but its like we went back in time with SSIS in basic import functionality.
UPDATE 11-18-2010: This bug still exists in SSIS. Amazing.

How about just:
Search/replace all "", with ''; (fix all the broken fields)
Search/replace all ;''; with ,"", (to "unfix" properly empty fields.)
Search/replace all '';''; with "","", (to "unfix" properly empty fields which follow a correct encapsulation of embedded delimiters.)
That converts your original to:
"1464885","LEVER WM","","B","MP17"
"1465075",":PLT-BC !!NOTE!!","","B",""
"1465076","BRKT-STR MTR !NOTE!","","B",""
"1465172",":BRKT-SW MTG !NOTE!","","B","MP16"
"1465388","BUSS BAR !NOTE!","","B","MP10"
"1465391","PLT-BLKHD ""NOTE""","","B","MP20"
"1465564","SPROCKET:13TEETH,74MM OD,66MM","ID W/.25"" SETSCR","B","MP6"
"S01266330002","CABLE:224'';E122/261,8 CO","","B","MP11"
Which seems to run the gauntlet fine in SSIS. You may have to step 3 recursively to account for 3 empty fields in a row ('';'';'';, etc.) but the bottom line here is that when you have embedded text qualifiers, you have to either escape them or replace them. Let this be a lesson in your CSV creation processes going forward.

Microsoft says doubled double quotes inside double quote delimited fields just don't work. A fix is planned for the end of 2011...
In the mean time we will have to use workarounds like described in the other answers.

I would just do a search/replace for ", and replace it with ,
Do you have access to the original file?

Related

disregard line breaks in field

I'm trying to import data into my MS SQL DB (as a flat file). However, there is a problem with one of the fields: it contains a line break within the data, which leads to the import wizard thinking it's the end-of-line, hence breaking each row into two. I've tried to import the data into excel as well (just to try it out), but it's the same behavior.
Does anyone know how to solve this? Any pre-import mechanism that might massage the data somehow?
(unfortunately, it's not practically possible for me to ask the source system to change the encoding)
//Eva-Lotta
Use to replace new line character in columns having values.
Replace(Replace(columnName,char(13),' '),char(10),' ')
Regards
I've managed to find a work-around! I start with splitting the files into chunks (as they are 3.8 GB in size ...), open them in UltraEdit, loop through them to join the 2 lines together, and import them into excel / my SQL DB. It's not neat, but it has solved my immediate problem ... but thanks for your engagement!

SQL server copy paste no column breaks

I have a very simple query in MS SQL Server 2016. I have been using 'Select All', 'Copy With Headers' and pasting the results into Excel. The results parse out perfectly into the correct columns with headers. I've been doing this for over a year with no problem. For the past 2 days, though, the data is not split into the right columns, but comes over as one long string. There are no spaces, tabs or other delimiters that would allow me to parse the data after pasting it into Excel. Because this was working fine before, I have to wonder if I somehow set a default incorrectly somewhere. I will have to default to saving to a .csv file and then importing, which works fine but is just annoying when a simple cut and paste was working before.
If you use the text to columns feature in excel before the copy and paste from SQL, the pasted format will remember the settings you selected for the delimiter and use this.
If you run the text to columns now, and set it to be comma, and then try pasting your results to see if this fixes your issue.
What I am assuming is what #Tim Mylott has suggested, and that you currently have set to tabs or something that doesn't display 'user friendly'. Tabs don't appear when copied and pasted in the string format, but they are there behind the scenes, so the TTC feature will still work.
I had the same problem, and this solution worked perfectly. My original pasted data became space-delimited when pasting into Excel, so I highlighted the first column, ran the Text-to-Columns wizard and deselected space and comma, leaving only tab. Then deleted out the original data and re-pasted it into Excel. No spaces!

How to retrieve the name of a file and store it in the database using SSIS package?

I'm doing an Excel loop through fifty or more Excel files. The loop goes through each Excel file, grabs all the data and inputs it into the database without error. This is the typical process of setting delay validation to true, and making sure that the expression for the Excel Connection is a string variable called EFile that is set to nothing (in the loop).
What is not working: trying to input the name of the Excel file into the database.
What's been tried (edit; SO changed my 2 to 1 - don't know why):
Add a derived column between the Excel file and database input, and add a column using the EFile expression (so under Expression in the Derived Column it would be #[User::EFile]). and add the empty. However, this inputs nothing a blank (nothing).
One suggestion was to add ANOTHER string variable and set its properties EvaluateAsExpression to True and set the Expression to the EFile variable (#[User::EFile]). The funny thing is that this does the same thing - inputs a blank into the database.
Numerous people on blogs claim they can do this, yet I haven't seen one actually address this (I have a blog and I will definitely be showing people how to do this when I get an answer because, so far, these others have fallen short). How do I grab an Excel file's name and input it in a database during a loop?
Added: Forgot to add, no scripts; the claim is that it can be done without them, so I want to see the solution without them.
Note: I already have the ability to import the data from the Excel files - that's easy (see my GitHub account, as I have two different projects for importing all sorts of txt, csv, xls, xlsx data). I am trying to also get the actual name of the file being imported also into the database. So, if there are fifty Excel files, along with the data in each file, the database will have the fifty file names alongside that data (so if each file has 1000 rows of data, each 1000 rows would also have the name of the file they came from next to them as an additional column). This point seems to cause a lot of confusion, as people assume I'm having trouble importing data in files - NOPE, see my GitHub; again that's easy. It's the FILENAME that needs to also be imported.
Test package: https://github.com/tmmtsmith/SSISLoopWithFileName
Solution: #jaimet pointed out that the Derived Column needed to be the #[User::CurrentFile] (see the test package). When I first ran the package, I still got a blank value in my database. But when we originally set up the connection, we do point it to an actual file (I call this "fooling the package"), then change the expression on the connecting later to the #[User::CurrentFile], which is blank. The Derived Column, using the variable #[User::CurrentFile], showed a string of 0. So, I removed the Derived Column, put the full file path and name in the variable, then added the variable to the Derived Column (which made it think the string was 91 characters long), then went back and set the variable to nothing (English teacher would hate the THENs about right now). When I ran the package, it inputted the full file path. Maybe, like the connection, it needs to initially think that a file exists in order for it to input the full amount of characters?
Appreciate all the help.
The issue is because of blank value in the variable #[User::FileNameInput] and this caused the SSIS package to assume that the value of this variable will always be of zero length in the Derived Column transformation.
Change the expression on the Derived column transformation from #[User::FileNameInput] to (DT_STR, 2000, 1252)#[User::FileNameInput].
Type casting the derived column to 2000 sets the column length to that maximum value. The value 1252 represents the code page. I assumed that you are using ANSI code page. I took the value 2000 from your table definition because the FilePath column had variable VARCHAR(2000). If the column data type had been NVARCHAR(2000), then the expression would be (DT_WSTR, 2000)#[User::FileNameInput]
Tim,
You're using the wrong variable in your Derived Column component. You are storing the filename in #[User::CurrentFile] but the variable that you're using in your Derived Column component is #[User::FileNameInput]
Change your Derived Column component to use #[User::CurrentFile] and you'll be good.
Hope that helps.
JT
If you are using a ForEach loop to process the files in a folder then I have have used the technique described in SSIS Junkie's blog to get the filename in to an SSIS variable: SSIS: Enumerating files in a Foreach loop
You can use the variable later in your flow to write it to the database.
TO all intents and purposes your method #1 should work. That's exactly how I would attempt to do it. I am baffled as to why it is not working. Could you perhaps share your package?
Tony, thanks very much for the link. Much appreciated.
Regards
Jamie

Fix CSV file with new lines

I ran a query on a MS SQL database using SQL Server Management Studio, and some the fields contained new lines. I selected to save the result as a csv, and apparently MS SQL isn't smart enough to give me a correctly formatted CSV file.
Some of these fields with new lines are wrapped in quotes, but some aren't, I'm not sure why (it seems to quote fields if they contain more than one new line, but not if they only contain one new line, thanks Microsoft, that's useful).
When I try to open this CSV in Excel, some of the rows are wrong because of the new lines, it thinks that one row is two rows.
How can I fix this?
I was thinking I could use a regex. Maybe something like:
/,[^,]*\n[^,]*,/
Problem with this is it matches the last element of one line and the 1st of the next line.
Here is an example csv that demonstrates the issue:
field a,field b,field c,field d,field e
1,2,3,4,5
test,computer,I like
pie,4,8
123,456,"7
8
9",10,11
a,b,c,d,e
A simple regex replacement won't work, but here's a solution based on preg_replace_callback:
function add_quotes($matches) {
return preg_replace('~(?<=^|,)(?>[^,"\r\n]+\r?\n[^,]*)(?=,|$)~',
'"$0"',
$matches[0]);
}
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){5}$~m';
$result=preg_replace_callback($row_regex, 'add_quotes', $source);
The secret to $row_regex is knowing ahead of time how many columns there are. It starts at the beginning of a line (^ in multiline mode) and consumes the next five things that look like fields. It's not as efficient as I'd like, because it always overshoots on the last column, consuming the "real" line separator and the first field of the next row before backtracking to the end of the line. If your documents are very large, that might be a problem.
If you don't know in advance how many columns there are, you can discover that by matching just the first row and counting the matches. Of course, that assumes the row doesn't contain any of the funky fields that caused the problem. If the first row contains column headers you shouldn't have to worry about that, or about legitimate quoted fields either. Here's how I did it:
preg_match_all('~\G,?[^,\r\n]++~', $source, $cols);
$row_regex = '~^(?:(?:(?:"[^"*]")+|[^,]*)(?:,|$)){' . count($cols[0]) . '}$~m';
Your sample data contains only linefeeds (\n), but I've allowed for DOS-style \r\n as well. (Since the file is generated by a Microsoft product, I won't worry about the older-Mac style CR-only separator.)
See an online demo
If you want a java programmatic solution, open the file using the OpenCSV library. If it is a manual operation, then open the file in a text editor such as Vim and run a replace command. If it is a batch operation, you can use a perl command to cleanup the CRLFs.

Is there a way to escape a double quote within a text qualified string on a SSIS Csv import?

I have a CSV I'm trying to import into SQL using SSIS packages through code.
A line might look something like this
321,1234,"SOME MACHINE, MACHINE ACCESSORIES 1 1/2"" - 4"""
In this example they're using a double quote to symbolize inches. They are trying to escape the inches double quote with a double quote. SSIS, however, does not honour this escapism and fails.
Is there anyway I can still use the double quote symbol for inches and escape it within the quoted text?
Many suggestions are to replace the double quote with two single quotes. Is this the only work around or can I use some other escape technique?
I've seen people talk about using the Derived Column transformation but in my case SSIS fails at the Flat File Source step and I therefore cannot get to a derived column transform step.
I'm currently running a script task in the control flow, just before the data flow, to manipulate the Csv with some regex's to cleanup the data.
I need the string to be text qualified with the 2 outer double quotes because of potential commas in the description column.
What can I do about the double quotes within the text qualified string?
Wow, I expected to be able to answer with "Just set the text qualifier", but figured you would have already tried that so I tried it before I answered. Surprise, SSIS doesn't support standard CSV files!
Looks like this is a common complaint. There is one comment in there from Microsoft about some samples that may help; Here is the codeplex project, they mentioned that the Regular Expression Flat File Source sample and the Delimited File Reader Source sample in particular may help -- I'm guessing the Delimited File Reader would be more worthwhile.
I ran into a similar problem yesterday.
We got the csv file that using comma , as delimiter and double quote " as text qualifier, but there is a field that contain double quote within double quote(non-escaped double quote within a string).
After spending half day searching, came up with the solution below:
// load the file into a one dimensional string array.
// fullFilePath is the full path + file name.
var fileContent = File.ReadAllLines(fullFilePath);
// Find double quotes within double quotes and replace with a single quote
var fileContentUpdated = fileContent.Select(
x => new Regex(#"(?<!^)(?<!\,)""(?!\,)(?!$)"
).Replace(x, "'")).ToArray();
// write the string array into the csv file.
File.WriteAllLines(fullFilePath, fileContentUpdated);
I don't see any other way than replace the double quote with something else to avoid the issue.
This answer is not applicable to 2005 as referenced here, but in case someone comes across this while searching and is using 2008, this other question appears to have a working answer: SSIS 2008 and Undouble
There is a workaround if in the File connection you remove the " as text qualifier you can remove all the double quotes later with a derived column expression REPLACE(Item_Name,"\"",""). The downside is that you will need to do it for every field
I didn't find a direct way to achieve this so I wrote a script:
Add a Script component to the workflow (make sure to connect the input arrow or it won't recognize the columns)
Right click on the Script component -> Input Columns, change the column Usage Type to READWRITE
Click Ok
Edit the Script, replace double quotes with two single quotes
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
var descr = Row.Description;
Row.Description = Row.Description.Replace("\"", "''");
}
Probably old news now, but this issue was fixed in SQL Server 2012. I was able to import the same file on a 2012 server that failed on my 2008 server.

Resources