SSIS Lower Case Thorn Delimiter Issue - sql-server

I'm using Visual Studio 2013 SSIS and have written a package to ingest a DAT file (in standard Concordance delimited format - with a lower case thorn text delimiter "þ", and pilcrow column delimiter "¶" (ascii 244), and CR/LF row delimiter.
Not sure how this happens, where the delimiter itself gets imported into the SQL 2014 database receiving table:
Any ideas would be greatly appreciated.

In the Flat file connection manager do the following:
Choose {CR}{LF} as Row Delimiter
Choose ¶ as Column Delimiter
Choose þ as Text Qualifier

Related

Flat file connection manager carriage return/line feed discrepancy

I'm upgrading from SQL Server 2008R2 to 2017, and making the same jump with SSIS. There are a number of flat file imports that are picking up files that have carriage return/line feeds ({CR}{LF}) embedded within a column in the row.
The 2008R2 flat file connection manager ignores the embedded {CR}{LF}s that are within a row, but the flat file connection manager in 2017 is treating each {CR}{LF} as a new line. It's the same in an upgraded connection manager or a brand new one that I make from scratch.
In both versions, the connection managers have the same specs:
General Tab
Locale: English (United States)
Unicode: No
Code page: 1252 (ANSI-Latin I)
Format: Delimited
Text qualifier: <none>
Header row delimiter: {CR}{LF}
Header rows to skip: 0
Column names in the first data row: Check
Columns Tab
Row delimiter: {CR}{LF}
Column delimiter: Vertical Bar{|}
The not particularly complicated text file I'm testing with:
row_id|row_data|empty_column|created_by|one_more_field{CR}{LF}
1|random test data||ebrandt|{CR}{LF}
2|Data field with a carriage return{CR}{LF}
and a line feed embedded in it.||ebrandt|
I pasted on the line terminators, just to show that they're there.
On the Columns tab, the Preview window in BIDS 2008R2 shows two rows:
But in 2017, exactly the same file gets broken into three rows:
There isn't bandwidth in this project to rework all the file imports.
Is there a setting that got changed between versions that I can change back? Or is there another trick to this that I'm missing?
Edit: In response to a comment that's been deleted, I would specify a text qualifier if I could, but the files I'm getting don't have any.
In the Flat File Connection Manager you need to set the AlwaysCheckForRowDelimiters property to False.
Your file will then be parsed as before.
This was a change made in 2012 to change behaviour to the following.
By default, the Flat File connection manager always checks for a row
delimiter in unquoted data, and starts a new row when a row delimiter
is found. This enables the connection manager to correctly parse files
with rows that are missing column fields.
See this link for more about it.

Why am I getting 0 rows processed in SSIS?

I am creating a SSIS package using MS Visual Studio 2012 Shell with .Net framework of 4.6.01055. The SSIS package has a Data Flow task with Flat File Source, Data Source Row count, Final Data Set count and OleDb destination tasks. It connects to a SQL Database and I have checked to see that my connection has been tested.
I have a flatfile connection manager which picks up a text file. On the Preview section it only shows the header columns in the flat file connection manager editor. The error message is only at warning level with the following message: [Flat File Source [10]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct. The file itself has a total of 19 rows with the first being the header row.
I have spaces in the header names of the origin file. So on that file I edited to have no spaces on the final column. That did not cure the issue. The last column is a date but I am designating as OutputColumnWidth of 50 and datatype as string[DT_STR]. I have the Row delimiter as {CR}{LF}. I have the column delimiter as {|}. When run the package file name does not change.
In the General section of the editor under locale = English; Unicode is not checked; Code Page = 1252 (ANSI-Latin1); Format = Delimited; Text qualifier = none; Header row delimeter = {CR}{LF} (I have tried just CR or LF as well); Header rows to skip=0 (I have tried 1 as well since there is only one header row); and I have checked Column Names if the first data row.
Why am I not getting data in my preview section? And why is it thinking I only have a header?
It seems to me that your text file does not have a matching EOL marker, and so SSIS never splits the lines (and treats the file as just having one big header).
Try opening the file in a text editor that lets you see the EOL marker. I know that NotePad++ can do this for you.
NotePad++ will also let you change the file's encoding as well, in case that is also a problem.
NB: The problem could also be that you are not specifying a correct column delimiter. If the delimiter you specify in SSIS doesn't match characters in the file, then SSIS will also think that you have a single header row where everything is in the first column.
Just to add to the other answer:
I had the same problem, when i opened the file in notepad, it became clear that there was a trailing empty line at the bottom.
So: make sure the last line of the file actually contains text.

Text Qualifier þ (thorn) in SSIS [duplicate]

I'm trying to read a flat file in SSIS which is in this format
col1 þ col2 þ col 3
I'm using the flatfile connection manager but there is no option for the 'þ' character in the column delimiter section of the connection manager.
What would be the workaround for this? Other than reading the file and replacing the thorn character with a SSIS supported delimiter,
Being a dumb 'merican, I think the lower case thorn character is 0xFE while upper case is 0xDE. This will become important soon.
I created an SSIS package with a Flat File Connection Manager. I pointed it at a comma delimited file that looked like
col 1,col 2,col 3
This allowed me to get the metadata set for the file. Once I have all the columns defined and my package is otherwise good. Save it. Commit it to your version control system. If you're not using version control, shame on you, but then make a copy of your .dtsx file and put it somewhere handy.
Replace the comma delimited file with the a thorn delimited one.
What we're doing
What we're going to do is edit the XML that is our SSIS package by hand to exchange the delimter of a , with a þ. It's a straight forward operation but since you are going off the reservation, it's easy to foul up and then your package won't open up properly in the editor.
How to fix it
If you have the package open, close the package but leave Visual Studio open. Right click on the file and select "View Code".
In an SSIS 2012 package, you'll be looking for
DTS:ColumnDelimiter="_x002C_"
In a 2008 package,
<DTS:Property DTS:Name="ColumnDelimiter" xml:space="preserve">_x002C_</DTS:Property>
What we're going to do is substitute _x00FE_ (thorn) for _x002C_ (comma). Save the file and then double click to open it back up.
Your connection manager should now show the thorn symbol on the Columns tab.
Interestingly enough, after you open the package, if you go back into the Code, the editor will have swapped the thorn character into the file in place of the hexagonal character code. Weird.

SSIS Unknow delimiter Encountered

I am currently working on a ssis project i have a data like this
col1þcol2þcol3þcol4
i have followed steps from here How to read a flatfile with lowercase thorn as the delimiter
but the data looks like this ?
col1þcol2þcol3þcol4 on loading in ssis Flat file connection manager !
What should be delimiter in this case ?
The column delimiter drop down menu in the connection manager just lists the most common options but you can paste whatever character you need, even if it's not on the list.
You can then just paste þ and it should work.
þ is a thorn delimiter
then you need to change the xml code (goto view-->code)
DTS:ColumnDelimiter="_x002C_"
or as said by #jayvee paste þ or try pasting þ

SQL Server CSV extract of tables with newline, doublequotes and commas in columns?

I extracted some 10 tables in CSV with " as the text qualifier. Problem is my extract does not look right in Excel because of special characters in a few columns. Some columns are breaking into a new row when it should stay in the column.
I've been doing it manually using the management studio export feature, but what's the best extract the 10 tables to CSV with the double quote qualifier using a script?
Will I have to escape commas and double quotes? Best way to do this?
How should I handle newline codes in my columns, we need them for migration to a new system, but the PM wants to open the files and make modifications using Excel. Can they have it both ways?
I understand that much of the problem is that Excel is interpreting the file where a load utility into another database might not do anything special with new line, but what about double quotes and commas in the data, if I don't care about excel, must I escape that?
Many Thanks.
If you are using SQL Server 2005 or later, the export wizard will export the excel file out for you.
Right click the database, select Tasks-> Export Data...
Set the source to be the database.
Set the destination to excel.
At the end of the wizard, select the option to create an SSIS package. You can then create a job to execute the package on a schedule or on demand.
I'd suggest never using commas for your delimiter - they show up too frequently in other places. Use a tab, since a tab isn't too easy to include in Excel tables.
Make sure you never start a field with a space unless you want that space in the field.
Try changing your text lf's into the literal text \n. That is:
You might have:
0,1,"Line 1
Line 2", 3
I suggest you want:
0 1 "Line 1\nLine 2" 3
(assuming the spacing between lines are tabs)
Good luck
As far as I know, you cannot have new line in csv columns. If you know a column could have comma, double quotes or new line, then you can use this SQL statement to extract the value as valid csv
SELECT '"' + REPLACE(REPLACE(REPLACE(CAST([yourColumnName] AS VARCHAR(MAX)), '"', '""'), char(13), ''), char(10), '') + '"' FROM yourTable.

Resources