Text Qualifier þ (thorn) in SSIS [duplicate] - sql-server

I'm trying to read a flat file in SSIS which is in this format
col1 þ col2 þ col 3
I'm using the flatfile connection manager but there is no option for the 'þ' character in the column delimiter section of the connection manager.
What would be the workaround for this? Other than reading the file and replacing the thorn character with a SSIS supported delimiter,

Being a dumb 'merican, I think the lower case thorn character is 0xFE while upper case is 0xDE. This will become important soon.
I created an SSIS package with a Flat File Connection Manager. I pointed it at a comma delimited file that looked like
col 1,col 2,col 3
This allowed me to get the metadata set for the file. Once I have all the columns defined and my package is otherwise good. Save it. Commit it to your version control system. If you're not using version control, shame on you, but then make a copy of your .dtsx file and put it somewhere handy.
Replace the comma delimited file with the a thorn delimited one.
What we're doing
What we're going to do is edit the XML that is our SSIS package by hand to exchange the delimter of a , with a þ. It's a straight forward operation but since you are going off the reservation, it's easy to foul up and then your package won't open up properly in the editor.
How to fix it
If you have the package open, close the package but leave Visual Studio open. Right click on the file and select "View Code".
In an SSIS 2012 package, you'll be looking for
DTS:ColumnDelimiter="_x002C_"
In a 2008 package,
<DTS:Property DTS:Name="ColumnDelimiter" xml:space="preserve">_x002C_</DTS:Property>
What we're going to do is substitute _x00FE_ (thorn) for _x002C_ (comma). Save the file and then double click to open it back up.
Your connection manager should now show the thorn symbol on the Columns tab.
Interestingly enough, after you open the package, if you go back into the Code, the editor will have swapped the thorn character into the file in place of the hexagonal character code. Weird.

Related

Failing to convert a CSV file to UTF-8 BOM(w/ Notepad++) in order to migrate it to SQL w/ SSIS and keep regional(Polish) letters

I have a CSV in Polish that I want to get into SQL w/ SSIS.
I open it in Notepad++ and it says UTF 8.
If it doesn't actually say UTF-8-BOM in the status bar then Notepad++ is only guessing the encoding. Try selecting Encoding > Encode in UTF-8-BOM, save the file, then close and reopen it to confirm the change. After saving it with a BOM (Byte Order Mark) try importing it via SSIS again using code page 65001 (UTF-8) setting and see if it works.
#AlwaysLearning
I convert the file as the user above suggested and it now shows UTF8 BOM in the corner. I save it.
So here's the vicious circle:
a) when choosing the CSV in SSIS as UTF-8 in the preview I can see my polish letters properly. Until I hit Run. Then I get this error:
Error at Data Flow Task [SQL Server Destination [9]]: The column "ColumnName" cannot be processed because more than one code page (65001 and 1252) are specified for it.
I get it for each column.
b) When I change the filetype in the connection manager to 1252, I can immediately see in the preview that my Polish letters are lost. But now running it works like a charm and I get no errors.
Screenshot1
Screenshot2
Here's what I've tried:
Changing to 1250, 65001 etc
Ticking Unicode
changing Locale to Polish, polish(Poland), English
googling
searching stack
Posting This question to stack:

Flat file connection manager carriage return/line feed discrepancy

I'm upgrading from SQL Server 2008R2 to 2017, and making the same jump with SSIS. There are a number of flat file imports that are picking up files that have carriage return/line feeds ({CR}{LF}) embedded within a column in the row.
The 2008R2 flat file connection manager ignores the embedded {CR}{LF}s that are within a row, but the flat file connection manager in 2017 is treating each {CR}{LF} as a new line. It's the same in an upgraded connection manager or a brand new one that I make from scratch.
In both versions, the connection managers have the same specs:
General Tab
Locale: English (United States)
Unicode: No
Code page: 1252 (ANSI-Latin I)
Format: Delimited
Text qualifier: <none>
Header row delimiter: {CR}{LF}
Header rows to skip: 0
Column names in the first data row: Check
Columns Tab
Row delimiter: {CR}{LF}
Column delimiter: Vertical Bar{|}
The not particularly complicated text file I'm testing with:
row_id|row_data|empty_column|created_by|one_more_field{CR}{LF}
1|random test data||ebrandt|{CR}{LF}
2|Data field with a carriage return{CR}{LF}
and a line feed embedded in it.||ebrandt|
I pasted on the line terminators, just to show that they're there.
On the Columns tab, the Preview window in BIDS 2008R2 shows two rows:
But in 2017, exactly the same file gets broken into three rows:
There isn't bandwidth in this project to rework all the file imports.
Is there a setting that got changed between versions that I can change back? Or is there another trick to this that I'm missing?
Edit: In response to a comment that's been deleted, I would specify a text qualifier if I could, but the files I'm getting don't have any.
In the Flat File Connection Manager you need to set the AlwaysCheckForRowDelimiters property to False.
Your file will then be parsed as before.
This was a change made in 2012 to change behaviour to the following.
By default, the Flat File connection manager always checks for a row
delimiter in unquoted data, and starts a new row when a row delimiter
is found. This enables the connection manager to correctly parse files
with rows that are missing column fields.
See this link for more about it.

Why am I getting 0 rows processed in SSIS?

I am creating a SSIS package using MS Visual Studio 2012 Shell with .Net framework of 4.6.01055. The SSIS package has a Data Flow task with Flat File Source, Data Source Row count, Final Data Set count and OleDb destination tasks. It connects to a SQL Database and I have checked to see that my connection has been tested.
I have a flatfile connection manager which picks up a text file. On the Preview section it only shows the header columns in the flat file connection manager editor. The error message is only at warning level with the following message: [Flat File Source [10]] Warning: The end of the data file was reached while reading header rows. Make sure the header row delimiter and the number of header rows to skip are correct. The file itself has a total of 19 rows with the first being the header row.
I have spaces in the header names of the origin file. So on that file I edited to have no spaces on the final column. That did not cure the issue. The last column is a date but I am designating as OutputColumnWidth of 50 and datatype as string[DT_STR]. I have the Row delimiter as {CR}{LF}. I have the column delimiter as {|}. When run the package file name does not change.
In the General section of the editor under locale = English; Unicode is not checked; Code Page = 1252 (ANSI-Latin1); Format = Delimited; Text qualifier = none; Header row delimeter = {CR}{LF} (I have tried just CR or LF as well); Header rows to skip=0 (I have tried 1 as well since there is only one header row); and I have checked Column Names if the first data row.
Why am I not getting data in my preview section? And why is it thinking I only have a header?
It seems to me that your text file does not have a matching EOL marker, and so SSIS never splits the lines (and treats the file as just having one big header).
Try opening the file in a text editor that lets you see the EOL marker. I know that NotePad++ can do this for you.
NotePad++ will also let you change the file's encoding as well, in case that is also a problem.
NB: The problem could also be that you are not specifying a correct column delimiter. If the delimiter you specify in SSIS doesn't match characters in the file, then SSIS will also think that you have a single header row where everything is in the first column.
Just to add to the other answer:
I had the same problem, when i opened the file in notepad, it became clear that there was a trailing empty line at the bottom.
So: make sure the last line of the file actually contains text.

SSIS Unknow delimiter Encountered

I am currently working on a ssis project i have a data like this
col1þcol2þcol3þcol4
i have followed steps from here How to read a flatfile with lowercase thorn as the delimiter
but the data looks like this ?
col1þcol2þcol3þcol4 on loading in ssis Flat file connection manager !
What should be delimiter in this case ?
The column delimiter drop down menu in the connection manager just lists the most common options but you can paste whatever character you need, even if it's not on the list.
You can then just paste þ and it should work.
þ is a thorn delimiter
then you need to change the xml code (goto view-->code)
DTS:ColumnDelimiter="_x002C_"
or as said by #jayvee paste þ or try pasting þ

Manual import into SQL Server 2000 of tab delimited text file does not format international characters

I have searched for this specific solution and while I have found similar queries, I have not found one that solves my issue. I am manually importing a tab-delimited text file of data that contains international characters in some fields.
This is one such character: Exhibit Hall C–D
it's either an em dash or en dash in between the C & D. It copies and pastes fine, but when the data is taken into SQL Server 2000, it ends up looking like this:
Exhibit Hall C–D
The field is nvarchar and like I said, I am doing the import manually through Enterprise Manager. Any ideas on how to solve this?
The problem is that the encoding between the import file and SQL Server is mismatched. The following approach worked for me in SQL Server 2000 importing into a database with the default encoding (SQL_Latin1_General_CP1_CI_AS):
Open the .csv/.tsv file with the free text editor Notepad++, and ensure that special characters appear normal to start with (if not, try Encoding|Encode in...)
Select Encoding|Convert to UCS-2 Little Endian
Save as a new .csv/.tsv file
In SQL Server Enterprise Manager, in the DTS Import/Export Wizard, choose the new file as the data source (source type: Text File)
If not automatically detected, choose File type: Unicode (in preview on this page, the unicode characters will still look like black blocks)
On the next page, Specify Column Delimiter, choose the correct delimiter. Once chosen, Unicode characters should appear correctly in the Preview pane
Complete import wizard
I would try using the bcputility ( http://technet.microsoft.com/en-us/library/ms162802(v=sql.90).aspx ) with the -w parameter.
You may also want to check the text encoding of the input file.

Resources