In the Flat File Connection Manager screen there is a checkbox to specify that the file is encoded as Unicode, but there is no way to tell which encoding will be used (UTF-8, UTF-16, ...)
Is there an official Microsoft resource as to which encoding is used?
"Unicode" in Microsoft products tends to be UTF-16LE.
Related
In an SSIS package that I'm writing, I have a CSV file as a source. On the Connection Manager General page, it has 65001 as the Code page (I was testing something). Unicode is not checked.
The columns map to a SQL Server destination table with varchar (among others) columns.
There's an error at the destination: The column "columnname" cannot be processed because more than one code page (65001 and 1252) are specified for it.
My SQL columns have to be varchar, not nvarchar due to other applications that use it.
On the Connection Manager General page I then change the Code page to 1252 (ANSI - Latin I) and OK out, but when I open it again it's back to 65001. It doesn't make a difference if (just for test) I check Unicode or not.
As a note, all this started happening after the CSV file and the SQL table had columns added and removed (users, you know.) Before that, I had no issues whatsoever. Yes, I refreshed the OLE DB destination in the Advanced Editor.
This is SQL Server 2012 and whichever version of BIDS and SSIS come with it.
If it is a CSV file column text stream [DT_TEXT] to SQL varchar(max) data type that you want to convert to, change the flat file Connection Manager Editor property Code page to 1252 (ANSI - Latin I).
65001 Code page = Unicode (UTF-8)
Based on this Microsoft article (Flat File Connection Manager):
Code page
Specify the code page for non-Unicode text.
Also
You can configure the Flat File connection manager in the following ways:
Specify the file, locale, and code page to use. The locale is used to interpret locale-sensitive data such as dates, and the code page is used to convert string data to Unicode.
So when the flat file has a Unicode encoding:
Unicode, UTF-8, UTF-16, UTF-32
Then this property cannot be changed, it will always return to it original encoding.
For more infor about the Code Page identifiers, you can refer to this article:
Code Page Identifiers
I solved this in SSIS through Derived Column Transformation
If it's a csv file, you can still use code page 1252 to process it. When you open the flat file connection manager it shows you the code page for the file, but you don't need to save that setting. If you have other changes to make in the connection manager, change the code page back to 1252 before you save the changes. It will process fine if there are no unicode characters in the file.
I was running into a similar challenge, which is how I ended up on this page looking for a solution. I resolved it using a different approach.
I opened the csv in Notepad++. One of the menu options is called Encoding. If you select that, it will give you the option to "Convert to ANSI."
I knew that my file did not contain any Unicode specific characters.
When I went back to the SSIS package, I edited the flat file connection and it automatically changed it to 1252.
In my case the file was generated in Excel and (mistakenly) saved as CSV UTF-8 (Comma delimited) (*.csv) instead of simply CSV (Comma delimited) (*.csv). Once I saved the file as the correct form of CSV, the code page no longer changed from 1252 (ANSI - Latin I).
While saving .sql files from SQL Server Management Studio in to my local windows folder, it looks to be including some binary characters making AccuRev comparisons impossible. I looked for possible save options and couldn't locate any. and couldn't find any. Any suggestions please?
If you can't tell AccuRev to handle this as UTF-8 files (this sucks - these days, all software should really know about UTF-8 and handle it correctly!), then you might need to do something in SQL Server Management Studio instead.
When you have a SQL statement open and you click on "File > Save", in the "Save" dialog, there is a little down-arrow to the right of the Save button:
If you click that (instead of just clicking on the button itself), you can select "Save with Encoding", which allows you to pick what encoding to use for your files - pick something like the Windows-1252 Western European - that should not have any UTF-8 Byte-Order Mark bytes at the start:
AccuRev does handle UTF-8 character encoding. However, older versions may not have that capability.
Make sure that the file is being saved using UTF-8. Anything else will have binary content and should be typed as such.
When you export sql files from MS SQL Server Management Studio in unicode (by default), it puts a "FF FE BOM" at the front of the file which forces programs to treat it as binary. Exporting as ANSI solved it. Choose "Save as ANSI Text".
First time user/question:
I have numerous TSV files exported from a computer forensics application encoded as "Unicode (UTF-8)". I created a package using visual studio 2013 and have a flat file connection manager where my code page is 65001 (UTF-8) and advanced settings are all unicode string (dt_wstr). My OLE DB destination is hooked to a table with the same unicode settings, and the component properties have "always use default code page" = 65001 and set to true.
However, the package fails with the error: "The data type for "Flat File Source.Outputs[Flat File Source Output].Columns[MyCOLUMN]" is DT_NTEXT, which is not supported with ANSI files. Use DT_TEXT instead and convert the data to DT_NTEXT using the data conversion component."
I'm puzzled: How does this have anything to do with ANSI? The file was exported encoded as UTF-8. Now, as the error suggests, I can work around this by setting my connection manager advanced properties to Varchar (dt_str), then using a data conversion task to convert each and every field to dt_wstr, but that seems unnecessary.
Thank you.
Ok I have 2 machines. When I open SSMS and write queries in sql file cyrillic works. When I transfer the same sql file to another machine the cyrillic looks like "Âàëåðè". If this problem is related to encoding, how to configure encoding on both machines to be the same ? How to fix it ?
Did you look (in SSMS) under Tools > Options > Environment > International Settings and look to see if there are differences there? If the machines have different options enabled this won't help you as that's a Windows setting, but you can change the SQL setting here to use the Windows setting, choose Same as Microsoft Windows. I'm looking at this in SQL 2014 but I'm fairly sure it's in the same place going back a few editions...
Try to save your file to UNICODE:
"Save as" -> Save with Encoding -> Unicode 1200
I am facing an issue with SSIS where a customer wants a (previously delivered file in UTF-8) to be delivered in ANSI-1252. No big deal i thought. change the file connection manager and done... unfortunately it wasn't that simple. Been stuck on this for a day and clueless on what to try next.
the package itself
IN - OLE DB source with a query. Source database fields are NVARCHAR.
Next i have created a Data conversion block where i convert the incoming DT_WSTR to DT_STR using 1252 codepage.
After that is a outbound file connection destination. The flat file connection is tab delimited using codepage 1252. I have mapped the converted columns to the columns used in this flat file. Below are some screenshots of the connection manager and destination block
Now when i create a new txt file from explorer it will be ANSI (as detected by Notepad++)
When the package runs the file becomes UTF-8 w/o BOM
I have tried experimenting with the checkbox for overwriting as suggested in SSIS - Flat file always ANSI never UTF-8 encoded
as building the project from scratch and experimenting with the data conversion.
Does anyone have a suggestion on what I am missing here? The strange thing is we have a different package with exact the same blocks build previously and it does output an ANSI file (checked the package from top to bottom). However we are getting mixed results on different machines. Some machines will give an ANSI file other the UTF-8 file.
Is this solved already? My idea is to delete the whole Data Flow Task and re-create it. I suppose the metadata is stuck and overwritten at each execution.
I believe you need not to change anything in your ssis package just check your editor setting (notepad++). Go to settings --> Preferences --> new document setting
You need to uncheck the 'Apply to opened ANSI files' checkbox.
Kindly check and let me know if it works for you.