Collation , convertion and importing from ssis with german chars - sql-server

My table is in SQL and contains only nvarchar columns.
The columns collation is set to Latin1_General_CI_AS . I want to import from SSIS, using flat file connection, some data that contains german charachters like Ö, Ü, etc.
In the flat file connection manager i have set the Code page:65001 (UTF-8) and the locale Germany, like in the attachement below and in the preview i have the data accordingly (Ex: Nürnberg).
However, when I execute the task and check the data in SQL table , it appears Nürnberg.
Am I missing something in this process?

I am assuming that your file is not Unicode. Since you get the "cannot convert between unicode and non-unicode string data types." error when using the 1252 ANSI code page, and you can change the datatype of your columns. I would just convert them to varchar. Since varchar is not Unicode the error should stop occurring.
The alternative would might be to do some thing like this:
Import Package Error - Cannot Convert between Unicode and Non Unicode String Data Type, but I think it would be easier to use varchar columns.

Related

SSIS does not recognize Arabic Characters

My source SQL database contains Arabic characters and the column is an Xml data type. I am trying to get the data flown from source sql to destination sql db. However, when the data flow happens, the column which contains Arabic characters turns into gibberish (unrecognizable characters - not question marks).
How do I transition data correct from the source table to destination table. I have tried using Collate Arabic_CI_AI_KS_WS and used data conversion to add NTEXT and it's still not working.
Appreciate any leads to the direction in resolving this issue.
Thank you

SSIS (ASCII needed): "Code page is 1252 and is required to be 20127"

I have a requirement to export a database to a tab-delimited file in the ASCII format. I am using derived columns to convert any Unicode strings to non-Unicode strings. For example, a former Unicode text stream is now casted as this:
(DT_TEXT,20127)incomingMessage
But SSIS is still looking for ANSI. I am still seeing an error at the Flat File Destination:
The code page on input column <column_name> is 1252 and is required to be 20127.
This happens for any column in the table, not just Unicode ones.
This is what I have been doing to ensure ASCII is used:
In the Flat File Connection Manager, used Code page "20127 (US-ASCII)"
Used a Derived Column to cast data types
In the OLE DB source, set the default code page to 20127
Any thoughts?
How about using the Data Conversion task? Connect the Flat File task to the Data Conversion and then change the metadata on the fly to suit your needs. You should be able to delete the derived column task if you change the metadata to handle the unicode issues in the data conversion task. Then you can process the records accordingly into the OLE DB Source without issues.

Unicode conversion, database woes (Delphi 2007 to XE2)

Currently, I am in the process of updating all of our Delphi 2007 code base to Delphi XE2. The biggest consideration is the ANSI to Unicode conversion, which we've dealt with by re-defining all base types (char/string) to ANSI types (ansichar/ansistring). This has worked in many of our programs, until I started working with the database.
The problem started when I converted a program that stores information read from a file into an SQL Server 2008 database. Suddenly simple queries that used a string to locate data would fail, such as:
SELECT id FROM table WHERE name = 'something'
The name field is a varchar. I found that I was able to complete the query successfully by prefixing the string name with an N. I was under the impression that varchar could only store ANSI characters, but it appears to be storing Unicode?
Some more information: the name field in Delphi is string[13], but I've tried dropping the [13]. The database collation is SQL_Latin1_General_CP1_CI_AS. We use ADO to interface with the database. The connection information is stored in the ODBC Administrator.
NOTE: I've solved my actual problem thanks to a bit of direction from Panagiotis. The name we read from our map file is an array[1..24] of AnsiChar. This value was being implicitly converted to string[13], which was including null characters. So a name with 5 characters was really being stored as the 5 characters + 8 null characters in the database.
varchar fields do NOT store Unicode characters. They store ASCII values in the codepage specified by the field's collation. SQL Server will try to convert characters to the correct codepage when you try to store Unicode or data from a different codepage. You can disable this feature but the best option is to avoid the whole mess by using nvarchar fields and UnicodeString in your application.
You mention that you changes all character types to ANSI, not UNICODE types in your application. If you want to use UNICODE you should be using a UNICODE type like UnicodeString. Otherwise your values will be converted to ANSI when they are sent to your server. This conversion is done by your code when you create the AnsiString that is sent to the server.
BTW, your select statement stores an ASCII value in the field. You have to prepend the value with N if you want to store it as a unicode value, eg.g
SELECT id FROM table WHERE name = N'something'
Even this will not guarantee that your data will reach the server in a Unicode form. If you store the statement in an AnsiString the entire statement is converted to ANSI before it is sent to the server. If your app makes a wrong conversion, you will end up with mangled data on the server.
The solution is very simple, just use parameterized statements to pass unicode values as unicode parameters and store them in NVarchar fields. It is much faster, avoids all conversion errors and prevents SQL injection attacks.

SQL Server 2008: Collation for UTF-8 code page 65001

There is a need to save an XML in UTF-8 encoding and then use it in T-SQL code to extract data.
Default database collation is SQL_Latin1_General_CP1_CI_AS.
I don't know if it is possible to save and work with UTF-8 data in SQL Server 2008, but I have an idea to use collation with code page of UTF-8 (65001) on the XML column in order to save the data in UTF-8.
Does anybody know if it is possible or have another idea on how to work with UTF-8 data in SQL Server?
If you're dealing with xml data, store it as the xml data type. That should take care of any concerns you have (i.e. how to store it) and you'll save yourself the work of having to convert it to xml when you do work on it (e.g. xpath expressions, xquery, etc).
You can store all Unicode characters in xml or nvarchar columns. It does not matter what collation you use. A handful of rare Chinese characters (from the supplementary plane) may be stored as pairs of nchars (surrogate pairs). But there is no loss of data.
NVARCHAR column should do the job just fine.

How can I recover Unicode data which displays in SQL Server as?

I have a database in SQL Server containing a column which needs to contain Unicode data (it contains user's addresses from all over the world e.g. القاهرة‎ for Cairo)
This column is an nvarchar column with a collation of database default (Latin1_General_CI_AS), but I've noticed data inserted into it via SQL statements containing non English characters and displays as ?????.
The solution seems to be that I wasn't using the n prefix e.g.
INSERT INTO table (address) VALUES ('القاهرة')
Instead of:
INSERT INTO table (address) VALUES (n'القاهرة')
I was under the impression that Unicode would automatically be converted for nvarchar columns and I didn't need this prefix, but this appears to be incorrect.
The problem is I still have some data in this column which appears as ????? in SQL Server Management Studio and I don't know what it is!
Is the data still there but in an incorrect character encoding preventing it from displaying but still salvageable (and if so how can I recover it?), or is it gone for good?
Thanks,
Tom
To find out what SQL Server really stores, use
SELECT CONVERT(VARBINARY(MAX), 'some text')
I just tried this with umlauted characters and Arabic (copied from Wikipedia, I have no idea) both as plain strings and as N'' Unicode strings.
The results are that Arabic non-Unicode strings really end up as question marks (0x3F) in the conversion to VARCHAR.
SSMS sometimes won't display all characters, I just tried what you had and it worked for me, copy and paste it into Word and it might display it corectly
Usually if SSMS can't display it it should be boxes not ?
Try to write a small client that will retrieve these data to a file or web page. Check ALL your code if there are no other inserts or updates that might convertthe data to varchar before storing them in tables.

Resources