Storing Emojis in SQL Tables

Storing Emojis in SQL Tables - sql-server

I am working with a SQL Server 2008 database on a Windows 2008 Server. Anytime I try to store an emoji in my table it converts it to a weird looking box. When I try to store the same emoji in SQL Server 2012 it stores the emoji fine. Is it not possible to store emojis correctly in SQL Server 2008? I really cannot update at this point so that would not be an option.

What we know based on details from the question and comments on the question:
Column is NVARCHAR
Value is inserted from VB.NET app via stored procedure
App hitting SQL Server 2008 (running on Windows 2008 Server) stores emoji character but "converts it to a weird looking box"
Same app code hitting SQL Server 2012 stores the same emoji character just fine
What we do not know:
How is the character being retrieved in order to determine whether or not it was stored correctly?
Are you viewing it in the app or in SSMS?
If in SSMS, are you connecting to SQL Server 2008 and 2012 using the same SSMS running on the same machine? Or are you using the version of SSMS that came with each version of SQL Server (hence they are not the same program, even if on the same machine)?
Based on the above:
Most likely this is a font issue. I say this due to:
If it were an issue of not supporting Unicode, then you would be seeing two question marks ?? (one for each surrogate character) instead of a single square box.
Emojis are nothing special. They are merely supplementary characters. And there are currently (as of Unicode v 12.0) 72,457 supplementary characters defined (and slots for another 976,119).
Supplementary Characters (emojis or otherwise) can be stored in NCHAR, NVARCHAR, and NTEXT columns without a problem, and without regard to the collation of the column or the current database.
To test this, I executed the following in a database having a default collation of SQL_Latin1_General_CP1_CI_AS, so there is definitely no "supplementary character support" there.
SELECT NCHAR(0xD83D) + NCHAR(0xDE31) AS [ScreamingFace],
NCHAR(0xD83D) + NCHAR(0xDDFA) AS [WorldMap],
NCHAR(0xD83D) + NCHAR(0xDF08) AS [Alchemical Symbol for Aqua Vitae];
It returns:
ScreamingFace WorldMap Alchemical Symbol for Aqua Vitae
😱 🗺 🜈
I see different things in different areas, all due to font differences. The chart below indicates what I am seeing:
LOCATION FONT Screaming World Alchemical Symbol
Face Map for Aqua Vitae
------------ ------------ ---------- ------ ----------------------------
Text Editor Consolas Yes Yes Square box w/ question mark
Grid Results Code2003 Yes Yes Yes
Text Results Courier New Yes Yes Empty square box
Most likely you were using two different versions of SSMS, or at least SSMS on two different computers. In either case, you probably had different fonts mapped to the Grid Results, or were even using Grid Results on one and Text Results on the other.
In the end, if you want to know if data was stored correctly, you need to check the bytes that were stored. To do this, simply convert the string column to VARBINARY(MAX):
SELECT CONVERT(VARBINARY(MAX), string_column)
FROM schema.table;
And compare those results between the 2008 and 2012 systems. More than likely they are (or "were" given that this was almost 2.5 years ago) the same.
For more info on what characters can actually be stored in the various string datatypes in SQL Server (from SQL Server 7.0 through at least SQL Server 2019), please read the following post of mine:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide

Related

What collation should I use for amharic language?

I am using SQL Server 2014. I want to use amharic language in my database. The default collation for the database was Latin1_General_CP1_CI_AS. I changed it to Latin1_General_CI_AS. Both doesn't diplay amharic characters. They show the characters when typing but when comitted they are changed to question marks.
What collation should I use or what am I missing?

I assume your problem might not be in Collation.
Try to use UNICODE data types such as NCHAR and NVARCHAR and you'll see your saved characters.
Collation is only needed for sorting and comparing. Look through the list of collations and choose the most appropriate one.
Originally, you did not tell you are using Full-text search. That requires you to use key word LANGUAGE with name of your language. However, there are only 53 supported languages (see in sys.fulltext_languages) and amharic isn't there.
You have only an option to recreate your Full-text catalog with the neutral word breaker and then re-populate it. Then at least it will recognize words by spaces and punctuation marks.
See more details: https://msdn.microsoft.com/en-us/library/ms142509.aspx?f=255&MSPPError=-2147217396

You can follow the steps on MSDN:
Default Collations in SQL Server Setup
In Control Panel, find the Windows system locale name on the Advanced
tab in Regional and Language Options. In Windows Vista, use the
Formats" tab. The following table shows the corresponding collation
designator to match collation settings with an existing Windows
locale:
So you have to change the setting using the Control panel so as to see the results.

A simple google search brougth me to this list of collations, most of them started with SQL Server 2008R2.
Your collation should be Latin1_General_100_
Some more hints:
The SQL Server has a default collation, which is the standard collation for all new databases and - very important!!! - the standard collation for your temp-database.
Best would be - if this works for you - to install the SQL Server with the appropriate collation.
SQL Server allows to define a default collation on database level too. But this can lead into deep troubles, if you work with CREATE TABLE #table and use WHERE,ORDER BY,GROUP BY or JOIN (any comparison...) using character fields...
Last but not least you can define a collation on statement level too. This means a lot ot typing and rather difficult to read code, but offers the best control...
It is a different issue how the output of a query is displayed. This is an issue of the output window (or any other program you are using to display the values of your queries...). In this case it might be necessary to use the appropriate encoding and character set. This depends on the tools you are working with...

Access cannot filter on Unicode characters after tables migrated to SQL Server

I have moved MS Access 2010 Data to SQL using their tools, and now filtering by Unicode is not working in Access linked tables. I see the linked table column in SQL is "nvarchar" but in Access there is "Unicode compression" set to "No" and I can't change it.

It is my understanding that the "Unicode compression" setting only affects native Access (ACE/Jet) tables and has no effect on ODBC linked tables. Instead, what you likely need to do is change the "Collation" setting of the SQL Server database itself by using SQL Server Management Studio:
For example, with the above SQL Server collation setting ("SQL_Latin1_General_CP1_CI_AS") I cannot filter on Greek characters (e.g., 'γιορτή') from Access, but if I change the collation of the SQL Server database to "Greek_CI_AS" then that same Access filter will work.
Edit re: comments
While this solution will work for single-byte code pages that are natively supported by SQL Server (e.g., Greek, which would correspond to Windows-1253), it won't work for languages that lack those code pages and must be represented either by
a code page that is not supported by SQL Server, or
Unicode.
ODBC linked tables in Access apparently do not fully support Unicode, passing search strings to SQL Server as 'text', not as N'text', so SQL Server feels compelled to interpret any such text according to the selected single-byte code page (via the "Collation") setting.

Not Storing Decimal Point in SQL Server 2008 R2 Express

I am in the UK. I have a Windows 2008 server in Germany with SQL Server 2008 R2 Express installed on it. The regional settings for Windows are set to UK. SQL Server language is set to English. When I run
sp_helplanguage ##LANGUAGE
it shows that it is set to
us_english
I have numerous tables in the database that have float datatypes in them. When I use SSMS to change one of the float values, if I type in
1234.1234
firstly it displays as
1234,1234
then when I click off the row to save it, it displays as
12341234
The ASP.NET application, that is being served the data via a stored procedure and being put into a double (VB.NET), does a
String.Format("{0:#0.0000}", dMyDouble)
This renders as
12341234,0000
Needless to say, on my local server (all UK based) all database entered numbers display as I would expect (1234.1234) and .NET formats them as I would expect (1234.1234). I am aware that my European friends use a different notation to us in the UK, but I need the UK format to be output - and more importantly the float in the database!
The fact that .NET is not formatting correctly, I imagine is purely to do with the fact that the number is not storing the accuracy.
I have also played around a bit by using a decimal column (18, 4) and I get a similar outcome. As I type 1234.1234 using the point (.) it replaces with a comma (,). When the row is saved, it saves as
12341234.0000

You must set the LANGUAGE on your session to the desired setting.
But, despite the fact that you did not show any C# code, the fact that you're seeing formatting issues it means you're passing the values as text in SQL Commands. You should be using #variables instead. This would have the side benefit of avoiding SQL injection issues.

Moving from text to varchar(MAX): Are there any troubles to expect with MS Access?

It is well-known that MS Access applications (MDBs) using SQL Server backends have trouble with certain data types. For example,
bit field support is broken: NULL values lead to strange errors,
decimal field support is broken: Sorting does not work correctly,
etc.
We are now considering to move from text/ntext fields to varchar(MAX)/nvarchar(MAX) fields, as recommended by Microsoft:
ntext, text, and image data types will be removed in a future version of Microsoft SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.
Are we going to run into trouble doing that?

I know this is an older post, but I think it is still relevant to some folks. I deal a lot with legacy data that is scaled up from Access Memo fields to SQL and then turned into a link table in Access.
I have found that scaling to NVARCHAR(max) does cause issue within the link tables. Depending on which driver you are building the Access Link table with, the problem varies.
Using SQL Native Client 10 my first finding is that Access treats the field as a NVARCHAR(4000). While using SQL Server as the driver does change the issues, there are still issues. With this older driver the issues seem to be harder to track down but do show up. Usually with a similar sizing problem.
Beware, what seems to be running ok, may in fact just be running correctly because the right circumstance has not been hit yet.
If you find that your field data never requires more than 4000 characters, then make it a NVARCHAR(4000). To set at MAX is over kill if you only need 4000 anyway.

We are in the same situation: MS-Access front end, SQL Server back end. We are already creating all new fields as nvarchar(max) instead of ntext, without any problem on the user side. As we do not use either text or image field types, I cannot say anything on them.

At work, we have the same setup as well (Access 2003 frontend, SQL Server 2005 backend) and we did exactly what you are asking about:
We had SQL Server tables with text/ntext columns, and we changed them to varchar(max)/nvarchar(max).
We didn't experience any problems at all, if I remember it correctly we didn't even have to re-link the tables in Access...it just worked.

We definitely discovered a pitfall recently. The problem is that one couldn't write more than 8000 bytes to a varbinary field of a linked table, even if the field is defined as varbinary(MAX).
Proof: varbinary(MAX) on linked tables

This is an addendum to the answer by JStevens.
The newer ODBC drivers for SQL Server limit VARCHAR(MAX) to 8000 characters. Trying to enter more text via a linked ODBC table results in this ODBC error:
[Microsoft][ODBC Driver 17 for SQL Server]String data, right truncation (#0)
It works with the ancient {SQL Server} driver, or with data type TEXT.
And surprisingly, it also works with NVARCHAR(MAX) !
These findings are with Access 2010 or 2016, and SQL Server 2008 R2.
+--------------------+--------------+---------------------------------+
| Data type \ Driver | {SQL Server} | {ODBC Driver 17 for SQL Server} |
+--------------------+--------------+---------------------------------+
| VARCHAR(MAX) | ok | ODBC Error |
| NVARCHAR(MAX) | didn't try | ok |
| TEXT | ok | ok |
+--------------------+--------------+---------------------------------+
So you have to pick your poison if you need to insert more data.
{SQL Server} wasn't an option for me, e.g. because it doesn't support the DATE data type.
So I stick with TEXT and hope that "ntext, text, and image data types will be removed in a future version of SQL Server." is just an empty threat.
Necro edit: NVARCHAR(MAX) doesn't seem to have the 8000 (or 4000) character limit with the new ODBC drivers.

Character set issues with Oracle Gateways, SQL Server, and Application Express

I am migrating data from a Oracle on VMS that accesses data on SQL Server using heterogeneous services (over ODBC) to Oracle on AIX accessing the SQL Server via Oracle Gateways (dg4msql). The Oracle VMS database used the WE8ISO8859P1 character set. The AIX database uses WE8MSWIN1252. The SQL Server database uses "Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data" according to sp_helpsort. The SQL Server databases uses nchar/nvarchar or all string columns.
In Application Express, extra characters are appearing in some cases, for example 123 shows up as %001%002%003. In sqlplus, things look ok but if I use Oracle functions like initcap, I see what appear as spaces between each letter of a string when I query the sql server database (using a database link). This did not occur under the old configuration.
I'm assuming the issue is that an nchar has extra bytes in it and the character set in Oracle can't convert it. It appears that the ODBC solution didn't support nchars so must have just cast them back to char and they showed up ok. I only need to view the sql server data so I'm open to any solution such as casting, but I haven't found anything that works.
Any ideas on how to deal with this? Should I be using a different character set in Oracle and if so, does that apply to all schemas since I only care about one of them.
Update: I think I can simplify this question. SQL Server table uses nchar. select dump(column) from table returns Typ=1 Len=6: 0,67,0,79,0,88 when the value is 'COX' whether I select from a remote link to sql server, cast the literal 'COX' to an nvarchar, or copy into an Oracle table as an nvarchar. But when I select the column itself it appears with extra spaces only when selecting from the remote sql server link. I can't understand why dump would return the same thing but not using dump would show different values. Any help is appreciated.

There is an incompatibility between Oracle Gateways and nchar on that particular version of SQL Server. The solution was to create views on the SQL Server side casting the nchars to varchars. Then I could select from the views via gateways and it handled the character sets correctly.

You might be interested in the Oracle NLS Lang FAQ

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight