German Umlaut hash - SHA256 on SQL server - sql-server

I am facing a problem when applying SHA256 hash to German Umlaut Characters.
--Without Umlaut
SELECT CONVERT(VARCHAR(MAX), HASHBYTES('SHA2_256','o'), 2) as HASH_ID
Sql server Output 65C74C15A686187BB6BBF9958F494FC6B80068034A659A9AD44991B08C58F2D2
This is matching to the output in
https://www.pelock.com/products/hash-calculator
--With Umlaut
SELECT CONVERT(VARCHAR(MAX), HASHBYTES('SHA2_256','ö'), 2)
Sql server Output B0B2988B6BBE724BACDA5E9E524736DE0BC7DAE41C46B4213C50E1D35D4E5F13
Output from pelock: 6DBD11FD012E225B28A5D94A9B432BC491344F3E92158661BE2AE5AE2B8B1AD8
I want the SQL server output to match to pelock. I have tested outputs from other sources (Snowflake and python) and all of it aligns with output from pelock. Not sure why SQL server is not giving the right result. Any help is much appreciated.

You have two issues:
The literal text itself is being reinterpreted, because you have the wrong database collation. You can use the N prefix to prevent that, but this leads to a second problem...
The value from pelock is UTF-8, but using N means it will be UTF-16 nvarchar.
So you need to use a UTF-8 binary collation, the N prefix and cast it back to varchar.
SELECT CONVERT(VARCHAR(MAX), HASHBYTES('SHA2_256',CAST(N'ö' COLLATE Latin1_General_100_BIN2_UTF8 AS varchar(100))), 2)
Result
6DBD11FD012E225B28A5D94A9B432BC491344F3E92158661BE2AE5AE2B8B1AD8
db<>fiddle
UTF-8 collations are only supported in SQL Server 2019 and later. In older version you would need to find a different collation that deals with the characters you have. It may not be possible to find a collation that deals with all of your data.

Related

characters appearing incorrectly even with Unicode source and destination (SSIS)

I am having a codepage unicode/non unicode problem and need expertise to understand it.
In SSIS I am reading data in from a UTF8 encoded text file. The datatypes are all DT_WSTR (unicode string). The destination is NVARCHAR which is also unicode.
Non standard characters such as Ú are not being encoded correctly )appearing as a black box question mark).
If the character appears correctly in the input file, the source is set to DT_WSTR & the destination is nvarchar, why is the character not rendering correctly?
I have tried setting the codepage of the source column to 65001, but in SSIS its only possible to change the codepage on a STR (non unicode) type.
Id appreciate any help in understanding why all unicode fields still cant store a unicode value correctly.
Update from the OP comments
It seems my output is ok if i use Unicode types end to end (input is DT_WSTR, destination column is nvarchar & when extracting again to text, output column is DW_WSTR. The only issue is sql server management studio, which does not seem to be able to render unicode characters correctly in the results of a query, when setting output to grid or text. this is a red herring and the process overall works without issue if this is ignored
Trying to figure out the issue
There is not problem importing unicode characters from flat files to SQL Server destination, the only thing you have to do is the set the flat file encoding as unicode, and the result columns must be NVARCHAR. Based on your question, it looks like you have met the requirements so i can say that:
Unicode Character are imported successfully to SQL Server, but for some reasons SQL Server Management Studio cannot show unicode characters in a grid Results, to check that data is imported correctly, change change the result view to Result To Text.
GoTo Tools >> Options >> Query Results >> Results To Text
In the second reference link i provided they mentioned that:
If you use SSMS for your queries, change to output type from "Grid" to "Text", because depending on the font the grid can't show unicode.
Or you can try to change the Grid Results font, (on my machine, i use Tahoma font and it shows unicode characters normally)
Experiments
You can perform the following test (taken from the links below)
SET NOCOUNT ON;
CREATE TABLE #test
( id int IDENTITY(1, 2) NOT NULL Primary KEY
,Uni nvarchar(20) NULL);
INSERT INTO #test (Uni) VALUES (N'DE: äöüßÖÜÄ');
INSERT INTO #test (Uni) VALUES (N'PL: śćźłę');
INSERT INTO #test (Uni) VALUES (N'JAP: 言も言わずに');
INSERT INTO #test (Uni) VALUES (N'CHN: 玉王瓜瓦甘生用田由疋');
SELECT * FROM #test;
GO
DROP TABLE #test;
Try the following query using Result as Grid and Result as Text options.
References
SQL Server 2012 not showing unicode character in results
sql server 2008 not showing and inserting unicode characters!
Import UTF-8 Unicode Special Characters with SQL Server Integration Services
Microsoft SQL Server Management Studio - query result as text

Why can I store an Ukrainian string in a varchar column?

I got a little surprised as I was able to store an Ukrainian string in a varchar column .
My table is:
create table delete_collation
(
text1 varchar(100) collate SQL_Ukrainian_CP1251_CI_AS
)
and using this query I am able to insert:
insert into delete_collation
values(N'використовується для вирішення квитки')
but when I am removing 'N' it is showing ?????? in the select statement.
Is it okay or am I missing something in understanding unicode and non-unicode with collate?
From MSDN:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
UPDATE:
Please see a similar questions::
What is the meaning of the prefix N in T-SQL statements?
Cyrillic symbols in SQL code are not correctly after insert
sql server 2012 express do not understand Russian letters
To expand on MegaTron's answer:
Using collate SQL_Ukrainian_CP1251_CI_AS, SQL server is able to store ukrainian characters in a varchar column by using CodePage 1251.
However, when you specify a string without the N prefix, that string will be converted to the default non-unicode codepage before it is sent to the database, and that is why you see ??????.
So it is completely fine to use varchar and collate as you do, but you must always include the N prefix when sending strings to the database, to avoid the intermediate conversion to default (non-ukrainian) codepage.

Why I can insert non-ascii characters into VARCHAR column and correctly get it back?

Below is my code sample.
DECLARE #a TABLE (a VARCHAR(20));
INSERT #a
(a)
VALUES ('中');
SELECT *
FROM #a;
I'm using SQL Server Management Studio to run it. My question is, why I can insert non-ascii characters into VARCHAR column and correctly get it back? As I understand, VARCHAR type is only for ascii characters and the NVARCHAR is for unicode characters. Anyone can help to explain it please? I'm on Windows 7 with SQL Server 2014 developer edition.
The codepage used to store the varchar data varies by DB collation.
https://msdn.microsoft.com/en-us/library/ms189617.aspx
Varchar is 8 bits, so you may have a different collation, or you may have gotten lucky on where your character falls on the code set
You can find the ASCII and Extended ASCII characters below.
ASCII
Extended ASCII
I don't believe '中' is an ASCII character.
www.asciitable.com

SQL Server Cyrillic Writing '?????'

I am running a SQL Server 2008 R2 and I have a database containing multilingual words.
For Cyrillic words I only see '???????'
The data type is nvarchar(255), the collection is SQL_Latin1_General_CP1_CI_AS (which was my default)
I have no idea what else can I do, any idea??
When you add data to the nvarchar column, use the prefix N
Insert into table(nvarchar_col)
select N'your Cyrillic words'

Allow special characters SQL Server 2008

I am using SQL Server 2008 express edition and its collation settings are set to default.I wish to store special characeters like á ,â ,ã ,å ,ā ,ă ,ą ,ǻ in my database but it converts them into normal characters like 'a'. How can I stop SQL Server from doing so?
Make sure that your columns are using the type nvarchar(...), rather than varchar(...). The former is Unicode, the latter is ASCII.
Also, make sure that your database default collation is set to Accent Sensitive, and that your columns are stored that way. You may also want to check your instance default collation, as that affects the default collation for your system databases, particularly tempdb.
Rahul, here is a very simple query that runs perfectly on SQL 2005 and 2008:
Query
DECLARE #t1 TABLE (
Col1 nvarchar(30)
)
INSERT INTO #t1 VALUES (N'á ,â ,ã ,å ,ā ,ă ,ą ,ǻ')
SELECT * FROM #t1
Result
Col1
------------------------------
á ,â ,ã ,å ,ā ,ă ,ą ,ǻ
There is nothing special here. No collation change from default, just a simple NVARCHAR column.
You said you are "just running direct queries in the database". Can you try this query and see if you get the same results?

Resources