SHA256 base 64 hash generation in SQL Server - sql-server

I need to generate a SHA256 base 64 hash from a table in SQL server but I can't find that algorithm in the list HASHBYTES arguments.
Is there a way to generate it directly in SQL Server?
Duplicate disclamer:
My question is not duplicate of SHA256 in T-sql stored procedure as I am looking for the SHA256 base 64 version of the algorithm which is not listed in the page.
Numeric Example
I have this query result in SQL Server
Start date,End date,POD,Amount,Currency
2016-01-01,2016-12-31,1234567890,12000,EUR
this give me the following string (using concatenate function)
2016-01-012016-12-31123456789012000EUR
whit this convertion tool I get the following hash
GMRzFNmm90KLVtO1kwTf7EcSeImq+96QTHgnWFFmZ0U
that I need to send to a customer.

First, the generator link you provided outputs the base64 representation in not exactly correct format. Namely, it omits the padding sequence. Though theoretically optional, padding is mandatory in MS SQL Server (tested on 2012 and 2016 versions).
With this in mind, the following code gives you what you need:
declare #s varchar(max), #hb varbinary(128), #h64 varchar(128);
select #s = '2016-01-012016-12-31123456789012000EUR';
set #hb = hashbytes('sha2_256', #s);
set #h64 = cast(N'' as xml).value('xs:base64Binary(sql:variable("#hb"))', 'varchar(128)');
select #hb as [BinaryHash], #h64 as [64Hash];
Apart from the aforementioned padding, there is another caveat for you to look for. Make sure that the input string is always of the same type, that is, either always varchar or always nvarchar. If some of your hashes will be calculated from ASCII strings and some from UTF-16, results will be completely different. Depending on which languages are used in your system, it might make sense to always convert the plain text to nvarchar before hashing.

Related

HASHBYTES 'SHA1' return value that differ to standard SHA1 implementation

I am computing HASH value of each row in a table (for diffing purpose), after implementing the algorithm I am testing the results.
Results are consistent and algorithm somewhat seems to work, but testing it step by step I found a strange result.
The script:
SELECT HASHBYTES('SHA1', (SELECT INNERTBL.VALUT FOR XML RAW)) as KHASH
FROM ACLING AS INNERTBL
Should perform the SHA1 calculation on the table key, but when I perform the same calculation with external tool I get different results:
In fact when I perform SHA1('<row VALUT="A"/>') with external tool (tool here: https://emn178.github.io/online-tools/sha1.html) I get a different result:
So my question is, there is something wrong with my logic or simply SQL Server use some non standard SHA1 "parametrization"? (I have suspect about the use of a, may be standard but particular, padding scheme)
Example in fiddler: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=efa4e0ba11c112f54e36afb5d54d2cce
SELECT HASHBYTES('SHA1','<row VALUT="A"/>'), --- you are testing this
HASHBYTES('SHA1',N'<row VALUT="A"/>') -- ..but for xml returns Nvarchar
It is important to remember that you get the same result only if the string are binary the same. For example if the two strings uses different characterset they will have different HASH value. For more details please check thsi out https://security.stackexchange.com/questions/18290/is-sha-1-hash-always-the-same

Get short hash value from the HASHBYTES('SHA1', text) independent on the SQL Server version?

For the purpose of getting the content-derived key of a longer text, I do calculate HASHBYTES('SHA1', text). It returns 20 bytes long varbinary. As I know the length of the result, I am storing it as binary(20).
To make it shorter (to be used as a key), I would like to follow the Git idea of a short hash -- as if the first (or last) characters of the hexadecimal representation. Instead of characters, I would like to get binary(5) value from the binary(20).
When trying with the SQL Server 2016 it seems that the following simple way:
DECLARE #hash binary(20) = HASHBYTES('SHA1', N'příšerně žluťoučký kůň úpěl ďábelské ódy')
DECLARE #short binary(5) = #hash
SELECT #hash, #short
Returns the leading bytes (higher order bytes):
(No column name) (No column name)
0xE02C3C55FBA0DF13ADA1B626B1E31746D57B4602 0xE02C3C55FB
However, the documentation (https://learn.microsoft.com/en-us/sql/t-sql/data-types/binary-and-varbinary-transact-sql?view=sql-server-ver15) warns that:
Conversions between any data type and the binary data types are not guaranteed to be the same between versions of SQL Server.
Well, this is not exactly a conversion. Still, does this uncertainty hold also for getting shorter version of binary from the longer version of binary? What should I expect for future versions of SQL Server?

SQL Server appears to correctly interpret non-supported string literal formats

Have an instance of SQL Server 2012 that appears to correctly interpret string literal dates whose formats are not listed in the docs (though note these docs are for SQL Server 2017).
Eg. I have a TSV with a column of dates of the format %d-%b-%y (see https://devhints.io/datetime#date-1) which looks like "25-FEB-93". However, this throws type errors when trying to copy the data into the SQL Server table (via mssql-tools bcp binary). Yet, when testing on another table in SQL Server, I can do something like...
select top 10 * from account where BIRTHDATE > '25-FEB-93'
without any errors. All this, even though the given format is not listed in the docs for acceptable date formats and it apparently also can't be used as a castable string literal when writing in new records. Can anyone explain what is going on here?
the given format is not listed in the docs for acceptable date formats
That means it's not supported, and does not have documented behavior. There's lots of strings that under certain regional settings will convert due to quirks in the parsing implementation.
It's a performance-critical code path, and so the string formats are not rigorously validated on conversion. You're expected to ensure that the strings are in a supported format.
So you may need to load the column as a varchar(n) and then convert it. eg
declare #v varchar(200) = '25-FEB-93'
select convert(datetime,replace(#v,'-',' '),6)
Per the docs format 6 is dd mon YY, but note that this conversion "works" without replacing the - with , but that's an example of the behavior you observed.

Calculate MD5 for a long string

When calling HASHBYTES with long string I am getting
Msg 8152, Level 16, State 10, Line 11
String or binary data would be truncated.
I am trying to calculate the MD5 hash for multiple fields together so I can compare objects,
Is there anyway around this?
Assuming you're using SQL Server 2008 or above, use the CHECKSUM function.
https://msdn.microsoft.com/en-us/library/ms189788.aspx
CHECKSUM computes a hash value, called the checksum, over its list of arguments. The hash value is intended for use in building hash indexes. If the arguments to CHECKSUM are columns, and an index is built over the computed CHECKSUM value, the result is a hash index. This can be used for equality searches over the columns.
CHECKSUM returns an error if any column is of noncomparable data type. Noncomparable data types are text, ntext, image, XML, and cursor, and also sql_variant with any one of the preceding types as its base type.
As #TimBiegeleisen said. SQL Server has an 8k bytes limitation on HASHBYTES.
However, it looks like that SQL Server 2016 and forward don't have this limitation.
For SQL Server 2014 (12.x) and earlier, allowed input values are
limited to 8000 bytes.
https://learn.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-2017

Some questions about HierarchyId (SQL Server 2008)

I am a newbie in SQL Server 2008 and just got introduced to HierarchyId's.
I am learning from SQL Server 2008 - HIERARCHYID - PART I. So basically I am following the article line by line and while practicing in SSMS I found that for every ChildId some hexadecimal values are generated like 0x,0x58,0x5AC0 etc.
My questions are
What are these hexadecimal values?
Why are these generated and what is their use? I mean where can I use those hexa values?
Do we have any control over those hexa values? I mean can we update etc.
How to determine the hierarchy by looking into those hexa values.. I mean how can I determine which is the parent and which is the child?
Those hex values are simply a binary representation of the hierarchy level. In general, you should not use them directly.
You may want to check out the following example, which I think should be self-explanatory. I hope it will get you going in the right direction.
Create a table with a hierarchyid field:
CREATE TABLE groups (
group_name nvarchar(100) NOT NULL,
group_hierarchy hierarchyid NOT NULL
);
Insert some values:
INSERT INTO groups (group_name, group_hierarchy)
VALUES
('root', hierarchyid::Parse('/')),
('domain-a', hierarchyid::Parse('/1/')),
('domain-b', hierarchyid::Parse('/2/')),
('sub-a-1', hierarchyid::Parse('/1/1/')),
('sub-a-2', hierarchyid::Parse('/1/2/'));
Query the table:
SELECT
group_name,
group_hierarchy.ToString()
FROM
groups
WHERE
(group_hierarchy.IsDescendantOf(hierarchyid::Parse('/1/')) = 1);
Adam Milazzo wrote a great article about the innards of hierarchyid here:
http://www.adammil.net/blog/view.php?id=100
In a nutshell, it's not meaningful to work with things in straight hex, but rather convert the numbers out to binary. The reason is that things are not cut up on even byte boundaries. Representing a single node can be as short as 5 bits if it's one of the first four nodes. Becomes longer and longer as more nodes are used, 6 bits each for the next 4 nodes, 7 bits each for the next 8 nodes, and then it jumps to 12 bits each for the next 64 nodes! And then up to 18 bits each for the next 1024.
I needed to convert a database to Postgres, and wrote a script which parses these hex values. You can check out a version I made for AdventureWorks here, search for "hierarchyid":
https://github.com/lorint/AdventureWorks-for-Postgres/blob/master/install.sql
I'll let others address your specific questions, but I will tell you, that, IMO, the HierarchyId in SQL Server 2008 isn't one of Microsoft's greatest contributions to SQL Server. They are complex and somewhat awkward. I think you will find that for many hierarchical needs, common table expressions (CTE) work great.
Randy

Resources