SQL Server 2014 Hashbytes of a nvarchar(max) result is nvarchar(max) - sql-server

Using SQL Server 2014 I have a table that has a nvarchar(max) column called [ASCII File] which can contain an ASCII text file of many K. I then want to do a MD5 hashbytes on that file and the resultant hash should always be 20 bytes.
Well when I do a select of hashbytes('MD5', [ASCII File]) I get query completed with errors
Msg 8152, Level 16, State 10, Line 4
String or binary data would be truncated.
I get the same message when I try
left(hashbytes('MD5', [ASCII File]), 50)
I get the same message when I try
convert(varchar(50), hashbytes('MD5', [ASCII File]))
It seems like since the column I am doing the hashbytes on is nvarchar(max), the result of the hashbytes function also is nvarchar(max).
Can you tell me how I can get the result to be the expected 20 long and not something so long it has to be truncated?

It seems like since the field I am doing the hashbytes on is nvarchar(max) the result of the hashbytes is nvarchar(max).
No, that is not possible, especially since the return value of HASHBYTES is a VARBINARY. Also, since your tests were just SELECT statements and not an INSERT statement, there is no way for the return value to get a truncation error. The truncation error is coming from the input value. As stated in that linked MSDN page for HASHBYTES (for SQL Server 2012 and 2014):
Allowed input values are limited to 8000 bytes. The output conforms to the algorithm standard: 128 bits (16 bytes) for MD2, MD4, and MD5; 160 bits (20 bytes) for SHA and SHA1; 256 bits (32 bytes) for SHA2_256, and 512 bits (64 bytes) for SHA2_512.
That really says it all: the input is limited to 8000 bytes, and the output is a fixed number of bytes, based on the specified algorithm.
The updated documentation, for SQL Server 2016 (which has removed the 8000 byte limitation), states:
For SQL Server 2014 and earlier, allowed input values are limited to 8000 bytes.
You can run a simple test:
DECLARE #Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(#Test);
SELECT HASHBYTES('MD5', #Test);
Returns:
50000
Msg 8152, Level 16, State 10, Line 3
String or binary data would be truncated.
If you want to pass in more than 8000 bytes to a hash function in a version of SQL Server prior to 2016, then you need to use SQLCLR. You can either write your own function, or you can download and install the Free version of the SQL# SQLCLR library (which I created), and use the Util_Hash and Util_HashBinary functions:
DECLARE #Test NVARCHAR(MAX) = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000);
SELECT LEN(#Test);
SELECT SQL#.Util_Hash('MD5', CONVERT(VARBINARY(MAX), #Test));
SELECT SQL#.Util_HashBinary('MD5', CONVERT(VARBINARY(MAX), #Test));
Returns:
50000
40752EB301B41EEAEB309348CE9711D6
0x40752EB301B41EEAEB309348CE9711D6
UPDATE
In the case of using a VARCHAR(MAX) column or variable but with 8000 or fewer characters (or an NVARCHAR(MAX) column or variable with 4000 or fewer characters), there will be no issue and everything will work as expected:
DECLARE #Test VARCHAR(MAX) = REPLICATE('t', 5000);
SELECT LEN(#Test) AS [Characters],
HASHBYTES('MD5', #Test) AS [MD5];
Returns:
5000 0x6ABFBA10B49157F2EF8C85862B6E6313

In SQL Server 2016 we don't have any more the problem of length of input parameter for HASHBYTES function.
DECLARE #Test NVARCHAR(MAX);
SET #Test = REPLICATE(CONVERT(NVARCHAR(MAX), N't'), 50000000);
SELECT LEN(#Test);
SELECT HASHBYTES('SHA2_512', #Test);
HASHBYTES (Transact-SQL)

If you are trying to convert a large varbinary or image file already in sql then there are some built in functions that can do this (possibly from 2014 onwards), this simple function will work for both varbinary(max) and older Image fields..
/****** Object: UserDefinedFunction [dbo].[MD5Bin] Script Date: 16/07/2018 11:04:26 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- ==================================================
-- Author: Darren Steven
-- Create date: 16/07/2018
-- Description: Hashes a binary or image field with MD5
-- ==================================================
CREATE FUNCTION [dbo].[MD5Bin](#value varbinary(max))
RETURNS varchar(32)
AS
BEGIN
RETURN SUBSTRING(master.sys.fn_sqlvarbasetostr(master.sys.fn_repl_hash_binary(#value)),3,32);
END
GO
then simply call the function with your select:
SELECT dbo.MD5Bin(imageFieldName) from dbo.yourTable

The input length limit of 8,000 bytes for the HASHBYTES (Transact-SQL) function is removed in sql 2016
Based on algorithm below are the output data size
128 bits (16 bytes) for MD2, MD4, and MD5;
160 bits (20 bytes) for SHA and SHA1;
256 bits (32 bytes) for SHA2_256
512 bits (64 bytes) for SHA2_512.

Related

MS SQL Server EncryptByKey - String or binary data would be truncated

In theory varchar(max) and varbinary(max) columns should be capable of storing up to 2GB of data but I cannot store a unicode string 5000 characters long.
I've looked through other questions on this topic and they all suggest checking column sizes. I've done this and see that all related columns are declared with max size.
The key difference from similar questions is that, when storing I'm encrypting data using EncryptByKey and I think that it's the bottleneck I'm looking for. From MSDN I know that return type of EncryptByKey has max size of 8000 bytes, and it is not clear what is max size of #cleartext argument, but I suspect it's the same.
The following code gives me error :
OPEN SYMMETRIC KEY SK1 DECRYPTION BY CERTIFICATE Cert1;
DECLARE #tmp5k AS NVARCHAR(max);
SET #tmp5k = N'...5000 characters...';
SELECT EncryptByKey(Key_GUID('SK1'), #tmp5k);
GO
[22001][8152] String or binary data would be truncated.
How to encrypt and store big strings (around 5k unicode characters)?
So I ran into this issue when using C# and trying to encrypt and inserts a long JSON string into SQL. What ended up working was converting the plain-text string to binary and then using the same SQL EncryptByKey function to insert that instead.
If you're doing this is just SQL, I think you can use this function:
CONVERT(VARBINARY(MAX), #tmp5k) AS ToBinary
So using our example:
OPEN SYMMETRIC KEY SK1 DECRYPTION BY CERTIFICATE Cert1;
DECLARE #tmp5k AS NVARCHAR(max);
SET #tmp5k = N'...5000 characters...';
SELECT EncryptByKey(Key_GUID('SK1'), CONVERT(VARBINARY(MAX), #tmp5k));
GO
And here's an example of using SQL to convert the binary back to a string:
CONVERT(VARCHAR(100), CONVERT(VARBINARY(100), #TestString)) AS StringFromBinaryFromString ;

SQL Server Linked Server to PostgreSQL Turkish Character Issue

I have added a PostgreSQL linked server to my SQL Server with help from this blog post. My problem is when I use the query below, I am having problems with Turkish characters.
Query on Microsoft SQL Server 2012:
SELECT *
FROM OpenQuery(CARGO, 'SELECT taxno ASACCOUNTNUM, title AS NAME FROM view_company');
Actual results:
MUSTAFA ÞAHÝNALP
Expected results:
MUSTAFA ŞAHİNALP
The problem is that the source encoding is 8-bit Extended ASCII using Code Page 1254 -- Windows Latin 5 (Turkish). If you follow that link, you will see the Latin5 chart of characters to values. The value of the Ş character -- "Latin Capital Letter S with Cedilla" -- is 222 (Decimal) / DE (Hex). Your local server (i.e. SQL Server) has a default Collation of SQL_Latin1_General_CP1_CI_AS which is also 8-bit Extended ASCII, but using Code Page 1252 -- Windows Latin 1 (ANSI). If you follow that link, you will see the Latin1 chart that shows the Þ character -- "Latin Capital Letter Thorn" -- also having a value of 222 (Decimal) / DE (Hex). This is why your characters are getting translated in that manner.
There are a few things you can try:
Use sp_serveroption to set the following two options:
EXEC sp_serveroption #server=N'linked_server_name',
#optname='use remote collation',
#optvalue=N'true';
EXEC sp_serveroption #server=N'linked_server_name',
#optname='collation name',
#optvalue=N'Turkish_100_CI_AS';
Not sure if that will work with PostgreSQL as the remote system, but it's worth trying at least. Please note that this requires that all remote column collations be set to this particular value: Turkish / Code Page 1254.
Force the Collation per each column:
SELECT [ACCOUNTNUM], [NAME] COLLATE Turkish_100_CI_AS
FROM OPENQUERY(CARGO, 'SELECT taxno AS ACCOUNTNUM, title AS NAME FROM view_company');
Convert the string values (just the ones with character mapping issues) to VARBINARY and insert into a temporary table where the column is set to the proper Collation:
CREATE TABLE #Temp ([AccountNum] INT, [Name] VARCHAR(100) COLLATE Turkish_100_CI_AS);
INSERT INTO #Temp ([AccountNum], [Name])
SELECT [ACCOUNTNUM], CONVERT(VARBINARY(100), [NAME])
FROM OPENQUERY(CARGO, 'SELECT taxno AS ACCOUNTNUM, title AS NAME FROM view_company');
SELECT * FROM #Temp;
This approach will first convert the incoming characters into their binary / hex representation (e.g. Ş --> 0xDE), and then, upon inserting 0xDE into the VARCHAR column in the temp table, it will translate 0xDE into the expected character of that value for Code Page 1254 (since that is the Collation of that column). The result will be Ş instead of Þ.
UPDATE
Option # 1 worked for the O.P.

Why does SQL Server give me a column twice the size I requested?

After executing a CREATE TABLE for a temporary table I was verifying that the size of the field fits what I need to use.
To my surprise, SQL Server (Azure SQL) is reporting that the table now has double the size. Why is this?
This is what I executed, in order:
CREATE TABLE #A ( Name NVARCHAR(500) not null )
EXEC tempdb..sp_help '#A'
An NVARCHAR column in SQL Server always stores every character with 2 bytes.
So if you're asking for 500 characters (at 2 bytes each), obviously this results in column size of 1000 bytes.
That's been like this in SQL Server forever - this isn't new or Azure specific.
NVARCHAR shows 2 bytes per character. So if the size is 500 it shows size as 1000. It is to store unicode format data.

Handling more than 8000 chars in stored proc parameter

I have a SQL Stored procedure that sends a mail. It's signature looks like this:
CREATE PROCEDURE SendMail
#From varchar(40),
#To varchar(255),
#Subject varchar(255),
#Body varchar(max),
#CC varchar(255) = null,
#BCC varchar(255) = null
AS...
When the message is for example 5000 characters it work. When it is 12 000, I get an error that [ODBC SQL Server Driver]String data, right truncation.
According to the help files varchar(max) can handle 2^31-1 bytes / characters.
So I tried changing #Body varchar(max) to #Body varchar(30000) and I get an error that
The size (30000) given to the type 'varchar' exceeds the maximum allowed for any data type (8000).
So the max is 8000 and not 2^31-1 bytes?
How can I handle more than 8000 characters?
You need to use nvarchar(max), instead of varchar(4000) or varchar(max). This can store up to 2 GB of text, which will solve your problem...
For more information see http://technet.microsoft.com/en-us/library/ms186939.aspx
Text fields cannot be larger than 8060 Bytes (8K) due to SQL Server Page Size which is 8K...
varchar has a maximum #of chars of 8000
nvarchar has a maximum #of chars of 4000 (each char-->2 bytes)
You cannot declare a parameter varchar(30000)
You should use varchar(max) or nvarchar(max)
the first has 2^31 chars (approx 2billions), the latter has 2^30 chars (approx 1billion)
Also, please note that SQL Server has a Stored Proc Named sp_send_dbmail that you can use to sen emails...
Try using NVARCHAR(MAX) instead of VARCHAR(MAX).
Use the BLOB data type. I use it occasionally for very long fields but it cannot be compared. I do not believe there is a max length on BLOB.
Max. capacity is 2 GByte of space - so you're looking at just over 1 billion 2-byte characters that will fit into a NVARCHAR(MAX) field.
Using the other answer's more detailed numbers, you should be able to store
(2 ^ 31 - 1) / 2 = 1'037'741'823 double-byte characters
1 billion, 37 million, 741 thousand and 823 characters to be precise
in your NVARCHAR(MAX) column (unfortunately, that last half character is wasted...)
SOURCE
REPLICATE returns the input type irrespective of later assignment. It's annoying, but to avoid silent truncation, try this example:
declare #x varchar(max) set #x = replicate (cast('a' as varchar(max)),
10000) select #x, len(#x)
This is because SQL Server performs the REPLICATE operation before it considers what you're assigning it to or how many characters you're trying to expand it to. It only cares about the input expression to determine what it should return, and if the input is not a max type, it assumes it is meant to fit within 8,000 bytes.

SQL Server Text Datatype Maxlength = 65,535?

Software I'm working with uses a text field to store XML. From my searches online, the text datatype is supposed to hold 2^31 - 1 characters. Currently SQL Server is truncating the XML at 65,535 characters every time. I know this is caused by SQL Server, because if I add a 65,536th character to the column directly in Management Studio, it states that it will not update because characters will be truncated.
Is the max length really 65,535 or could this be because the database was designed in an earlier version of SQL Server (2000) and it's using the legacy text datatype instead of 2005's?
If this is the case, will altering the datatype to Text in SQL Server 2005 fix this issue?
that is a limitation of SSMS not of the text field, but you should use varchar(max) since text is deprecated
Here is also a quick test
create table TestLen (bla text)
insert TestLen values (replicate(convert(varchar(max),'a'), 100000))
select datalength(bla)
from TestLen
Returns 100000 for me
MSSQL 2000 should allow up to 2^31 - 1 characters (non unicode) in a text field, which is over 2 billion. Don't know what's causing this limitation but you might wanna try using varchar(max) or nvarchar(max). These store as many characters but allow also the regular string T-SQL functions (like LEN, SUBSTRING, REPLACE, RTRIM,...).
If you're able to convert the column, you might as well, since the text data type will be removed in a future version of SQL Server. See here.
The recommendation is to use varchar(MAX) or nvarchar(MAX). In your case, you could also use the XML data type, but that may tie you to certain database engines (if that's a consideration).
You should have a look at
XML Support in Microsoft SQL Server
2005
Beginning SQL Server 2005 XML
Programming
So I would rather try to use the data type appropriate for the use. Not make a datatype fit your use from a previous version.
Here's a little script I wrote for getting out all data
SELECT #data = N'huge data';
DECLARE #readSentence NVARCHAR (MAX) = N'';
DECLARE #dataLength INT = ( SELECT LEN (#data));
DECLARE #currIndex INT = 0;
WHILE #data <> #readSentence
BEGIN
DECLARE #temp NVARCHAR (MAX) = N'';
SET #temp = ( SELECT SUBSTRING (#data, #currIndex, 65535));
SELECT #temp;
SET #readSentence += #temp;
SET #currIndex += 65535;
END;

Resources