Calculating MD5 HashBytes for Nvarchar(max) column is possible in SQL? - sql-server

I have a table with a column data type nvarchar(max), the column will have data more than 8000 characters.
mytext navarchar(max)
I want to calculate hash value of that column, I am using the following code in MS SQL 2008/R2
select HASHBYTES('md5',column_name)
But I am getting error as,
String or binary data would be truncated.
Is that possible to calculate hash value in nvarchar(max) field in sql query.
Or is there any other ways to do it.
Thanks in advance.

Allowed input values are limited to 8000 bytes as was mentioned.
Try:
select master.sys.fn_repl_hash_binary(cast(column_name as varbinary(max)))
For this operation you have to disable FIPS validated cryptographic algorithms:
http://blog.aggregatedintelligence.com/2007/10/fips-validated-cryptographic-algorithms.html

Related

Using varchar(8000) column value in HashByte md5 function

I want to create a hash using all the row values in SQL. In that table one of the column length is varchar(8000).I put a hashbyte function like below -
Hashbyte('MD5',column1+column2+....) -- column1 having varchar(8000) length and it contains string of length 8000.
Then it gives me same hashbyte where the rows having same value in column1 even if other columns contains different data
Then I converted column1 to varchar(max) in hashbyte function, I got different hashbyte for each row.
Hashbyte('MD5',convert(varchar(max),column1)+column2+....)
Why the hashbyte('MD5'...) wont take all column values?
If you want to try one more example of having varchar(8000) column issue-
try to calculate the length
create a table having column with varchar(8000) and calculate length of all column values. It will give you 8000 only. Next convert the varchar(8000) to varchar(max) it will give you correct result.
len(column1+column2...) --> 8000
len(convert(varchar(max),column1)+column2...) --> actual length
adding any string with varchar(8000) is such an issue?
You're under the misconception that a varchar(8000) concatenated to a varchar(8000) (or even any other length <= 8000) results in a varchar(MAX). This is not true. To get a MAX length you must define at least one of the values in the expression as a MAX.
This is confirmed in the remarks in + (String Concatenation) (Transact-SQL):
Remarks
...
If the result of the concatenation of strings exceeds the limit of 8,000 bytes, the result is truncated. However, if at least one of the strings concatenated is a large value type, truncation does not occur.
As a result you need to convert one of the values first to MAX and then the rest would implicitly be cast to a MAX as well. If you don't explicitly convert (at least) one of the expressions, then the value will be truncated, as the documentation states.
Obviously this applies to nvarchar as well, where truncation occurs at 4,000 characters (which is still 8,000 bytes).

MS SQL Server EncryptByKey - String or binary data would be truncated

In theory varchar(max) and varbinary(max) columns should be capable of storing up to 2GB of data but I cannot store a unicode string 5000 characters long.
I've looked through other questions on this topic and they all suggest checking column sizes. I've done this and see that all related columns are declared with max size.
The key difference from similar questions is that, when storing I'm encrypting data using EncryptByKey and I think that it's the bottleneck I'm looking for. From MSDN I know that return type of EncryptByKey has max size of 8000 bytes, and it is not clear what is max size of #cleartext argument, but I suspect it's the same.
The following code gives me error :
OPEN SYMMETRIC KEY SK1 DECRYPTION BY CERTIFICATE Cert1;
DECLARE #tmp5k AS NVARCHAR(max);
SET #tmp5k = N'...5000 characters...';
SELECT EncryptByKey(Key_GUID('SK1'), #tmp5k);
GO
[22001][8152] String or binary data would be truncated.
How to encrypt and store big strings (around 5k unicode characters)?
So I ran into this issue when using C# and trying to encrypt and inserts a long JSON string into SQL. What ended up working was converting the plain-text string to binary and then using the same SQL EncryptByKey function to insert that instead.
If you're doing this is just SQL, I think you can use this function:
CONVERT(VARBINARY(MAX), #tmp5k) AS ToBinary
So using our example:
OPEN SYMMETRIC KEY SK1 DECRYPTION BY CERTIFICATE Cert1;
DECLARE #tmp5k AS NVARCHAR(max);
SET #tmp5k = N'...5000 characters...';
SELECT EncryptByKey(Key_GUID('SK1'), CONVERT(VARBINARY(MAX), #tmp5k));
GO
And here's an example of using SQL to convert the binary back to a string:
CONVERT(VARCHAR(100), CONVERT(VARBINARY(100), #TestString)) AS StringFromBinaryFromString ;

SQL Server - trying to convert column to XML fails

I'm in the process of importing data from a legacy MySQL database into SQL Server 2005.
I have one table in particular that's causing me grief. I've imported it from MySQL using a linked server and the MySQL ODBC driver, and I end up with this:
Col Name Datatype MaxLen
OrderItem_ID bigint 8
PDM_Structure_ID int 4
LastModifiedDate datetime 8
LastModifiedUser varchar 20
CreationDate datetime 8
CreationUser varchar 20
XMLData text -1
OrderHeader_ID bigint 8
Contract_Action varchar 10
ContractItem int 4
My main focus is on the XMLData column - I need to clean it up and make it so that I can convert it to an XML datatype to use XQuery on it.
So I set the table option "large data out of row" to 1:
EXEC sp_tableoption 'OrderItem', 'large value types out of row', 1
and then I go ahead and convert XMLData to VARCHAR(MAX) and do some cleanup of the XML stored in that field. All fine so far.
But when I now try to convert that column to XML datatype:
ALTER TABLE dbo.OrderItem
ALTER COLUMN XMLData XML
I get this message here:
Msg 511, Level 16, State 1, Line 1
Cannot create a row of size 8077 which
is greater than the allowable maximum
row size of 8060. The statement has
been terminated.
which is rather surprising, seeing that the columns besides the XMLData only make up roughly 90 bytes, and I specifically instructed SQL Server to store all "large data" off-row....
So why on earth does SQL Server refuse to convert that column to XML data??? Any ideas?? Thoughts?? Things I can check / change in my approach??
Update: I don't know what changed, but on a second attempt to import the raw data from MySQL into SQL Server, I was successfully able to convert that NTEXT -> VARCHAR(MAX) column to XML in the end..... odd..... anyhoo - works now - thanks guys for all your input and recommendations! Highly appreciated !
If you have sufficient storage space, you could try selecting from the VARCHAR(MAX) version of the table into a new table with the same schema but with XMLData set up as XML - either using SELECT INTO or by explicitly creating the table before you begin.
PS - it's a side issue unrelated to your problem, but you might want to check that you're not losing Unicode characters in the original MySQL XMLData field by this conversion since the text/varchar data types won't support them.
Can you ADD a new column of type xml?
If so, add the new xml column, update the table to set the new column equal to the XmlData column and then drop the XmlData column.
Edit
I have a table "TestTable" with a "nvarchar(max)" column.
select * from sys.tables where name = 'TestTable'
This gives a result containing:
[lob_data_space_id] [text_in_row_limit] [large_value_types_out_of_row]
1 0 0
yet I can happily save 500k characters in my nvarchar(max) field.
What do you get if you query sys.tables for your OrderItems table?
If your [text_in_row_limit] is not zero, try this, which should convert any existing in-row strings into BLOBs:
exec sp_tableoption 'OrderItems', 'text in row', 0
and then try to switch from nvarchar(max) to xml.
From BOL,
Disabling the text in row option or
reducing the limit of the option will
require the conversion of all BLOBs;
therefore, the process can be long,
depending on the number of BLOB
strings that must be converted. The
table is locked during the conversion
process.

Sql Server XML-type column duplicate entry detection

In Sql Server I am using an XML type column to store a message. I do not want to store duplicate messages.
I only will have a few messages per user. I am currently querying the table for these messages, converting the XML to string in my C# code. I then compare the strings with what I am about to insert.
Unfortunately, Sql Server pretty-prints the data in the XML typed fields. What you store into the database is not necessarily exactly the same string as what you get back out later. It is functionally equivalent, but may have white space removed, etc.
Is there an efficient way to compare an XML string that I am considering inserting with those that are already in the database? As an aside, if I detect a duplicate I need to delete the older message then insert the replacement.
0 - Add a hash column to your table
1 - when you receive a new message, convert the whole XML to uppercase, remove all blanks and returns/linefeed, then compute the hash value of the normalized string.
2 - check if you already have a row with the resulting hash code in it.
If yes, this is duplicated, treat it
accordingly
If not, store the original XML along with the hash in a new row
I'm not 100% sure on your exact implementation but here is something I played around with. The idea being a stored procedure would do the inserting. Inserting into the messages table does a basic check on existing messages (SQL 2008 syntax):
declare #messages table (msg xml)
insert into #messages values
('<message>You like oranges</message>')
,('<message>You like apples</message>')
declare #newMessage xml = '<message>You like apples</message>'
insert into #messages (msg)
select #newMessage
where #newMessage.value('(message)[1]', 'nvarchar(50)') not in (
select msg.value('(message)[1]', 'nvarchar(50)')
from #messages
)
One solution is to stop using the XML typed field. Store the XML string into a varchar typed field.
I don't really like this solution, but I don't really like p.marino's solution either. It doesn't seem right to store a hash of something that is already in the row in the table.
What if you use OPENXML on each row in the table and query the actual XML information for key nodes and/or key attributes? But then you need to do it row by row, I don't think OPENXML works with a whole set of table rows.

Length of varbinary(max) filestream on SQL Server 2008

Is there some efficient way how to get length of data in "varbinary(max) filestream" column?
I found only samples with conversion to varchar and then calling the "LEN" function.
SELECT
length = DATALENGTH(Name),
Name
FROM
Production.Product
ORDER BY
Name
"Returns the number of bytes used to represent any expression."
T-SQL and quote taken from MSDN's DATALENGTH (Transact-SQL) library.

Resources