My problem is that I'm trying to encrypt a column in a SQL Server database because of policies of my work place. I have access only to simple methods for encrypting (TDE seems out of my possibilities) so I've tried using EncryptByCert or EncryptByKey. I was doing fine since the documentation shows the cap at 8000 which is enough for the data we're saving.
It just so happens that when I try to save anything it caps off at around 200 characters generating a 514 byte long varbinary. The 514 byte length varbinary will encrypt and decrypt fine but will not grow or shorten, a single character counts the same as a 200 string making those same 514 bytes binary. After say around 230 characters that I want to encrypt it will just leave the column null.
Does anyone know what's happening with that?
Encryption performed by these methods is done in chunks, with the maximum chunk size is the key length minus some internal overhead (117 bytes for 1024 bit keys, and 245 bytes for 2048 bit keys first introduced in SQL Server 2016).
If your input is any larger than that, you have to split it into chunks and encrypt one at a time, then concatenate the result.
Decryption, of course, should be performed accordingly. However, an important difference between the two is that encryption chunk size will be smaller than the key, and for decryption it should be exactly the key size. That's because any data, however short, will be encrypted into key long chunk, so that no guesses on the input length can be made by looking at the output.
Here is an excerpt from my encryption function (written for 2012 version, so 1024 bit keys are assumed):
create function [dbo].[security_EncryptByCert]
(
#ClearText varbinary(max)
)
returns varbinary(max) with schemabinding, returns null on null input as begin
-- Returned value
declare #Ret varbinary(max) = 0x,
-- Length of the ciphertext
#Lng int = datalength(#ClearText),
-- Starting offset of the chunk to encrypt
#i int = 1,
-- Chunk size, currently it can't be more than 117 bytes
#Size int = 100,
-- Certificate to encrypt data with
#CertId int;
-- Determine the certificate with which to perform encryption
select #CertId = Id from ...
-- Iterate chunk by chunk til the end of the text
while #i < #Lng begin
set #Ret += encryptbycert(#CertId, substring(#ClearText, #i, #Size));
-- Move the pointer to the next block
set #i += #Size;
end;
return #Ret;
end;
In this case, I used 100 byte chunks, not the largest possible ones. Don't really remember why, but you can use 245 bytes as a limit on 2016.
Related
I am going to encrypted several fields in existing table. Basically, the following encryption technique is going to be used:
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'sm_long_password#'
GO
CREATE CERTIFICATE CERT_01
WITH SUBJECT = 'CERT_01'
GO
CREATE SYMMETRIC KEY SK_01
WITH ALGORITHM = AES_256 ENCRYPTION
BY CERTIFICATE CERT_01
GO
OPEN SYMMETRIC KEY SK_01 DECRYPTION
BY CERTIFICATE CERT_01
SELECT ENCRYPTBYKEY(KEY_GUID('SK_01'), 'test')
CLOSE SYMMETRIC KEY SK_01
DROP SYMMETRIC KEY SK_01
DROP CERTIFICATE CERT_01
DROP MASTER KEY
The ENCRYPTBYKEY returns varbinary with a maximum size of 8,000 bytes. Knowing the table fields going to be encrypted (for example: nvarchar(128), varchar(31), bigint) how can I define the new varbinary types length?
You can see the full specification here
So lets calculate:
16 byte key UID
_4 bytes header
16 byte IV (for AES, a 16 byte block cipher)
Plus then the size of the encrypted message:
_4 byte magic number
_2 bytes integrity bytes length
_0 bytes integrity bytes (warning: may be wrongly placed in the table)
_2 bytes (plaintext) message length
_m bytes (plaintext) message
CBC padding bytes
The CBC padding bytes should be calculated the following way:
16 - ((m + 4 + 2 + 2) % 16)
as padding is always applied. This will result in a number of padding bytes in the range 1..16. A sneaky shortcut is to just add 16 bytes to the total, but this may mean that you're specifying up to 15 bytes that are never used.
We can shorten this to 36 + 8 + m + 16 - ((m + 8) % 16) or 60 + m - ((m + 8) % 16. Or if you use the little trick specified above and you don't care about the wasted bytes: 76 + m where m is the message input.
Notes:
beware that the first byte in the header contains the version number of the scheme; this answer does not and cannot specify how many bytes will be added or removed if a different internal message format or encryption scheme is used;
using integrity bytes is highly recommended in case you want to protect your DB fields against change (keeping the amount of money in an account confidential is less important than making sure the amount cannot be changed).
The example on the page assumes single byte encoding for text characters.
Based upon some tests in SQL Server 2008, the following formula seems to work. Note that #ClearText is VARCHAR():
52 + (16 * ( ((LEN(#ClearText) + 8)/ 16) ) )
This is roughly compatible with the answer by Maarten Bodewes, except that my tests showed the DATALENGTH(myBinary) to always be of the form 52 + (z * 16), where z is an integer.
LEN(myVarCharString) DATALENGTH(encryptedString)
-------------------- -----------------------------------------
0 through 7 usually 52, but occasionally 68 or 84
8 through 23 usually 68, but occasionally 84
24 through 39 usually 84
40 through 50 100
The "myVarCharString" was a table column defined as VARCHAR(50). The table contained 150,000 records. The mention of "occasionally" is an instance of about 1 out of 10,000 records that would get bumped into a higher bucket; very strange. For LEN() of 24 and higher, there were not enough records to get the weird anomaly.
Here is some Perl code that takes a proposed length for "myVarCharString" as input from the terminal and produces an expected size for the EncryptByKey() result. The function "int()" is equivalent to "Math.floor()".
while($len = <>) {
print 52 + ( 16 * int( ($len+8) / 16 ) ),"\n";
}
You might want to use this formula to calculate a size, then add 16 to allow for the anomaly.
What is the maximum character string length / bytes that can be sent to SQL Server using RODBC's function sqlQuery()? I've been using sqlQuery() primarily for updating and inserting records in tables by sending multiple statements in one string, batches. For example, this string has 3 query statements and has a string length / byte size of 146 using nchar() when it's not broken up by newlines and spaces.
update_queries = "UPDATE tbl SET col1 = newval1 WHERE col2 = val1;
UPDATE tbl SET col1 = newval2 WHERE col2 = val2;
UPDATE tbl SET col1 = newval3 WHERE col2 = val3;"
I can send it off with sqlQuery(db_conn, update_queries), and so this goes back to the question I posted. I've run into this concept of Network Packet Size from the SQL Server (64 bit) documentation. It states that the maximum batch size is 65,536 * Network Packet Size, where the the default packet size is 4 KB. I assume 65,636 is in bytes so then byte wise it's 65,536 * 4,000 = 262,144,000 bytes. Would 262,144,000 then be the maximum length a string containing valid queries could be? Can someone please clarify if I have the right idea here or is there another SQL Server concept or ODBC concept that I need to know about? Thanks
Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).
If I declare a column as nvarchar(max), I understand that this will allocate 2Gb of space, but does it actually take the 2Gb, of disk space, straight away once I save the changes to the table? Or, is it that it makes note that this column will allow 2Gb of data to be populated in the column?
As I understand it, space isn't allocated until it is needed.
Try the following queries:
CREATE TABLE SizeTest (
MyID int primary key
)
INSERT INTO SizeTest SELECT 1
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
UNION SELECT 5
EXEC sp_spaceused 'SizeTest'
ALTER TABLE SizeTest ADD MyBigText nvarchar(max)
EXEC sp_spaceused 'SizeTest'
UPDATE SizeTest SET MyBigText = 'This is big text' WHERE MyID = 1
EXEC sp_spaceused 'SizeTest'
DROP TABLE SizeTest
By executing this statement, you should get the following for all three sp_spaceused calls:
name rows reserved data index_size unused
SizeTest 5 16 KB 8 KB 8 KB 0 KB
At no point is the 2GB allocated.
The maximum storage size for VARCHAR(MAX) is 2^31-1 bytes (2,147,483,647 bytes or 2GB - 1 bytes). The storage size is the actual length of data entered + 2 bytes. The data entered can be 0 characters in length. Since each character in a VARCHAR data type uses one byte, the maximum length for a VARCHAR(MAX) data type is 2,147,483,645.
The maximum storage size for NVARCHAR(MAX) is also 2^31-1 bytes (2,147,483,647 bytes or 2GB - 1 bytes). The storage size, in bytes, is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length. Since each Unicode character in an NVARCHAR data type uses two bytes, the maximum length for an NVARCHAR(MAX) data type is 1,073,741,822.
The maximum storage size for VARBINARY(MAX) is the same as the maximum storage size for VARCHAR(MAX) and NVARCHAR(MAX), which is 2^31-1 (2,147,483,647 bytes or 2GB - 1 bytes). The storage size is the actual length of the data entered + 2 bytes. The data that is entered can be 0 bytes in length.
Will allow 2Gb of data :) I think so.. It allows until 2Gb but it will not allocate this space straight away
I am integrating between 4 data sources:
InternalDeviceRepository
ExternalDeviceRepository
NightlyDeviceDeltas
MidDayDeviceDeltas
Changes flow into the InternalDeviceRepository from the other three sources.
All sources eventually are transformed to have the definition of
FIELDS
=============
IdentityField
Contract
ContractLevel
StartDate
EndDate
ContractStatus
Location
IdentityField is the PrimaryKey, Contract Key is a secondary Key only if a match exists, otherwise a new record needs to be created.
Currently I compare all the fields in a WHERE clause in SQL Statements and also in a number of places in SSIS packages. This creates some unclean looking SQL and SSIS packages.
I've been mulling computing a hash of ContractLevel, StartDate, EndDate, ContractStatus, and Location and adding that to each of the input tables. This would allow me to use a single value for comparison, instead of 5 separate ones each time.
I've never done this before, nor have I seen it done. Is there a reason that it should be used, or is that a cleaner way to do it?
It is a valid approach. Consider to introduce a calculated field with the hash and index on it.
You may use either CHECKSUM function or write your own hash function like this:
CREATE FUNCTION dbo.GetMyLongHash(#data VARBINARY(MAX))
RETURNS VARBINARY(MAX)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE #res VARBINARY(MAX) = 0x
DECLARE #position INT = 1, #len INT = DATALENGTH(#data)
WHILE 1 = 1
BEGIN
SET #res = #res + HASHBYTES('MD5', SUBSTRING(#data, #position, 8000))
SET #position = #position+8000
IF #Position > #len
BREAK
END
WHILE DATALENGTH(#res) > 16 SET #res= dbo.GetMyLongHash(#res)
RETURN #res
END
which will give you 16-byte value - you may take all the 16 bytes as Guid, or only first 8-bytes as bigint and compare it.
Adapt the function in your way - to accept string as parameter or even all the your fields instead of varbinary
BUT
be careful with strings casing, datetime formats
if using CHECKSUM - check also other fields, checksum produces dublicates
avoid using 4-byte hash result on relaively big table