Is it possible to determine ENCRYPTBYKEY maximum returned value by the clear text type? - sql-server

I am going to encrypted several fields in existing table. Basically, the following encryption technique is going to be used:
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'sm_long_password#'
GO
CREATE CERTIFICATE CERT_01
WITH SUBJECT = 'CERT_01'
GO
CREATE SYMMETRIC KEY SK_01
WITH ALGORITHM = AES_256 ENCRYPTION
BY CERTIFICATE CERT_01
GO
OPEN SYMMETRIC KEY SK_01 DECRYPTION
BY CERTIFICATE CERT_01
SELECT ENCRYPTBYKEY(KEY_GUID('SK_01'), 'test')
CLOSE SYMMETRIC KEY SK_01
DROP SYMMETRIC KEY SK_01
DROP CERTIFICATE CERT_01
DROP MASTER KEY
The ENCRYPTBYKEY returns varbinary with a maximum size of 8,000 bytes. Knowing the table fields going to be encrypted (for example: nvarchar(128), varchar(31), bigint) how can I define the new varbinary types length?

You can see the full specification here
So lets calculate:
16 byte key UID
_4 bytes header
16 byte IV (for AES, a 16 byte block cipher)
Plus then the size of the encrypted message:
_4 byte magic number
_2 bytes integrity bytes length
_0 bytes integrity bytes (warning: may be wrongly placed in the table)
_2 bytes (plaintext) message length
_m bytes (plaintext) message
CBC padding bytes
The CBC padding bytes should be calculated the following way:
16 - ((m + 4 + 2 + 2) % 16)
as padding is always applied. This will result in a number of padding bytes in the range 1..16. A sneaky shortcut is to just add 16 bytes to the total, but this may mean that you're specifying up to 15 bytes that are never used.
We can shorten this to 36 + 8 + m + 16 - ((m + 8) % 16) or 60 + m - ((m + 8) % 16. Or if you use the little trick specified above and you don't care about the wasted bytes: 76 + m where m is the message input.
Notes:
beware that the first byte in the header contains the version number of the scheme; this answer does not and cannot specify how many bytes will be added or removed if a different internal message format or encryption scheme is used;
using integrity bytes is highly recommended in case you want to protect your DB fields against change (keeping the amount of money in an account confidential is less important than making sure the amount cannot be changed).
The example on the page assumes single byte encoding for text characters.

Based upon some tests in SQL Server 2008, the following formula seems to work. Note that #ClearText is VARCHAR():
52 + (16 * ( ((LEN(#ClearText) + 8)/ 16) ) )
This is roughly compatible with the answer by Maarten Bodewes, except that my tests showed the DATALENGTH(myBinary) to always be of the form 52 + (z * 16), where z is an integer.
LEN(myVarCharString) DATALENGTH(encryptedString)
-------------------- -----------------------------------------
0 through 7 usually 52, but occasionally 68 or 84
8 through 23 usually 68, but occasionally 84
24 through 39 usually 84
40 through 50 100
The "myVarCharString" was a table column defined as VARCHAR(50). The table contained 150,000 records. The mention of "occasionally" is an instance of about 1 out of 10,000 records that would get bumped into a higher bucket; very strange. For LEN() of 24 and higher, there were not enough records to get the weird anomaly.
Here is some Perl code that takes a proposed length for "myVarCharString" as input from the terminal and produces an expected size for the EncryptByKey() result. The function "int()" is equivalent to "Math.floor()".
while($len = <>) {
print 52 + ( 16 * int( ($len+8) / 16 ) ),"\n";
}
You might want to use this formula to calculate a size, then add 16 to allow for the anomaly.

Related

Is it safe to use the first 22 characters of a NFT pubkey as a primary key for a DB

I am wondering if is safe to only use the first 22 characters instead of the 44 characters of a pubkey of an NFT as a primary key of a MySQL DB. I have a DB with huge data and could save a lot of space thanks to this approach. For instance having the following pubkey:
AQoKYV7tYpTrFZN6P5oUufbQKAUr9mNYGe1TTJC9wajM
Would it be safer to use the first 22 characters:
AQoKYV7tYpTrFZN6P5oUuf
Would it be safer using the first 11chars plus the trailing 11chars, or doesn't make any difference?
AQoKYV7tYpTe1TTJC9wajM
A public key is 32 bytes, so those "44 characters" are actually the base-58 representation of those 32 bytes.
If you're only storing 22 characters, let's simplify things and say that you're storing 16 bytes out of 32 total. The chance of two pubkeys sharing the same 16-byte sequence is 1 / 256^16 = 1 / 2^128 = 2.9 * 10 ^ -39, which is very unlikely, but possible.
Here's another way to approach the problem -- how about storing the full pubkey as 32 bytes instead of as a string? Then you won't ever lose any precision.

How is nvarchar(n) stored in SQL Server?

Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).

SQL Server encryptbycert capping off

My problem is that I'm trying to encrypt a column in a SQL Server database because of policies of my work place. I have access only to simple methods for encrypting (TDE seems out of my possibilities) so I've tried using EncryptByCert or EncryptByKey. I was doing fine since the documentation shows the cap at 8000 which is enough for the data we're saving.
It just so happens that when I try to save anything it caps off at around 200 characters generating a 514 byte long varbinary. The 514 byte length varbinary will encrypt and decrypt fine but will not grow or shorten, a single character counts the same as a 200 string making those same 514 bytes binary. After say around 230 characters that I want to encrypt it will just leave the column null.
Does anyone know what's happening with that?
Encryption performed by these methods is done in chunks, with the maximum chunk size is the key length minus some internal overhead (117 bytes for 1024 bit keys, and 245 bytes for 2048 bit keys first introduced in SQL Server 2016).
If your input is any larger than that, you have to split it into chunks and encrypt one at a time, then concatenate the result.
Decryption, of course, should be performed accordingly. However, an important difference between the two is that encryption chunk size will be smaller than the key, and for decryption it should be exactly the key size. That's because any data, however short, will be encrypted into key long chunk, so that no guesses on the input length can be made by looking at the output.
Here is an excerpt from my encryption function (written for 2012 version, so 1024 bit keys are assumed):
create function [dbo].[security_EncryptByCert]
(
#ClearText varbinary(max)
)
returns varbinary(max) with schemabinding, returns null on null input as begin
-- Returned value
declare #Ret varbinary(max) = 0x,
-- Length of the ciphertext
#Lng int = datalength(#ClearText),
-- Starting offset of the chunk to encrypt
#i int = 1,
-- Chunk size, currently it can't be more than 117 bytes
#Size int = 100,
-- Certificate to encrypt data with
#CertId int;
-- Determine the certificate with which to perform encryption
select #CertId = Id from ...
-- Iterate chunk by chunk til the end of the text
while #i < #Lng begin
set #Ret += encryptbycert(#CertId, substring(#ClearText, #i, #Size));
-- Move the pointer to the next block
set #i += #Size;
end;
return #Ret;
end;
In this case, I used 100 byte chunks, not the largest possible ones. Don't really remember why, but you can use 245 bytes as a limit on 2016.

Netezza: ERROR: 65536 : Record size limit exceeded

Can someone please explain below behavior
KAP.ADMIN(ADMIN)=> create table char1 ( a char(64000),b char(1516));
CREATE TABLE
KAP.ADMIN(ADMIN)=> create table char2 ( a char(64000),b char(1517));
ERROR: 65536 : Record size limit exceeded
KAP.ADMIN(ADMIN)=> insert into char1 select * from char1;
ERROR: 65540 : Record size limit exceeded => why this error during
insert if create table does not throw any error for same table as
shown above.
KAP.ADMIN(ADMIN)=> \d char1
Table "CHAR1"
Attribute | Type | Modifier | Default Value
-----------+------------------+----------+---------------
A | CHARACTER(64000) | |
B | CHARACTER(1516) | |
Distributed on hash: "A"
./nz_ddl_table KAP char1
Creating table: "CHAR1"
CREATE TABLE CHAR1
(
A character(64000),
B character(1516)
)
DISTRIBUTE ON (A)
;
/*
Number of columns 2
(Variable) Data Size 4 - 65520
Row Overhead 28
====================== =============
Total Row Size (bytes) 32 - 65548
*/
I would like to know the calculation of row size in above case.
I checked the netezza db user guide, but not able to understand its calculation in above example.
I think this link does a good job of explaining the over head of Netezza / PDA Datatypes:
For every row of every table, there is a 24-byte fixed overhead of the rowid, createxid, and deletexid. If you have any nullable columns, a null vector is required and it is N/8 bytes where N is the number of columns in the record.
The system rounds up the size of
this header to a multiple of 4 bytes.
In addition, the system adds a record header of 4 bytes if any of the following is true:
Column of type VARCHAR
Column of type CHAR where the length is greater than 16 (stored internally as VARCHAR)
Column of type NCHAR
Column of type NVARCHAR
Using UTF-8 encoding, each Unicode code point can require 1 - 4 bytes of storage. A 10-character string requires 10 bytes of storage if it is ASCII and up to 20 bytes if it is Latin, or as many as 40 bytes if it is Kanji.
The only time a record does not contain a header is if all the columns are defined as NOT NULL, there are no character data types larger than 16 bytes, and no variable character data types.
https://www.ibm.com/support/knowledgecenter/SSULQD_7.2.1/com.ibm.nz.dbu.doc/c_dbuser_data_types_calculate_row_size.html
First create a temp table based on one row of data.
create temp table tmptable as
select *
from Table
limit 1
Then check the used bytes of the temp table. That should be the size per row.
select used_bytes
from _v_sys_object_storage_size a inner join
_v_table b
on a.tblid = b.objid
and b.tablename = 'tmptable'
Netezza has some Limitations:
1)Maximum number of characters in a char/varchar field: 64,000
2)Maximum row size: 65,535 bytes
Beyond 65 k bytes is impossible for a record length in NZ.
Though NZ box offers huge space, it would be really good idea to move with accurate space forecasting rather radomly spacing. Now in your requirement does all the attributes would mandatorily require a char(64000) or can be compacted with real-time data analysis. If further compacting can be done, then revisit on the attribute length .
Also during such requirements, never go with insert into char1 select * ....... statements because this will allow system to choose preferred datatypes and that will be of higher sizing ends which might not be necessary.

Calculate storage space used by sql server sql_variant data type to store fixed length data types

I'm trying to calculate the storage space used by sql_variant to store fixed length data types.
For my test I created a table with two columns:
Key int identitiy(1,1) primary key
Value sql_variant
I added one row with Value 1 of type int and I used DBCC PAGE to check the size of the row, that turned out being 21 bytes.
Using Estimate the Size of a Clustered Index I have:
Null_bitmap = 3
Fixed_Data_Size = 4 (Key column int)
Variable_Data_Size = 2 + 2 + 4 (Value column with an int)
Row_Size = 4 + 8 + 3 + 4 = 19 bytes
Why does the row take 21 bytes? What am I missing in my calculation?
I tried the same analysis with a table using an int column instead of the sql_variant and the used byte count reported by DBCC PAGE is 15, which match my calculation:
Null_bitmap = 3
Fixed_Data_Size = 8 (Key column int, Value column int)
Variable_Data_Size = 0
Row_Size = 4 + 8 + 3 = 15 bytes
The extra space is the sql_variant metadata information. From the BOL:
http://msdn.microsoft.com/en-us/library/ms173829.aspx
*Each instance of a sql_variant column records the data value and the metadata information. This includes the base data type, maximum size, scale, precision, and collation.
For compatibility with other data types, the catalog objects, such as the DATALENGTH function, that report the length of sql_variant objects report the length of the data. The length of the metadata that is contained in a sql_variant object is not returned.*
You missed part 7.
7 . Calculate the number of rows per page (8096 free bytes per page):
Rows_Per_Page = 8096 / (Row_Size + 2)
Because rows do not span pages, the number of rows per page should be
rounded down to the nearest whole row. The value 2 in the formula is
for the row's entry in the slot array of the page.

Resources