Is it safe to use the first 22 characters of a NFT pubkey as a primary key for a DB - uuid

I am wondering if is safe to only use the first 22 characters instead of the 44 characters of a pubkey of an NFT as a primary key of a MySQL DB. I have a DB with huge data and could save a lot of space thanks to this approach. For instance having the following pubkey:
AQoKYV7tYpTrFZN6P5oUufbQKAUr9mNYGe1TTJC9wajM
Would it be safer to use the first 22 characters:
AQoKYV7tYpTrFZN6P5oUuf
Would it be safer using the first 11chars plus the trailing 11chars, or doesn't make any difference?
AQoKYV7tYpTe1TTJC9wajM

A public key is 32 bytes, so those "44 characters" are actually the base-58 representation of those 32 bytes.
If you're only storing 22 characters, let's simplify things and say that you're storing 16 bytes out of 32 total. The chance of two pubkeys sharing the same 16-byte sequence is 1 / 256^16 = 1 / 2^128 = 2.9 * 10 ^ -39, which is very unlikely, but possible.
Here's another way to approach the problem -- how about storing the full pubkey as 32 bytes instead of as a string? Then you won't ever lose any precision.

Related

Is it possible to determine ENCRYPTBYKEY maximum returned value by the clear text type?

I am going to encrypted several fields in existing table. Basically, the following encryption technique is going to be used:
CREATE MASTER KEY ENCRYPTION
BY PASSWORD = 'sm_long_password#'
GO
CREATE CERTIFICATE CERT_01
WITH SUBJECT = 'CERT_01'
GO
CREATE SYMMETRIC KEY SK_01
WITH ALGORITHM = AES_256 ENCRYPTION
BY CERTIFICATE CERT_01
GO
OPEN SYMMETRIC KEY SK_01 DECRYPTION
BY CERTIFICATE CERT_01
SELECT ENCRYPTBYKEY(KEY_GUID('SK_01'), 'test')
CLOSE SYMMETRIC KEY SK_01
DROP SYMMETRIC KEY SK_01
DROP CERTIFICATE CERT_01
DROP MASTER KEY
The ENCRYPTBYKEY returns varbinary with a maximum size of 8,000 bytes. Knowing the table fields going to be encrypted (for example: nvarchar(128), varchar(31), bigint) how can I define the new varbinary types length?
You can see the full specification here
So lets calculate:
16 byte key UID
_4 bytes header
16 byte IV (for AES, a 16 byte block cipher)
Plus then the size of the encrypted message:
_4 byte magic number
_2 bytes integrity bytes length
_0 bytes integrity bytes (warning: may be wrongly placed in the table)
_2 bytes (plaintext) message length
_m bytes (plaintext) message
CBC padding bytes
The CBC padding bytes should be calculated the following way:
16 - ((m + 4 + 2 + 2) % 16)
as padding is always applied. This will result in a number of padding bytes in the range 1..16. A sneaky shortcut is to just add 16 bytes to the total, but this may mean that you're specifying up to 15 bytes that are never used.
We can shorten this to 36 + 8 + m + 16 - ((m + 8) % 16) or 60 + m - ((m + 8) % 16. Or if you use the little trick specified above and you don't care about the wasted bytes: 76 + m where m is the message input.
Notes:
beware that the first byte in the header contains the version number of the scheme; this answer does not and cannot specify how many bytes will be added or removed if a different internal message format or encryption scheme is used;
using integrity bytes is highly recommended in case you want to protect your DB fields against change (keeping the amount of money in an account confidential is less important than making sure the amount cannot be changed).
The example on the page assumes single byte encoding for text characters.
Based upon some tests in SQL Server 2008, the following formula seems to work. Note that #ClearText is VARCHAR():
52 + (16 * ( ((LEN(#ClearText) + 8)/ 16) ) )
This is roughly compatible with the answer by Maarten Bodewes, except that my tests showed the DATALENGTH(myBinary) to always be of the form 52 + (z * 16), where z is an integer.
LEN(myVarCharString) DATALENGTH(encryptedString)
-------------------- -----------------------------------------
0 through 7 usually 52, but occasionally 68 or 84
8 through 23 usually 68, but occasionally 84
24 through 39 usually 84
40 through 50 100
The "myVarCharString" was a table column defined as VARCHAR(50). The table contained 150,000 records. The mention of "occasionally" is an instance of about 1 out of 10,000 records that would get bumped into a higher bucket; very strange. For LEN() of 24 and higher, there were not enough records to get the weird anomaly.
Here is some Perl code that takes a proposed length for "myVarCharString" as input from the terminal and produces an expected size for the EncryptByKey() result. The function "int()" is equivalent to "Math.floor()".
while($len = <>) {
print 52 + ( 16 * int( ($len+8) / 16 ) ),"\n";
}
You might want to use this formula to calculate a size, then add 16 to allow for the anomaly.

How is nvarchar(n) stored in SQL Server?

Does it occupy fixed N*2 or it may use less storage if the actual value to be stored is smaller then N*2 bytes?
I have a huge table with many fields of fixed nvarchar type. Some are nvarchar(100) and some are nvarchar(400) etc.
Data in column is never an exact size, it varies from 0 to N. Most of data is less then N/2.
For example, a field called RecipientName is of type nvarchar(400) and there are 9026424 rows.
Size of only RecipientName would be 800*9026424 = 6.72 GB.
but actual storage size of entire table is only 2.02 GB. Is there any compression applied or some smaller then N with power of 2 is chosen?
NCHAR data type:
It is a fixed length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NChar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength In Bytes], LEN(#Name) As [Length];
Name Datalength Length
Sachin 40 6
Even though declared size is 20, the data length column shows 40 bytes storage memory size because it uses 2 bytes for each character.
And this 40 bytes of memory is irrespective of the actual length of data stored.
NVARCHAR data type:
It is a variable length data type.
It Occupies 2 bytes of space for EACH CHARACTER.
It is used to store Unicode characters (e.g. other languages like Spanish, French, Arabic, German, etc.)
For Example:
Declare #Name NVarchar(20);
Set #Name = N'Sachin'
Select #Name As Name, DATALENGTH(#Name) As [Datalength], LEN(#Name) As [Length];
Name Datalength Length
Sachin 12 6
Even though declared size is 20, the data length column shows 12 bytes storage memory size because it uses 2 bytes for each character.
And this 12 bytes of memory is irrespective of the length of data in the declaration.
Hope this is helpful :)
Yes,
it may use less storage if the actual value to be stored is smaller
then N*2 bytes
n just shows the maximum number of characters that can be stored in this field, the number of stored characters is equal to actual characters number you pass in.
And here is the documentation: nchar and nvarchar (Transact-SQL)
For non-MAX, non-XML string types, the length that they are declared as (i.e. the value within the parenthesis) is the maximum number of smallest (in terms of bytes) characters that will be allowed. But, the actual limit isn't calculated in terms of characters but in terms of bytes. CHAR and VARCHAR characters can be 1 or 2 bytes, so the smallest is 1 and hence a [VAR]CHAR(100) has a limit of 100 bytes. That 100 bytes can be filled up by 100 single-byte characters, or 50 double-byte characters, or any combination that does not exceed 100 bytes. NCHAR and NVARCHAR (stored as UTF-16 Little Endian) characters can be either 2 or 4 bytes, so the smallest is 2 and hence a N[VAR]CHAR(100) has a limit of 200 bytes. That 200 bytes can be filled up by 100 two-byte characters or 50 four-byte characters, or any combination that does not exceed 200 bytes.
If you enable ROW or DATA Compression (this is a per-Index setting), then the actual space used will usually be less. NCHAR and NVARCHAR use the Unicode Compression Algorithm which is somewhat complex so not easy to calculate what it would be. And I believe that the MAX types don't allow for compression.
Outside of those technicalities, the difference between the VAR and non-VAR types is simply that the VAR types take up only the space of each individual value inserted or updated, while the non-VAR types are blank-padded and always take up the declared amount of space (which is why one almost always uses the VAR types). The MAX types are only variable (i.e. there is no CHAR(MAX) or NCHAR(MAX)).

What are some good ways to compress data across time?

I have an array of objects with time and value property. Looks something like this.
UPDATE: dataset with epoch times rather than time strings
[{datetime:1383661634, value: 43},{datetime:1383661856, value: 40}, {datetime:1383662133, value: 23}, {datetime:1383662944, value: 23}]
The array is far larger than this. Possibly a 6 digit length. I intend to build a graph to represent this array. Due to obvious reasons, I cannot use every bit of the data to build this graph (value vs time); so I need to normalize it across time.
So here's the main problem - There is no trend in the timestamp for these objects; so I need to dynamically choose slots of time in which I either average out the values or show counts of objects in that slot.
How can I calculate slots that user friendly. i.e per minute, hour, day, eight hours or so. I am looking at having a maximum of 25 slots done out of the array, which I show up on the graph.
I hope this helps get my point through.
You can convert the date/time into epoch and use numpy.histogram to get the ranges:
import random, numpy
l = [ random.randint(0, 1000) for x in range(1000) ]
num_items_bins, bin_ranges = numpy.histogram(l, 25)
print num_items_bins
print bin_ranges
Gives:
[34 38 42 41 43 50 34 29 37 46 31 47 43 29 30 42 38 52 42 44 42 42 51 34 39]
[ 1. 40.96 80.92 120.88 160.84 200.8 240.76 280.72
320.68 360.64 400.6 440.56 480.52 520.48 560.44 600.4
640.36 680.32 720.28 760.24 800.2 840.16 880.12 920.08
960.04 1000. ]
Hard to say without knowing the nature of your values, compressing values for display is a matter of what you can afford to discard and what you can't. Some ideas though:
histogram
candlestick chart
Is this JSON and the DateTimes transmitted as text?
Why not transmit the Date as a long (Int64), and use a method to convert to/from DateTime? Depending on which language you could use these implementations:
DateTime to Long in C#
Date to long using Unix timestamp in Java
That alone would save you a considerable amount of space, since strings are 16-bits per character and the long TimeStamp would be just 64 bits.

What is the maximum characters for the NVARCHAR(MAX)? [duplicate]

This question already has answers here:
What is the maximum number of characters that nvarchar(MAX) will hold?
(3 answers)
Closed 1 year ago.
I have declared a column of type NVARCHAR(MAX) in SQL Server 2008, what would be its exact maximum characters having the MAX as the length?
The max size for a column of type NVARCHAR(MAX) is 2 GByte of storage.
Since NVARCHAR uses 2 bytes per character, that's approx. 1 billion characters.
Leo Tolstoj's War and Peace is a 1'440 page book, containing about 600'000 words - so that might be 6 million characters - well rounded up. So you could stick about 166 copies of the entire War and Peace book into each NVARCHAR(MAX) column.
Is that enough space for your needs? :-)
By default, nvarchar(MAX) values are stored exactly the same as nvarchar(4000) values would be, unless the actual length exceed 4000 characters; in that case, the in-row data is replaced by a pointer to one or more seperate pages where the data is stored.
If you anticipate data possibly exceeding 4000 character, nvarchar(MAX) is definitely the recommended choice.
Source: https://social.msdn.microsoft.com/Forums/en-US/databasedesign/thread/d5e0c6e5-8e44-4ad5-9591-20dc0ac7a870/
From MSDN Documentation
nvarchar [ ( n | max ) ]
Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB).
The storage size, in bytes, is two times the actual length of data entered + 2 bytes
I think actually nvarchar(MAX) can store approximately 1070000000 chars.

Does a char occupy 1 byte in a database?

Does a char occupy 1 byte in a database?
EDIT:
If I define a column as varchar(1), will it reserve 1 or 2 bytes for me?
Char(k) takes k-bytes no matter what the value is,
varchar(k) n+1 bytes, where n = number of chars in the value, but max k+1 bytes
Value CHAR(4) Storage Required VARCHAR(4) Storage Required
'' ' ' 4 bytes '' 1 byte
'ab' 'ab ' 4 bytes 'ab' 3 bytes
'abcd' 'abcd' 4 bytes 'abcd' 5 bytes
'abcdefgh' 'abcd' 4 bytes 'abcd' 5 bytes
http://dev.mysql.com/doc/refman/5.1/en/char.html
depends on what kind of char is it. if type of string is char/varchar then 1 byte if unicode: nchar/nvarchar then most probably 2 bytes.
It depends on the RDBMS system, and how you define the column. You certainly could define one that only requires one byte of storage space [in SQL Server, it'd be CHAR(1) ]. Overhead for row headers, null bitmasks, possibly indexing uniquefication, and lots of other cruft can complicate things, but yeah, you should be able to create a column that's one byte wide.
Yes, if you specify the length of the char field as one, and the database is using a codepage based character mapping so that each character is represented as one byte.
If the database for example is set up to use UTF-8 for storing characters, each character will take anything from one to five bytes depending on what character it is.
However, the char data type is rather old, some databases may actually store a char(1) fields the same way as a varchar(1) field. In that case the field will also need a length, so it will take up at least one or two bytes depending on whether it's a space that you store in the field (which will be stored as an empty string), maybe more depending on the database.

Resources