Total memory needed to store all Geohash strings - geohashing

Geohash has precision levels ranging from 1 and 12 (e.g., 2p5qnb2). The longer a precision level, the smaller the size the Geohash covers.
What will be the total memory needed to store all Geohash strings?
For each Geohash character, it could be made up of 0 - 9 digit or 22 lowercase alphabets (all alphabets except "a", "i", "l" and "o"). This makes each character can be fit with 5 bits (2 ^ 5 = 32 possibilities from digits and lowercase alphabets), but let's assume 1 byte for each character for simplicity.
Below is my calculation:
For precision level of length 12, we will need 12 bytes each, and we will have 32 ^ 12 of Geohash strings.
Similarly, for precision level of length 12, we will have 32 ^ 11 of Geohash strings.
And so on.
So we will have 32 ^ 12 + 32 ^ 11 + 32 ^ 10 ... + 32 = ~ 1.19 x 10 ^ 18 bytes = 1,190 PBs, which seems way off to me.

Related

How to sort Hexadecimal Numbers (like 10 1 A B) in C?

I want to implement a sorting algorithm in C for sorting hexadecimal numbers like:
10 1 A B
to this:
1 A B 10
The problem that I am facing here is I didn;t understand how A & B is less than 10 as A = 10 and B = 11 in hexadecimal numbers. Im sorry if I am mistaken.
Thank you!
As mentioned in the previous comments, 10 is 0x10, so this sorting seems to be no problem: 0x1 < 0xA < 0xB < 0x10
In any base a number with two digits is always greater than a number with one digit.
In hexadecimal notation we have 6 more digits available than in decimal, but they still count as one "digit":
hexadecimal digit | value in decimal representation
A | 10
B | 11
C | 12
D | 13
E | 14
F | 15
When you get a number in hexadecimal notation, it might be that its digits happen to use none of the above extra digits, but just the well-known 0..9 digits. This can be confusing, as we still must treat them as hexadecimal. In particular, a digit in a multi-digit hexadecimal representation must be multiplied with a power of 16 (instead of 10) to be correctly interpreted. So when you get 10 as hexadecimal number, it has a value of one (1) time sixteen plus zero (0), so the hexadecimal number 10 has a (decimal) value of 16.
The hexadecimal numbers you gave should therefore be ordered as 1 < A < B < 10.
As a more elaborate example, the hexadecimal representation 1D6A can be converted to decimal like this:
1D6A
│││└─> 10 x 16⁰ = 10
││└──> 6 x 16¹ = 96
│└───> 13 x 16² = 3328
└────> 1 x 16³ = 4096
──── +
7530
Likewise
10
│└─> 0 x 16⁰ = 0
└──> 1 x 16¹ = 16
── +
16

Why is there a difference in precision range widths for decimal?

As is evident by the MSDN description of decimal certain precision ranges have the same amount of storage bytes assigned to them.
What I don't understand is that there are differences in the sizes of the range. How the range from 1 to 9 of 5 storage bytes has a width of 9, while the range from 10 to 19 of 9 storage bytes has a width of 10. Then the next range of 13 storage bytes has a width of 9 again, while the next has a width of 10 again.
Since the storage bytes increase by 4 every time, I would have expected all of the ranges to be the same width. Or maybe the first one to be smaller to reserve space for the sign or something but from then on equal in width. But it goes from 9 to 10 to 9 to 10 again.
What's going on here? And if it would exist, would 21 storage bytes have a precision range of 39-47 i.e. is the pattern 9-10-9-10-9-10...?
would 21 storage bytes have a precision range of 39-47
No. 2 ^ 160 = 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976 - which has 49 decimal digits. So this hypothetical scenario would cater for a precision range of 39-48 (as a 20 byte integer would not be big enough to hold any 49 digit numbers larger than that)
The first byte is reserved for the sign.
01 is used for positive numbers; 00 for negative.
The remainder stores the value as an integer. i.e. 1.234 would be stored as the integer 1234 (or some multiple of 10 of this dependant on the declared scale)
The length of the integer is either 4, 8, 12 or 16bytes depending on the declared precision. Some 10 digit integers can be stored in 4 bytes however to get the whole range in would overflow this so it needs to go to the next step up.
And so on.
2^32 = 4,294,967,295 (10 digits)
2^64 = 18,446,744,073,709,551,616 (20 digits)
2^96 = 79,228,162,514,264,337,593,543,950,336 (29 digits)
2^128 = 340,282,366,920,938,463,463,374,607,431,768,211,456 (39 digits)
You need to use DBCC PAGE to see this, casting the column as binary does not give you the storage representation. Or use a utility like SQL Server internals viewer.
CREATE TABLE T(
A DECIMAL( 9,0),
B DECIMAL(19,0),
C DECIMAL(28,0) ,
D DECIMAL(38,0)
);
INSERT INTO T VALUES
(999999999, 9999999999999999999, 9999999999999999999999999999, 99999999999999999999999999999999999999),
(-999999999, -9999999999999999999, -9999999999999999999999999999, -99999999999999999999999999999999999999);
Shows the first row stored as
And the second as
Note that the values after the sign bit are byte reversed. 0x3B9AC9FF = 999999999

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

Number of bits assigned for double data type

How many bits out of 64 is assigned to integer part and fractional part in double. Or is there any rule to specify it?
Note: I know I already replied with a comment. This is for my own benefit as much as the OPs; I always learn something new when I try to explain it.
Floating-point values (regardless of precision) are represented as follows:
sign * significand * βexp
where sign is 1 or -1, β is the base, exp is an integer exponent, and significand is a fraction. In this case, β is 2. For example, the real value 3.0 can be represented as 1.102 * 21, or 0.112 * 22, or even 0.0112 * 23.
Remember that a binary number is a sum of powers of 2, with powers decreasing from the left. For example, 1012 is equivalent to 1 * 22 + 0 * 21 + 1 * 20, which gives us the value 5. You can extend that past the radix point by using negative powers of 2, so 101.112 is equivalent to
1 * 22 + 0 * 21 + 1 * 20 + 1 * 2-1 + 1 * 2-2
which gives us the decimal value 5.75. A floating-point number is normalized such that there's a single non-zero digit prior to the radix point, so instead of writing 5.75 as 101.112, we'd write it as 1.01112 * 22
How is this encoded in a 32-bit or 64-bit binary format? The exact format depends on the platform; most modern platforms use the IEEE-754 specification (which also specifies the algorithms for floating-point arithmetic, as well as special values as infinity and Not A Number (NaN)), however some older platforms may use their own proprietary format (such as VAX G and H extended-precision floats). I think x86 also has a proprietary 80-bit format for intermediate calculations.
The general layout looks something like the following:
seeeeeeee...ffffffff....
where s represents the sign bit, e represents bits devoted to the exponent, and f represents bits devoted to the significand or fraction. The IEEE-754 32-bit single-precision layout is
seeeeeeeefffffffffffffffffffffff
This gives us an 8-bit exponent (which can represent the values -126 through 127) and a 22-bit significand (giving us roughly 6 to 7 significant decimal digits). A 0 in the sign bit represents a positive value, 1 represents negative. The exponent is encoded such that 000000012 represents -126, 011111112 represents 0, and 111111102 represents 127 (000000002 is reserved for representing 0 and "denormalized" numbers, while 111111112 is reserved for representing infinity and NaN). This format also assumes a hidden leading fraction bit that's always set to 1. Thus, our value 5.75, which we represent as 1.01112 * 22, would be encoded in a 32-bit single-precision float as
01000000101110000000000000000000
|| || |
|| |+----------+----------+
|| | |
|+--+---+ +------------ significand (1.0111, hidden leading bit)
| |
| +---------------------------- exponent (2)
+-------------------------------- sign (0, positive)
The IEEE-754 double-precision float uses 11 bits for the exponent (-1022 through 1023) and 52 bits for the significand. I'm not going to bother writing that out (this post is turning into a novel as it is).
Floating-point numbers have a greater range than integers because of the exponent; the exponent 127 only takes 8 bits to encode, but 2127 represents a 38-digit decimal number. The more bits in the exponent, the greater the range of values that can be represented. The precision (the number of significant digits) is determined by the number of bits in the significand. The more bits in the significand, the more significant digits you can represent.
Most real values cannot be represented exactly as a floating-point number; you cannot squeeze an infinite number of values into a finite number of bits. Thus, there are gaps between representable floating point values, and most values will be approximations. To illustrate the problem, let's look at an 8-bit "quarter-precision" format:
seeeefff
This gives us an exponent between -7 and 8 (we're not going to worry about special values like infinity and NaN) and a 3-bit significand with a hidden leading bit. The larger our exponent gets, the wider the gap between representable values gets. Here's a table showing the issue. The left column is the significand; each additional column shows the values we can represent for the given exponent:
sig -1 0 1 2 3 4 5
--- ---- ----- ----- ----- ----- ----- ----
000 0.5 1 2 4 8 16 32
001 0.5625 1.125 2.25 4.5 9 18 36
010 0.625 1.25 2.5 5 10 20 40
011 0.6875 1.375 2.75 5.5 11 22 44
100 0.75 1.5 3 6 12 24 48
101 0.8125 1.625 3.25 6.5 13 26 52
110 0.875 1.75 3.5 7 14 28 56
111 0.9375 1.875 3.75 7.5 15 30 60
Note that as we move towards larger values, the gap between representable values gets larger. We can represent 8 values between 0.5 and 1.0, with a gap of 0.0625 between each. We can represent 8 values between 1.0 and 2.0, with a gap of 0.125 between each. We can represent 8 values between 2.0 and 4.0, with a gap of 0.25 in between each. And so on. Note that we can represent all the positive integers up to 16, but we cannot represent the value 17 in this format; we simply don't have enough bits in the significand to do so. If we add the values 8 and 9 in this format, we'll get 16 as a result, which is a rounding error. If that result is used in any other computation, that rounding error will be compounded.
Note that some values cannot be represented exactly no matter how many bits you have in the significand. Just like 1/3 gives us the non-terminating decimal fraction 0.333333..., 1/10 gives us the non-terminating binary fraction 1.10011001100.... We would need an infinite number of bits in the significand to represent that value.
a double on a 64 bit machine, has one sign bit, 11 exponent bits and 52 fractional bits.
think (1 sign bit) * (52 fractional bits) ^ (11 exponent bits)

Converting 8 bits to a scaled 12 bits equivalent

I need to convert an 8 bit number (0 - 255 or #0 - #FF) to its 12 bit equivalent (0 - 4095 or #0 - #FFF)
I am not wanting to do just a straight conversion of the same number. I am wanting to represent the same scale, but in 12 bits.
For example:-
0xFF in 8 bits should convert to 0xFFF in 12 bits
0x0 in 8 bits should convert to 0x0 in 12 bits
0x7F in 8 bits should convert to 0x7FF in 12 bits
0x24 in 8 bit should convert to 0x249 in 12 bits
Are there any specific algorithms or techniques that I should be using?
I am coding in C
Try x << 4 | x >> 4.
This has been updated by the OP, changed from x << 4 + x >> 4
If you are able to go through a larger domain then this may help:
b = a * ((1 << 12) - 1) / ((1 << 8) - 1)
It is ugly but preserves scaling almost as requested. Of course you can put constants.
What about:
x = x ?((x + 1) << 4) - 1 :0
I use mathematical equation y=mx+c
Assuming low range of values is zero.
You can scale your data by a factor of m (Multiple for increasing range and divide for decreasing)
Ex.
My ADC data was 12 bit. Range in integer =0 to 4095
I want to shrink this data in range 0 to 255.
m=(y2-y1/x2-x1)
m=(4095-0/255-0)
m=16.05 = 16
So data received in 12 bits is divided by 16 to convert to 8 bits.
This conversion is linear in nature.
Hope this is also a good idea.
Image Link

Resources