Can big-endian order be associated with the way Englishmen say the numbers under 100 and small-endian with the way Germans say those numbers? - arrays

I was thinking that for me and most people around me big-endian order of the bytes in memory seems the most natural way of arranging numbers.
You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight
The most significant digit is written first and then you continue to write the next digits from the next most significant to the least significant This is the same way you say the numbers.
But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit.
I think this is a good analogy to endianness.

"I was thinking that for me... Big-endian order of the bytes in memory seems the most natural way of arranging numbers... You start with the most significant bytes, just like you write the numbers down"
Actually all binary data (zero/one bits) is written in MSB format. We always write the value as starting with MSD (Most-Significant Digit) on the left side, just like in real-life.
However, with having 8 slots within a byte to fill, we write the value itself starting from right side and increasing upwards by shifting to the left. PS: Endianness only applies at multi-byte level.
Summarily: In a single byte (holding a < 100 value like 28 or even 99)
The value 28 is written as 28 (but since it's binary format, it looks like : 11100).
To write value we start at right side : x x x 1 1 1 0 0 (where most-left 1is the MSD).
So the value itself is written in MSB style, but noted within the byte using LSB style of writing.
There is no concept of endiannes within a single-byte value
Example : Imagine bits were slots for holding 0-9 digits...
We still write 28 as : [0 0 0 0 0 0 2 8] so the twenties part is placed like MSB but the whole value starts from the right as if written in LSB style.
Since a single byte does not have endianness, writing value 28 is never going to look like : [0 0 0 0 0 0 8 2] and never as [2 8 0 0 0 0 0 0] since that would give an incorrect 82 or incorrect 28 million values.
"You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight... But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit. I think this is a good analogy to endianness."
Sorry. No it isn't. It stopped being a good analogy as a soon as you mentioned that it involves one byte. A verbally spoken eight-twenty phrase could mean a different thing compared to the written decimal value 820.
What about the English eight-ten (aka eight-teen) for value 18? By your logic the Germans also say eight-ten, right? What happens to eight-ten when a machine is told to simply "reverse" the input when converting between English and German style?

Related

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

Is the least significant bit (LSB) always the "first" bit?

I'm reading Modern C (version Feb 13, 2018.) and on page 42 it says
It says that the bit with index 4 is the least significant bit. Isn't the bit with index 0 should be the least significant bit? (Same question about MSB.)
Which is right? What's the correct terminology?
Their definition of "most significant bit" and "least significant bit" is misleading:
8 bit Binary number : 1 1 1 1 0 0 0 0
Bit number 7 6 5 4 3 2 1 0
| | |
| | least significant bit
| |
| |
| least significant bit that is 1
|
most significant bit that is 1 and also just most significant bit
The book's definition does not align with common/typical/mainstream/correct usage. See Wikipedia, for instance:
In computing, the least significant bit (LSB) is the bit position in a binary integer giving the units value, that is, determining whether the number is even or odd.
The book, on the other hand, seems to consider only bits that are 1, so that in an 8-bit byte representing the number 16, which we can write:
00010000
the bit that is 1 has index 4 (it's b4 in the book's notation), and then it claims that that particular number's LSB is four.
The proper definition just uses LSB to denote that bit whose value is 1, i.e. the "units", and with that the LSB is the rightmost bit. This latter definition is more useful, and I really think the book is wrong.
They're using an unusual definition of LSB and MSB, which only refers to the bits that are set to 1. So in the case of 240, the first 1 bit is b4, not b0, because b0 through b3 are all 0.
I'm not sure why the book considers this definition of LSB/MSB to be useful. It's not generally interesting for integers, although it does come into play in floating point. Floating point numbers are scaled so integers above 1 have the low-order zero bits shifted away, and the exponent is incremented to make up for this (conversely, fractions have their high-order bits shifted away, and the exponent is decremented).

what is the difference between logical OR operation and binary addition?

I'm trying to understand how a binary addition and logical OR table differs.
does both carry forward 1 or if not which one does carry forward operation and which does not?
The exclusive-or (XOR) operation is like binary addition, except that
there is no carry from one bit position to the next. Thus, each bit
position can be evaluated independently of the rest.
I'll attempt to clarify a few points with a few illustrations.
First, addition. Basically like adding numbers in grade school. But if you have a 1-bit aligned with a 1-bit, you get a 0 with a 1 carry (i.e. 10, essentially analogous to 5 plus 5 in base-10). Otherwise, add them like 'regular' (base-10) numbers. For instance:
₁₁₁
1001
+ 1111
______
11000
Note that in the left-most column two 1's are added to give 10, which with another 1 gives 11 (similar to 5 + 5 + 5).
Now, assuming by "logical OR" you mean something along the lines of bitwise OR (an operation which basically performs the logical OR (inclusive) operation on each pair of corresponding bits), then you have this:
1001
| 1111
______
1111
Only case here you should have a 0 bit is if both bits are 0.
Finally, since you tagged this question xor, which I assume is bitwise as well.
1001
^ 1111
______
0110 = 110₂
In this case, two 1-bits give a 0, and of course two 0-bits give 0.
With a logical OR you get a logical result (Boolean). IOW true OR true is true (anything other than false OR false is true). In some languages (like C) any numeric value other than 0 means true. And some languages use an explicit datatype for true, false (bool, Boolean).
In case of binary OR, you are ORing the bits of two binary values. ie: 1 (which is binary 1) bitwise OR 2 (which is binary 10) is binary 11:
01
10
11
which is 3. Thus binary OR is also an addition when the values do not have shared bits (like flag values).

Why is the answer to: How many ways to write a 15 bit string with at least 3 1s

I was going over my textbook to review permutations and combinatorics, which I have great difficulty comprehending despite seeming simple and came across this problem.
How many ways are there to write a length 15 string using binary if there must be exactly 3 "1's" and 12 "0's".
The answer to the problem was C(15, 3) or C(15, 12). Now, I understand why there are two possible solutions to the problem, but I'm puzzled as to why the answer is C(15, 12) || C(15, 3)
From my understanding, we're choosing three (or twelve) of the digits to be 1 (or 0), which is good and all, but how does that ensure that the remaining digits are the remainings 0's or 1's?
tl;dr: By using C(15,3) we ensure that we have the # of ways three digits will be 1, but how does that guarantee the remaining 12 will be 0s?
Go back to first principals:
Start with all 15 bits set to 0 [1 way to do this]
Choose 1 bit and flip it [15 ways to do this]
Choose a different bit and flip it [14 ways to do this]
Choose yet another bit and flip it [13 ways to do this]
It should be clear that exactly 3 bits are 1's and the remaining 12 are 0's
Total number of ways to do this: 1 x 15 x 14 x 13 = C(15, 3)

Different bases for radix sort in C

I am having a difficult time understanding radix sort. I have no problems implementing code to work with bases of 2 or 10. However, I have an assignment that requires a command line argument to specify the radix. The radix can be anywhere from 2 - 100,000. I have spent around 10 hours trying to understand this problem. I am not asking for a direct answer, because this is homework. However, if anyone can shed some light on this, please do.
A few things I don't understand. What is the point of having base 100,000? How would that even work. I understand having a base for every letter of the alphabet, or every number 1-9. I just can't seem to wrap my head around this concept.
I'm sorry if I haven't been specific enough.
A number N in any base B is just a series of digits in the range [0, B-1]. Since we don't have enough symbols to represent all the digits in a "normal" human writing system, don't think about how it's written in characters. You'll just need to know that the digits are stored/written separately
For example 255 in base 177 is a 2-digit number in which the first digit has value 1 and the second digit has value 78 since 25510 = 1×1771 + 78×1770. If some culture uses this base they'll have 177 distinct symbols for the digits and they write it in only 2 digits. Since we only have 10 symbols we'll need to define some symbol to delimit the digits, which is often :. As you can see from Wolfram Alpha, 25510 = 1:78177
Note that not all people count in base 10. There exists cultures that count in base 4, 5, 6, 8, 12, 15, 16, 20, 24, 27, 32, 36, 60... so they'll have more or less symbols than most of us. However among the non-decimal bases, only base 20, 12 and 60 are most commonly used nowadays.
In base 100000 it's the same. 1234567890987654321 will be a 4-digit number written as symbols with value 1234, 56789, 9876, 54321 in order
I was about to explain it in a comment, but basically you're talking about what we sometimes call "modular arithmetic." Each digit is {0...n-1} and represents that times nk, where k is the position. 255 in decimal is 5×100 + 5×101 + 2×102.
So, your 255 base 177 is hard to represent, but there's a 1 in the 177s place (177×101) and 78 in the 1s (177×100) place.
As a general pseudocode algorithm, you want something like...
n = input value
digits = []
while n > 1
quotient = n / base (as an integer)
digits += quotient
remainder = n - quotient * base
n = remainder
And you might need to check the final remainder, in case something has gone wrong.
Of course, how you represent those digits is another story. MIME is contains semi-standard way for handling up through Base-64, for example.
If it was me, I'd just delimit the digits and make it clear that's the representation, but there's all of Unicode, if you want to mess around with hexadecimal-like extensions...

Resources