Matrices - memory - arrays

Let's say that I have a matrix A=[];
I want to know if there is any way to represent it in a way where only the filled blocks must occupy memory and remaining must not, e.g.:
A = 1 0 0
0 1 0
0 0 1
Now, every block would take 1 bit of memory to store the matrix,
hence I would like to know is it possible to store matrix as:
A = 1
1
1
and the empty spaces must not occupy any memory at all. Is there any file format to represent a matrix in such a way?

No. You're dealing with bits. It would take MORE memory to store a list of the "filled" bits than it would to simply store the bits. e.g. for a simple 1x8 matrix:
0 1 2 3 4 5 6 7 <---bit-wise addresses
m = [0,1,0,0,0,1,1,1]
could be stored as a SINGLE byte of memory, at a storage ratio of 1 bit per bit.
To store just the locations of the SET bits would take 4 bytes. If all of the bits were set, you'd need 8 bytes to store those locations. So now you've got from a constant 1 byte requirement, to a variable 0 -> 8 bytes.

You could develop an way where you can store Informatiosn about the positions in a List but that would at least consummee more memmory as you would win this way. So at least no.

Related

AVX2 instruction to combine first and third elements of two packed doubles

I have two AVX2 256 bit registers (i.e. __m256d) that store doubles. The first stores 0 1 2 3 and the other stores 4 5 6 7. I would like to get 0 2 4 6, i.e. combine the first and third elements of each register and store them in a __m256d. Is there any instruction to achieve this directly? If not, what is the fastest way to get the desired result?
Thank you for your help

Confusion about a memory alignment example

When reading some posts for memory alignment knowlodge, I have a question about a good answer from What is aligned memory allocation?, #dan04.
Reading the example he gives,
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | | words
The problem is that on some CPU architectures, the instruction to load a 4-byte integer from memory only works on word boundaries. So your program would have to fetch each half of b with separate instructions.
Why can't (Can it?) read the 4 bytes(a word, assume 32bits) directly that contains b?
For example, if I want b
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word(assume it's 32 bit, get b directly)
read 1 word starts from address 2.
if I want a
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word
read 1 word starts from address 0 and get the first 2 bytes and discard the latter 2 bytes.
if I want c and d
0 1 2 3 4 5 6 7
|a|a|b|b|b|b|c|d| bytes
| | a word
read 1 word starts from address 4 and get the last 2 bytes and discard the first 2 bytes.
Then it seems alignment is not needed which is definitely incorrect..
I must have misunderstood something or lack some other knowledge, please help correct me..
"Why can't (Can it?) read the 4 bytes(a word, assume 32bits) directly that contains b?"
The answer you have quoted already right above. The key is "on word boundaries". That is not the same as "in word size". I.e. those CPUs can read word width only from exactly N*wordwidth, not from N*wordwidth+2.
A wordboundary (only applicable on the mentioned platforms) is a clean multiple of the wordwidth. 0, 4, 8, 12... But not 2, 6, 10...
Picking up your phrasing from comment, yes.
Those CPUs can only read from address 0, 4, 8, 12, 16 and so on.
E.g. one word from addresses 0-3, one word from address 4-7.
(Note the added 12.)

Integer compression method

How can I compress a row of integers into something shorter ?
Like:
Input: '1 2 4 5 3 5 2 3 1 2 3 4' -> Algorithm -> Output: 'X Y Z'
and can get it back the other way around? ('X Y Z' -> '1 2 4 5 3 5 2 3 1 2 3 4')
Note:Input will only contain numbers between 1-5 and the total string of number will be 10-16
Is there any way I can compress it to 3-5 numbers?
Here is one way. First, subtract one from each of your little numbers. For your example input that results in
0 1 3 4 2 4 1 2 0 1 2 3
Now treat that as the base-5 representation of an integer. (You can choose either most significant digit first or last.) Calculate the number in binary that means the same thing. Now you have a single integer that "compressed" your string of little numbers. Since you have shown no code of your own, I'll just stop here. You should be able to implement this easily.
Since you will have at most 16 little numbers, the maximum resulting value from that algorithm will be 5^16 which is 152,587,890,625. This fits into 38 bits. If you need to store smaller numbers than that, convert your resulting value into another, larger number base, such as 2^16 or 2^32. The former would result in 3 numbers, the latter in 2.
#SergGr points out in a comment that this method does not show the number of integers encoded. If that is not stored separately, that can be a problem, since the method does not distinguish between leading zeros and coded zeros. There are several ways to handle that, if you need the number of integers included in the compression. You could require the most significant digit to be 1 (first or last depends on where the most significant number is.) This increases the number of bits by one, so you now may need 39 bits.
Here is a toy example of variable length encoding. Assume we want to encode two strings: 1 2 3 and 1 2 3 0 0. How the results will be different? Let's consider two base-5 numbers 321 and 00321. They represent the same value but still let's convert them into base-2 preserving the padding.
1 + 2*5 + 3*5^2 = 86 dec = 1010110 bin
1 + 2*5 + 3*5^2 + 0*5^3 + 0*5^4 = 000001010110 bin
Those additional 0 in the second line mean that the biggest 5-digit base-5 number 44444 has a base-2 representation of 110000110100 so the binary representation of the number is padded to the same size.
Note that there is no need to pad the first line because the biggest 3-digit base-5 number 444 has a base-2 representation of 1111100 i.e. of the same length. For an initial string 3 2 1 some padding will be required in this case as well, so padding might be required even if the top digits are not 0.
Now lets add the most significant 1 to the binary representations and that will be our encoded values
1 2 3 => 11010110 binary = 214 dec
1 2 3 0 0 => 1000001010110 binary = 4182 dec
There are many ways to decode those values back. One of the simplest (but not the most efficient) is to first calculate the number of base-5 digits by calculating floor(log5(encoded)) and then remove the top bit and fill the digits one by one using mod 5 and divide by 5 operations.
Obviously such encoding of variable-length always adds exactly 1 bit of overhead.
Its call : polidatacompressor.js but license will be cost you, you have to ask author about prices LOL
https://github.com/polidatacompressor/polidatacompressor
Ncomp(65535) will output: 255, 255 and when you store this in database as bytes you got 2 char
another way is to use "Hexadecimal aka base16" in javascript (1231).toString(16) give you '4cf' in 60% situation it compress char by -1
Or use base10 to base64 https://github.com/base62/base62.js/
4131 --> 14D
413131 --> 1Jtp

Can big-endian order be associated with the way Englishmen say the numbers under 100 and small-endian with the way Germans say those numbers?

I was thinking that for me and most people around me big-endian order of the bytes in memory seems the most natural way of arranging numbers.
You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight
The most significant digit is written first and then you continue to write the next digits from the next most significant to the least significant This is the same way you say the numbers.
But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit.
I think this is a good analogy to endianness.
"I was thinking that for me... Big-endian order of the bytes in memory seems the most natural way of arranging numbers... You start with the most significant bytes, just like you write the numbers down"
Actually all binary data (zero/one bits) is written in MSB format. We always write the value as starting with MSD (Most-Significant Digit) on the left side, just like in real-life.
However, with having 8 slots within a byte to fill, we write the value itself starting from right side and increasing upwards by shifting to the left. PS: Endianness only applies at multi-byte level.
Summarily: In a single byte (holding a < 100 value like 28 or even 99)
The value 28 is written as 28 (but since it's binary format, it looks like : 11100).
To write value we start at right side : x x x 1 1 1 0 0 (where most-left 1is the MSD).
So the value itself is written in MSB style, but noted within the byte using LSB style of writing.
There is no concept of endiannes within a single-byte value
Example : Imagine bits were slots for holding 0-9 digits...
We still write 28 as : [0 0 0 0 0 0 2 8] so the twenties part is placed like MSB but the whole value starts from the right as if written in LSB style.
Since a single byte does not have endianness, writing value 28 is never going to look like : [0 0 0 0 0 0 8 2] and never as [2 8 0 0 0 0 0 0] since that would give an incorrect 82 or incorrect 28 million values.
"You start with the most significant bytes, just like you write the numbers down and just like you spell them e.g twenty-eight... But the German people say this number in reverse. They say the number beginning with the least significant digit and then continue with the most significant digit. I think this is a good analogy to endianness."
Sorry. No it isn't. It stopped being a good analogy as a soon as you mentioned that it involves one byte. A verbally spoken eight-twenty phrase could mean a different thing compared to the written decimal value 820.
What about the English eight-ten (aka eight-teen) for value 18? By your logic the Germans also say eight-ten, right? What happens to eight-ten when a machine is told to simply "reverse" the input when converting between English and German style?

Finding row with maximum no. of 1s if each row is sorted using logicalOR approach

Question similar to this may have been discussed before but I want to discuss a different approach to this.
Given a boolen 2D array where each row is sorted, find the rows with maximum number of 1s.
Input Matrix :
0 1 1 1
0 0 1 1
1 1 1 1
0 0 0 0
Output : 2
How about doing this approach...Logical OR for column 0 of each row and if answer is 1, return that row index and stop. Like in this case if I do (0 | 0 | 1 | 0) answer would be one and thereby return that row index. if the input matrix is something like :
Input matrix:
0 1 1 1
0 0 1 1
0 0 0 1
0 0 0 0
Ouput : 0
When I do logicalOR of column 0 of each row, answer would be zero...so I would move to column 1 of each row, the procedure is followed till the LogicalOR is 1.?I know other approaches to solve this problem but I would like to have view on this approach.
If it's:
0 ... 0 1
0 ... 0 0
0 ... 0 0
0 ... 0 0
0 ... 0 0
You'd have to search many columns.
The maximum amount of work involved would be linear in the number of cells (O(mn)), and the other approaches outperform this here.
Specifically the approach where:
You start at the top right and
Repeatedly:
Search left until you find a 0 and
Search down until you find a 1
And return the last row where you found a 1
Is linear in the number of rows plus columns (O(m + n)).
That would work since it's equivalent to finding the row for which the leftmost 1 is before (or at the same point as) any other row's leftmost 1. It would still be O(m * n) in the worst case:
Input Matrix :
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
Given that your rows are sorted, I would binary search for the position of the first one for each row, and return the row with the minimum position. This would be O(m * logn), although you might be able to do better.
Your approach is likely to be orders of magnitude slower than the naive "go through the rows, and count the zeros, and remember the row with the fewest zeros." The reason is that, assuming your bits are stored one-row-at-a-time, with the bools packed tightly, then memory for the row will be in cache all at once, and bit-counting will cache beautifully.
Contrast this to your proposed approach, where for each row, the cache line will be loaded, and a single bit will be read from it. By the time you've cycled through all the rows in your array, the memory for the first row will (probably, if you've got any reasonable number of rows), be out of the cache, and the row will have to be loaded again.
Approximately, assuming a 64B cache line, the first approach is going to need (1/64*8) memory accesses per bit in the array, compared to 1 memory access per bit in the array compared to yours. Since counting the bits and remembering the max is just a few cycles, it's reasonable to think that the memory access are going to dominate the running cost, which means the first approach will run approximately 64 * 8 = 512 times faster. Of course, you'll get some of that time back because your approach can terminate early, but the 512 times speed hit is a large cost to overcome.
If your rows are super-long, you may find that a hybrid between these two approaches works excellently: count the number of bits in the first cache-line's worth of data in each row (being careful to cache-line-align each row of your data in memory), and if every row has no bits set in the first cache-line, go to the second and so forth. This combines the cache-efficiency of the first approach with the early termination of the second approach.
As with all optimisations, you should measure results, and be sure that it's important that the code is fast. The efficient solution is likely to impose annoying restrictions (like 64-byte memory alignment for rows), and the code will be harder to read than a straightforward solution.

Resources