I need to create a large binary matrix that is over the array size limit for MATLAB.
By default, MATLAB creates integer arrays as double precision arrays. But since my matrix is binary, I am hoping that there is a way to create an array of bits instead of doubles and consume far less memory.
I created a random binary matrix A and converted it to a logical array B:
A = randi([0 1], 1000, 1000);
B=logical(A);
I saved both as .mat files. They take up about the same space on my computer so I don't think MATLAB is using a more compact data type for logicals, which seems very wasteful. Any ideas?
Are you sure that the variables take the same amount of space? logical data matrices / arrays are inherently 1 byte per number where as randi is double precision, which is 8 bytes per number. A simple call to whos will show you how much memory each variable takes:
>> A = randi([0 1], 1000, 1000);
>> B = logical(A);
>> whos
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
B 1000x1000 1000000 logical
As you can see, A takes 8 x 1000 x 1000 = 8M bytes where as B takes up 1 x 1000 x 1000 = 1M bytes. There is most certainly memory savings between them.
The drawback with logicals is that it takes 1 byte per number, and you're looking for 1 bit instead. The best thing I can think of is to use an unsigned integer type and interleave chunks of N-bits where N is the associated bit precision of the data type, so uint8, uint16, uint32 etc. into a single interleaved array. As such, 32 digits can be interleaved per number and you can save this final matrix.
Going off on a tangent - Images
In fact, this is how Java packs colour pixels when reading images in using their BufferedImage class. Each pixel in a RGB image is 24 bits, where there are 8 bits per colour channel - red, green and blue. Each pixel is represented as a proportion of red, green and blue, and they concatenate the trio of 8 bits into a single 24-bit integer. Usually, integers are represented as 32 bits and so you may think that there are 8 extra bits being wasted. There is in fact an alpha channel that represents the transparency of each colour pixel and that is another 8 bits to represent this. If you don't use transparency, these are assumed to be all 1s, and so the collection of these 4 pairs of 8 bits constitute 32 bits per pixel. There is, however, compression algorithms to reduce the size of each pixel on average to significantly less than 32 bits per pixel, but that's outside the scope of what I'm talking about.
Going back to our discussion, one way to represent this binary matrix in bit form would be perhaps in a for loop like so:
Abin = zeros(1, ceil(numel(A)/32), 'uint32');
for ii = 1 : numel(Abin)
val = A((ii-1)*32 + 1:ii*32);
dec = bin2dec(sprintf('%d', val));
Abin(ii) = dec;
end
Bear in mind that this will only work for matrices where the total number of elements is divisible by 32. I won't go into how to handle the case where it isn't because I solely want to illustrate the point that you can do what you ask, but it requires a bit of manipulation. Your case of 1000 x 1000 = 1M is certainly divisible by 32 (you get 1M / 32 = 31250), and so this will work.
This is probably not the most optimized code, but it gets the point across. Basically, we take chunks of 32 numbers (0/1) going column-wise from left to right and determining the 32-bit unsigned integer representation of this number. We then store this in a single location in the matrix Abin. What you will get in the end, given your 1000 x 1000 matrix is 31250 32-bit unsigned integers, which corresponds to 1000 x 1000 bits, or 1M bits = 125,000 bytes.
Try looking at the size of each variable now:
>> whos
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
Abin 1x31250 125000 uint32
B 1000x1000 1000000 logical
To perform a reconstruction, try:
Arec = zeros(size(A));
for ii = 1 : numel(Abin)
val = dec2bin(Abin(ii), 32) - '0';
Arec((ii-1)*32 + 1:ii*32) = val(:);
end
Also not the most optimized, but it gets the point across. Given the "compressed" matrix Abin that we calculated before, for each element, we reconstruct what the original 32-bit number was then assign these numbers in 32-bit chunks stored in Arec.
You can verify that Arec is indeed equal to the original matrix A:
>> isequal(A, Arec)
ans =
1
Also, check out the workspace with whos:
>> whos
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
Abin 1x31250 125000 uint32
Arec 1000x1000 8000000 double
B 1000x1000 1000000 logical
You are storing your data in a compressed file format. For mat files in version 7.0 and 7.3 gzip compression is used. The uncompressed data has different sizes, but after compression both are compressed down to roughly the same size. That happened because both data contains only 0 and 1 which can be compressed efficient.
Related
The limit size of a BLE packet is 20 bytes. I need to transfer the following data over it:
struct Data {
uint16_t top;
uint16_t bottom;
float accX;
float accY;
float accZ;
float qx;
float qy;
float qz;
float qw;
};
Size of Data is 32 bytes. The precision of floats can not be sacrificed, since they represent accelerometers and quaternions, and would create a huge drift error if not represented precisely (data would be integrated over time).
I don't want to send 2 packets as well, as it's really important that whole data is taken at the same time.
I'm planning to take advantage of the range instead.
Accelerometer are IEEE floats in the range of [-10, 10]
Quaternions are IEEE floats in the range of [-1, 1]. We could remove w, as x^2 + y^2 + z^2 + w^2 = 1
Top, and bottom are 10-bit each.
Knowing this information, how can I serialize Data using at most 20 bytes?
Assuming binary32, code is using 2*16 + 7*32 bits (256 bits) and OP wants to limit to 20*8 bits (160).
Some savings:
1) 10 bit uint16_t,
2) reduced exponent range saves a bit or 2 per float - would save a few more bits if OP stated the _minimum exponent as well. (estimate 4 bits total)
3) Not coding w.
This make make for 2*10 + 6*(32-4) = 188 bits, still not down to 160.
OP says "The precision of floats can not be sacrificed" implying the 24 bit (23- bits explicitly coded) significand is needed. 7 float * 23 bits is 161 bits and that is not counting the sign, exponent nor 2 uint16_t.
So unless some pattern or redundant information can be eliminated, OP is outta luck.
Suggest taking many samples of data and try compressing it using LZ or other compression techniques. If OP ends up with significantly less than 20 bytes per averagedata, then the answer is yes - in theory, else you are SOL.
For a 32 bit integer, divide it into 32 bins of consecutive integers such that there are twice as many integers in each successive bin. The first bin contains 0, the second 0..1, etc up to 0..2^31-1.
The fastest algorithm I could come up with, given a 32 bit integer i, is 5 cycles on an i7 (bit scan is 3 cycles):
// bin is the number of leading zeroes, and then we clear the msb to get item
bin_index = bsr(i)
item = i ^ (1 << bin_index)
Or equivalently (well it stores the items 0..2^(32-1) in bin 0 and 0 in bin 31, but that doesn't matter):
// bin is the number of trailing zeroes, and then we shift down by that many bits + 1
bin_index = bsf(i)
item = i >> (bin_index + 1)
In each case the bin index is encoded as the number of leading/trailing zero bits, with a 1 to separate them from the item number. You could do the same with leading or trailing ones and a zero to separate them. Neither works with i=0, but that's not important.
The mapping between integers and the bins/items can be completely arbitrary, so long as twice as many consecutive integers end up in each successive bin and the total number of integers in the bins sums to 2^32-1. Can you think of a more efficient algorithm to bin the 32 integers on an i7? Keep in mind an i7 is superscalar so any operations that don't depend on each other can execute in parallel, up to the throughput for each instruction type.
You can improve your algorithm by trying to sort the data first before counting zeros.
For example , compare it to 2^31 first and if its greater put it in that bin, otherwise go on and count trailing zeros. With this you now have half your data set put into its bin in 2 instructions...probably two cycles. The other half would take a bit longer but the net result would be an improvement. You can likely optimize even further following this line if thought.
I guess this would also be dependent on the efficiency ofbranch prediction.
Disclaimer: I am asking these questions in relation to an assignment. The assignment itself calls for implementing a bitmap and doing some operations with that, but that is not what I am asking about. I just want to understand the concepts so I can try the implementation for myself.
I need help understanding bitmaps/bit arrays and bitwise operations. I understand the basics of binary and how left/right shift work, but I don't know exactly how that use is beneficial.
Basically, I need to implement a bitmap to store the results of a prime sieve (of Eratosthenes.) This is a small part of a larger assignment focused on different IPC methods, but to get to that part I need to get the sieve completed first. I've never had to use bitwise operations nor have I ever learned about bitmaps, so I'm kind of on my own to learn this.
From what I can tell, bitmaps are arrays of a bit of a certain size, right? By that I mean you could have an 8-bit array or a 32-bit array (in my case, I need to find the primes for a 32-bit unsigned int, so I'd need the 32-bit array.) So if this is an array of bits, 32 of them to be specific, then we're basically talking about a string of 32 1s and 0s. How does this translate into a list of primes? I figure that one method would evaluate the binary number and save it to a new array as decimal, so all the decimal primes exist in one array, but that seems like you're using too much data.
Do I have the gist of bitmaps? Or is there something I'm missing? I've tried reading about this around the internet but I can't find a source that makes it clear enough for me...
Suppose you have a list of primes: {3, 5, 7}. You can store these numbers as a character array: char c[] = {3, 5, 7} and this requires 3 bytes.
Instead lets use a single byte such that each set bit indicates that the number is in the set. For example, 01010100. If we can set the byte we want and later test it we can use this to store the same information in a single byte. To set it:
char b = 0;
// want to set `3` so shift 1 twice to the left
b = b | (1 << 2);
// also set `5`
b = b | (1 << 4);
// and 7
b = b | (1 << 6);
And to test these numbers:
// is 3 in the map:
if (b & (1 << 2)) {
// it is in...
You are going to need a lot more than 32 bits.
You want a sieve for up to 2^32 numbers, so you will need a bit for each one of those. Each bit will represent one number, and will be 0 if the number is prime and 1 if it is composite. (You can save one bit by noting that the first bit must be 2 as 1 is neither prime nor composite. It is easier to waste that one bit.)
2^32 = 4,294,967,296
Divide by 8
536,870,912 bytes, or 1/2 GB.
So you will want an array of 2^29 bytes, or 2^27 4-byte words, or whatever you decide is best, and also a method for manipulating the individual bits stored in the chars (ints) in the array.
It sounds like eventually, you are going to have several threads or processes operating on this shared memory.You may need to store it all in a file if you can't allocate all that memory to yourself.
Say you want to find the bit for x. Then let a = x / 8 and b = x - 8 * a. Then the bit is at arr[a] & (1 << b). (Avoid the modulus operator % wherever possible.)
//mark composite
a = x / 8;
b = x - 8 * a;
arr[a] |= 1 << b;
This sounds like a fun assignment!
A bitmap allows you to construct a large predicate function over the range of numbers you're interested in. If you just have a single 8-bit char, you can store Boolean values for each of the eight values. If you have 2 chars, it doubles your range.
So, say you have a bitmap that already has this information stored, your test function could look something like this:
bool num_in_bitmap (int num, char *bitmap, size_t sz) {
if (num/8 >= sz) return 0;
return (bitmap[num/8] >> (num%8)) & 1;
}
I am a newbie for solving these kind of problems. I need to extract variable no of bits from a single short value.
Like If I have read something from an array and need to fill another array first reading the 10 bits from earlier read value , and then again 6 bits to another short.
Like:
int pixelNo = 0;
short pixelValue_part = pixels[pixelNo];
// but here i need only 10 bits , in the second
// iteration i might need 4 bits and then so on so forth.
After reading these shorts in parts, I will have to put these parts into
second array sequentially inorder to arrange all pixels in the sequence.
Note :
The problem is of arranging all input pixel sequence in ascending ordered way.I have to arrange the pixels of each having size of 10 bits. For this reason I would need read first 10 bits of short.
Edit:
| | 0,1 512,513 1024,1025 1536,1537 1,2,3 513,514,515 1025,1206,1027 1357,1538,1539
|_|____|___|___|_|___|______|______
I have above array as input and I want to produce output like the following array.
| | 0,1 2,3 4,5 6,7 ...... 513,514,515,1024,1025,1536,1537...
|_|____|___|___|_|___|______|______
Values of arrays all are Pixels of some image. So in actual the image was unarranged in pixels in input array, and then the second array is the array of arranged (sorted ) pixels.
Assuming your original data is in src, and you want a span of n bits (starting n+pos bits from the low end), this will extract those bits:
(src >> pos) & ((1<<n)-1)
Breaking it down:
(1<<n)-1 is a mask of n 1's (binary)
src >> pos slides the bits you want down to the "bottom" of the variable
Then we bitwise-and the two together, effectively erasing the bits you don't want, leaving behind the ones you do
You can do this for each piece you need. To put the pieces together, you'd use << to shift pieces where you need them to be and then | (bitwise-or) the pieces together.
could somebody explain to me why in 24-bit rgb bitmap file I have to add a padding which size depends on width of image ? What for ?
I mean I must add this code to my program (in C):
if( read % 4 != 0 ) {
read = 4 - (read%4);
printf( "Padding: %d bytes\n", read );
fread( pixel, read, 1, inFile );
}
Because 24 bits is an odd number of bytes (3) and for a variety of reasons all the image rows are required to start at an address which is a multiple of 4 bytes.
According to Wikipedia, the bitmap file format specifies that:
The bits representing the bitmap pixels are packed in rows. The size of each row is rounded up to a multiple of 4 bytes (a 32-bit DWORD) by padding. Padding bytes (not necessarily 0) must be appended to the end of the rows in order to bring up the length of the rows to a multiple of four bytes. When the pixel array is loaded into memory, each row must begin at a memory address that is a multiple of 4. This address/offset restriction is mandatory only for Pixel Arrays loaded in memory. For file storage purposes, only the size of each row must be a multiple of 4 bytes while the file offset can be arbitrary. A 24-bit bitmap with Width=1, would have 3 bytes of data per row (blue, green, red) and 1 byte of padding, while Width=2 would have 2 bytes of padding, Width=3 would have 3 bytes of padding, and Width=4 would not have any padding at all.
The wikipedia article on Data Structure Padding is also an interesting read that explains the reasons that paddings are generally used in computer science.
I presume this was design decision to align for better memory patterns while not wasting that much space (for 319px wide image you would waste 3 bytes or 0.25%)
Imagine you need to access some odd row directly. You could access first 4 pixels of n-th row by doing:
uint8_t *startRow = bmp + n * width * 3; //3 bytes per pixel
uint8_t r1 = startRow[0];
uint8_t g1 = startRow[1];
//... Repeat
uint8_t b4 = startRow[11];
Note that if n and width are odd (and bmp is even), startRow is going to be odd.
Now if you tried to do following speedup:
uint32_t *startRow = (uint32_t *) (bmp + n * width * 3);
uint32_t a = startRow[0]; //Loading register at a time is MUCH faster
uint32_t b = startRow[1]; //but only if address is aligned
uint32_t c = startRow[2]; //else code can hit bus errors!
uint8_t r1 = (a & 0xFF000000) >> 24;
uint8_t g1 = (a & 0x00FF0000) >> 16;
//... Repeat
uint8_t b4 = (c & 0x000000FF) >> 0;
You'd run into lots of problems. In best case scenario (that is intel cpu) your every load of a, b and c would need to be broken into two loads since startRow is not divisible by 4. In worst case scenario (eg. sun sparc) your program would crash with "bus error".
In newer designs it is common to force rows to be aligned to at least L1 cache line size (64 bytes on intel or 128 bytes on nvidia gpus).
Short version
Because the bmp file format specifies rows must perfectly fit in a 32bits "memory cells". Because pixels are 24bits, some combinations of pixels will not perfect sit in 32bit "cells". In this case, the cell is "padded up to" the full 32bits.
8bits per byte ∴
cell: 32bit = 4bytes ∴
pixel: 24bits = 3bytes
// If doesn't fit perfectly in 4 byte "cell"
if( read % 4 != 0 ) {
// find the difference between the "cell", and "the partial fit"
read = 4 - (read%4);
printf( "Padding: %d bytes\n", read );
// skip the difference
fread( pixel, read, 1, inFile );
}
Long version
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor
-wiki: Word_(computer_architecture)
Computer systems basically have a preferred "word length" (though not so important these days). A standard data unit allows all sorts of optimisations in the architecture of the computer system (think what shipping containers did for the shipping industry). There is a 32 bit standard called DWORD aka Double word (I guess) - and thats what typical bitmap images are optimised for.
So if you have 24bits per pixel, there will be various "literal pixels" row lengths that will not fit nicely into the 32bits. So in that case, pad it out.
Note: today, you are probably using a computer with a 64bit word size. Check your processor.
It depends on the format whether or not there is padding at the end of each row.
There really isn't much reason for it for 3 x 8 bit channel images since I/O is byte oriented anyway. For images with pixels packed into less than a byte (1 bit / pixel for example), padding is useful so that each row starts at a byte offset.