I need to create an array with 3 billion boolean variables. My memory is only 4GB, therefore I need this array to be very tight (at most one byte per variable). Theoretically this should be possible. But I found that Ruby uses way too much space for one boolean variable in an array.
ObjectSpace.memsize_of(Array.new(100, false)) #=> 840
That's more than 8 bytes per variable. I would like to know if there's a more lightweight implementation of C-arrays in Ruby.
Apart from a small profile, I also need each boolean this array to be fast accessible, because I need to flip them as fast as possible on demand.
Ruby isn't a well performing language, especially in memory use. As other said, you should put your booleans in numbers. You'll lose a lot of memory due to ruby's 'objetification'. If it is a bad scenario to you, you may store into strings of a large length and store the strings in a array, losing less memory.
http://calleerlandsson.com/2014/02/06/rubys-bitwise-operators/
You also can implement your own gem in C++, that can naturally use bits and doubles, losing less memory. And array of doubles means 64 booleans in each position, more than sufficient to your application.
Extremely large objects are always a problem and will require you to implement a lot to make easier to work with your large collection of objects. Surely you'll have to at least implement some kind of method to acess some position in an array of objects that store more than one boolean, and other to flip them.
The following class may not be exactly what you're looking for. It will store 1's or 0's into an array using bits and shifting. Entries default to 0. If you need three states for each entry, 0, 1, or nil, then you'd need to change it to use two bits for each entry, rather than one.
class BitArray < Array
BITS_PER_WORD = 0.size * 8
MASK = eval("0x#{'FF' * (BITS_PER_WORD/8)}") - 1
def []=(n, value_0_or_1)
word = word_at(n / BITS_PER_WORD) || 0
word &= MASK << n % BITS_PER_WORD
super(n / BITS_PER_WORD, value_0_or_1 << (n % BITS_PER_WORD) | word)
end
def [](n)
return 0 if word_at(n / BITS_PER_WORD).nil?
(super(n / BITS_PER_WORD) >> (n % BITS_PER_WORD)) & 1
end
def word_at(n)
Array.instance_method('[]').bind(self).call(n)
end
end
Related
I'm looking for the most efficient method of pre-allocating a logical array in MATLAB without specifying true or false at the time of pre-allocation.
When pre-allocating e.g. a 1×5 numeric array I can use nan(1,5). To my mind, this is better than using zeros(1,5), since I can easily tell which slots have been filled with data versus those that are yet to be filled. If using the zeros() solution it's hard to know whether any 0s are intentional 0s or just unfilled slots in the array.
I'm aware that I can pre-alloate a logical array using true(1,5) or false(1,5). The problem with these is similar to the use of zeros() in the numeric example; there's no way of knowing whether a slot is filled or not.
I know that one solution to this problem is to treat the array as numeric and pre-allocate using nan(1,5), and only converting to a logical array later when all the slots are filled. But this strikes me as inefficient.
Is there some smart way to pre-allocate a logical array in MATLAB and remain agnostic as to the actual content of that array until it is ready to be filled?
The short answer is no, the point of a logical array is that each element takes a single byte, and the implementation is only capable of storing only two states (true=1 or false=0). You might assume that logicals only need a single bit, but in fact they need 8 bits (a byte) to avoid compromising on performance.
If memory is a concern, you could use a single array instead of a double array, moving from 64-bit to 32-bit numbers and still capable of storing NaN. Then you can cast to logical whenever required (assuming you have no NaNs by that point, otherwise it will error).
If it was really important to track whether a value was ever assigned whilst also reducing memory, you could have a 2nd logical array which you update at the same time as the first, and stores simply whether a value was ever assigned. Then this can be used as a check on whether you have any default values left after assignments. Now we've dropped from 32-bit singles to two 8-bit logicals, which is worse than one logical but still twice as efficient than using floating point numbers for the sake of the NaN. Obviously assignment operations now take twice as long as using a single logical array, I don't know how they compare to float assignments.
Going off-piste, you could make your own class to do this assignment-tracking for you, and display the logical array as if it was capable of storing NaNs. This isn't really recommended but I've written the below code to complete the thought experiment.
Note you originally ask for "the most efficient method", in terms of execution time this is definitely not going to be as efficient than the native implementation of logical arrays.
classdef nanBool
properties
assigned % Tracks whether element of "value" was ever assigned
value % Tracks boolean array
end
methods
function obj = nanBool(varargin)
% Constructor: initialise main and tracking arrays to false
% handles same inputs as using "false()" normally
obj.value = false(varargin{:});
obj.assigned = false(size(obj.value));
end
function b = subsref(obj,S)
% Override the indexing operator so that indexing works like it
% would for a logical array unless accessing object properties
if strcmp(S.type,'.')
b = obj.(S.subs);
else
b = builtin('subsref',obj.value,S);
end
end
function obj = subsasgn(obj,S,B)
% Override the assignement operator so that the value array is
% updated when normal array indexing is used. In sync, update
% the assigned state for the corresponding elements
obj.value = builtin('subsasgn',obj.value,S,B);
obj.assigned = builtin('subsasgn',obj.assigned,S,true(size(B)));
end
function disp(obj)
% Override the disp function so printing to the command window
% renders NaN for elements which haven't been assigned
a = double(obj.value);
a(~obj.assigned) = NaN;
disp(a);
end
end
end
Test cases:
>> a = nanBool(3,1)
a =
NaN
NaN
NaN
>> a(2) = true
a =
NaN
1
NaN
>> a(3) = false
a =
NaN
1
0
>> a(:) = true
a =
1
1
1
>> whos a
Name Size Bytes Class Attributes
a 1x1 6 nanBool
>> b = false(3,1); whos b
Name Size Bytes Class Attributes
b 3x1 3 logical
Note the whos test shows this custom class has the same memory footprint as two logical arrays the same size. It also shows that the size is reported incorrectly, indicating we'd also have to override the size function in our custom class, I'm sure there are lots of other similar edge cases you'd want to handle.
you could check whether there's any "logical NaNs" (unassigned values) with something like this, or add a function which does this to the class:
fullyAssigned = all(a.assigned);
In 21b and newer you can do some more controlled indexing overrides for custom classes instead of subsref and subsasgn, but I can't test this:
https://uk.mathworks.com/help/matlab/customize-object-indexing.html
I have been wondering for a while which of the two following methods are faster or better.
MY CURRENT METHOD
I'm developing a chess game and the pieces are stored as numbers (really bytes to preserve memory) into a one-dimensional array. There is a position for the cursor corresponding to the index in the array. To access the piece at the current position in the array is easy (piece = pieces[cursorPosition]).
The problem is that to get the x and y values for checking if the move is a valid move requires the division and a modulo operators (x = cursorPosition % 8; y = cursorPosition / 8).
Likewise when using x and y to check if moves are valid (you have to do it this way for reasons that would fill the entire page), you have to do something like - purely as an example - if pieces[y * 8 + x] != 0: movePiece = False. The obvious problem is having to do y * 8 + x a bunch of times to access the array.
Ultimately, this means that getting a piece is trivial but then getting the x and y requires another bit of memory and a very small amount of time to compute it each round.
A MORE TRADITIONAL METHOD
Using a two-dimensional array, one can implement the above process a little easier except for the fact that piece lookup is now a little harder and more memory is used. (I.e. piece = pieces[cursorPosition[0]][cursorPosition[1]] or piece = pieces[x][y]).
I don't think this is faster and it definitely doesn't look less memory intensive.
GOAL
My end goal is to have the fastest possible code that uses the least amount of memory. This will be developed for the unix terminal (and potentially Windows CMD if I can figure out how to represent the pieces without color using Ansi escape sequences) and I will either be using a secure (encrypted with protocol and structure) TCP connection to connect people p2p to play chess or something else and I don't know how much memory people will have or how fast their computer will be or how strong of an internet connection they will have.
I also just want to learn to do this the best way possible and see if it can be done.
-
I suppose my question is one of the following:
Which of the above methods is better assuming that there are slightly more computations involving move validation (which means that the y * 8 + x has to be used a lot)?
or
Is there perhaps a method that includes both of the benefits of 1d and 2d arrays with not as many draw backs as I described?
First, you should profile your code to make sure that this is really a bottleneck worth spending time on.
Second, if you're representing your position as an unsigned byte decomposing it into X and Y coordinates will be very fast. If we use the following C code:
int getX(unsigned char pos) {
return pos%8;
}
We get the following assembly with gcc 4.8 -O2:
getX(unsigned char):
shrb $3, %dil
movzbl %dil, %eax
ret
If we get the Y coordinate with:
int getY(unsigned char pos) {
return pos/8;
}
We get the following assembly with gcc 4.8 -O2:
getY(unsigned char):
movl %edi, %eax
andl $7, %eax
ret
There is no short answer to this question; it all depends on how much time you spend optimizing.
On some architectures, two-dimensional arrays might work better than one-dimensional. On other architectures, bitmapped integers might be the best.
Do not worry about division and multiplication.
You're dividing, modulating and multiplying by 8.
This number is in the power of two, thus any computer can use bitwise operations in order to achieve the result.
(x * 8) is the same as (x << 3)
(x % 8) is the same as (x & (8 - 1))
(x / 8) is the same as (x >> 3)
Those operations are normally performed in a single clock cycle. On many modern architectures, they can be performed in less than a single clock cycle (including ARM architectures).
Do not worry about using bitwise operators instead of *, % and /. If you're using a compiler that's less than a decade old, it'll optimize it for you and use bitwise operations.
What you should focus on instead, is how easy it will be for you to find out whether or not a move is legal, for instance. This will help your computer-player to "think quickly".
If you're using an 8*8 array, then it's easy for you to see where a castle can move by checking if only x or y is changed. If checking the queen, then X must either be the same or move the same number of steps as the Y position.
If you use a one-dimensional array, you also have advantages.
But performance-wise, it might be a real good idea to use a 16x16 array or a 1x256 array.
Fill the entire array with 0x80 values (eg. "illegal position"). Then fill the legal fields with 0x00.
If using a 1x256 array, you can check bit 3 and 7 of the index. If any of those are set, then the position is outside the board.
Testing can be done this way:
if(position & 0x88)
{
/* move is illegal */
}
else
{
/* move is legal */
}
... or ...
if(0 == (position & 0x88))
{
/* move is legal */
}
'position' (the index) should be an unsigned byte (uint8_t in C). This way, you'll never have to worry about pointing outside the buffer.
Some people optimize their chess-engines by using 64-bit bitmapped integers.
While this is good for quickly comparing the positions, it has other disadvantages; for instance checking if the knight's move is legal.
It's not easy to say which is better, though.
Personally, I think the one-dimensional array in general might be the best way to do it.
I recommend getting familiar (very familiar) with AND, OR, XOR, bit-shifting and rotating.
See Bit Twiddling Hacks for more information.
I have one set of continuous integer values and corresponding set of non-continuous values, for example:
0 -> 22
1 -> 712
2 -> 53
3 -> 12323
...
and so on.
Amount of items is very huge (about of 10^9...10^10), so using just plain array is not an option.
Is there data structure that capable of fast mapping from first values to another with moderate memory requirements? For example:
ret = map(0); // returns 22
ret = map(3); // returns 12323
Edit: values in this set are really generated using pseudo-random number generator, so it is not possible to suggest some specific distribution. Question is - is it possible to lower memory requirements (may be in price of lookup speed)? I mean using something like "perfect hashing" - time required for generate such "perfect hash" doesn't matter.
As your range is continuous, the obvious solution is to store your values in a contiguous int[]. Then value i is arr[i]. As the values generated by PRNG, it will be difficult to apply further compression.
Another solution, which trades time for space, is to store the seed of your RNG and recalculate on the fly. This approach could be improved in time, and worsened in space, by storing intermediate seeds. I.e. seed for key 1000, 2000 etc.
You may be able to save some space by using exactly the number of bits required by each value. For example if your values are only 24 bits, you can save a byte over 32-bit integers. That said, there is only so much memory you can save.
Ob 64-bit machines it would be feasible to mmap() a file to a memory address, thus getting over the physical memory limit by using disk storage, at the price of performance.
But since you mentioned using a pseudo-random generator to generate the values, how about just storing the RNG seed for specific indexes and calculating the rest of the values as needed? For example you could store the seed for indexes 0, 100, 200, ... and calculate e.g. 102 by re-seeding the RNG for 100 and calling the generator function three times.
Such an approach would reduce the memory needed by a large factor (100 in this case) and you could lessen the performance cost by bunching or caching your queries.
If the range of your function is the set of numbers generated by a pseudo-random number generator in sequence then you can compress the series down to, well, to the code which generates the sequence plus the state of the PRNG before starting. For example, the (infinite) series of digits comprising the decimal expansion of pi is easily (and, technically, infinitely) compressed to the code to generate that series; your series could be seen as an example of something almost identical.
So, if you are willing to wait for a long time to get the last elements in the series, you can get very good compression, by writing your series not into a data structure but out of a function. That is at one end of your time/space trade-off spectrum.
At the other end of the spectrum is an array of all the numbers; this uses lots of space but gives very quick (O(1)) access to any desired element in the set. This doesn't seem to appeal to you for a variety of reasons, but I'm not sure that a cleverer data structure than an array will offer much space saving, or, for that matter, time saving.
The one obvious solution I see is to save a set of intermediate states of the PRNG at intervals, so your 'data' structure would become:
ret(0) = prng(seed, other_parameters, ...)
ret(10^5-1) = prng(seed', other_parameters, ...)
ret(2*(10^5)-1) = prng(seed'', other_parameters, ...)
etc. then, to get element 9765, say, you read (the state of the PRNG at) ret(0) and generate the 9765-th pseudo-random number thereafter.
Ok, so the intent is to trade speed for less memory usage.
Imagine that you have some sort of loop that fills the array.
int array[intendedArraySize];
seed = 3;
for (size_t z = 0; z < intendedArraySize; z++)
{
array[z] = some_int_psn_generator(seed);
}
After which you can display the values.
for (size_t z = 0; z < intendedArraySize; z++)
{
std::cout << z << " " << array[z] << std::endl;
}
If that is indeed the case, consider discarding the array altogether, by simply recalculating the value each time.
for (size_t z = 0; z < intendedArraySize; z++)
{
std::cout << z << " " << some_int_psn_generator(z) << std::endl;
}
I was reading http://comicjk.com/comic.php/906 where the problem of checking if one list is a permutation of another is presented and two solutions are proposed.
The 'c brain' solution "you know the lists will always contain four numbers of fewer, and each number will be less than 256, so we can byte-pack all the permutations into 32-bit ints and..."
The 'python brain' solution is to sort both and then compare them, this seems more obvious, but I am interested in a more efficient (and low level) solution.
My initial approach was:
int permutations(int a[4], int b[4]){
int A = a[0] | a[1]*1<<8 | a[2]*1<<16 | a[3]*1<<24;
int B = b[0] | b[1]*1<<8 | b[2]*1<<16 | b[3]*1<<24;
unsigned int c=0, i=0;
for( i=0xFF; i>0; i<<=8 ){
if(
A&i == B&0xFF ||
A&i == B&0xFF00 ||
A&i == B&0xFF0000 ||
A&i == B&0xFF000000
) c |= i;
}
if( c == 0xFFFFFFFF )
return 1;
return 0;
}
But this cant work unless I can find an easy way to position both A&i and B*0xxxxxxxxx both at the same byte (removing any trailing 0s after the byte we are looking at).
So something like
(a&i>>al)>>ar == b(&j>>bl)>>br
where al+ar == bl+br == 4 and are used to determine which byte we are examining.
Another approach
Someone in the comments box said "In C, why not simply dynamically allocate a section of memory of appropriate size and treat it as a single number?
True it'd be a bit slower than using an int but it'd also not be restricted to contain four or fewer elements or maximum number of 256, and still be faster than sorting (whether in C or Python) ..."
If we could have an array which has a length in bits greater than our highest number, then we could set the appropriate bits and compare the arrays, but this gives more comparisons as we then have comparisons unless we can treat this as one large number efficiently in c.
In x86 (which I have just started learning) we have the SBC instruction so we could subtract each part and if the results are all zero (which we could test with a JNE/JNZ) they are equal.
As far as I can tell we would still have to do / SBCs and jumps
Actual question
I would like to know how
byte packing
treating the whole list as one large number
can be used to check if a list is a permutation of another (assuming the lists are no longer than 4 items and each item is < 256)
Optimization assuming that the result is probably "no": calculate the sum (or the xor, or some other inexpensive, associative, commutative operator) of each list. If the sums differ, the answer is no without further testing. If the sums are the same, then perform a more expensive test to get a definitive answer.
I think you just want a hash that is not affected by permutations. Addition was offered as one method. I was thinking of a method that would be compatible with a bloom filter, so you could do more things with it.
A bloom filter would work with lists of arbitrary lengths and numbers of arbitrary size. It could be used to see if a list had the same permutations as a group of lists. It can be used to see if an element might exist in a list.
A bloom filter is basically an array of bits. You just 'or' the bits of the elements making up your list together to produce the bloom filter. Any list with the same elements in any order will have the same bits set. For small lists you can get away with using integer sized numbers for the bit arrays:
unsigned char b = a1|a2|a3|a4; // where a1..n are items (8 bit numbers)
if you had item a1 and a list with bloom b and wanted to know if a1 was in the list:
fPossibleMatch = ((a1&b) == a1);
if you had two lists of arbitrary lengths with blooms b1, b2 and wanted to know if all items of b1 might exists in b2:
fPossibleMatch = ((b1&b2) == b1);
If you wanted to know if list b1 and b2 with the same # of elements were permutations of each other.
fPossibleMatch = (b1==b2);
To cut down on false positives, widen the bloom filter. If we used a 64 bit bloom, we could use this arbitrarily chosen algorithm to spread bits out:
unsigned long long b = (a1<<(a1&0x1F)) | (a2<<(a2&0x1F)) | (a3<<(a3&0x1F)) | a4<<(a4&0x1F);
I have a feeling that my algorithm to widen the bloom is not any good. It might just set all the bits to mush. Someone else might know of a better way. I think you get the idea though.
I think this is better:
#define MUNGE(a) ((a)<<(((a)&7)<<3))
unsigned long long b = MUNGE(a1)|MUNGE(a2)|MUNGE(a3)|MUNGE(a4)
I'm not good at creating hashes.
You still have to double check any lists that have matching bloom filters. The number of false positives increases with increasing list length and element size. False positives decrease with increasing bloom size.
Use a hashtable and iterate through each list. This will give a solution that requires O(n) time and O(n) memory.
For my university process I'm simulating a process called random sequential adsorption.
One of the things I have to do involves randomly depositing squares (which cannot overlap) onto a lattice until there is no more room left, repeating the process several times in order to find the average 'jamming' coverage %.
Basically I'm performing operations on a large array of integers, of which 3 possible values exist: 0, 1 and 2. The sites marked with '0' are empty, the sites marked with '1' are full. Initially the array is defined like this:
int i, j;
int n = 1000000000;
int array[n][n];
for(j = 0; j < n; j++)
{
for(i = 0; i < n; i++)
{
array[i][j] = 0;
}
}
Say I want to deposit 5*5 squares randomly on the array (that cannot overlap), so that the squares are represented by '1's. This would be done by choosing the x and y coordinates randomly and then creating a 5*5 square of '1's with the topleft point of the square starting at that point. I would then mark sites near the square as '2's. These represent the sites that are unavailable since depositing a square at those sites would cause it to overlap an existing square. This process would continue until there is no more room left to deposit squares on the array (basically, no more '0's left on the array)
Anyway, to the point. I would like to make this process as efficient as possible, by using bitwise operations. This would be easy if I didn't have to mark sites near the squares. I was wondering whether creating a 2-bit number would be possible, so that I can account for the sites marked with '2'.
Sorry if this sounds really complicated, I just wanted to explain why I want to do this.
You can't create a datatype that is 2-bits in size since it wouldn't be addressable. What you can do is pack several 2-bit numbers into a larger cell:
struct Cell {
a : 2;
b : 2;
c : 2;
d : 2;
};
This specifies that each of the members a, b, c and d should occupy two bits in memory.
EDIT: This is just an example of how to create 2-bit variables, for the actual problem in question the most efficient implementation would probably be to create an array of int and wrap up the bit fiddling in a couple of set/get methods.
Instead of a two-bit array you could use two separate 1-bit arrays. One holds filled squares and one holds adjacent squares (or available squares if this is more efficient).
I'm not really sure that this has any benefit though over packing 2-bit fields into words.
I'd go for byte arrays unless you are really short of memory.
The basic idea
Unfortunately, there is no way to do this in C. You can create arrays of 1 byte, 2 bytes, etc., but you can't create areas of bits.
The best thing you can do, then, is to write a new library for yourself, which makes it look like you're dealing with arrays of 2 bits, but in reality does a lot of hard work. The same way that the string libraries give you functions that work on "strings" (which in C are just arrays), you'll be creating a new library which works on "bit arrays" (which in reality will be arrays of integers, with a few special functions to deal with them as-if they were arrays of bits).
NOTE: If you're new to C, and haven't learned the ideas of "creating a new library/module", or the concept of "abstraction", then I'd recommend learning about them before you continue with this project. Understanding them is IMO more important than optimizing your program to use a little less space.
How to implement this new "library" or module
For your needs, I'd create a new module called "2-bit array", which exports functions for dealing with the 2-bit arrays, as you need them.
It would have a few functions that deal with setting/reading bits, so that you can work with it as if you have an actual array of bits (you'll actually have an array of integers or something, but the module will make it seem like you have an array of bits).
Using this module would like something like this:
// This is just an example of how to use the functions in the twoBitArray library.
twoB my_array = Create2BitArray(size); // This will "create" a twoBitArray and return it.
SetBit(twoB, 5, 1); // Set bit 5 to 1 //
bit b = GetBit(twoB, 5); // Where bit is typedefed to an int by your module.
What the module will actually do is implement all these functions using regular-old arrays of integers.
For example, the function GetBit(), for GetBit(my_arr, 17), will calculate that it's the 1st bit in the 4th integer of your array (depending on sizeof(int), obviously), and you'd return it by using bitwise operations.
You can compact one dimension of array into sub-integer cells. To convert coordinate (lets say x for example) to position inside byte:
byte cell = array[i][ x / 4 ];
byte mask = 0x0004 << (x % 4);
byte data = (cell & mask) >> (x % 4);
to write data do reverse