For my university process I'm simulating a process called random sequential adsorption.
One of the things I have to do involves randomly depositing squares (which cannot overlap) onto a lattice until there is no more room left, repeating the process several times in order to find the average 'jamming' coverage %.
Basically I'm performing operations on a large array of integers, of which 3 possible values exist: 0, 1 and 2. The sites marked with '0' are empty, the sites marked with '1' are full. Initially the array is defined like this:
int i, j;
int n = 1000000000;
int array[n][n];
for(j = 0; j < n; j++)
{
for(i = 0; i < n; i++)
{
array[i][j] = 0;
}
}
Say I want to deposit 5*5 squares randomly on the array (that cannot overlap), so that the squares are represented by '1's. This would be done by choosing the x and y coordinates randomly and then creating a 5*5 square of '1's with the topleft point of the square starting at that point. I would then mark sites near the square as '2's. These represent the sites that are unavailable since depositing a square at those sites would cause it to overlap an existing square. This process would continue until there is no more room left to deposit squares on the array (basically, no more '0's left on the array)
Anyway, to the point. I would like to make this process as efficient as possible, by using bitwise operations. This would be easy if I didn't have to mark sites near the squares. I was wondering whether creating a 2-bit number would be possible, so that I can account for the sites marked with '2'.
Sorry if this sounds really complicated, I just wanted to explain why I want to do this.
You can't create a datatype that is 2-bits in size since it wouldn't be addressable. What you can do is pack several 2-bit numbers into a larger cell:
struct Cell {
a : 2;
b : 2;
c : 2;
d : 2;
};
This specifies that each of the members a, b, c and d should occupy two bits in memory.
EDIT: This is just an example of how to create 2-bit variables, for the actual problem in question the most efficient implementation would probably be to create an array of int and wrap up the bit fiddling in a couple of set/get methods.
Instead of a two-bit array you could use two separate 1-bit arrays. One holds filled squares and one holds adjacent squares (or available squares if this is more efficient).
I'm not really sure that this has any benefit though over packing 2-bit fields into words.
I'd go for byte arrays unless you are really short of memory.
The basic idea
Unfortunately, there is no way to do this in C. You can create arrays of 1 byte, 2 bytes, etc., but you can't create areas of bits.
The best thing you can do, then, is to write a new library for yourself, which makes it look like you're dealing with arrays of 2 bits, but in reality does a lot of hard work. The same way that the string libraries give you functions that work on "strings" (which in C are just arrays), you'll be creating a new library which works on "bit arrays" (which in reality will be arrays of integers, with a few special functions to deal with them as-if they were arrays of bits).
NOTE: If you're new to C, and haven't learned the ideas of "creating a new library/module", or the concept of "abstraction", then I'd recommend learning about them before you continue with this project. Understanding them is IMO more important than optimizing your program to use a little less space.
How to implement this new "library" or module
For your needs, I'd create a new module called "2-bit array", which exports functions for dealing with the 2-bit arrays, as you need them.
It would have a few functions that deal with setting/reading bits, so that you can work with it as if you have an actual array of bits (you'll actually have an array of integers or something, but the module will make it seem like you have an array of bits).
Using this module would like something like this:
// This is just an example of how to use the functions in the twoBitArray library.
twoB my_array = Create2BitArray(size); // This will "create" a twoBitArray and return it.
SetBit(twoB, 5, 1); // Set bit 5 to 1 //
bit b = GetBit(twoB, 5); // Where bit is typedefed to an int by your module.
What the module will actually do is implement all these functions using regular-old arrays of integers.
For example, the function GetBit(), for GetBit(my_arr, 17), will calculate that it's the 1st bit in the 4th integer of your array (depending on sizeof(int), obviously), and you'd return it by using bitwise operations.
You can compact one dimension of array into sub-integer cells. To convert coordinate (lets say x for example) to position inside byte:
byte cell = array[i][ x / 4 ];
byte mask = 0x0004 << (x % 4);
byte data = (cell & mask) >> (x % 4);
to write data do reverse
Related
need to make an algorithm (formula, function) using AND OR XOR NEG SHIFT NOT etc. which calculates the element of array from an index,
the size of the element is one byte
e.g. element = index^constant where the constant is array[index]^index (previously calculated).
This will work only if the array size is less then 256.
How to make a byte from an index when the index would be bigger then 1 byte.
the same way however there will be duplicates as you got only 256 possible numbers in BYTE so if your array is bigger than 256 there must be duplicates.
to avoid obvious mirroring you can not use monotonic functions for example
value[ix] = ix
is monotnic so it will be saw like shape mirroring the content of array every 256 bytes. To avoiding this you need to combine more stuff together. Its similar to computing own pseudo random generator. The usual approaches are:
modular arithmetics
something like:
value[ix]=( ( c0*ix + c1*ix*ix + c2*ix*ix*ix )%prime )&255
if constants c0,c1,c2 and prime are big enough the output looks random so there will be much less repeated patterns visible in the output ... But you need to use arithmetic of bitwidth that can hold the prime ...
In case you are hitting the upper bounds of your arithmetics bitwidth then you need to use modmul,modpow to avoid overflows. See:
Modular arithmetics and NTT (finite field DFT) optimizations
swapping bits
simply do some math on your ix where you also use ix with swapped bits. That will change the monotonic properties a lot... This approach works best however on cumulative sub-result which is not the case of yours. I would try:
value[ix]=( ix + ((ix<<3)*5) - ((ix>>2)*7) + ((3*ix)^((ix<<4)||(ix>>4))) )&255
playing with constants and operators achieve different results. However with this approach you need to check validity (which I did not!). So render graph for first few values (like 1024) where x axis is ix and y axis is value[ix]. There you should see if the stuff is repeating or even saturated towards some value and if it is change the equation.
for more info see How to seed to generate random numbers?
Of coarse after all this its not possible to get the ix from value[ix] ...
I need to create an array with 3 billion boolean variables. My memory is only 4GB, therefore I need this array to be very tight (at most one byte per variable). Theoretically this should be possible. But I found that Ruby uses way too much space for one boolean variable in an array.
ObjectSpace.memsize_of(Array.new(100, false)) #=> 840
That's more than 8 bytes per variable. I would like to know if there's a more lightweight implementation of C-arrays in Ruby.
Apart from a small profile, I also need each boolean this array to be fast accessible, because I need to flip them as fast as possible on demand.
Ruby isn't a well performing language, especially in memory use. As other said, you should put your booleans in numbers. You'll lose a lot of memory due to ruby's 'objetification'. If it is a bad scenario to you, you may store into strings of a large length and store the strings in a array, losing less memory.
http://calleerlandsson.com/2014/02/06/rubys-bitwise-operators/
You also can implement your own gem in C++, that can naturally use bits and doubles, losing less memory. And array of doubles means 64 booleans in each position, more than sufficient to your application.
Extremely large objects are always a problem and will require you to implement a lot to make easier to work with your large collection of objects. Surely you'll have to at least implement some kind of method to acess some position in an array of objects that store more than one boolean, and other to flip them.
The following class may not be exactly what you're looking for. It will store 1's or 0's into an array using bits and shifting. Entries default to 0. If you need three states for each entry, 0, 1, or nil, then you'd need to change it to use two bits for each entry, rather than one.
class BitArray < Array
BITS_PER_WORD = 0.size * 8
MASK = eval("0x#{'FF' * (BITS_PER_WORD/8)}") - 1
def []=(n, value_0_or_1)
word = word_at(n / BITS_PER_WORD) || 0
word &= MASK << n % BITS_PER_WORD
super(n / BITS_PER_WORD, value_0_or_1 << (n % BITS_PER_WORD) | word)
end
def [](n)
return 0 if word_at(n / BITS_PER_WORD).nil?
(super(n / BITS_PER_WORD) >> (n % BITS_PER_WORD)) & 1
end
def word_at(n)
Array.instance_method('[]').bind(self).call(n)
end
end
I have been wondering for a while which of the two following methods are faster or better.
MY CURRENT METHOD
I'm developing a chess game and the pieces are stored as numbers (really bytes to preserve memory) into a one-dimensional array. There is a position for the cursor corresponding to the index in the array. To access the piece at the current position in the array is easy (piece = pieces[cursorPosition]).
The problem is that to get the x and y values for checking if the move is a valid move requires the division and a modulo operators (x = cursorPosition % 8; y = cursorPosition / 8).
Likewise when using x and y to check if moves are valid (you have to do it this way for reasons that would fill the entire page), you have to do something like - purely as an example - if pieces[y * 8 + x] != 0: movePiece = False. The obvious problem is having to do y * 8 + x a bunch of times to access the array.
Ultimately, this means that getting a piece is trivial but then getting the x and y requires another bit of memory and a very small amount of time to compute it each round.
A MORE TRADITIONAL METHOD
Using a two-dimensional array, one can implement the above process a little easier except for the fact that piece lookup is now a little harder and more memory is used. (I.e. piece = pieces[cursorPosition[0]][cursorPosition[1]] or piece = pieces[x][y]).
I don't think this is faster and it definitely doesn't look less memory intensive.
GOAL
My end goal is to have the fastest possible code that uses the least amount of memory. This will be developed for the unix terminal (and potentially Windows CMD if I can figure out how to represent the pieces without color using Ansi escape sequences) and I will either be using a secure (encrypted with protocol and structure) TCP connection to connect people p2p to play chess or something else and I don't know how much memory people will have or how fast their computer will be or how strong of an internet connection they will have.
I also just want to learn to do this the best way possible and see if it can be done.
-
I suppose my question is one of the following:
Which of the above methods is better assuming that there are slightly more computations involving move validation (which means that the y * 8 + x has to be used a lot)?
or
Is there perhaps a method that includes both of the benefits of 1d and 2d arrays with not as many draw backs as I described?
First, you should profile your code to make sure that this is really a bottleneck worth spending time on.
Second, if you're representing your position as an unsigned byte decomposing it into X and Y coordinates will be very fast. If we use the following C code:
int getX(unsigned char pos) {
return pos%8;
}
We get the following assembly with gcc 4.8 -O2:
getX(unsigned char):
shrb $3, %dil
movzbl %dil, %eax
ret
If we get the Y coordinate with:
int getY(unsigned char pos) {
return pos/8;
}
We get the following assembly with gcc 4.8 -O2:
getY(unsigned char):
movl %edi, %eax
andl $7, %eax
ret
There is no short answer to this question; it all depends on how much time you spend optimizing.
On some architectures, two-dimensional arrays might work better than one-dimensional. On other architectures, bitmapped integers might be the best.
Do not worry about division and multiplication.
You're dividing, modulating and multiplying by 8.
This number is in the power of two, thus any computer can use bitwise operations in order to achieve the result.
(x * 8) is the same as (x << 3)
(x % 8) is the same as (x & (8 - 1))
(x / 8) is the same as (x >> 3)
Those operations are normally performed in a single clock cycle. On many modern architectures, they can be performed in less than a single clock cycle (including ARM architectures).
Do not worry about using bitwise operators instead of *, % and /. If you're using a compiler that's less than a decade old, it'll optimize it for you and use bitwise operations.
What you should focus on instead, is how easy it will be for you to find out whether or not a move is legal, for instance. This will help your computer-player to "think quickly".
If you're using an 8*8 array, then it's easy for you to see where a castle can move by checking if only x or y is changed. If checking the queen, then X must either be the same or move the same number of steps as the Y position.
If you use a one-dimensional array, you also have advantages.
But performance-wise, it might be a real good idea to use a 16x16 array or a 1x256 array.
Fill the entire array with 0x80 values (eg. "illegal position"). Then fill the legal fields with 0x00.
If using a 1x256 array, you can check bit 3 and 7 of the index. If any of those are set, then the position is outside the board.
Testing can be done this way:
if(position & 0x88)
{
/* move is illegal */
}
else
{
/* move is legal */
}
... or ...
if(0 == (position & 0x88))
{
/* move is legal */
}
'position' (the index) should be an unsigned byte (uint8_t in C). This way, you'll never have to worry about pointing outside the buffer.
Some people optimize their chess-engines by using 64-bit bitmapped integers.
While this is good for quickly comparing the positions, it has other disadvantages; for instance checking if the knight's move is legal.
It's not easy to say which is better, though.
Personally, I think the one-dimensional array in general might be the best way to do it.
I recommend getting familiar (very familiar) with AND, OR, XOR, bit-shifting and rotating.
See Bit Twiddling Hacks for more information.
I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the
same set of integers ? Suggest an algo which can run faster than NlogN
without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.
You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.
You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.
I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.
A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
How to tell if an array is a permutation in O(n)?
Algorithm to determine if array contains n...n+m?
mathoverflow
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.
Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial
Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).
The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.
In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.
You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use:
Algorithm to tell if two arrays have identical members
For each integer i check that the number of occurrences of i in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n).
No, I wouldn't do this in practice.
Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.
why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3
1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3
12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9
but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor !
This can be analysed by simple bitvalue examination.
Is there anything wrong in this approach ??
I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures
All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....
If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.
Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?
Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"
Will this new 1-dimensional space possess some very special useful properties?
For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.
However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.
In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.
Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:
uint8_t vector[3] = { 1, 2, 3 };
then you can join them into a single (24-bit number) like so:
uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];
This number would, if printed using this statement:
printf("the vector was packed into %06x", (unsigned int) all);
produce the output
the vector was packed into 010203
The reverse operation would look like this:
uint8_t v2[3];
v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;
Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.
If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.
This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).
You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.
In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.
If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.
(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:
1 2 5 | draw along the edge of the square this way
4 3 6 v
8 7
if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)
I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.
To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:
X = e[n];
X *= MAX_E[n-1] + 1;
X += e[n-1];
/* ... */
X *= MAX_E[0] + 1;
X += e[0];
And unpack them with:
e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);
e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);
/* ... */
e[n] = X;
(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.
The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).
There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).
This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.
You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.
#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;
or
union X {
long L;
uint8_t A[sizeof(long)/sizeof(uint8_t)];
};
works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.
If you do care about the endian then you need to mask and shift.
I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.
As for useful properties, these mappings are related to Gray codes.
Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.