A Perfect Hashing Function for an 8 by 8 board?

A Perfect Hashing Function for an 8 by 8 board? - theory

I'm implementing a board with only 2 types of pieces, and was looking for a function to map from that board to a Long Integer (64 bits). I was thinking this should not be so hard, since a long integer contains more available information than an 8 by 8 array (call it grid[x][y]) with only 3 possible elements in each spot including the empty element. I tried the following:
(1) Zobrist hashing with Longs rather than ints (Just to test - I didn't actually expect that to work perfectly)
(2) Translated the grid into a 64 character string of a base 3 number, and then took that number and parsed it into a long. I think this should work, but it took a very very long time.
Is there some simpler solution to (2) involving bit operations of shifting or something like that?
Edit: Please don't give me actual code, as this is for a class project, and that would probably be considered unethical in our department (or at least not in Java).
Edit2: Basically, there are only 10 whites and 10 blacks on the board at any given time, of which no two pieces of the same color can be neighbors, either in the horizontal, vertical, or diagonal direction. Also, there are 12 spaces for each color where only that color may place pieces.

If each tile in the game can be 1 of any 3 states at any point in the game, then the minimum amount of storage required for a "perfect hash" when hashing every possible state of the game board, at any given moment will
= power(3,8*8) individual hashes
= log2(3^64) bits
= approx. 101.4 bits, so you will need at least 102 bits to store this info
At this point, you may as well just say there are 4 states for each tile, which will bring you to needing 128 bits.
Once you do this, its rather easy to make a fast hashing algorithm for the board.
E.g. (writtin as c++, may need to alter code if the platform doesn't support 128 bit numbers)
uint_128 CreateGameBoardHash(int (&game_board)[8][8])
{
uint_128 board_hash = 0;
for(int i = 0; i < 8; ++i)
{
for(int j = 0; j < 8; ++j)
{
board_hash |= game_board[i][j] << ((i * 8 + j) *2);
}
}
return board_hash;
}
This method will only waste 26 bits (little more than 3 bytes) over the optimal solution of 102 bits, but you will save a LOT of processing time that would be otherwise spent doing base 3 math.
Edit Here's a version that doesn't require 128 bits and should work on any 16-bit (or better) processor
struct GameBoardHash
{
uint16 row[8];
};
GameBoardHash CreateGameBoardHash(int (&game_board)[8][8])
{
GameBoardHash board_hash;
for(int i = 0; i < 8; ++i)
{
board_hash.row[i] = 0;
for(int j = 0; j < 8; ++j)
{
board_hash.row[i] |= game_board[j] << (j*2);
}
}
return board_hash;
}

It won't fit into a 64 bit integer. You have 64 squares and you need more than 1 bit to record each square. Why do you need it to fit into a 64 bit int? Are you targetting the ZX81?

How about a 16 byte array containing the bits? Each 2-bits, represent a position's value, so that given a position in the 8x8 board (pos=0-63), you can figure out the index by dividing pos by 4 and you can get the value by doing bit manipulation to get two bits (bit0=pos mod 4 and bit1=bit0 + 1). The two bits can be either 00, 01, or 10.

Reading your comments to David, it doesn't seem like you really need a perfect hash value. You just need a hashable object.
Make it simple for yourself... Make some hash for you position in the overwrite to GetHashCode(), and then do the rest of the work in the Equals function.
If you REALLY need it to be perfect, then you have to use a GUID to encode your data in and make your own hash that can use 128bit keys. But that is just a huge investment of time for little benifit.

Related

Separate byte in groups

So Im using this (its from another question I did),
unsigned char *y = resultado->informacion;
int i = 0;
int tam = data->tamanho;
unsigned char resAfter;
for (int i=0; i<tam;i++)
{
unsigned char x = data->informacion[i];
x <<= 3;
if (i>0)
{
resAfter = (resAfter << 5) | x;
}
else
{
resAfter = x;
}
}
printf("resAfter es %s\n", resAfter);
so at the end I have this really long number (Im estimating about 43 bits), how can I get groups of 8 bits, I think im gettin something like (010101010101010.....000) and I want to separate this in groups of 8.
Another question, I know for sure that resAfter is going to have n number of bits where n is a multiply of 8 plus 3, so my question is: is this possible? or c is going to complete the byte? like if I get 43 bits then c is going to fill them with 0 and complete so I have 48 bits; and is there a way to delete these 3 bits?
Im new on c and bitwise so sorry if what Im doing is reallly bad.

Basically in programming you deal with bytes (i think, at least in most cases), in C you deal with types of specific size (depending on system you run it on).
That said char usually has size of 1 byte, and I don't really think you can playing around with single bits. I mean u can do operation on them (<< for instance) in scale of single bits but i don't know of any standard way to preserve less than 8 bits in variable in C (though i may be wrong about it)

String to very long sequence of length less than 1 byte

I can't guess how to solve following problem. Assume I have a string or an array of integer-type variables (uchar, char, integer, whatever). Each of these data type is 1 byte long or more.
I would like to read from such array but read a pieces that are smaller than 1 byte, e.g. 3 bits (values 0-7). I tried to do a loop like
cout << ( (tab[index] >> lshift & lmask) | (tab[index+offset] >> rshift & rmask) );
but guessing how to set these variables is out of my reach. What is the metodology to solve such problem?
Sorry if question has been ever asked, but searching gives no answer.

I am sure this is not the best solution, as there some inefficiencies in the code that could be eliminated, but I think the idea is workable. I only tested it briefly:
void bits(uint8_t * src, int arrayLength, int nBitCount) {
int idxByte = 0; // byte index
int idxBitsShift = 7; // bit index: start at the high bit
// walk through the array, computing bit sets
while (idxByte < arrayLength) {
// compute a single bit set
int nValue = 0;
for (int i=2; i>=0; i--) {
nValue += (src[idxByte] & (1<<idxBitsShift)) >> (idxBitsShift-i);
if ((--idxBitsShift) < 0) {
idxBitsShift=8;
if (++idxByte >= arrayLength)
break;
}
}
// print it
printf("%d ", nValue);
}
}
int main() {
uint8_t a[] = {0xFF, 0x80, 0x04};
bits(a, 3, 3);
}
The thing with collecting bits across byte boundaries is a bit of a PITA, so I avoided all that by doing this a bit at a time, and then collecting the bits together in the nValue. You could have smarter code that does this three (or however many) bits at a time, but as far as I am concerned, with problems like this it is usually best to start with a simple solution (unless you already know how to do a better one) and then do something more complicated.

In short, the way the data is arranged in memory strictly depends on :
the Endianess
the standard used for computation/representation ( usually it's the IEEE 754 )
the type of the given variable
Now, you can't "disassemble" a data structure with this rationale without destroing its own meaning, simply put, if you are going to subdivide your variable in "bitfields" you are just picturing an undefined value.
In computer science there are data structure or informations structured in blocks, like many hashing algorithms/hash results, but a numerical value it's not stored like that and you are supposed to know what you are doing to prevent any data loss.
Another thing to note is that your definition of "pieces that are smaller than 1 byte" doesn't make much sense, it's also highly intrusive, you are losing abstraction here and you can also do something bad.

Here's the best method I could come up with for setting individual bits of a variable:
Assume we need to set the first four bits of variable1 (a char or other byte long variable) to 1010
variable1 &= 0b00001111; //Zero the first four bytes
variable1 |= 0b10100000; //Set them to 1010, its important that any unaffected bits be zero
This could be extended to whatever bits desired by placing zeros in the first number corresponding to the bits which you wish to set (the first four in the example's case), and placing zeros in the second number corresponding to the bits which you wish to remain neutral in the second number (the last four in the example's case). The second number could also be derived by bit-shifting your desired value by the appropriate number of places (which would have been four in the example's case).
In response to your comment this can be modified as follows to accommodate for increased variability:
For this operation we will need two shifts assuming you wish to be able to modify non-starting and non-ending bits. There are two sets of bits in this case the first (from the left) set of unaffected bits and the second set. If you wish to modify four bits skipping the first bit from the left (1 these four bits 111 for a single byte), the first shift would be would be 7 and the second shift would be 5.
variable1 &= ( ( 0b11111111 << shift1 ) | 0b11111111 >> shift2 );
Next the value we wish to assign needs to be shifted and or'ed in.
However, we will need a third shift to account for how many bits we want to set.
This shift (we'll call it shift3) is shift1 minus the number of bits we wish to modify (as previously mentioned 4).
variable1 |= ( value << shift3 );

Need help understanding bitmaps, bitwise operations, and C

Disclaimer: I am asking these questions in relation to an assignment. The assignment itself calls for implementing a bitmap and doing some operations with that, but that is not what I am asking about. I just want to understand the concepts so I can try the implementation for myself.
I need help understanding bitmaps/bit arrays and bitwise operations. I understand the basics of binary and how left/right shift work, but I don't know exactly how that use is beneficial.
Basically, I need to implement a bitmap to store the results of a prime sieve (of Eratosthenes.) This is a small part of a larger assignment focused on different IPC methods, but to get to that part I need to get the sieve completed first. I've never had to use bitwise operations nor have I ever learned about bitmaps, so I'm kind of on my own to learn this.
From what I can tell, bitmaps are arrays of a bit of a certain size, right? By that I mean you could have an 8-bit array or a 32-bit array (in my case, I need to find the primes for a 32-bit unsigned int, so I'd need the 32-bit array.) So if this is an array of bits, 32 of them to be specific, then we're basically talking about a string of 32 1s and 0s. How does this translate into a list of primes? I figure that one method would evaluate the binary number and save it to a new array as decimal, so all the decimal primes exist in one array, but that seems like you're using too much data.
Do I have the gist of bitmaps? Or is there something I'm missing? I've tried reading about this around the internet but I can't find a source that makes it clear enough for me...

Suppose you have a list of primes: {3, 5, 7}. You can store these numbers as a character array: char c[] = {3, 5, 7} and this requires 3 bytes.
Instead lets use a single byte such that each set bit indicates that the number is in the set. For example, 01010100. If we can set the byte we want and later test it we can use this to store the same information in a single byte. To set it:
char b = 0;
// want to set `3` so shift 1 twice to the left
b = b | (1 << 2);
// also set `5`
b = b | (1 << 4);
// and 7
b = b | (1 << 6);
And to test these numbers:
// is 3 in the map:
if (b & (1 << 2)) {
// it is in...

You are going to need a lot more than 32 bits.
You want a sieve for up to 2^32 numbers, so you will need a bit for each one of those. Each bit will represent one number, and will be 0 if the number is prime and 1 if it is composite. (You can save one bit by noting that the first bit must be 2 as 1 is neither prime nor composite. It is easier to waste that one bit.)
2^32 = 4,294,967,296
Divide by 8
536,870,912 bytes, or 1/2 GB.
So you will want an array of 2^29 bytes, or 2^27 4-byte words, or whatever you decide is best, and also a method for manipulating the individual bits stored in the chars (ints) in the array.
It sounds like eventually, you are going to have several threads or processes operating on this shared memory.You may need to store it all in a file if you can't allocate all that memory to yourself.
Say you want to find the bit for x. Then let a = x / 8 and b = x - 8 * a. Then the bit is at arr[a] & (1 << b). (Avoid the modulus operator % wherever possible.)
//mark composite
a = x / 8;
b = x - 8 * a;
arr[a] |= 1 << b;
This sounds like a fun assignment!

A bitmap allows you to construct a large predicate function over the range of numbers you're interested in. If you just have a single 8-bit char, you can store Boolean values for each of the eight values. If you have 2 chars, it doubles your range.
So, say you have a bitmap that already has this information stored, your test function could look something like this:
bool num_in_bitmap (int num, char *bitmap, size_t sz) {
if (num/8 >= sz) return 0;
return (bitmap[num/8] >> (num%8)) & 1;
}

Combining two integers with bit-shifting

I am writing a program, I have to store the distances between pairs of numbers in a hash table.
I will be given a Range R. Let's say the range is 5.
Now I have to find distances between the following pairs:
1 2
1 3
1 4
1 5
2 3
2 4
2 5
3 4
3 5
4 5
that is, the total number of pairs is (R^2/2 -R). I want to store it in a hash table. All these are unsigned integers. So there are 32 bits. My idea was that, I take an unsigned long (64 bits).
Let's say I need to hash the distance between 1 and 5. Now
long k = 1;
k = k<<31;
k+=5;
Since I have 64 bits, I am storing the first number in the first 31 bits and the second number in the second 31 bits. This guarantees unique keys which can then be used for hashing.
But when I do this:
long k = 2;
k << 31;
k+= 2;
The value of k becomes zero.
I am not able to wrap my head around this shifting concept.
Essentially what I am trying to achieve is that,
An unsigned long has | 32bits | 32 bits |
Store |1st integer|2nd integer|
How can I achieve this to get unique keys for each pair?
I am running the code on a 64 bit AMD Opteron processor. sizeof(ulong) returns 8. So it is 64 bits. Do I need a long long in such a case?
Also I need to know if this will create unique keys? From my understanding , it does seem to create unique keys. But I would like a confirmation.

Assuming you're using C or something that follows vaguely similar rules, your problem is primarily with types.
long k = 2; // This defines `k` a a long
k << 31; // This (sort of) shifts k left, but still as a 32-bit long.
What you almost certainly want/need to do is convert k to a long long before you shift it left, so you're shifting in a 64-bit word.
unsigned long first_number = 2;
unsigned long long both_numbers = (unsigned long long)first_number << 32;
unsigned long second_number = 5;
both_numbers |= second_number;
In this case, if (for example) you print out both_numbers, in hexadecimal, you should get 0x0000000200000005.

The concept makes sense. As Oli has added, you want to shift by 32, not 31 - shifting by 31 will put it in the 31st bit, so if you shifted back to the right to try and get the first number you would end up with a bit missing, and the second number would be potentially huge because you could have put a 1 in the uppermost bit.
But if you want to do bit manipulation, I would do instead:
k = 1 << 32;
k = k|5;
It really should produce the same result though. Are you sure that long is 64 bits on your machine? This is not always the case (although it usually is, I think). If long is actually 32 bits, 2<<31 will result in 0.
How large is R? You can get away with a 32 bit sized variable if R doesn't go past 65535...

optimized byte array shifter

I'm sure this has been asked before, but I need to implement a shift operator on a byte array of variable length size. I've looked around a bit but I have not found any standard way of doing it. I came up with an implementation which works, but I'm not sure how efficient it is. Does anyone know of a standard way to shift an array, or at least have any recommendation on how to boost the performance of my implementation;
char* baLeftShift(const char* array, size_t size, signed int displacement,char* result)
{
memcpy(result,array,size);
short shiftBuffer = 0;
char carryFlag = 0;
char* byte;
if(displacement > 0)
{
for(;displacement--;)
{
for(byte=&(result[size - 1]);((unsigned int)(byte))>=((unsigned int)(result));byte--)
{
shiftBuffer = *byte;
shiftBuffer <<= 1;
*byte = ((carryFlag) | ((char)(shiftBuffer)));
carryFlag = ((char*)(&shiftBuffer))[1];
}
}
}
else
{
unsigned int offset = ((unsigned int)(result)) + size;
displacement = -displacement;
for(;displacement--;)
{
for(byte=(char*)result;((unsigned int)(byte)) < offset;byte++)
{
shiftBuffer = *byte;
shiftBuffer <<= 7;
*byte = ((carryFlag) | ((char*)(&shiftBuffer))[1]);
carryFlag = ((char)(shiftBuffer));
}
}
}
return result;
}

If I can just add to what #dwelch is saying, you could try this.
Just move the bytes to their final locations. Then you are left with a shift count such as 3, for example, if each byte still needs to be left-shifted 3 bits into the next higher byte. (This assumes in your mind's eye the bytes are laid out in ascending order from right to left.)
Then rotate each byte to the left by 3. A lookup table might be faster than individually doing an actual rotate. Then, in each byte, the 3 bits to be shifted are now in the right-hand end of the byte.
Now make a mask M, which is (1<<3)-1, which is simply the low order 3 bits turned on.
Now, in order, from high order byte to low order byte, do this:
c[i] ^= M & (c[i] ^ c[i-1])
That will copy bits to c[i] from c[i-1] under the mask M.
For the last byte, just use a 0 in place of c[i-1].
For right shifts, same idea.

My first suggestion would be to eliminate the for loops around the displacement. You should be able to do the necessary shifts without the for(;displacement--;) loops. For displacements of magnitude greater than 7, things get a little trickier because your inner loop bounds will change and your source offset is no longer 1. i.e. your input buffer offset becomes magnitude / 8 and your shift becomes magnitude % 8.

It does look inefficient and perhaps this is what Nathan was referring to.
assuming a char is 8 bits where this code is running there are two things to do first move the whole bytes, for example if your input array is 0x00,0x00,0x12,0x34 and you shift left 8 bits then you get 0x00 0x12 0x34 0x00, there is no reason to do that in a loop 8 times one bit at a time. so start by shifting the whole chars in the array by (displacement>>3) locations and pad the holes created with zeros some sort of for(ra=(displacement>>3);ra>3)] = array[ra]; for(ra-=(displacement>>3);ra>(7-(displacement&7))). a good compiler will precompute (displacement>>3), displacement&7, 7-(displacement&7) and a good processor will have enough registers to keep all of those values. you might help the compiler by making separate variables for each of those items, but depending on the compiler and how you are using it it could make it worse too.
The bottom line though is time the code. perform a thousand 1 bit shifts then a thousand 2 bit shifts, etc time the whole thing, then try a different algorithm and time it the same way and see if the optimizations make a difference, make it better or worse. If you know ahead of time this code will only ever be used for single or less than 8 bit shifts adjust the timing test accordingly.
your use of the carry flag implies that you are aware that many processors have instructions specifically for chaining infinitely long shifts using the standard register length (for single bit at a time) rotate through carry basically. Which the C language does not support directly. for chaining single bit shifts you could consider assembler and likely outperform the C code. at least the single bit shifts are faster than C code can do. A hybrid of moving the bytes then if the number of bits to shift (displacement&7) is maybe less than 4 use the assembler else use a C loop. again the timing tests will tell you where the optimizations are.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight