I need your help to know if it's possible to use a bitboard for a connect 5 game. Because I saw a lot of examples for connect 4 games but nothing if the board is very large such as 15x15 or 19x19. I don't understand how can I represent the board with 64bits.
Thanks you for your help
It is possible, as explained in unwind's answer, but by doing that you negate the main advantage of bitboards, which is performing logical operations on the entire data structure in one CPU instruction. Optimally, the size of the bitboard would be the width of your instruction set of the platform.
A possible workaround would be to represent the board as multiple smaller "standard" bitboards (32-bit or 64-bit, depending on your platform) and apply logical operations on each mini-bitboard separately. For instance, a board of 15x15 can be broken into 4 8x8 boards, for example:
typedef uint64_t miniboard_type; // 8x8 board
typedef struct
{
miniboard_type miniboard[4]; // Larger 15x15 board comprising 4 8x8 boards
} bitboard_type;
bitboard_type bitboard;
Correspondigly, I believe that you might also need to adjust the math and logic to account for the edges between the mini-bitboards (and that depends on what you want to do with your main bitboard).
Huh?
Of course it's possible, if by "bitboard" you just mean "a 2D array using one bit per cell".
It's sometimes easier if you e.g. round off so each row (or column) is a whole number of bytes, but you don't have to do that.
Just do something like:
#include <stdint.h>
#include <limits.h>
#define WIDTH 22
#define HEIGHT 17
uint8_t bits[(WIDTH * HEIGHT + CHAR_BIT - 1) / CHAR_BIT];
this declares bits as an array of bytes, with enough bytes to hold a matrix of WIDTH by HEIGHT cells.
With the example's values, it'll be (22 * 17 + 7) / 8 or 47 bytes long (assuming CHAR_BIT is 8).
Related
Apologies for the generic question title, I wasn't sure how to phrase it properly (suggestions welcome!)
I'm trying to get my head around some of the code for the Common Mark parser and came across this:
/* Oversize the buffer by 50% to guarantee amortized linear time
* complexity on append operations. */
bufsize_t new_size = target_size + target_size / 2;
new_size += 1;
new_size = (new_size + 7) & ~7;
So given a number, eg 32, it will add (32 / 2) [48], add 1 [49], add 7 [56], finally ANDing that with -8 [56].
Is this a common pattern? Specifically the adding of a number and then ANDing with its complement.
Is anyone able to provide any insight into what this is doing and what advantages, if any, exist?
The (+7) & ~7 part rounds the number up to the first multiple of 8. It works only with powers of 2 (7 is 2^3-1). If you want to round to a multiple of 32 then use 31 instead of 7.
The reason to round the size to a multiple of 8 is probably specific to the algorithm.
It is also possible that the author of the code knows how the memory allocator works. If the allocator uses internally blocks of memory of multiple of 8 bytes, an allocation request of any number of bytes between 1 and 8 uses an entire block. By asking for a block having a size that is multiple of 8 one gets several extra bytes for the same price.
I am trying to understand 9 point stencil's algorithm from this book , the logic is clear to me , but the calculation of WIDTHP macro is what i am unable to understand, here is the breif code (original code is more than 300 lines length!!):
#define PAD64 0
#define WIDTH 5900
#if PAD64
#define WIDTHP ((((WIDTH*sizeof(REAL))+63)/64)*(64/sizeof(REAL)))
#else
#define WIDTHP WIDTH
#endif
#define HEIGHT 10000
REAL *fa = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);
REAL *fb = (REAL *)malloc(sizeof(REAL)*WIDTHP*HEIGHT);
original array is 5900 X 10000, but if i define PAD64 , the array
becomes 5915.75 X 10000
Though so far i can guess that the author is trying to align & pad array to 64 byte boundary. But array returned by malloc is usually aligned(& padded) , also, the posix_memalign gives you a chunk of memory that is guaranteed to have the requested alignment , we can also use
__attribute__((align(64)))
what impact does this WIDTHP can make on my code's performance?
The idea is that each row of the matrix (or column, if it's treated as a column-major matrix) can be aligned to the start of a new cache line, by adding padding to the end of the line. Exactly what impact this has depends of course a lot on the access pattern, but in general cache-friendliness can be quite important for intensely number-crunching code.
Also, the computation is integer, so the result is certainly not 5915.75, that doesn't make sense.
I was going to put this in as a comment to unwind's answer because he's right. But perhaps I can explain more clearly, albeit in more characters than will fit in a comment.
When I do the math, I get 5904 reals, which is 23616 bytes, which is 396 cache lines for 64 byte cache lines. It is the bytes, rather than the number of elements which must be a multiple of 64.
As to why you want to pad the value of width, lets look at a smaller example. Let's pretend we had a "cache line" that holds 10 letter and that we have an "array" with a width of 8 letters and height of 4. Now since our hypothetical array is in C and C is row major, the array will look something like this:
AAAAAAAA
BBBBBBBB
CCCCCCCC
DDDDDDDD
but what does it look like when it is arranged in cache lines, since those are 10 letters long:
AAAAAAAABB
BBBBBBCCCC
CCCCDDDDDD
DD
Not good. Only the first row of the array is aligned. But if we pad width by two spaces, we get this in cache:
AAAAAAAA__
BBBBBBBB__
CCCCCCCC__
DDDDDDDD__
which is what we want. Now we can have a nested loop like
for i = 1 to height
for j = 1 to width
and know that every time we start to work on the j loop, the data we need will be aligned.
Oh, and yes, they really should do something to make sure that the first element of the array is aligned. 'attribute((align(64)))' won't work because the arrays are being allocated dynamically but they could have used posix_memalign instead of malloc.
The width p calculation is say
( Width/64) +1
Well rounded for int precision math. I'd give you a better answer except in the SE mobile app it ain't viable to flick between this and the listing
I have an MP3 board attached to a ATmega microcontroller which is additionally connected to a potentiometer. The MP3 board plays MP3 data directly through its onboard speaker and therefore I am also able to set the volume of the output.
So, as you might guess, I read the value from the poti and forward it to the microcontroller. Unfortunately, the microcontroller does not increase the volume in a linear way. So, from values 0 to 128 you nearly hear nothing, and from 128 to 255 (max) the volume increases rapidly.
I found out, that the following function could solve this problem:
vol = 1 - (1 - x)^4
but x must be between 0 and 1 and the result is also between 0 and 1.
Since I am on a microcontroller, I would like to
transform this formula, so that I can use it with unsigned integers
optimize it (maybe use some cheap binary functions), because I read the poti value multiple times per second. So this function has to be calculated multiple times per second and I want to use the microcontroller for other stuff too ;-)
Maybe some of you have an idea? Would be great!
uint8_t linearize_volume(uint8_t value) {
// ideas?
// please don't use bigger data types than uint16_t
}
You can "pay" with memory for CPU cycles. If you have 256 bytes of ROM available to you, the cheapest way of computing such function would be building a lookup table.
Make a program that prints a list of 256 8-bit numbers with the values of your non-linear function. It does not matter how fast the program is, because you are going to run it only once. Copy the numbers the program prints into your C program as an array initializer, and perform the lookup instead of calculating the function.
You can get a decent estimate by treating the values as 8.8 fixed-point and raising to the power of four by squaring twice.
uint8_t linearize_volume(uint8_t value) {
// Approximate 255 * (1 - (1 - x/255)^4)
uint16_t x = 0xff - value;
x = (x * x) >> 8;
x = (x * x) >> 8;
return 0xff - x;
}
First, be sure you're using a linear pot, not an audio-taper pot.
This is typical of audio outputs. The data is a sine wave, and therefore negative values are necessary. You can certainly convert negatives to positives for the sole purpose of accessing their power level, but you can't alter the sample without hearing a completely different sound.
Depending upon the output device, lower values may not pack enough power to energize your speaker much at all.
The "MP3 board" should include an ability to control the volume without your having to alter samples.
You state you read the pot and forward it to the micro. Aren't you reading the pot with the micro's ADC?
I have a variable that can take any 3 values. If it can take only 2 values I would have assigned a bool type. But my variable can take 3 values. If I assign a int8_t type I am wasting 6 bits. Though this looks like preemptive optimization, I have millions of instances of this type, which is going to make a huge difference in memory usage.
What datatype should I assign the variable to such that least memory is used overall.
If I do it with enum, will that ensure less memory is used?
In particular what datatype should I use in C, Java, Python & MySQL.
If you really (although I'm not sure it's the case) need this data type, you can use a bitfield. However, this could be constraining, since you can't define a pointer to such type. Wasting a bit:
struct s
{
int n:2; /* 4 states instead of 3 */
};
Here's a bit of math: naively you can describe each element with two bits, so you could pack four elements into one byte and get decent random access. Four elements have 34 = 81 states, so that's a usage of 81 / 256 ≈ 32%. If you want to stay on a byte boundary, you could look for the nearest power of three that fits into 28, which is 35 = 243. In other words, if you one one byte to enumerate all possible states of five consecutive elements, you have a space efficiency of 243 / 256 ≈ 95%.
It makes no sense to do this packing in memory unless you're processing vast amounts of data and cannot fit everything into physical memory and can't partition your algorithm to run on smaller chunks at a time. For efficient computation, you should at the very least use a single byte (uint8_t), or even a machine word (uint8fast_t) to store your data. It's only when you serialize your data to disk and find that your terabytes of data are too expensive for your RAID-50 storage that you may wish to consider a complicated packing scheme. (Though then again you could just pipe your data through gzip, which basically does all that for you.)
Here's a rudimentary decoding algorithm for getting the five elements out of a byte:
unsigned int get_tristate(unsigned char const n, size_t const i)
{
/* Conditions: n in [0, 243)
i in [0, 5)
Returns: the i^th trivalent element encoded in n, in [0, 2).
*/
static unsigned int const powers[] = { 1, 3, 9, 27, 81, 243 };
return (n / powers[i]) % powers[i + 1];
}
Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?
Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"
Will this new 1-dimensional space possess some very special useful properties?
For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.
However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.
In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.
Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:
uint8_t vector[3] = { 1, 2, 3 };
then you can join them into a single (24-bit number) like so:
uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];
This number would, if printed using this statement:
printf("the vector was packed into %06x", (unsigned int) all);
produce the output
the vector was packed into 010203
The reverse operation would look like this:
uint8_t v2[3];
v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;
Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.
If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.
This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).
You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.
In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.
If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.
(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:
1 2 5 | draw along the edge of the square this way
4 3 6 v
8 7
if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)
I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.
To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:
X = e[n];
X *= MAX_E[n-1] + 1;
X += e[n-1];
/* ... */
X *= MAX_E[0] + 1;
X += e[0];
And unpack them with:
e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);
e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);
/* ... */
e[n] = X;
(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.
The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).
There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).
This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.
You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.
#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;
or
union X {
long L;
uint8_t A[sizeof(long)/sizeof(uint8_t)];
};
works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.
If you do care about the endian then you need to mask and shift.
I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.
As for useful properties, these mappings are related to Gray codes.
Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.