Can you choose #define based on parameter? - c

To simplify a commonly occurring bitmap lookup, I decided to write a macro for it. This macro would take in parameters array, position, and number of bits per entry. The latter is always 1, 2, or 3, to be determined at compile time.
The macros for one and two bits per entry are quite simple:
#define 1BITLOOKUP(array, index) (array[index / CHAR_BIT] >> (index % CHAR_BIT)) & 1
#define 2BITLOOKUP(array, index) (array[2 * index / CHAR_BIT] >> (2 * index % CHAR_BIT)) & (0x11)
However, three-bit bitmap will be more complex, since it will cross byte boundaries. For the sake of efficiency, I do not wish to include that code when I am only using one- and two- bit lookups. So I would like to use a macro of the sort
#define BITMAPLOOKUP(array, index, k) kBITLOOKUP(array, index)
However, this does not work. Is there a workaround?
NOTE: This has been updated to a non-minimal working example, as per request of multiple commenters.

You can paste the macro argument into other words without separator, using the ## construct. So here this will work:
#define FUNCTION(k) FUNCTION##k
Try it online!

#MadPhysicist suggests something like:
#include <stdio.h>
static void func0(void) { printf("0\n"); }
static void func1(void) { printf("1\n"); }
static void func2(void) { printf("2\n"); }
static void func3(void) { printf("3\n"); }
static void (*func[])(void) = {func0, func1, func2, func3};
int main(void)
{
func[1]();
return 0;
}
This has an advantage: you are not limited to compile time literals:
for (int i = 0; i < 3; i++) {
FUNCTION(i); /* invalid */
func[i](); /* valid */
}

Related

Generating random values without time.h

I want to generate random numbers repeatedly without using the time.h library. I saw another post regarding use the
srand(getpid());
however that doesn't seem to work for me getpid hasn't been declared. Is this because I'm missing the library for it? If it is I need to work out how to randomly generate numbers without using any other libraries than the ones I currently have.
#include <stdio.h>
#include <stdlib.h>
int main(void) {
int minute, hour, day, month, year;
srand(getpid());
minute = rand() % (59 + 1 - 0) + 0;
hour = rand() % (23 + 1 - 0) + 0;
day = rand() % (31 + 1 - 1) + 1;
month = rand() % (12 + 1 - 1) + 1;
year = 2018;
printf("Transferred successfully at %02d:%02d on %02d/%02d/%d\n", hour,
minute, day, month, year);
return 0;
}
NB: I can only use libraries <stdio.h> and <stdlib.h> and <string.h> — strict guidelines for an assignment.
getpid hasn't been declared.
No, because you haven't included the <unistd.h> header where it is declared (and according to your comment, you cannot use it, because you're restricted to using <stdlib.h>, <string.h>, and <stdio.h>).
In that case, I would use something like
#include <stdlib.h>
#include <stdio.h>
static int randomize_helper(FILE *in)
{
unsigned int seed;
if (!in)
return -1;
if (fread(&seed, sizeof seed, 1, in) == 1) {
fclose(in);
srand(seed);
return 0;
}
fclose(in);
return -1;
}
static int randomize(void)
{
if (!randomize_helper(fopen("/dev/urandom", "r")))
return 0;
if (!randomize_helper(fopen("/dev/arandom", "r")))
return 0;
if (!randomize_helper(fopen("/dev/random", "r")))
return 0;
/* Other randomness sources (binary format)? */
/* No randomness sources found. */
return -1;
}
and a simple main() to output some pseudorandom numbers:
int main(void)
{
int i;
if (randomize())
fprintf(stderr, "Warning: Could not find any sources for randomness.\n");
for (i = 0; i < 10; i++)
printf("%d\n", rand());
return EXIT_SUCCESS;
}
The /dev/urandom and /dev/random character devices are available in Linux, FreeBSD, macOS, iOS, Solaris, NetBSD, Tru64 Unix 5.1B, AIX 5.2, HP-UX 11i v2, and /dev/random and /dev/arandom on OpenBSD 5.1 and later.
As usual, it looks like Windows does not provide any such randomness sources: Windows C programs must use proprietary Microsoft interfaces instead.
The randomize_helper() returns nonzero if the input stream is NULL, or if it cannot read an unsigned int from it. If it can read an unsigned int from it, it is used to seed the standard pseudorandom number generator you can access using rand() (which returns an int between 0 and RAND_MAX, inclusive). In all cases, randomize_helper() closes non-NULL streams.
You can add other binary randomness sources to randomize() trivially.
If randomize() returns 0, rand() should return pseudorandom numbers. Otherwise, rand() will return the same default sequence of pseudorandom numbers. (They will still be "random", but the same sequence will occur every time you run the program. If randomize() returns 0, the sequence will be different every time you run the program.)
Most standard C rand() implementations are linear congruental pseudorandom number generators, often with poor choices of parameters, and as a result, are slowish, and not very "random".
For non-cryptographic work, I like to implement one of the Xorshift family of functions, originally by George Marsaglia. They are very, very fast, and reasonably random; they pass most of the statistical randomness tests like the diehard tests.
In OP's case, the xorwow generator could be used. According to current C standards, unsigned int is at least 32 bits, so we can use that as the generator type. Let's see what implementing one to replace the standard srand()/rand() would look like:
#include <stdlib.h>
#include <stdio.h>
/* The Xorwow PRNG state. This must not be initialized to all zeros. */
static unsigned int prng_state[5] = { 1, 2, 3, 4, 5 };
/* The Xorwow is a 32-bit linear-feedback shift generator. */
#define PRNG_MAX 4294967295u
unsigned int prng(void)
{
unsigned int s, t;
t = prng_state[3] & PRNG_MAX;
t ^= t >> 2;
t ^= t << 1;
prng_state[3] = prng_state[2];
prng_state[2] = prng_state[1];
prng_state[1] = prng_state[0];
s = prng_state[0] & PRNG_MAX;
t ^= s;
t ^= (s << 4) & PRNG_MAX;
prng_state[0] = t;
prng_state[4] = (prng_state[4] + 362437) & PRNG_MAX;
return (t + prng_state[4]) & PRNG_MAX;
}
static int prng_randomize_from(FILE *in)
{
size_t have = 0, n;
unsigned int seed[5] = { 0, 0, 0, 0, 0 };
if (!in)
return -1;
while (have < 5) {
n = fread(seed + have, sizeof seed[0], 5 - have, in);
if (n > 0 && ((seed[0] | seed[1] | seed[2] | seed[3] | seed[4]) & PRNG_MAX) != 0) {
have += n;
} else {
fclose(in);
return -1;
}
}
fclose(in);
prng_seed[0] = seed[0] & PRNG_MAX;
prng_seed[1] = seed[1] & PRNG_MAX;
prng_seed[2] = seed[2] & PRNG_MAX;
prng_seed[3] = seed[3] & PRNG_MAX;
prng_seed[4] = seed[4] & PRNG_MAX;
/* Note: We might wish to "churn" the pseudorandom
number generator state, to call prng()
a few hundred or thousand times. For example:
for (n = 0; n < 1000; n++) prng();
This way, even if the seed has clear structure,
for example only some low bits set, we start
with a PRNG state with set and clear bits well
distributed.
*/
return 0;
}
int prng_randomize(void)
{
if (!prng_randomize_from(fopen("/dev/urandom", "r")))
return 0;
if (!prng_randomize_from(fopen("/dev/arandom", "r")))
return 0;
if (!prng_randomize_from(fopen("/dev/random", "r")))
return 0;
/* Other sources? */
/* No randomness sources found. */
return -1;
}
The corresponding main() to above would be
int main(void)
{
int i;
if (prng_randomize())
fprintf(stderr, "Warning: No randomness sources found!\n");
for (i = 0; i < 10; i++)
printf("%u\n", prng());
return EXIT_SUCCESS;
}
Note that PRNG_MAX has a dual purpose. On one hand, it tells the maximum value prng() can return -- which is an unsigned int, not int like rand(). On the other hand, because it must be 232-1 = 4294967295, we also use it to ensure the temporary results when generating the next pseudorandom number in the sequence remain 32-bit. If the uint32_t type, declared in stdint.h or inttypes.h were available, we could use that and drop the masks (& PRNG_MAX).
Note that the prng_randomize_from() function is written so that it still works, even if the randomness source cannot provide all requested bytes at once, and returns a "short count". Whether this occurs in practice is up to debate, but I prefer to be certain. Also note that it does not accept the state if it is all zeros, as that is the one single prohibited initial seed state for the Xorwow PRNG.
You can obviously use both srand()/rand() and prng()/prng_randomize() in the same program. I wrote them so that the Xorwow generator functions all start with prng.
Usually, I do put the PRNG implementation into a header file, so that I can easily test it (to verify it works) by writing a tiny test program; but also so that I can switch the PRNG implementation simply by switching to another header file. (In some cases, I put the PRNG state into a structure, and have the caller provide a pointer to the state, so that any number of PRNGs can be used concurrently, independently of each other.)
however that doesn't seem to work for me getpid hasn't been declared.
That's because you need to include the headers for getpid():
#include <sys/types.h>
#include <unistd.h>
Another option is to use time() to seed (instead of getpid()):
srand((unsigned int)time(NULL));
As other answer pointed, you need to include the unistd.h header. If you don't want to do that then put the declaration of getpid() above main(). Read the manual page of getpid() here http://man7.org/linux/man-pages/man2/getpid.2.html
One approach may be
#include <stdio.h>
#include <stdlib.h>
pid_t getpid(void); /* put the declrataion of getpid(), if don't want to include the header */
int main(void) {
/* .. some code .. */
return 0;
}
Or you can use time() like
srand((unsigned int)time(NULL));

Fisher Yates algorithm gives back same order of numbers in parallel started programs when seeded over the system time

I start several C / C++ programs in parallel, which rely on random numbers. Fairly new to this topic, I heard that the seed should be done over the time.
Furthermore, I use the Fisher Yates Algorithm to get a list with unique random shuffled values. However, starting the program twice in parallel gives back the same results for both lists.
How can I fix this? Can I use a different, but still relient seed?
My simple test code for this looks like this:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
static int rand_int(int n) {
int limit = RAND_MAX - RAND_MAX % n;
int rnd;
do {
rnd = rand();
}
while (rnd >= limit);
return rnd % n;
}
void shuffle(int *array, int n) {
int i, j, tmp;
for (i = n - 1; i > 0; i--) {
j = rand_int(i + 1);
tmp = array[j];
array[j] = array[i];
array[i] = tmp;
}
}
int main(int argc,char* argv[]){
srand(time(NULL));
int x = 100;
int randvals[100];
for(int i =0; i < x;i++)
randvals[i] = i;
shuffle(randvals,x);
for(int i=0;i < x;i++)
printf("%d %d \n",i,randvals[i]);
}
I used the implementation for the fisher yates algorithm from here:
http://www.sanfoundry.com/c-program-implement-fisher-yates-algorithm-array-shuffling/
I started the programs in parallel like this:
./randomprogram >> a.txt & ./randomprogram >> b.txt
and then compared both text files, which had the same content.
The end application is for data augmentation in the deep learning field. The machine runs Ubuntu 16.04 with C++11.
You're getting the same results due to how you're seeding the RNG:
srand(time(NULL));
The time function returns the time in seconds since the epoch. If two instances of the program start during the same second (which is likely if start them in quick succession) then both will use the same seed and get the same set of random values.
You need to add more entropy to your seed. A simple way of doing this is to bitwise-XOR the process ID with the time:
srand(time(NULL) ^ getpid());
As I mentioned in a comment, I like to use a Xorshift* pseudo-random number generator, seeded from /dev/urandom if present, otherwise using POSIX.1 clock_gettime() and getpid() to seed the generator.
It is good enough for most statistical work, but obviously not for any kind of security or cryptographic purposes.
Consider the following xorshift64.h inline implementation:
#ifndef XORSHIFT64_H
#define XORSHIFT64_H
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <time.h>
#ifndef SEED_SOURCE
#define SEED_SOURCE "/dev/urandom"
#endif
typedef struct {
uint64_t state[1];
} prng_state;
/* Mixes state by generating 'rounds' pseudorandom numbers,
but does not store them anywhere. This is often done
to ensure a well-mixed state after seeding the generator.
*/
static inline void prng_skip(prng_state *prng, size_t rounds)
{
uint64_t state = prng->state[0];
while (rounds-->0) {
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
}
prng->state[0] = state;
}
/* Returns an uniform pseudorandom number between 0 and 2**64-1, inclusive.
*/
static inline uint64_t prng_u64(prng_state *prng)
{
uint64_t state = prng->state[0];
state ^= state >> 12;
state ^= state << 25;
state ^= state >> 27;
prng->state[0] = state;
return state * UINT64_C(2685821657736338717);
}
/* Returns an uniform pseudorandom number [0, 1), excluding 1.
This carefully avoids the (2**64-1)/2**64 bias on 0,
but assumes that the double type has at most 63 bits of
precision in the mantissa.
*/
static inline double prng_one(prng_state *prng)
{
uint64_t u;
double d;
do {
do {
u = prng_u64(prng);
} while (!u);
d = (double)(u - 1u) / 18446744073709551616.0;
} while (d == 1.0);
return d;
}
/* Returns an uniform pseudorandom number (-1, 1), excluding -1 and +1.
This carefully avoids the (2**64-1)/2**64 bias on 0,
but assumes that the double type has at most 63 bits of
precision in the mantissa.
*/
static inline double prng_delta(prng_state *prng)
{
uint64_t u;
double d;
do {
do {
u = prng_u64(prng);
} while (!u);
d = ((double)(u - 1u) - 9223372036854775808.0) / 9223372036854775808.0;
} while (d == -1.0 || d == 1.0);
return d;
}
/* Returns an uniform pseudorandom integer between min and max, inclusive.
Uses the exclusion method to ensure uniform distribution.
*/
static inline uint64_t prng_range(prng_state *prng, const uint64_t min, const uint64_t max)
{
if (min != max) {
const uint64_t basis = (min < max) ? min : max;
const uint64_t range = (min < max) ? max-min : min-max;
uint64_t mask = range;
uint64_t u;
/* In range, all bits up to the higest bit set in range, must be set. */
mask |= mask >> 1;
mask |= mask >> 2;
mask |= mask >> 4;
mask |= mask >> 8;
mask |= mask >> 16;
mask |= mask >> 32;
/* In all cases, range <= mask < 2*range, so at worst case,
(mask = 2*range-1), this excludes at most 50% of generated values,
on average. */
do {
u = prng_u64(prng) & mask;
} while (u > range);
return u + basis;
} else
return min;
}
static inline void prng_seed(prng_state *prng)
{
#if _POSIX_TIMERS-0 > 0
struct timespec now;
#endif
FILE *src;
/* Try /dev/urandom. */
src = fopen(SEED_SOURCE, "r");
if (src) {
int tries = 16;
while (tries-->0) {
if (fread(prng->state, sizeof prng->state, 1, src) != 1)
break;
if (prng->state[0]) {
fclose(src);
return;
}
}
fclose(src);
}
#if _POSIX_TIMERS-0 > 0
#if _POSIX_MONOTONIC_CLOCK-0 > 0
if (clock_gettime(CLOCK_MONOTONIC, &now) == 0) {
prng->state[0] = (uint64_t)((uint64_t)now.tv_sec * UINT64_C(60834327289))
^ (uint64_t)((uint64_t)now.tv_nsec * UINT64_C(34958268769))
^ (uint64_t)((uint64_t)getpid() * UINT64_C(2772668794075091))
^ (uint64_t)((uint64_t)getppid() * UINT64_C(19455108437));
if (prng->state[0])
return;
} else
#endif
if (clock_gettime(CLOCK_REALTIME, &now) == 0) {
prng->state[0] = (uint64_t)((uint64_t)now.tv_sec * UINT64_C(60834327289))
^ (uint64_t)((uint64_t)now.tv_nsec * UINT64_C(34958268769))
^ (uint64_t)((uint64_t)getpid() * UINT64_C(2772668794075091))
^ (uint64_t)((uint64_t)getppid() * UINT64_C(19455108437));
if (prng->state[0])
return;
}
#endif
prng->state[0] = (uint64_t)((uint64_t)time(NULL) * UINT64_C(60834327289))
^ (uint64_t)((uint64_t)clock() * UINT64_C(34958268769))
^ (uint64_t)((uint64_t)getpid() * UINT64_C(2772668794075091))
^ (uint64_t)((uint64_t)getppid() * UINT64_C(19455108437));
if (!prng->state[0])
prng->state[0] = (uint64_t)UINT64_C(16233055073);
}
#endif /* XORSHIFT64_H */
If it can seed the state from SEED_SOURCE, it is used as-is. Otherwise, if POSIX.1 clock_gettime() is available, it is used (CLOCK_MONOTONIC, if possible; otherwise CLOCK_REALTIME). Otherwise, time (time(NULL)), CPU time spent thus far (clock()), process ID (getpid()), and parent process ID (getppid()) are used to seed the state.
If you wanted the above to also run on Windows, you'd need to add a few #ifndef _WIN32 guards, and either omit the process ID parts, or replace them with something else. (I don't use Windows myself, and cannot test such code, so I omitted such from above.)
The idea is that you can include the above file, and implement other pseudo-random number generators in the same format, and choose between them by simply including different files. (You can include multiple files, but you'll need to do some ugly #define prng_state prng_somename_state, #include "somename.h", #undef prng_state hacking to ensure unique names for each.)
Here is an example of how to use the above:
#include <stdlib.h>
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include "xorshift64.h"
int main(void)
{
prng_state prng1, prng2;
prng_seed(&prng1);
prng_seed(&prng2);
printf("Seed 1 = 0x%016" PRIx64 "\n", prng1.state[0]);
printf("Seed 2 = 0x%016" PRIx64 "\n", prng2.state[0]);
printf("After skipping 16 rounds:\n");
prng_skip(&prng1, 16);
prng_skip(&prng2, 16);
printf("Seed 1 = 0x%016" PRIx64 "\n", prng1.state[0]);
printf("Seed 2 = 0x%016" PRIx64 "\n", prng2.state[0]);
return EXIT_SUCCESS;
}
Obviously, initializing two PRNGs like this is problematic in the fallback case, because it basically relies on clock() yielding different values for consecutive calls (so expects each call to take at least 1 millisecond of CPU time).
However, even a small change in the seeds thus generated is sufficient to yield very different sequences. I like to generate and discard (skip) a number of initial values to ensure the generator state is well mixed:
Seed 1 = 0x8a62585b6e71f915
Seed 2 = 0x8a6259a84464e15f
After skipping 16 rounds:
Seed 1 = 0x9895f664c83ad25e
Seed 2 = 0xa3fd7359dd150e83
The header also implements 0 <= prng_u64() < 2**64, 0 <= prng_one() < 1, -1 < prng_delta() < +1, and min <= prng_range(,min,max) <= max, which should be uniform.
I use the above Xorshift64* variant for tasks where a lot of quite uniform pseudorandom numbers are needed, so the functions also tend to use the faster methods (like max. 50% average exclusion rate rather than 64-bit modulus operation, and so on) (of those that I know of).
Additionally, if you require repeatability, you can simply save a randomly-seeded prng_state structure (a single uint64_t), and load it later, to reproduce the exact same sequence. Just remember to only do the skipping (generate-and-discard) only after randomly seeding, not after loading a new seed from a file.
Converting rather copious comments into an answer.
If two programs are started in the same second, they'll both have the same sequence of random numbers.
Consider whether you need to use a better random number generator than the rand()/srand() duo — that is usually only barely random (better than nothing, but not by a large margin). Do NOT use them for cryptography.
I asked about platform; you responded Ubuntu 16.04 LTS.
Use /dev/urandom or /dev/random to get some random bytes for the seed.
On many Unix-like platforms, there's a device /dev/random — on Linux, there's also a slightly lower-quality device /dev/urandom which won't block whereas /dev/random might. Systems such as macOS (BSD) have /dev/urandom as a synonym for /dev/random for Linux compatibility. You can open it and read 4 bytes (or the relevant number of bytes) of random data, and use that as a seed for the PRNG of your choice.
I often use the drand48() set of functions because they are in POSIX and were in System V Unix. They're usually adequate for my needs.
Look at the manuals across platforms; there are often other random number generators. C++11 provides high-quality PRNG — the header <random> has a number of different ones, such as the MT 19937 (Mersenne Twister). MacOS Sierra (BSD) has random(3) and arc4random(3) as alternatives to rand() – as well as drand48() et al.
Another possibility on Linux is simply to keep a connection to /dev/urandom open, reading more bytes when you need them. However, that gives up any chance of replaying a random sequence. The PRNG systems have the merit of allowing you to replay the same sequence again by recording and setting the random seed that you use. By default, grab a seed from /dev/urandom, but if the user requests it, take a seed from the command line, and report the seed used (at least on request).

Use system implementation if find, otherwise use my own implementation

I'm try in to use fls in my routine. However, not every system has this function. So, I ship my own version of fls. I'm wondering if there is any way to let the program use the system implementation and not found, use my own implementation?
#include "strings.h"
#include <stdio.h>
int fls(int mask);
int foo(int N)
{
int tmp = 1 << (fls(N));
return tmp;
}
/*
* Find Last Set bit
*/
int
fls(int mask)
{
int bit;
if (mask == 0)
return (0);
for (bit = 1; mask != 1; bit++)
mask = (unsigned int) mask >> 1;
return (bit);
}
You can use a weak function.
https://en.wikipedia.org/wiki/Weak_symbol
By default, without any annotation, a symbol in an object file is
strong. During linking, a strong symbol can override a weak symbol of
the same name.
Same question for C++, slightly different from C implementation Can I re-define a function or check if it exists?
int __attribute__((weak)) fls(int mask){ .. }
so if system fls is defined as strong, your fls implementation will be overridden.

C: fastest way to evaluate a function on a finite set of small integer values by using a lookup table?

I am currently working on a project where I would like to optimize some numerical computation in Python by calling C.
In short, I need to compute the value of y[i] = f(x[i]) for each element in an huge array x (typically has 10^9 entries or more). Here, x[i] is an integer between -10 and 10 and f is function that takes x[i] and returns a double. My issue is that f but it takes a very long time to evaluate in a way that is numerically stable.
To speed things up, I would like to just hard code all 2*10 + 1 possible values of f(x[i]) into constant array such as:
double table_of_values[] = {f(-10), ...., f(10)};
And then just evaluate f using a "lookup table" approach as follows:
for (i = 0; i < N; i++) {
y[i] = table_of_values[x[i] + 11]; //instead of y[i] = f(x[i])
}
Since I am not really well-versed at writing optimized code in C, I am wondering:
Specifically - since x is really large - I'm wondering if it's worth doing second-degree optimization when evaluating the loop (e.g. by sorting x beforehand, or by finding a smart way to deal with the negative indices (aside from just doing [x[i] + 10 + 1])?
Say x[i] were not between -10 and 10, but between -20 and 20. In this case, I could still use the same approach, but would need to hard code the lookup table manually. Is there a way to generate the look-up table dynamically in the code so that I make use of the same approach and allow for x[i] to belong to a variable range?
It's fairly easy to generate such a table with dynamic range values.
Here's a simple, single table method:
#include <malloc.h>
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
double *table_of_values;
int table_bias;
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 0
typedef short xval_t;
#endif
#if 1
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
// fslow -- your original function
double
fslow(int i)
{
return 1; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi)
{
int len;
table_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free(table_of_values) when no longer needed
table_of_values = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
table_of_values[i + table_bias] = f(i);
}
// fcached -- retrieve cached table data
double
fcached(int i)
{
return table_of_values[i + table_bias];
}
// fripper -- access x and table arrays
void
fripper(xval_t *x)
{
double *tptr;
int bias;
double val;
// ensure these go into registers to prevent needless extra memory fetches
tptr = table_of_values;
bias = table_bias;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
VARIABLE_USED(val);
}
}
int
main(void)
{
ftablegen(fslow,-10,10);
x = malloc(sizeof(xval_t) * XLEN);
fripper(x);
return 0;
}
Here's a slightly more complex way that allows many similar tables to be generated:
#include <malloc.h>
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 1
typedef short xval_t;
#endif
#if 0
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
struct table {
int tbl_lo; // lowest index
int tbl_hi; // highest index
int tbl_bias; // bias for index
double *tbl_data; // cached data
};
struct table ftable1;
struct table ftable2;
double
fslow(int i)
{
return 1; // whatever
}
double
f2(int i)
{
return 2; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi,struct table *tbl)
{
int len;
tbl->tbl_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free tbl_data when no longer needed
tbl->tbl_data = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
tbl->tbl_data[i + tbl->tbl_bias] = fslow(i);
}
// fcached -- retrieve cached table data
double
fcached(struct table *tbl,int i)
{
return tbl->tbl_data[i + tbl->tbl_bias];
}
// fripper -- access x and table arrays
void
fripper(xval_t *x,struct table *tbl)
{
double *tptr;
int bias;
double val;
// ensure these go into registers to prevent needless extra memory fetches
tptr = tbl->tbl_data;
bias = tbl->tbl_bias;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
VARIABLE_USED(val);
}
}
int
main(void)
{
x = malloc(sizeof(xval_t) * XLEN);
// NOTE: we could use 'char' for xval_t ...
ftablegen(fslow,-37,62,&ftable1);
fripper(x,&ftable1);
// ... but, this forces us to use a 'short' for xval_t
ftablegen(f2,-99,307,&ftable2);
return 0;
}
Notes:
fcached could/should be an inline function for speed. Notice that once the table is calculated once, fcached(x[i]) is quite fast. The index offset issue you mentioned [solved by the "bias"] is trivially small in calculation time.
While x may be a large array, the cached array for f() values is fairly small (e.g. -10 to 10). Even if it were (e.g.) -100 to 100, this is still about 200 elements. This small cached array will [probably] stay in the hardware memory cache, so access will remain quite fast.
Thus, sorting x to optimize H/W cache performance of the lookup table will have little to no [measurable] effect.
The access pattern to x is independent. You'll get best performance if you access x in a linear manner (e.g. for (i = 0; i < 999999999; ++i) x[i]). If you access it in a semi-random fashion, it will put a strain on the H/W cache logic and its ability to keep the needed/wanted x values "cache hot"
Even with linear access, because x is so large, by the time you get to the end, the first elements will have been evicted from the H/W cache (e.g. most CPU caches are on the order of a few megabytes)
However, if x only has values in a limited range, changing the type from int x[...] to short x[...] or even char x[...] cuts the size by a factor of 2x [or 4x]. And, that can have a measurable improvement on the performance.
Update: I've added an fripper function to show the fastest way [that I know of] to access the table and x arrays in a loop. I've also added a typedef named xval_t to allow the x array to consume less space (i.e. will have better H/W cache performance).
UPDATE #2:
Per your comments ...
fcached was coded [mostly] to illustrate simple/single access. But, it was not used in the final example.
The exact requirements for inline has varied over the years (e.g. was extern inline). Best use now: static inline. However, if using c++, it may be, yet again different. There are entire pages devoted to this. The reason is because of compilation in different .c files, what happens when optimization is on or off. Also, consider using a gcc extension. So, to force inline all the time:
__attribute__((__always_inline__)) static inline
fripper is the fastest because it avoids refetching globals table_of_values and table_bias on each loop iteration. In fripper, compiler optimizer will ensure they remain in registers. See my answer: Is accessing statically or dynamically allocated memory faster? as to why.
However, I coded an fripper variant that uses fcached and the disassembled code was the same [and optimal]. So, we can disregard that ... Or, can we? Sometimes, disassembling the code is a good cross check and the only way to know for sure. Just an extra item when creating fully optimized C code. There are many options one can give to the compiler regarding code generation, so sometimes it's just trial and error.
Because benchmarking is important, I threw in my routines for timestamping (FYI, [AFAIK] the underlying clock_gettime call is the basis for python's time.clock()).
So, here's the updated version:
#include <malloc.h>
#include <time.h>
typedef long long s64;
#define SUPER_INLINE \
__attribute__((__always_inline__)) static inline
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
#define TVSEC 1000000000LL // nanoseconds in a second
#define TVSECF 1e9 // nanoseconds in a second
// tvget -- get high resolution time of day
// RETURNS: absolute nanoseconds
s64
tvget(void)
{
struct timespec ts;
s64 nsec;
clock_gettime(CLOCK_REALTIME,&ts);
nsec = ts.tv_sec;
nsec *= TVSEC;
nsec += ts.tv_nsec;
return nsec;
)
// tvgetf -- get high resolution time of day
// RETURNS: fractional seconds
double
tvgetf(void)
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_REALTIME,&ts);
sec = ts.tv_nsec;
sec /= TVSECF;
sec += ts.tv_sec;
return sec;
)
double *table_of_values;
int table_bias;
double *dummyptr;
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 0
typedef short xval_t;
#endif
#if 1
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
// fslow -- your original function
double
fslow(int i)
{
return 1; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi)
{
int len;
table_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free(table_of_values) when no longer needed
table_of_values = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
table_of_values[i + table_bias] = f(i);
}
// fcached -- retrieve cached table data
SUPER_INLINE double
fcached(int i)
{
return table_of_values[i + table_bias];
}
// fripper_fcached -- access x and table arrays
void
fripper_fcached(xval_t *x)
{
double val;
double *dptr;
dptr = dummyptr;
for (int i = 0; i < XLEN; ++i) {
val = fcached(x[i]);
// do stuff with val
dptr[i] = val;
}
}
// fripper -- access x and table arrays
void
fripper(xval_t *x)
{
double *tptr;
int bias;
double val;
double *dptr;
// ensure these go into registers to prevent needless extra memory fetches
tptr = table_of_values;
bias = table_bias;
dptr = dummyptr;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
dptr[i] = val;
}
}
int
main(void)
{
ftablegen(fslow,-10,10);
x = malloc(sizeof(xval_t) * XLEN);
dummyptr = malloc(sizeof(double) * XLEN);
fripper(x);
fripper_fcached(x);
return 0;
}
You can have negative indices in your arrays. (I am not sure if this is in the specifications.) If you have the following code:
int arr[] = {1, 2 ,3, 4, 5};
int* lookupTable = arr + 3;
printf("%i", lookupTable[-2]);
it will print out 2.
This works because arrays in c are defined as pointers. And if the pointer does not point to the begin of the array, you can access the item before the pointer.
Keep in mind though that if you have to malloc() the memory for arr you probably cannot use free(lookupTable) to free it.
I really think Craig Estey is on the right track for building your table in an automatic way. I just want to add a note for looking up the table.
If you know that you will run the code on a Haswell machine (with AVX2) you should make sure your code utilise VGATHERDPD which you can utilize with the _mm256_i32gather_pd intrinsic. If you do that, your table lookups will fly! (You can even detect avx2 on the fly with cpuid(), but that's another story)
EDIT:
Let me elaborate with some code:
#include <stdint.h>
#include <stdio.h>
#include <immintrin.h>
/* I'm not sure if you need the alignment */
double table[8] __attribute__((aligned(16)))= { 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 };
int main()
{
int32_t i[4] = { 0,2,4,6 };
__m128i index = _mm_load_si128( (__m128i*) i );
__m256d result = _mm256_i32gather_pd( table, index, 8 );
double* f = (double*)&result;
printf("%f %f %f %f\n", f[0], f[1], f[2], f[3]);
return 0;
}
Compile and run:
$ gcc --std=gnu99 -mavx2 gathertest.c -o gathertest && ./gathertest
0.100000 0.300000 0.500000 0.700000
This is fast!

c macro for setting bits

I have a program that compares variables from two structs and sets a bit accordingly for a bitmap variable. I have to compare each variables of the struct. No. of variables in reality are more for each struct but for simplicity I took 3. I wanted to know if i can create a macro for comparing the variables and setting the bit in the bitmap accordingly.
#include<stdio.h>
struct num
{
int a;
int b;
int c;
};
struct num1
{
int d;
int e;
int f;
};
enum type
{
val1 = 0,
val2 = 1,
val3 = 2,
};
int main()
{
struct num obj1;
struct num1 obj2;
int bitmap = 0;
if( obj1.a != obj2.d)
{
bitmap = bitmap | val1;
}
if (obj1.b != obj2.e)
bitmap = bitmap | val2;
printf("bitmap - %d",bitmap);
return 1;
}
can i declare a macro like...
#define CHECK(cond)
if (!(cond))
printf(" failed check at %x: %s",__LINE__, #cond);
//set the bit accordingly
#undef CHECK
With a modicum of care, you can do it fairly easily. You just need to identify what you're comparing and setting carefully, and pass them as macro parameters. Example usage:
CHECK(obj1.a, obj2.d, bitmap, val1);
CHECK(obj1.b, obj2.e, bitmap, val2);
This assumes that CHECK is defined something like:
#define STRINGIFY(expr) #expr
#define CHECK(v1, v2, bitmap, bit) do \
{ if ((v1) != (v2)) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(v1 != v2)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
You can lay the macro out however you like, of course; I'm not entirely happy with that, but it isn't too awful.
Demo Code
Compilation and test run:
$ gcc -Wall -Wextra -g -O3 -std=c99 xx.c -o xx && ./xx
failed check at 40: obj1.a != obj2.d
failed check at 42: obj1.c != obj2.f
bitmap - 5
$
Actual code:
#include <stdio.h>
struct num
{
int a;
int b;
int c;
};
struct num1
{
int d;
int e;
int f;
};
enum type
{
val1 = 0,
val2 = 1,
val3 = 2,
};
#define STRINGIFY(expr) #expr
#define CHECK(v1, v2, bitmap, bit) do \
{ if ((v1) != (v2)) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(v1 != v2)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
int main(void)
{
struct num obj1 = { 1, 2, 3 };
struct num1 obj2 = { 2, 2, 4 };
int bitmap = 0;
CHECK(obj1.a, obj2.d, bitmap, val1);
CHECK(obj1.b, obj2.e, bitmap, val2);
CHECK(obj1.c, obj2.f, bitmap, val3);
printf("bitmap - %X\n", bitmap);
return 0;
}
Clearly, this code relies on you matching the right elements and bit numbers in the invocations of the CHECK macro.
It's possible to devise more complex schemes using offsetof() etc and initialized arrays describing the data structures, etc, but you'd end up with a more complex system and little benefit. In particular, the invocations can't reduce the parameter count much. You could assume 'bitmap' is the variable. You need to identify the two objects, so you'll specify 'obj1' and 'obj2'. Somewhere along the line, you need to identify which fields are being compared and the bit to set. That could be some single value (maybe the bit number), but you've still got 3 arguments (CHECK(obj1, obj2, valN) and the assumption about bitmap) or 4 arguments (CHECK(obj1, obj2, bitmap, valN) without the assumption about bitmap), but a lot of background complexity and probably a greater chance of getting it wrong. If you can tinker with the code so that you have a single type instead of two types, etc, then you can make life easier with the hypothetical system, but it is still simpler to handle things the way shown in the working code, I think.
I concur with gbulmer that I probably wouldn't do things this way, but you did state that you had reduced the sizes of the structures dramatically (for which, thanks!) and it would become more enticing as the number of fields increases (but I'd only write out the comparisons for one pair of structure types once, in a single function).
You could also revise the macro to:
#define CHECK(cond, bitmap, bit) do \
{ if (cond) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(cond)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
CHECK(obj1.a != obj2.d, bitmap, val1);
...
CHECK((strcmp(obj3.str1, obj4.str) != 0), bitmap, val6);
where the last line shows that this would allow you to choose arbitrary comparisons, even if they contain commas. Note the extra set of parentheses surrounding the call to strcmp()!
You should be able to do that except you need to use backslash for multi-line macros
#ifndef CHECK
#define CHECK(cond) \
if (!(cond)) { \
printf(" failed check at %x: %s",__LINE__, #cond); \
//set the bit accordingly
}
#endif /* CHECK */
If you want to get really fancy (and terse), you can use the concatenation operator. I also recommend changing your structures around a little bit to have different naming conventions, though without knowing what you're trying to do with it, it's hard to say. I also noticed in your bit field that you have one value that's 0; that won't tell you much when you try to look at that bit value. If you OR 0 into anything, it remains unchanged. Anyway, here's your program slightly re-written:
struct num {
int x1; // formerly a/d
int x2; // formerly b/e
int x3; // formerly c/f
};
enum type {
val1 = 1, // formerly 0
val2 = 2, // formerly 1
val3 = 4, // formerly 2
};
// CHECK uses the catenation operator (##) to construct obj1.x1, obj1.x2, etc.
#define CHECK(__num) {\
if( obj1.x##__num != obj2.x##__num )\
bitmap |= val##__num;\
}
void main( int argc, char** argv ) {
struct num obj1;
struct num obj2;
int bitmap = 0;
CHECK(1);
CHECK(2);
CHECK(3);
}
As a reasonable rule of thumb, when trying to do bit-arrays is C, there needs to be a number that can be used to index the bit.
You can either pass that bit number into the macro, or try to derive it.
Pretty much the only thing available at compile time or run time is the address of a field.
So you could use that.
There are a few questions to understand if it might work.
For your structs:
Are all the fields in the same order? I.e. you can compare c with f, and not c with e?
Do all of the corresponding fields have the same type
Is the condition just equality? Each macro will have the condition wired in, so each condition needs a new macro.
If the answer to all is yes, then you could use the address:
#define CHECK(s1, f1, s2, f2) do \
{ if ((&s1.f1-&s1 != &s2.f2-&s2) || (sizeof(s1.f1)!=sizeof(s2.f2)) \
|| (s1.f1) != (s2.f2) \
{ printf("failed check at %d: ", #s1 "." #f1 "!=" #s1 "." #f1 "\n", \
__LINE__); \
(shared_bitmap) |= (1 << (&s1.f1-&s1)); // test failed \
} \
} while (0)
I'm not too clear on whether it is a bitmap for all comparisons, or one per struct pair. I've assumed it is a bit map for all.
There is quite a lot of checking to ensure you haven't broken 'the two rules':
(&s1.f1-&s1 != &s2.f2-&s2) || (sizeof(s1.f1)!=sizeof(s2.f2))
If you are confident that the tests will be correct, without those constraints, just throw that part of the test away.
WARNING I have not compiled that code.
This becomes much simpler if the values are an array.
I probably wouldn't use it. It seems a bit too tricky to me :-)

Resources