c macro for setting bits - c

I have a program that compares variables from two structs and sets a bit accordingly for a bitmap variable. I have to compare each variables of the struct. No. of variables in reality are more for each struct but for simplicity I took 3. I wanted to know if i can create a macro for comparing the variables and setting the bit in the bitmap accordingly.
#include<stdio.h>
struct num
{
int a;
int b;
int c;
};
struct num1
{
int d;
int e;
int f;
};
enum type
{
val1 = 0,
val2 = 1,
val3 = 2,
};
int main()
{
struct num obj1;
struct num1 obj2;
int bitmap = 0;
if( obj1.a != obj2.d)
{
bitmap = bitmap | val1;
}
if (obj1.b != obj2.e)
bitmap = bitmap | val2;
printf("bitmap - %d",bitmap);
return 1;
}
can i declare a macro like...
#define CHECK(cond)
if (!(cond))
printf(" failed check at %x: %s",__LINE__, #cond);
//set the bit accordingly
#undef CHECK

With a modicum of care, you can do it fairly easily. You just need to identify what you're comparing and setting carefully, and pass them as macro parameters. Example usage:
CHECK(obj1.a, obj2.d, bitmap, val1);
CHECK(obj1.b, obj2.e, bitmap, val2);
This assumes that CHECK is defined something like:
#define STRINGIFY(expr) #expr
#define CHECK(v1, v2, bitmap, bit) do \
{ if ((v1) != (v2)) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(v1 != v2)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
You can lay the macro out however you like, of course; I'm not entirely happy with that, but it isn't too awful.
Demo Code
Compilation and test run:
$ gcc -Wall -Wextra -g -O3 -std=c99 xx.c -o xx && ./xx
failed check at 40: obj1.a != obj2.d
failed check at 42: obj1.c != obj2.f
bitmap - 5
$
Actual code:
#include <stdio.h>
struct num
{
int a;
int b;
int c;
};
struct num1
{
int d;
int e;
int f;
};
enum type
{
val1 = 0,
val2 = 1,
val3 = 2,
};
#define STRINGIFY(expr) #expr
#define CHECK(v1, v2, bitmap, bit) do \
{ if ((v1) != (v2)) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(v1 != v2)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
int main(void)
{
struct num obj1 = { 1, 2, 3 };
struct num1 obj2 = { 2, 2, 4 };
int bitmap = 0;
CHECK(obj1.a, obj2.d, bitmap, val1);
CHECK(obj1.b, obj2.e, bitmap, val2);
CHECK(obj1.c, obj2.f, bitmap, val3);
printf("bitmap - %X\n", bitmap);
return 0;
}
Clearly, this code relies on you matching the right elements and bit numbers in the invocations of the CHECK macro.
It's possible to devise more complex schemes using offsetof() etc and initialized arrays describing the data structures, etc, but you'd end up with a more complex system and little benefit. In particular, the invocations can't reduce the parameter count much. You could assume 'bitmap' is the variable. You need to identify the two objects, so you'll specify 'obj1' and 'obj2'. Somewhere along the line, you need to identify which fields are being compared and the bit to set. That could be some single value (maybe the bit number), but you've still got 3 arguments (CHECK(obj1, obj2, valN) and the assumption about bitmap) or 4 arguments (CHECK(obj1, obj2, bitmap, valN) without the assumption about bitmap), but a lot of background complexity and probably a greater chance of getting it wrong. If you can tinker with the code so that you have a single type instead of two types, etc, then you can make life easier with the hypothetical system, but it is still simpler to handle things the way shown in the working code, I think.
I concur with gbulmer that I probably wouldn't do things this way, but you did state that you had reduced the sizes of the structures dramatically (for which, thanks!) and it would become more enticing as the number of fields increases (but I'd only write out the comparisons for one pair of structure types once, in a single function).
You could also revise the macro to:
#define CHECK(cond, bitmap, bit) do \
{ if (cond) \
{ printf("failed check at %d: %s\n", __LINE__, STRINGIFY(cond)); \
(bitmap) |= (1 << (bit)); \
} \
} while (0)
CHECK(obj1.a != obj2.d, bitmap, val1);
...
CHECK((strcmp(obj3.str1, obj4.str) != 0), bitmap, val6);
where the last line shows that this would allow you to choose arbitrary comparisons, even if they contain commas. Note the extra set of parentheses surrounding the call to strcmp()!

You should be able to do that except you need to use backslash for multi-line macros
#ifndef CHECK
#define CHECK(cond) \
if (!(cond)) { \
printf(" failed check at %x: %s",__LINE__, #cond); \
//set the bit accordingly
}
#endif /* CHECK */

If you want to get really fancy (and terse), you can use the concatenation operator. I also recommend changing your structures around a little bit to have different naming conventions, though without knowing what you're trying to do with it, it's hard to say. I also noticed in your bit field that you have one value that's 0; that won't tell you much when you try to look at that bit value. If you OR 0 into anything, it remains unchanged. Anyway, here's your program slightly re-written:
struct num {
int x1; // formerly a/d
int x2; // formerly b/e
int x3; // formerly c/f
};
enum type {
val1 = 1, // formerly 0
val2 = 2, // formerly 1
val3 = 4, // formerly 2
};
// CHECK uses the catenation operator (##) to construct obj1.x1, obj1.x2, etc.
#define CHECK(__num) {\
if( obj1.x##__num != obj2.x##__num )\
bitmap |= val##__num;\
}
void main( int argc, char** argv ) {
struct num obj1;
struct num obj2;
int bitmap = 0;
CHECK(1);
CHECK(2);
CHECK(3);
}

As a reasonable rule of thumb, when trying to do bit-arrays is C, there needs to be a number that can be used to index the bit.
You can either pass that bit number into the macro, or try to derive it.
Pretty much the only thing available at compile time or run time is the address of a field.
So you could use that.
There are a few questions to understand if it might work.
For your structs:
Are all the fields in the same order? I.e. you can compare c with f, and not c with e?
Do all of the corresponding fields have the same type
Is the condition just equality? Each macro will have the condition wired in, so each condition needs a new macro.
If the answer to all is yes, then you could use the address:
#define CHECK(s1, f1, s2, f2) do \
{ if ((&s1.f1-&s1 != &s2.f2-&s2) || (sizeof(s1.f1)!=sizeof(s2.f2)) \
|| (s1.f1) != (s2.f2) \
{ printf("failed check at %d: ", #s1 "." #f1 "!=" #s1 "." #f1 "\n", \
__LINE__); \
(shared_bitmap) |= (1 << (&s1.f1-&s1)); // test failed \
} \
} while (0)
I'm not too clear on whether it is a bitmap for all comparisons, or one per struct pair. I've assumed it is a bit map for all.
There is quite a lot of checking to ensure you haven't broken 'the two rules':
(&s1.f1-&s1 != &s2.f2-&s2) || (sizeof(s1.f1)!=sizeof(s2.f2))
If you are confident that the tests will be correct, without those constraints, just throw that part of the test away.
WARNING I have not compiled that code.
This becomes much simpler if the values are an array.
I probably wouldn't use it. It seems a bit too tricky to me :-)

Related

Function to generate the corresponding mask for a bit field

I have a 32 bit register R with various bit fields declared as follows:
typedef union {
uint32_t raw;
struct {
uint32_t F1 : 0x4;
uint32_t F2 : 0x8;
uint32_t F3 : 0x8;
uint32_t F4 : 0xC;
}
} reg1
I also have a regWrite macro that read-modify-writes a field in the register as follows:
#define RegWrite(Reg, Field, Addr, Val) do {
Reg.raw = read32(Addr);
Reg.Field = Val;
write32(Addr, Reg.raw);
} while(0)
Now, I wanted to enhance the RegWrite module to optionally output a script to console instead of actually programming hardware, so that this can be saved and re-run at a later point of time.
For example, if I call out to regWrite as follows:
regWrite(reg1, F2, 0x12345678, 0xC)
The print output from the macro should look something like this:
set variable1 [read 32 0x12345678]
set variable1 [ ($variable1 & 0xFFFFF00F) | (0xC << 4) ]
write 32 0x12345678 variable1
How would I generate 0xFFFFF00F, and 4 within the macro? Thanks!
Well, your question lacks some important information, including:
What do you try to achive?
Why do you need to give just the struct member name as an argument?
This might be an X-Y-problem.
Anyway, from the literal requirement:
Fn(X) should print out 0xY, and Z.
You can do this with a macro:
#include <stdint.h>
#include <stdio.h>
struct F {
uint32_t F1 : 0x4;
uint32_t F2 : 0x8;
uint32_t F3 : 0x8;
uint32_t F4 : 0xC;
};
#define Fn(Fx) do { \
union { \
struct F f; \
uint32_t u; \
} v; \
v.u = 0; \
v.u = ~v.u; \
v.f.Fx = 0; \
uint32_t m = v.u; \
int b; \
for (b = 0; (v.u & 1) != 0; b++) { \
v.u >>= 1; \
} \
(void)printf("0x%0X %d\n", m, b); \
} while (0)
int main(void) {
/* Fn(F2) should print out 0xFFFFF00F, and 4. */
Fn(F2);
/* Fn(F3) should print out 0xFFF00FFF, and 12. */
Fn(F3);
return 0;
}
Some notes to this hacked "solution":
It uses do { ... } while(0) to make sure that the macro can't be used as an expression, only as a statement.
There is no interpretation of Fx until it is read by the compiler in the line v.f.Fx = 0.
The code is only for C.
Each time it is used it will take clock cycles, and it needs code space. This seems to be unnecessary for constant expressions.
It works by defining a union that can be used as the struct or the resulting uint32_t.
The mask is generated by setting all bits to 1, and then resetting only the given struct member to 0.
The bit offset is obtained by looking for the first 0-bit from the right.
Please be aware that the standard makes no promisses about the order of bitfields in a memory word ("unit"), not even that they are in the same memory word. For further details see the chapter "Structure and union specifiers" of the version of the standard your compiler complies to.
But if you need the values for other purposes you should think about your architectur, and of course of the possibilities of the C standard. As I said, presumably you're trying to solve a completely other problem. And for this, the shown source is not the solution.
OK, I found some time to search for some more usable solution.
#include <stdint.h>
#include <stdio.h>
struct F {
uint32_t F1 : 0x4;
uint32_t F2 : 0x8;
uint32_t F3 : 0x8;
uint32_t F4 : 0xC;
};
typedef union {
struct F f;
uint32_t u;
} Fn_type;
uint32_t Fn_mask_helper(Fn_type v) {
return ~v.u;
}
#define Fn_mask(Fx) Fn_mask_helper((Fn_type){.u=0, .f.Fx=~0})
int Fn_bit_offset_helper(Fn_type v) {
v.u = ~v.u;
int b;
for (b = 0; (v.u & 1) != 0; b++) {
v.u >>= 1;
}
return b;
}
#define Fn_bit_offset(Fx) Fn_bit_offset_helper((Fn_type){.u=0, .f.Fx=~0})
int main(void) {
uint32_t m2 = Fn_mask(F2);
int b2 = Fn_bit_offset(F2);
(void)printf("0x%0X %d\n", m2, b2);
uint32_t m3 = Fn_mask(F3);
int b3 = Fn_bit_offset(F3);
(void)printf("0x%0X %d\n", m3, b3);
return 0;
}
To access the field (struct member) specified in the argument we need to use a macro. In C we can't use the name of a struct member as an argument on its own. Remember, the C preprocessor knows nothing about C. It is a quite simple search'n'replace tool.
This macro expands to the call to its helper function which takes the union as a parameter. The macro replacement text contains an initialization for this union with all bits on 0 but the bits of the concerned struct member.
The helper functions do the same as the macro in my other answer. In Fn_bit_offset_helper() the inversion of v.u together with the right shift ensures that the loop will not loop forever.
Note: You need a compiler in compliance with at least C99.

compile time check for enums [duplicate]

Is there a compile-time way to detect / prevent duplicate values within a C/C++ enumeration?
The catch is that there are multiple items which are initialized to explicit values.
Background:
I've inherited some C code such as the following:
#define BASE1_VAL (5)
#define BASE2_VAL (7)
typedef enum
{
MsgFoo1A = BASE1_VAL, // 5
MsgFoo1B, // 6
MsgFoo1C, // 7
MsgFoo1D, // 8
MsgFoo1E, // 9
MsgFoo2A = BASE2_VAL, // Uh oh! 7 again...
MsgFoo2B // Uh oh! 8 again...
} FOO;
The problem is that as the code grows & as developers add more messages to the MsgFoo1x group, eventually it overruns BASE2_VAL.
This code will eventually be migrated to C++, so if there is a C++-only solution (template magic?), that's OK -- but a solution that works with C and C++ is better.
There are a couple ways to check this compile time, but they might not always work for you. Start by inserting a "marker" enum value right before MsgFoo2A.
typedef enum
{
MsgFoo1A = BASE1_VAL,
MsgFoo1B,
MsgFoo1C,
MsgFoo1D,
MsgFoo1E,
MARKER_1_DONT_USE, /* Don't use this value, but leave it here. */
MsgFoo2A = BASE2_VAL,
MsgFoo2B
} FOO;
Now we need a way to ensure that MARKER_1_DONT_USE < BASE2_VAL at compile-time. There are two common techiques.
Negative size arrays
It is an error to declare an array with negative size. This looks a little ugly, but it works.
extern int IGNORE_ENUM_CHECK[MARKER_1_DONT_USE > BASE2_VAL ? -1 : 1];
Almost every compiler ever written will generate an error if MARKER_1_DONT_USE is greater than BASE_2_VAL. GCC spits out:
test.c:16: error: size of array ‘IGNORE_ENUM_CHECK’ is negative
Static assertions
If your compiler supports C11, you can use _Static_assert. Support for C11 is not ubiquitous, but your compiler may support _Static_assert anyway, especially since the corresponding feature in C++ is widely supported.
_Static_assert(MARKER_1_DONT_USE < BASE2_VAL, "Enum values overlap.");
GCC spits out the following message:
test.c:16:1: error: static assertion failed: "Enum values overlap."
_Static_assert(MARKER_1_DONT_USE < BASE2_VAL, "Enum values overlap.");
^
I didn't see "pretty" in your requirements, so I submit this solution implemented using the Boost Preprocessor library.
As an up-front disclaimer, I haven't used Boost.Preprocessor a whole lot and I've only tested this with the test cases presented here, so there could be bugs, and there may be an easier, cleaner way to do this. I certainly welcome comments, corrections, suggestions, insults, etc.
Here we go:
#include <boost/preprocessor.hpp>
#define EXPAND_ENUM_VALUE(r, data, i, elem) \
BOOST_PP_SEQ_ELEM(0, elem) \
BOOST_PP_IIF( \
BOOST_PP_EQUAL(BOOST_PP_SEQ_SIZE(elem), 2), \
= BOOST_PP_SEQ_ELEM(1, elem), \
BOOST_PP_EMPTY()) \
BOOST_PP_COMMA_IF(BOOST_PP_NOT_EQUAL(data, BOOST_PP_ADD(i, 1)))
#define ADD_CASE_FOR_ENUM_VALUE(r, data, elem) \
case BOOST_PP_SEQ_ELEM(0, elem) : break;
#define DEFINE_UNIQUE_ENUM(name, values) \
enum name \
{ \
BOOST_PP_SEQ_FOR_EACH_I(EXPAND_ENUM_VALUE, \
BOOST_PP_SEQ_SIZE(values), values) \
}; \
\
namespace detail \
{ \
void UniqueEnumSanityCheck##name() \
{ \
switch (name()) \
{ \
BOOST_PP_SEQ_FOR_EACH(ADD_CASE_FOR_ENUM_VALUE, name, values) \
} \
} \
}
We can then use it like so:
DEFINE_UNIQUE_ENUM(DayOfWeek, ((Monday) (1))
((Tuesday) (2))
((Wednesday) )
((Thursday) (4)))
The enumerator value is optional; this code generates an enumeration equivalent to:
enum DayOfWeek
{
Monday = 1,
Tuesday = 2,
Wednesday,
Thursday = 4
};
It also generates a sanity-check function that contains a switch statement as described in Ben Voigt's answer. If we change the enumeration declaration such that we have non-unique enumerator values, e.g.,
DEFINE_UNIQUE_ENUM(DayOfWeek, ((Monday) (1))
((Tuesday) (2))
((Wednesday) )
((Thursday) (1)))
it will not compile (Visual C++ reports the expected error C2196: case value '1' already used).
Thanks also to Matthieu M., whose answer to another question got me interested in the Boost Preprocessor library.
I don't believe there's a way to detect this with the language itself, considering there are conceivable cases where you'd want two enumeration values to be the same. You can, however, always ensure all explicitly set items are at the top of the list:
typedef enum
{
MsgFoo1A = BASE1_VAL, // 5
MsgFoo2A = BASE2_VAL, // 7
MsgFoo1B, // 8
MsgFoo1C, // 9
MsgFoo1D, // 10
MsgFoo1E, // 11
MsgFoo2B // 12
} FOO;
So long as assigned values are at the top, no collision is possible, unless for some reason the macros expand to values which are the same.
Usually this problem is overcome by giving a fixed number of bits for each MsgFooX group, and ensuring each group does not overflow it's allotted number of bits. The "Number of bits" solution is nice because it allows a bitwise test to determine to which message group something belongs. But there's no built-in language feature to do this because there are legitimate cases for an enum having two of the same value:
typedef enum
{
gray = 4, //Gr[ae]y should be the same
grey = 4,
color = 5, //Also makes sense in some cases
couleur = 5
} FOO;
I don't know of anything that will automatically check all enum members, but if you want to check that future changes to the initializers (or the macros they rely on) don't cause collisions:
switch (0) {
case MsgFoo1A: break;
case MsgFoo1B: break;
case MsgFoo1C: break;
case MsgFoo1D: break;
case MsgFoo1E: break;
case MsgFoo2A: break;
case MsgFoo2B: break;
}
will cause a compiler error if any of the integral values is reused, and most compilers will even tell you what value (the numeric value) was a problem.
You could roll a more robust solution of defining enums using Boost.Preprocessor - wether its worth the time is a different matter.
If you are moving to C++ anyway, maybe the (proposed) Boost.Enum suits you (available via the Boost Vault).
Another approach might be to use something like gccxml (or more comfortably pygccxml) to identify candidates for manual inspection.
While we do not have full on reflection, you can solve this problem if you can relist the enumeration values.
Somewhere this is declared:
enum E { A = 0, B = 0 };
elsewhere, we build this machinery:
template<typename S, S s0, S... s>
struct first_not_same_as_rest : std::true_type {};
template<typename S, S s0, S s1, S... s>
struct first_not_same_as_rest : std::integral_constant< bool,
(s0 != s1) && first_not_same_as_rest< S, s0, s... >::value
> {};
template<typename S, S... s>
struct is_distinct : std::true_type {};
template<typename S, S s0, S... s>
struct is_distinct : std::integral_constant< bool,
std::is_distinct<S, s...>::value &&
first_not_same_as_rest< S, s0, s... >::value
> {};
Once you have that machinery (which requires C++11), we can do the following:
static_assert( is_distinct< E, A, B >::value, "duplicate values in E detected" );
and at compile time we will ensure that no two elements are equal.
This requires O(n) recursion depth and O(n^2) work by the compiler at compile time, so for extremely large enums this could cause problems. A O(lg(n)) depth and O(n lg(n)) work with a much larger constant factor can be done by sorting the list of elements first, but that is much, much more work.
With the enum reflection code proposed for C++1y-C++17, this will be doable without relisting the elements.
I didn't completely like any of the answers already posted here, but they gave me some ideas. The crucial technique is to rely on Ben Voight's answer of using a switch statement. If multiple cases in a switch share the same number, you'll get a compile error.
Most usefully to both myself and probably the original poster, this doesn't require any C++ features.
To clean things up, I used aaronps's answer at How can I avoid repeating myself when creating a C++ enum and a dependent data structure?
First, define this in some header someplace:
#define DEFINE_ENUM_VALUE(name, value) name=value,
#define CHECK_ENUM_VALUE(name, value) case name:
#define DEFINE_ENUM(enum_name, enum_values) \
typedef enum { enum_values(DEFINE_ENUM_VALUE) } enum_name;
#define CHECK_ENUM(enum_name, enum_values) \
void enum_name ## _test (void) { switch(0) { enum_values(CHECK_ENUM_VALUE); } }
Now, whenever you need to have an enumeration:
#define COLOR_VALUES(GEN) \
GEN(Red, 1) \
GEN(Green, 2) \
GEN(Blue, 2)
Finally, these lines are required to actually make the enumeration:
DEFINE_ENUM(Color, COLOR_VALUES)
CHECK_ENUM(Color, COLOR_VALUES)
DEFINE_ENUM makes the enum data type itself. CHECK_ENUM makes a test function that switches on all the enum values. The compiler will crash when compiling CHECK_ENUM if you have duplicates.
Here's a solution using X macro without Boost. First define the X macro and its helper macros. I'm using this solution to portably make 2 overloads for the X macro so that you can define the enum with or without an explicit value. If you're using GCC or Clang then it can be made shorter
#define COUNT_X_ARGS_IMPL2(_1, _2, count, ...) count
#define COUNT_X_ARGS_IMPL(args) COUNT_X_ARGS_IMPL2 args
#define COUNT_X_ARGS(...) COUNT_X_ARGS_IMPL((__VA_ARGS__, 2, 1, 0))
/* Pick the right X macro to invoke. */
#define X_CHOOSE_HELPER2(count) X##count
#define X_CHOOSE_HELPER1(count) X_CHOOSE_HELPER2(count)
#define X_CHOOSE_HELPER(count) X_CHOOSE_HELPER1(count)
/* The actual macro. */
#define X_GLUE(x, y) x y
#define X(...) X_GLUE(X_CHOOSE_HELPER(COUNT_X_ARGS(__VA_ARGS__)), (__VA_ARGS__))
Then define the macro and check it
#define BASE1_VAL (5)
#define BASE2_VAL (7)
// Enum values
#define MY_ENUM \
X(MsgFoo1A, BASE1_VAL) \
X(MsgFoo1B) \
X(MsgFoo1C) \
X(MsgFoo1D) \
X(MsgFoo1E) \
X(MsgFoo2A, BASE2_VAL) \
X(MsgFoo2B)
// Define the enum
#define X1(enum_name) enum_name,
#define X2(enum_name, enum_value) enum_name = enum_value,
enum foo
{
MY_ENUM
};
#undef X1
#undef X2
// Check duplicates
#define X1(enum_name) case enum_name: break;
#define X2(enum_name, enum_value) case enum_name: break;
static void check_enum_duplicate()
{
switch(0)
{
MY_ENUM
}
}
#undef X1
#undef X2
Use it
int main()
{
// Do something with the whole enum
#define X1(enum_name) printf("%s = %d\n", #enum_name, enum_name);
#define X2(enum_name, enum_value) printf("%s = %d\n", #enum_name, enum_value);
// Print the whole enum
MY_ENUM
#undef X1
#undef X2
}

C: fastest way to evaluate a function on a finite set of small integer values by using a lookup table?

I am currently working on a project where I would like to optimize some numerical computation in Python by calling C.
In short, I need to compute the value of y[i] = f(x[i]) for each element in an huge array x (typically has 10^9 entries or more). Here, x[i] is an integer between -10 and 10 and f is function that takes x[i] and returns a double. My issue is that f but it takes a very long time to evaluate in a way that is numerically stable.
To speed things up, I would like to just hard code all 2*10 + 1 possible values of f(x[i]) into constant array such as:
double table_of_values[] = {f(-10), ...., f(10)};
And then just evaluate f using a "lookup table" approach as follows:
for (i = 0; i < N; i++) {
y[i] = table_of_values[x[i] + 11]; //instead of y[i] = f(x[i])
}
Since I am not really well-versed at writing optimized code in C, I am wondering:
Specifically - since x is really large - I'm wondering if it's worth doing second-degree optimization when evaluating the loop (e.g. by sorting x beforehand, or by finding a smart way to deal with the negative indices (aside from just doing [x[i] + 10 + 1])?
Say x[i] were not between -10 and 10, but between -20 and 20. In this case, I could still use the same approach, but would need to hard code the lookup table manually. Is there a way to generate the look-up table dynamically in the code so that I make use of the same approach and allow for x[i] to belong to a variable range?
It's fairly easy to generate such a table with dynamic range values.
Here's a simple, single table method:
#include <malloc.h>
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
double *table_of_values;
int table_bias;
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 0
typedef short xval_t;
#endif
#if 1
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
// fslow -- your original function
double
fslow(int i)
{
return 1; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi)
{
int len;
table_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free(table_of_values) when no longer needed
table_of_values = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
table_of_values[i + table_bias] = f(i);
}
// fcached -- retrieve cached table data
double
fcached(int i)
{
return table_of_values[i + table_bias];
}
// fripper -- access x and table arrays
void
fripper(xval_t *x)
{
double *tptr;
int bias;
double val;
// ensure these go into registers to prevent needless extra memory fetches
tptr = table_of_values;
bias = table_bias;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
VARIABLE_USED(val);
}
}
int
main(void)
{
ftablegen(fslow,-10,10);
x = malloc(sizeof(xval_t) * XLEN);
fripper(x);
return 0;
}
Here's a slightly more complex way that allows many similar tables to be generated:
#include <malloc.h>
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 1
typedef short xval_t;
#endif
#if 0
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
struct table {
int tbl_lo; // lowest index
int tbl_hi; // highest index
int tbl_bias; // bias for index
double *tbl_data; // cached data
};
struct table ftable1;
struct table ftable2;
double
fslow(int i)
{
return 1; // whatever
}
double
f2(int i)
{
return 2; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi,struct table *tbl)
{
int len;
tbl->tbl_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free tbl_data when no longer needed
tbl->tbl_data = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
tbl->tbl_data[i + tbl->tbl_bias] = fslow(i);
}
// fcached -- retrieve cached table data
double
fcached(struct table *tbl,int i)
{
return tbl->tbl_data[i + tbl->tbl_bias];
}
// fripper -- access x and table arrays
void
fripper(xval_t *x,struct table *tbl)
{
double *tptr;
int bias;
double val;
// ensure these go into registers to prevent needless extra memory fetches
tptr = tbl->tbl_data;
bias = tbl->tbl_bias;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
VARIABLE_USED(val);
}
}
int
main(void)
{
x = malloc(sizeof(xval_t) * XLEN);
// NOTE: we could use 'char' for xval_t ...
ftablegen(fslow,-37,62,&ftable1);
fripper(x,&ftable1);
// ... but, this forces us to use a 'short' for xval_t
ftablegen(f2,-99,307,&ftable2);
return 0;
}
Notes:
fcached could/should be an inline function for speed. Notice that once the table is calculated once, fcached(x[i]) is quite fast. The index offset issue you mentioned [solved by the "bias"] is trivially small in calculation time.
While x may be a large array, the cached array for f() values is fairly small (e.g. -10 to 10). Even if it were (e.g.) -100 to 100, this is still about 200 elements. This small cached array will [probably] stay in the hardware memory cache, so access will remain quite fast.
Thus, sorting x to optimize H/W cache performance of the lookup table will have little to no [measurable] effect.
The access pattern to x is independent. You'll get best performance if you access x in a linear manner (e.g. for (i = 0; i < 999999999; ++i) x[i]). If you access it in a semi-random fashion, it will put a strain on the H/W cache logic and its ability to keep the needed/wanted x values "cache hot"
Even with linear access, because x is so large, by the time you get to the end, the first elements will have been evicted from the H/W cache (e.g. most CPU caches are on the order of a few megabytes)
However, if x only has values in a limited range, changing the type from int x[...] to short x[...] or even char x[...] cuts the size by a factor of 2x [or 4x]. And, that can have a measurable improvement on the performance.
Update: I've added an fripper function to show the fastest way [that I know of] to access the table and x arrays in a loop. I've also added a typedef named xval_t to allow the x array to consume less space (i.e. will have better H/W cache performance).
UPDATE #2:
Per your comments ...
fcached was coded [mostly] to illustrate simple/single access. But, it was not used in the final example.
The exact requirements for inline has varied over the years (e.g. was extern inline). Best use now: static inline. However, if using c++, it may be, yet again different. There are entire pages devoted to this. The reason is because of compilation in different .c files, what happens when optimization is on or off. Also, consider using a gcc extension. So, to force inline all the time:
__attribute__((__always_inline__)) static inline
fripper is the fastest because it avoids refetching globals table_of_values and table_bias on each loop iteration. In fripper, compiler optimizer will ensure they remain in registers. See my answer: Is accessing statically or dynamically allocated memory faster? as to why.
However, I coded an fripper variant that uses fcached and the disassembled code was the same [and optimal]. So, we can disregard that ... Or, can we? Sometimes, disassembling the code is a good cross check and the only way to know for sure. Just an extra item when creating fully optimized C code. There are many options one can give to the compiler regarding code generation, so sometimes it's just trial and error.
Because benchmarking is important, I threw in my routines for timestamping (FYI, [AFAIK] the underlying clock_gettime call is the basis for python's time.clock()).
So, here's the updated version:
#include <malloc.h>
#include <time.h>
typedef long long s64;
#define SUPER_INLINE \
__attribute__((__always_inline__)) static inline
#define VARIABLE_USED(_sym) \
do { \
if (1) \
break; \
if (!! _sym) \
break; \
} while (0)
#define TVSEC 1000000000LL // nanoseconds in a second
#define TVSECF 1e9 // nanoseconds in a second
// tvget -- get high resolution time of day
// RETURNS: absolute nanoseconds
s64
tvget(void)
{
struct timespec ts;
s64 nsec;
clock_gettime(CLOCK_REALTIME,&ts);
nsec = ts.tv_sec;
nsec *= TVSEC;
nsec += ts.tv_nsec;
return nsec;
)
// tvgetf -- get high resolution time of day
// RETURNS: fractional seconds
double
tvgetf(void)
{
struct timespec ts;
double sec;
clock_gettime(CLOCK_REALTIME,&ts);
sec = ts.tv_nsec;
sec /= TVSECF;
sec += ts.tv_sec;
return sec;
)
double *table_of_values;
int table_bias;
double *dummyptr;
// use the smallest of these that can contain the values the x array may have
#if 0
typedef int xval_t;
#endif
#if 0
typedef short xval_t;
#endif
#if 1
typedef char xval_t;
#endif
#define XLEN (1 << 9)
xval_t *x;
// fslow -- your original function
double
fslow(int i)
{
return 1; // whatever
}
// ftablegen -- generate variable table
void
ftablegen(double (*f)(int),int lo,int hi)
{
int len;
table_bias = -lo;
len = hi - lo;
len += 1;
// NOTE: you can do free(table_of_values) when no longer needed
table_of_values = malloc(sizeof(double) * len);
for (int i = lo; i <= hi; ++i)
table_of_values[i + table_bias] = f(i);
}
// fcached -- retrieve cached table data
SUPER_INLINE double
fcached(int i)
{
return table_of_values[i + table_bias];
}
// fripper_fcached -- access x and table arrays
void
fripper_fcached(xval_t *x)
{
double val;
double *dptr;
dptr = dummyptr;
for (int i = 0; i < XLEN; ++i) {
val = fcached(x[i]);
// do stuff with val
dptr[i] = val;
}
}
// fripper -- access x and table arrays
void
fripper(xval_t *x)
{
double *tptr;
int bias;
double val;
double *dptr;
// ensure these go into registers to prevent needless extra memory fetches
tptr = table_of_values;
bias = table_bias;
dptr = dummyptr;
for (int i = 0; i < XLEN; ++i) {
val = tptr[x[i] + bias];
// do stuff with val
dptr[i] = val;
}
}
int
main(void)
{
ftablegen(fslow,-10,10);
x = malloc(sizeof(xval_t) * XLEN);
dummyptr = malloc(sizeof(double) * XLEN);
fripper(x);
fripper_fcached(x);
return 0;
}
You can have negative indices in your arrays. (I am not sure if this is in the specifications.) If you have the following code:
int arr[] = {1, 2 ,3, 4, 5};
int* lookupTable = arr + 3;
printf("%i", lookupTable[-2]);
it will print out 2.
This works because arrays in c are defined as pointers. And if the pointer does not point to the begin of the array, you can access the item before the pointer.
Keep in mind though that if you have to malloc() the memory for arr you probably cannot use free(lookupTable) to free it.
I really think Craig Estey is on the right track for building your table in an automatic way. I just want to add a note for looking up the table.
If you know that you will run the code on a Haswell machine (with AVX2) you should make sure your code utilise VGATHERDPD which you can utilize with the _mm256_i32gather_pd intrinsic. If you do that, your table lookups will fly! (You can even detect avx2 on the fly with cpuid(), but that's another story)
EDIT:
Let me elaborate with some code:
#include <stdint.h>
#include <stdio.h>
#include <immintrin.h>
/* I'm not sure if you need the alignment */
double table[8] __attribute__((aligned(16)))= { 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 };
int main()
{
int32_t i[4] = { 0,2,4,6 };
__m128i index = _mm_load_si128( (__m128i*) i );
__m256d result = _mm256_i32gather_pd( table, index, 8 );
double* f = (double*)&result;
printf("%f %f %f %f\n", f[0], f[1], f[2], f[3]);
return 0;
}
Compile and run:
$ gcc --std=gnu99 -mavx2 gathertest.c -o gathertest && ./gathertest
0.100000 0.300000 0.500000 0.700000
This is fast!

Macro for run-once conditioning

I am trying to build a macro that runs a code only once.
Very useful for example if you loop a code and want something inside to happen only once. The easy to use method:
static int checksum;
for( ; ; )
{
if(checksum == 0) { checksum == 1; // ... }
}
But it is a bit wasteful and confusing. So I have this macros that use checking bits instead of checking true/false state of a variable:
#define CHECKSUM(d) static d checksum_boolean
#define CHECKSUM_IF(x) if( ~(checksum_boolean >> x) & 1) \
{ \
checksum_boolean |= 1 << x;
#define CHECKSUM_END }1
The 1 at the end is to force the user to put semi-colon at the end. In my compiler this is allowed.
The problem is figuring out how to do this without having the user to specify x (n bit to be checked).
So he can use this:
CHECKSUM(char); // 7 run-once codes can be used
for( ; ; )
{
CHECKSUM_IF
// code..
CHECKSUM_END;
}
Ideas how can I achieve this?
I guess you're saying you want the macro to somehow automatically track which bit of your bitmask contains the flag for the current test. You could do it like this:
#define CHECKSUM(d) static d checksum_boolean; \
d checksum_mask
#define CHECKSUM_START do { checksum_mask = 1; } while (0)
#define CHECKSUM_IF do { \
if (!(checksum_boolean & checksum_mask)) { \
checksum_boolean |= checksum_mask;
#define CHECKSUM_END \
} \
checksum_mask <<= 1; \
} while (0)
#define CHECKSUM_RESET(i) do { checksum_boolean &= ~((uintmax_t) 1 << (i)); } while (0)
Which you might use like this:
CHECKSUM(char); // 7 run-once codes can be used
for( ; ; )
{
CHECKSUM_START;
CHECKSUM_IF
// code..
CHECKSUM_END;
CHECKSUM_IF
// other code..
CHECKSUM_END;
}
Note, however, that that has severe limitations:
The CHECKSUM_START macro and all the corresponding CHECKSUM_IF macros must all appear in the same scope
Control must always pass through CHECKSUM_START before any of the CHECKSUM_IF blocks
Control must always reach the CHECKSUM_IF blocks in the same order. It may only skip a CHECKSUM_IF block if it also skips all subsequent ones that use the same checksum bitmask.
Those constraints arise because the preprocessor cannot count.
To put it another way, barring macro redefinitions, a macro without any arguments always expands to exactly the same text. Therefore, if you don't use a macro argument to indicate which flag bit applies in each case then that needs to be tracked at run time.

Best way to define offsets via C preprocessor

I would like to define a macro that will help me to auto generate offsets. Something like this:
#define MEM_OFFSET(name, size) ...
MEM_OFFSET(param1, 1);
MEM_OFFSET(param2, 2);
MEM_OFFSET(param3, 4);
MEM_OFFSET(param4, 1);
should generate the following code:
const int param1_offset = 0;
const int param2_offset = 1;
const int param3_offset = 3;
const int param4_offset = 7;
or
enum {
param1_offset = 0,
param2_offset = 1,
param3_offset = 3,
param4_offset = 7,
}
or even (not possible using C-preprocessor only for sure, but who knows ;)
#define param1_offset 0
#define param2_offset 1
#define param3_offset 3
#define param4_offset 7
Is it possible to do without running external awk/bash/... scripts?
I'm using Keil C51
It seems I've found a solution with enum:
#define MEM_OFFSET(name, size) \
name ## _offset, \
___tmp__ ## name = name ## _offset + size - 1, // allocate right bound offset and introduce a gap to force compiler to use next available offset
enum {
MEM_OFFSET(param1, 1)
MEM_OFFSET(param2, 2)
MEM_OFFSET(param3, 4)
MEM_OFFSET(param4, 1)
};
In the comments to your post you mention that you're managing an EEPROM memory map, so this answer relates to managing memory offsets rather than answering your specific question.
One way to manage EEPROM memory is with the use of a packed struct. ie, one where there is no space between each of the elements. The struct is never instantiated, it is only used for offset calculations.
typedef struct {
uint8_t param1;
#ifdef FEATURE_ENABLED
uint16_t param2;
#endif
uint8_t param3;
} __packed eeprom_memory_layout_t;
You could then use code like the following to determine the offset of each element as needed(untested). This uses the offsetof stddef macro.
uint16_t read_param3(void) {
uint8_t buf;
eeprom_memory_layout_t * ee;
/* eeprom_read(offset, size, buf) */
eeprom_read(offsetof(eeprom_memory_layout_t, param3), sizeof(ee->param3), &buf);
return buf;
}
Note that the struct is never instantiated. Using a struct like this makes it easy to see your memory map at a glance, and macros can easily be used to abstract away the calls to offsetof and sizeof during access.
If you want to create several structures based on some preprocessor declarations, you could do something like:
#define OFFSET_FOREACH(MODIFIER) \
MODIFIER(1) \
MODIFIER(2) \
MODIFIER(3) \
MODIFIER(4)
#define OFFSET_MODIFIER_ENUM(NUM) param##NUM##_offset,
enum
{
OFFSET_FOREACH(OFFSET_MODIFIER_ENUM)
};
The preprocessor would then produce the following code:
enum
{
param1_offset,
param2_offset,
param3_offset,
param4_offset,
}
I'm sure somebody will figure a nice preprocessor trick to compute the offset values with the sum of its predecessors :)
If you are doing this in C code, you have to keep in mind that const int declarations do not declare constants in C. To declare a named constant you have to use either enum or #define.
If you need int constants specifically, then enum will work well, although I the auto-generation part might be tricky in any case. Off the top of my head I can only come up with something as ugly as
#define MEM_OFFSET_BEGIN(name, size)\
enum {\
name##_OFFSET = 0,\
name##_SIZE__ = size,
#define MEM_OFFSET(name, size, prev_name)\
name##_OFFSET = prev_name##_OFFSET + prev_name##_SIZE__,\
name##_SIZE__ = size,
#define MEM_OFFSET_END()\
};
and then
MEM_OFFSET_BEGIN(param1, 1)
MEM_OFFSET(param2, 2, param1)
MEM_OFFSET(param3, 4, param2)
MEM_OFFSET(param4, 1, param3)
MEM_OFFSET_END()
Needless to say, the fact that it requires the next offset declaration to refer to the previous offset declaration by name defeats most of the purpose of this construct.
Try something like:
#define OFFSET(x) offsetof(struct {\
char param1[1], param2[2], param3[4], param4[1];\
},x)
Then you can use OFFSET(param1), etc. and it's even an integer constant expression.

Resources