How to catch bugs of the form sizeof(#define) - c

I'm sure there are sometimes good reasons for taking the sizeof() a #define in C, but I occasionally come across bugs where someone has taken the sizeof() a #define instead of the sizeof() a structure (and in my codebase I don't need to take the sizeof() a #define).
For example (contrived, but hopefully illustrates the point):
typedef struct my_struct
{
fields
} MY_STRUCT;
#define MY_DEFINE 1234
my_size = sizeof(MY_DEFINE); // Should be sizeof(MY_STRUCT)
Is there any easy, automated way to catch this?
Thanks for any help.
NickB

Well, no. Macros are macros. If the result of macro substitution is a valid expression (or type), then the code will compile. The compiler does not know what you want to do.
One thing that might help you (or not), is that in this specific example you want to sizeof a type as opposed to sizeof of an expression. If your coding standard insisted on always doing it through struct tag, as in
sizeof(struct my_struct)
then accidental mistakes like the specific one in your example would be less likely. Although other mistakes would not be.
You could probably replace your size of with a macro that somehow requires a type (and use it everywhere in place of ordinary sizeof). For example something like this
#define SIZE_OF_TYPE(T) ((T *) 0, sizeof(T))
would fail to compile with non-type argument. But it also will fail to compile with some type arguments.
Actually I don't know your context, but in general the whole idea seems counterproductive. A good programming practice is actually to avoid applying sizeof to types, preferring to apply it to expressions instead, as in
int *p = malloc(n * sizeof *p); /* good */
instead of
int *p = malloc(n * sizeof(int)); /* bad */
And you seem to want to move in the opposite direction.

Why are you using ALL CAPS in your typedef'd name? A typedef is a C language construct, as opposed to a C preprocessor construct. It's an identifier, just like any other identifier.
If you only use all caps for MACROs, it will be pretty obvious when you're using them.

Related

C function that returns a pointer to an array correct syntax?

In C you can declare a variable that points to an array like this:
int int_arr[4] = {1,2,3,4};
int (*ptr_to_arr)[4] = &int_arr;
Although practically it is the same as just declaring a pointer to int:
int *ptr_to_arr2 = int_arr;
But syntactically it is something different.
Now, how would a function look like, that returns such a pointer to an array (of int e.g.) ?
A declaration of int is int foo;.
A declaration of an array of 4 int is int foo[4];.
A declaration of a pointer to an array of 4 int is int (*foo)[4];.
A declaration of a function returning a pointer to an array of 4 int is int (*foo())[4];. The () may be filled in with parameter declarations.
As already mentioned, the correct syntax is int (*foo(void))[4]; And as you can tell, it is very hard to read.
Questionable solutions:
Use the syntax as C would have you write it. This is in my opinion something you should avoid, since it's incredibly hard to read, to the point where it is completely useless. This should simply be outlawed in your coding standard, just like any sensible coding standard enforces function pointers to be used with a typedef.
Oh so we just typedef this just like when using function pointers? One might get tempted to hide all this goo behind a typedef indeed, but that's problematic as well. And this is since both arrays and pointers are fundamental "building blocks" in C, with a specific syntax that the programmer expects to see whenever dealing with them. And the absensce of that syntax suggests an object that can be addressed, "lvalue accessed" and copied like any other variable. Hiding them behind typedef might in the end create even more confusion than the original syntax.
Take this example:
typedef int(*arr)[4];
...
arr a = create(); // calls malloc etc
...
// somewhere later, lets make a hard copy! (or so we thought)
arr b = a;
...
cleanup(a);
...
print(b); // mysterious crash here
So this "hide behind typedef" system heavily relies on us naming types somethingptr to indicate that it is a pointer. Or lets say... LPWORD... and there it is, "Hungarian notation", the heavily criticized type system of the Windows API.
A slightly more sensible work-around is to return the array through one of the parameters. This isn't exactly pretty either, but at least somewhat easier to read since the strange syntax is centralized to one parameter:
void foo (int(**result)[4])
{
...
*result = &arr;
}
That is: a pointer to a pointer-to-array of int[4].
If one is prepared to throw type safety out the window, then of course void* foo (void) solves all of these problems... but creates new ones. Very easy to read, but now the problem is type safety and uncertainty regarding what the function actually returns. Not good either.
So what to do then, if these versions are all problematic? There are a few perfectly sensible approaches.
Good solutions:
Leave allocation to the caller. This is by far the best method, if you have the option. Your function would become void foo (int arr[4]); which is readable and type safe both.
Old school C. Just return a pointer to the first item in the array and pass the size along separately. This may or may not be acceptable from case to case.
Wrap it in a struct. For example this could be a sensible implementation of some generic array type:
typedef struct
{
size_t size;
int arr[];
} array_t;
array_t* alloc (size_t items)
{
array_t* result = malloc(sizeof *result + sizeof(int[items]));
return result;
}
The typedef keyword can make things a lot clearer/simpler in this case:
int int_arr[4] = { 1,2,3,4 };
typedef int(*arrptr)[4]; // Define a pointer to an array of 4 ints ...
arrptr func(void) // ... and use that for the function return type
{
return &int_arr;
}
Note: As pointed out in the comments and in Lundin's excellent answer, using a typedef to hide/bury a pointer is a practice that is frowned-upon by (most of) the professional C programming community – and for very good reasons. There is a good discussion about it here.
However, although, in your case, you aren't defining an actual function pointer (which is an exception to the 'rule' that most programmers will accept), you are defining a complicated (i.e. difficult to read) function return type. The discussion at the end of the linked post delves into the "too complicated" issue, which is what I would use to justify use of a typedef in a case like yours. But, if you should choose this road, then do so with caution.

Create a min() macro for any type of array

I would like to create a C macro returning the scalar minimum for any type of static array in input. For example:
float A[100];
int B[10][10];
// [...]
float minA = MACRO_MIN(A);
int minB = MACRO_MIN(B);
How can I do so?
It can be probably be done with GCC extensions, but not in standard C. Other compilers might have suitable extensions, too. It will of course make the code fantastically hard to port. I would advise against it, since it's quite hard to achieve it will be "unexpected" and probably act as a source of confusion (or, worse, bugs) down the line.
You're going to have to declare a temporary variable to hold the max/min seen "so far" when iterating over the array, and the type of that variable is hard to formulate without extensions.
Also returning the value of the temporary is hard, but possible with GCC extensions.
To make the above more concrete, here's a sketch of what I imagine. I did not test-compile this, so it's very likely to have errors in it:
#define ARRAY_MAX(a) ({ typeof(a) tmp = a[0];\
for(size_t i = 1; i < sizeof a / sizeof tmp; ++i)\
{\
if(a[i] > tmp)\
tmp = a[i];\
}\
tmp;\
})
The above uses:
({ and }) is the GCC Statement Expressions extension, allowing the macro to have a local variable which is used as the "return value".
typeof is used to compute the proper type.
Note assumption that the array is not of zero size. This should not be a very limiting assumption.
The use of sizeof is of course standard.
As I wrote the above, I realize there might be issues with multi-dimensional arrays that I hadn't realized until trying. I'm not going to polish it further, though. Note that it starts out with "probably".

What's the benefit of encapsulating only one basic field into a struct in C?

I saw some C code like this:
// A:
typedef uint32_t in_addr_t;
struct in_addr { in_addr_t s_addr; };
And I always prefer like this:
// B:
typedef uint32_t in_addr;
So my question is: what's the difference / benefit of doing it in A from B?
It's a layer to introduce type safety, and it can be helpful 'for future expansion'.
One problem with the former is that it's easy to 'convert' a value of a type represented by a typedefed builtin to any of several other types or typedefed builtins.
consider:
typedef int t_millisecond;
typedef int t_second;
typedef int t_degrees;
versus:
// field notation could vary greatly here:
struct t_millisecond { int ms; };
struct t_second { int s; };
struct t_degrees { int f; };
In some cases, it makes it a little clearer to use a notation, and the compiler will also forbid erroneous conversions. Consider:
int a = millsecond * second - degree;
this is a suspicious program. using typedefed ints, that's a valid program. Using structs, it's ill-formed -- compiler errors will require your corrections, and you can make your intent explicit.
Using typedefs, arbitrary arithmetic and conversions may be applied, and they may be assigned to each other without warning, which can can become a burden to maintain.
Consider also:
t_second s = millisecond;
that would also be a fatal conversion.
It's just another tool in the toolbox -- use at your discretion.
Justin's answer is essentially correct, but I think some expansion is needed:
EDIT: Justin expanded his answer significantly, which makes this one somewhat redundant.
Type safety - you want to provide your users with API functions which manipulate the data, not let it just treat it as an integer. Hiding the field in a structure makes it harder to use it the wrong way, and pushes the user towards the proper API.
For future expansion - perhaps a future implementation would like to change things. Maybe add a field, or break the existing field into 4 chars. With a struct, this can be done without changing APIs.
What's your benefit? That your code won't break if implementation changes.

Which way is better for creating type-agnostic structures in C?

I'm trying to write some generic structures. Essentially, what I need for my purpose is C++ templates, but since I'm writing in C, templates are out of consideration. Currently I'm considering 2 ways of achieving what I want.
Method 1: use the preprocessor. Like so:
#define DEFINE_PAIR(T) typedef struct Pair_##T{ \
T x; \
T y; \
} Pair_##T
DEFINE_PAIR(int);
int main(){
Pair_int p;
return 0;
}
An obvious downside to it is that you have to invoke the macro before using the type. Probably there are more disadvantages, which I hope you will point out.
Method 2: just use void-pointers, like so:
typedef struct Pair{
void* x;
void* y;
} Pair;
Obviously, this approach is not type safe (I could easily pass a pair of strings to a function expecting a pair of doubles), plus the code doing deallocation gets a lot messier with this approach.
I would like to hear your thoughts on this. Which of the two methods is better/worse and why? Is there any other method I could use to write generic structures in C?
Thanks.
If you only plan on using primitive data types, then your original macro-based solution seems nifty enough. However, when you start storing pairs of pointers to opaque data types with complex structures underneath that are meant to be used by passing pointers between functions, such as:
complex_structure_type *object = complex_structure_type_init();
complex_structure_type_set_title(object, "Whatever");
complex_structure_type_free(object);
then you have to
typedef complex_structure_type *complex_structure_type_ptr;
in order to
DEFINE_PAIR(complex_structure_type_ptr);
so you can
Pair_complex_structure_type_ptr p;
and then
p.x = object;
But that's only a little bit more work, so if you feel it works for you, go for it. You might even put together your own preprocessor that goes through the code, pulls out anything like Pair_whatever, and then adds DEFINE_PAIR(whatever) for the C preprocessor. Anyway, it's definitely a neat idea that you've presented here.
Personally, I would just use void pointers and forget about strong type safety. C just doesn't have the same type safety machinery as other languages, and the more opportunities you give yourself to forget something, the more bugs you'll accidentally create.
Good luck!
Noting that templates in c++ provide a language for writing code, you might simple consider doing code generation with some tool more powerful than the c-preprocessor.
Now that does add another step to you build, and makes you build depend on another toll (unless you care to write your own generator in c...), but it may provide the flexibility and type-safety you desire.
This is almost the same, but it's a bit more nimble:
#define PAIR_T(TYPE) \
struct { \
TYPE x; \
TYPE y; \
}
typedef PAIR_T(int) int_pair;
typedef PAIR_T(const char *) string_pair;
int main(void)
{
int_pair p = {1, 1};
string_pair sp = {"a", "b"};
}

Reassemble float from bytes inline

I'm working with HiTech PICC32 on the PIC32MX series of microprocessors, but I think this question is general enough for anyone knowledgable in C. (This is almost equivalent to C90, with sizeof(int) = sizeof(long) = sizeof(float) = 4.)
Let's say I read a 4-byte word of data that represents a float. I can quickly convert it to its actual float value with:
#define FLOAT_FROM_WORD(WORD_VALUE) (*((float*) &(WORD_VALUE)))
But this only works for lvalues. I can't, for example, use this on a function return value like:
FLOAT_FROM_WORD(eeprom_read_word(addr));
Is there a short and sweet way to do this inline, i.e. without a function call or temp variable? To be honest, there's no HUGE reason for me to avoid a function call or extra var, but it's bugging me. There must be a way I'm missing.
Added: I didn't realise that WORD was actually a common typedef. I've changed the name of the macro argument to avoid confusion.
You can run the trick the other way for return values
float fl;
*(int*)&fl = eeprom_read_word(addr);
or
#define WORD_TO_FLOAT(f) (*(int*)&(f))
WORD_TO_FLOAT(fl) = eeprom_read_word(addr);
or as R Samuel Klatchko suggests
#define ASTYPE(type, val) (*(type*)&(val))
ASTYPE(WORD,fl) = eeprom_read_word(addr);
If this were GCC, you could do this:
#define atob(original, newtype) \
(((union { typeof(original) i; newtype j })(original)).k)
Wow. Hideous. But the usage is nice:
int i = 0xdeadbeef;
float f = atob(i, float);
I bet your compiler doesn't support either the typeof operator nor the union casting that GCC does, since neither are standard behavior, but in the off-chance that your compiler can do union casting, that is your answer. Modified not to use typeof:
#define atob(original, origtype newtype) \
(((union { origtype i; newtype j })(original)).k)
int i = 0xdeadbeef;
float f = atob(i, int, float);
Of course, this ignores the issue of what happens when you use two types of different sizes, but is closer to "what you want," i.e. a simple macro filter that returns a value, instead of taking an extra parameter. The extra parameters this version takes are just for generality.
If your compiler doesn't support union casting, which is a neat but non-portable trick, then there is no way to do this the "way you want it," and the other answers have already got it.
you can take the address of a temporary value if you use a const reference:
FLOAT_FROM_WORD(w) (*(float*)&(const WORD &)(w))
but that won't work in c :(
(c doesn't have references right? works in visual c++)
as others have said, be it an inlined function or a temp in a define, the compiler will optimize it out.
Not really an answer, more a suggestion. Your FLOAT_FROM_WORD macro will be more natural to use and more flexible if it doesn't have a ; at the end
#define FLOAT_FROM_WORD(w) (*(float*)&(w))
fl = FLOAT_FROM_WORD(wd);
It may not be possible in your exact situation, but upgrading to a C99 compiler would solve your problem too.
C99 has inline functions which, while acting like normal functions in parameters and return values, get improved efficiency in exactly this case with none of the drawbacks of macros.

Resources