Can I force GCC to throw a warning or error at compile-time if the number of elements in a certain explicitly-initialized C array is not equal to a certain value?
Consider the following simple C program:
#include <stdio.h>
enum my_enum {
MY_ENUM_FIRST,
MY_ENUM_SECOND,
MY_ENUM_THIRD,
MY_ENUM_COUNT
};
// indexable by my_enum
const char *my_enum_names[] = {
"first",
"second",
"third",
};
int main(void) {
int i;
for (i = 0; i < MY_ENUM_COUNT; i++)
{
printf("%s\n", my_enum_names[i]);
}
}
Unless they are directly adjacent in the code, a developer might not realize that the enum and array must be kept "synchronized" with each other. A developer may add an entry to the enum but not to the array (or vice-versa) and therefore expose an out-of-bounds vulnerability.
Can I add some sort of pragma or attribute to the definition of my_enum_names so that if its size is not equal to MY_ENUM_COUNT, the compiler will throw a warning or error?
Some clarifications:
I am referring specifically to arrays which are explicitly initialized, meaning their size is known at compile-time.
I am referring specifically to the GCC compiler, including compiler extensions.
I swear I've done this before, possibly using one of GCC's __attribute__ extensions, but now I can't find any documentation on any feature that does what I want.
_Static_assert(sizeof my_enum_names / sizeof *my_enum_names == MY_ENUM_COUNT,
"my_enum_names is the wrong size.");
Prior to the addition of _Static_assert to the language, you could force errors in these situations with declarations such as:
extern char my_enum_namesIsTheWrongSize[1];
extern char my_enum_namesIsTheWrongSize[sizeof my_enum_names / sizeof *my_enum_names == MY_ENUM_COUNT];
If the test in the latter were false, it would attempt to declare an array with zero elements, which is an error itself, but, just in case the compiler does not report zero-size arrays, also conflicts with the preceding declaration, and so it should generate an error message.
How about
const char *my_enum_names[MY_ENUM_COUNT] = { ... };
Then the array will always contain enough elements, but some may be NULL which you then need to add check for instead. It's still better than risking going out of bounds.
Also, with the above, if you remove enumerations then the compiler will warn you about to many initializers if you forget to update the array initialization.
What is the fastest way to find unused enum members?
Commenting values out one by one won't work because I have almost 700 members and want to trim off a few unused ones.
I am not aware of any compiler warning, but you could possibly try with splint static analyzer tool. According to its documentation (emphasis mine):
Splint detects constants, functions, parameters, variables, types,
enumerator members, and structure or union fields that are declared
but never used.
As I checked, it works as intented. Here is example code:
#include <stdio.h>
enum Month { JAN, FEB, MAR };
int main()
{
enum Month m1 = JAN;
printf("%d\n", m1);
}
By running the splint command, you will obtain following messages:
main.c:3:19: Enum member FEB not used
A member of an enum type is never used. (Use -enummemuse to inhibit warning)
main.c:3:24: Enum member MAR not used
Note that »unused« is a relatively dangerous term here.
typedef enum type_t { VALUE_A, VALUE_B, VALUE_C } type_t;
int main() {
printf("A = %d, ", VALUE_A);
printf("C = %d", VALUE_C);
return 0;
}
will print A = 0, C = 2, but removing the »unused« VALUE_B changes the output to A = 0, C = 1.
If you persist such values, do arithmetic on it or anything in that area you might end up changing the behavior of your program.
Change the names of all the enums (by, say, adding a _ before their name). Compile. You'll get a lot of errors because it won't find the previous enum names (obviously). A bit of grep-foo and making sure the compiler / build system doesn't stop on the first error - and you'll have a list of all the enums in use!
At least, that's how I'd do it.
Which one is better to use among the below statements in C?
static const int var = 5;
or
#define var 5
or
enum { var = 5 };
It depends on what you need the value for. You (and everyone else so far) omitted the third alternative:
static const int var = 5;
#define var 5
enum { var = 5 };
Ignoring issues about the choice of name, then:
If you need to pass a pointer around, you must use (1).
Since (2) is apparently an option, you don't need to pass pointers around.
Both (1) and (3) have a symbol in the debugger's symbol table - that makes debugging easier. It is more likely that (2) will not have a symbol, leaving you wondering what it is.
(1) cannot be used as a dimension for arrays at global scope; both (2) and (3) can.
(1) cannot be used as a dimension for static arrays at function scope; both (2) and (3) can.
Under C99, all of these can be used for local arrays. Technically, using (1) would imply the use of a VLA (variable-length array), though the dimension referenced by 'var' would of course be fixed at size 5.
(1) cannot be used in places like switch statements; both (2) and (3) can.
(1) cannot be used to initialize static variables; both (2) and (3) can.
(2) can change code that you didn't want changed because it is used by the preprocessor; both (1) and (3) will not have unexpected side-effects like that.
You can detect whether (2) has been set in the preprocessor; neither (1) nor (3) allows that.
So, in most contexts, prefer the 'enum' over the alternatives. Otherwise, the first and last bullet points are likely to be the controlling factors — and you have to think harder if you need to satisfy both at once.
If you were asking about C++, then you'd use option (1) — the static const — every time.
Generally speaking:
static const
Because it respects scope and is type-safe.
The only caveat I could see: if you want the variable to be possibly defined on the command line. There is still an alternative:
#ifdef VAR // Very bad name, not long enough, too general, etc..
static int const var = VAR;
#else
static int const var = 5; // default value
#endif
Whenever possible, instead of macros / ellipsis, use a type-safe alternative.
If you really NEED to go with a macro (for example, you want __FILE__ or __LINE__), then you'd better name your macro VERY carefully: in its naming convention Boost recommends all upper-case, beginning by the name of the project (here BOOST_), while perusing the library you will notice this is (generally) followed by the name of the particular area (library) then with a meaningful name.
It generally makes for lengthy names :)
In C, specifically? In C the correct answer is: use #define (or, if appropriate, enum)
While it is beneficial to have the scoping and typing properties of a const object, in reality const objects in C (as opposed to C++) are not true constants and therefore are usually useless in most practical cases.
So, in C the choice should be determined by how you plan to use your constant. For example, you can't use a const int object as a case label (while a macro will work). You can't use a const int object as a bit-field width (while a macro will work). In C89/90 you can't use a const object to specify an array size (while a macro will work). Even in C99 you can't use a const object to specify an array size when you need a non-VLA array.
If this is important for you then it will determine your choice. Most of the time, you'll have no choice but to use #define in C. And don't forget another alternative, that produces true constants in C - enum.
In C++ const objects are true constants, so in C++ it is almost always better to prefer the const variant (no need for explicit static in C++ though).
The difference between static const and #define is that the former uses the memory and the later does not use the memory for storage. Secondly, you cannot pass the address of an #define whereas you can pass the address of a static const. Actually it is depending on what circumstance we are under, we need to select one among these two. Both are at their best under different circumstances. Please don't assume that one is better than the other... :-)
If that would have been the case, Dennis Ritchie would have kept the best one alone... hahaha... :-)
In C #define is much more popular. You can use those values for declaring array sizes for example:
#define MAXLEN 5
void foo(void) {
int bar[MAXLEN];
}
ANSI C doesn't allow you to use static consts in this context as far as I know. In C++ you should avoid macros in these cases. You can write
const int maxlen = 5;
void foo() {
int bar[maxlen];
}
and even leave out static because internal linkage is implied by const already [in C++ only].
Another drawback of const in C is that you can't use the value in initializing another const.
static int const NUMBER_OF_FINGERS_PER_HAND = 5;
static int const NUMBER_OF_HANDS = 2;
// initializer element is not constant, this does not work.
static int const NUMBER_OF_FINGERS = NUMBER_OF_FINGERS_PER_HAND
* NUMBER_OF_HANDS;
Even this does not work with a const since the compiler does not see it as a constant:
static uint8_t const ARRAY_SIZE = 16;
static int8_t const lookup_table[ARRAY_SIZE] = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; // ARRAY_SIZE not a constant!
I'd be happy to use typed const in these cases, otherwise...
If you can get away with it, static const has a lot of advantages. It obeys the normal scope principles, is visible in a debugger, and generally obeys the rules that variables obey.
However, at least in the original C standard, it isn't actually a constant. If you use #define var 5, you can write int foo[var]; as a declaration, but you can't do that (except as a compiler extension" with static const int var = 5;. This is not the case in C++, where the static const version can be used anywhere the #define version can, and I believe this is also the case with C99.
However, never name a #define constant with a lowercase name. It will override any possible use of that name until the end of the translation unit. Macro constants should be in what is effectively their own namespace, which is traditionally all capital letters, perhaps with a prefix.
#define var 5 will cause you trouble if you have things like mystruct.var.
For example,
struct mystruct {
int var;
};
#define var 5
int main() {
struct mystruct foo;
foo.var = 1;
return 0;
}
The preprocessor will replace it and the code won't compile. For this reason, traditional coding style suggest all constant #defines uses capital letters to avoid conflict.
It is ALWAYS preferable to use const, instead of #define. That's because const is treated by the compiler and #define by the preprocessor. It is like #define itself is not part of the code (roughly speaking).
Example:
#define PI 3.1416
The symbolic name PI may never be seen by compilers; it may be removed by the preprocessor before the source code even gets to a compiler. As a result, the name PI may not get entered into the symbol table. This can be confusing if you get an error during compilation involving the use of the constant, because the error message may refer to 3.1416, not PI. If PI were defined in a header file you didn’t write, you’d have no idea where that 3.1416 came from.
This problem can also crop up in a symbolic debugger, because, again, the name you’re programming with may not be in the symbol table.
Solution:
const double PI = 3.1416; //or static const...
I wrote quick test program to demonstrate one difference:
#include <stdio.h>
enum {ENUM_DEFINED=16};
enum {ENUM_DEFINED=32};
#define DEFINED_DEFINED 16
#define DEFINED_DEFINED 32
int main(int argc, char *argv[]) {
printf("%d, %d\n", DEFINED_DEFINED, ENUM_DEFINED);
return(0);
}
This compiles with these errors and warnings:
main.c:6:7: error: redefinition of enumerator 'ENUM_DEFINED'
enum {ENUM_DEFINED=32};
^
main.c:5:7: note: previous definition is here
enum {ENUM_DEFINED=16};
^
main.c:9:9: warning: 'DEFINED_DEFINED' macro redefined [-Wmacro-redefined]
#define DEFINED_DEFINED 32
^
main.c:8:9: note: previous definition is here
#define DEFINED_DEFINED 16
^
Note that enum gives an error when define gives a warning.
The definition
const int const_value = 5;
does not always define a constant value. Some compilers (for example tcc 0.9.26) just allocate memory identified with the name "const_value". Using the identifier "const_value" you can not modify this memory. But you still could modify the memory using another identifier:
const int const_value = 5;
int *mutable_value = (int*) &const_value;
*mutable_value = 3;
printf("%i", const_value); // The output may be 5 or 3, depending on the compiler.
This means the definition
#define CONST_VALUE 5
is the only way to define a constant value which can not be modified by any means.
Although the question was about integers, it's worth noting that #define and enums are useless if you need a constant structure or string. These are both usually passed to functions as pointers. (With strings it's required; with structures it's much more efficient.)
As for integers, if you're in an embedded environment with very limited memory, you might need to worry about where the constant is stored and how accesses to it are compiled. The compiler might add two consts at run time, but add two #defines at compile time. A #define constant may be converted into one or more MOV [immediate] instructions, which means the constant is effectively stored in program memory. A const constant will be stored in the .const section in data memory. In systems with a Harvard architecture, there could be differences in performance and memory usage, although they'd likely be small. They might matter for hard-core optimization of inner loops.
Don't think there's an answer for "which is always best" but, as Matthieu said
static const
is type safe. My biggest pet peeve with #define, though, is when debugging in Visual Studio you cannot watch the variable. It gives an error that the symbol cannot be found.
Incidentally, an alternative to #define, which provides proper scoping but behaves like a "real" constant, is "enum". For example:
enum {number_ten = 10;}
In many cases, it's useful to define enumerated types and create variables of those types; if that is done, debuggers may be able to display variables according to their enumeration name.
One important caveat with doing that, however: in C++, enumerated types have limited compatibility with integers. For example, by default, one cannot perform arithmetic upon them. I find that to be a curious default behavior for enums; while it would have been nice to have a "strict enum" type, given the desire to have C++ generally compatible with C, I would think the default behavior of an "enum" type should be interchangeable with integers.
A simple difference:
At pre-processing time, the constant is replaced with its value.
So you could not apply the dereference operator to a define, but you can apply the dereference operator to a variable.
As you would suppose, define is faster that static const.
For example, having:
#define mymax 100
you can not do printf("address of constant is %p",&mymax);.
But having
const int mymax_var=100
you can do printf("address of constant is %p",&mymax_var);.
To be more clear, the define is replaced by its value at the pre-processing stage, so we do not have any variable stored in the program. We have just the code from the text segment of the program where the define was used.
However, for static const we have a variable that is allocated somewhere. For gcc, static const are allocated in the text segment of the program.
Above, I wanted to tell about the reference operator so replace dereference with reference.
We looked at the produced assembler code on the MBF16X... Both variants result in the same code for arithmetic operations (ADD Immediate, for example).
So const int is preferred for the type check while #define is old style. Maybe it is compiler-specific. So check your produced assembler code.
I am not sure if I am right but in my opinion calling #defined value is much faster than calling any other normally declared variable (or const value).
It's because when program is running and it needs to use some normally declared variable it needs to jump to exact place in memory to get that variable.
In opposite when it use #defined value, the program don't need to jump to any allocated memory, it just takes the value. If #define myValue 7 and the program calling myValue, it behaves exactly the same as when it just calls 7.
I am currently trying to implement a compile-time check on an array-of-structs to make sure that if someone changes it in the future, every element of the array is defined. I want to avoid a case where someone adds too many elements to the array-of-struct, which is possible if I explicitly set the array size. This does not cover the case where someone defines too few elements to the array, and the remaining elements are just zero-initialized.
#include <stdio.h>
typedef struct myStruct {
int a;
int b;
} myStruct_t;
#define ARRAY_SIZE (3)
myStruct_t sArr[] = {
{0, 0},
{1, 1},
{2, 2}
}
#define RATIO (sizeof(sArr) / sizeof(myStruct_t))
#if ARRAY_SIZE != RATIO
#error API issue
#endif
int main(void) {
printf("Testing\n");
return 0;
}
This seemed like a sound check, since sizeof() is evaluated at compile time. But the compiler reports:
test.c:15:12: error: missing binary operator before token "("
test.c:19: error: expected ',' or ';' before 'int'
How, if possible, can I implement such a check?
Thank you.
You must use features of the compiler that come after the preprocessiong phase.
C11 has _Static_assert:
_Static_assert(ARRAY_SIZE == RATIO);
That would be the cleanest solution. If you don't have that you can use tricks like
typedef char something_useless[ARRAY_SIZE == RATIO];
If the comparison evaluates to 1 this is a valid typedef that will just do nothing. If it is 0 and error (constraint violation) will occur.
The preprocessor does not evaluate sizeof(). That is something done by the compiler.
There are two major stages to compiling a C program, preprocessor stage which does text transformations only and the second major state of compiling the output of the preprocessor.
This means that C variables and structs are not evaluated by the preprocessor so your decision is not going to work.
You may consider using the ASSERT() macro to assert specific conditions. These are evaluated at run time if the ASSERT() is enabled to expand the ASSERT() macro.
I have actually written my own version to put specific asserts into some functions to do run time checks on sizes of structs. With my own assert macros I can selectively turn them on and off.
With my own assert macros I have a function that will create a log of the asserts and if the build is a debug build such as is being used for designer testing the function will perform a break so that the designer will see the assert failure immediately and be able to do stack trace and take other steps to determine why the assert happened.
The basic roll your own assert macro I use is:
#define NHPOS_ASSERT(x) if (!(x)) { PifLogAbort( (UCHAR *) #x , (UCHAR *) __FILE__ , (UCHAR *) "function" , __LINE__ );}
where PifLogAbort() is a function that generated the log file. Using this I can see the condition that asserted along with the file name and the line number.
I have a question regarding the initialization of an array of structs in C. Googling showed me that a lot of people have had very similar questions, but they weren't quite identical.
Essentially, I have a global array of structs of type "memPermissions" shown below.
This array needs all the "address" and "ownerId" fields to be initialized to -1 upon program execution.
typedef struct memPermissions {
int address;
int ownerId;
} *test;
The problem is the array is sized using a #define, so I can't simply go:
#define numBoxes 5
struct memPermissions memPermissions[numBoxes] = {
{-1, -1},
...
{-1, -1}
};
I tried:
struct memPermissions memPermissions[numBoxes] = {-1, -1};
But naturally this only initialized the first element. (The rest were set to 0). The only solution that jumps to mind would be to initialize it with a simple loop somewhere, but because of the nature of where this code will run, I'm really hoping that's not my only option.
Is there any way to initialize all the elements of this array of structs without a loop?
Cheers,
-Josh
The C99 standard added all sorts of useful ways to initialize structures, but did not provide a repeat operator (which Fortran has had since forever - but maybe that was why it wasn't added).
If you are using a sufficiently recent version of GCC and you can afford to use a non-portable extension, then GCC provides an extension. In the GCC 8.1.0 manual (§6.27 Designated Initializers), it says:
To initialize a range of elements to the same value, write ‘[first ... last] = value’.
This is a GNU extension. For example,
int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };
If the value in it has side-effects, the side-effects will happen only once, not for each initialized field by the range initializer.
So, using this in your example:
struct memPermissions memPermissions[numBoxes] =
{
[0..numBoxes-1] = {-1, -1}, // GCC extension
};
I wish this were in the C Standard; it would be so helpful!
Without using that or other similar compiler-specific mechanisms, your only choice is a loop. For a complex initializer with many fields, not all the same value, you can probably use:
#include <string.h>
#include "memperm.h" // Header declaring your types and variables
static int initialized = 0;
// -2 so the initialization isn't uniform and memset() is not an option
static const struct memPermissions initPermissions = { -1, -2 };
struct memPermissions memPermissions[numBoxes];
void initialize_permissions(void)
{
if (initialized == 0)
{
for (int i = 0; i < numBoxes; i++)
memmove(&memPermissions[i], &initPermissions, sizeof(initPermissions));
initialized = 1;
}
}
You can also use memcpy() here - there is no danger of the two variables overlapping.
Now you just need to ensure that initialize_permissions() is called before the array is used - preferably just once. There may be compiler-specific mechanisms to allow that, too.
You could use a local variable in the initialize_permissions() function in place of the initialized static constant variable - just be sure your compiler doesn't initialize it every time the function is called.
If you have a C99 compiler, you can use a compound literal in place of the constant:
void initialize_permissions(void)
{
if (initialized == 0)
{
for (int i = 0; i < numBoxes; i++)
memmove(&memPermissions[i],&(struct memPermissions){ -1, -2 },
sizeof(memPermissions[0]));
initialized = 1;
}
}
You can write an external program that is passed the number of items that you want. This program should be called by your Makefile or equivalent. The program will write an include file for you with the required number of -1 values as well as the #define.
If you have the standard library available, you can use memset combined with sizeof(struct memPermissions) * numBoxes to fill your array with any uniform byte value. Since -1 is 0xFFFFFFFF on many platforms, this might work for you.
The only solution that jumps to mind would be to initialize it with a simple loop somewhere
Which is the only possibility within the language, I'm afraid. In C, you either initialize each element explicitly, initialize to all zeros or don't initialize.
However, you can sidestep the issue by using 0 for the purpose that your -1 currently serves.
If it really important to not use a loop, you could do something rather strange, and use/abuse memset assuming that is available.
N.B. Memset may be implemented using a loop, so it may be moot.
memset(memPermissions, 0xFF, sizeof(memPermissions)*numBoxes*2*sizeof(int));
The times 2 is needed for both members of the struct (i.e. two of them).
This is poorly designed in that it depends that the struct is not padded or aligned, which the compiler is free to do so as per the C specifications.
(Utilizing that -1 is typically 0xFFFFFFFF for 2-compliment negative integers on 32-bit processors, with 32-bit int. Kudos to #James for pointing this out.)
Though I would suspect in most cases that the code would be implemented as a small, fast, tight loop (rep movsd for x86) in assembly language in all but the most trivial of cases (very small values of numBoxes).