Hidden features of C - c

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know there is a standard behind all C compiler implementations, so there should be no hidden features. Despite that, I am sure all C developers have hidden/secret tricks they use all the time.

More of a trick of the GCC compiler, but you can give branch indication hints to the compiler (common in the Linux kernel)
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
see: http://kerneltrap.org/node/4705
What I like about this is that it also adds some expressiveness to some functions.
void foo(int arg)
{
if (unlikely(arg == 0)) {
do_this();
return;
}
do_that();
...
}

int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t
These are an optional item in the standard, but it must be a hidden feature, because people are constantly redefining them. One code base I've worked on (and still do, for now) has multiple redefinitions, all with different identifiers. Most of the time it's with preprocessor macros:
#define INT16 short
#define INT32 long
And so on. It makes me want to pull my hair out. Just use the freaking standard integer typedefs!

The comma operator isn't widely used. It can certainly be abused, but it can also be very useful. This use is the most common one:
for (int i=0; i<10; i++, doSomethingElse())
{
/* whatever */
}
But you can use this operator anywhere. Observe:
int j = (printf("Assigning variable j\n"), getValueFromSomewhere());
Each statement is evaluated, but the value of the expression will be that of the last statement evaluated.

initializing structure to zero
struct mystruct a = {0};
this will zero all stucture elements.

Function pointers. You can use a table of function pointers to implement, e.g., fast indirect-threaded code interpreters (FORTH) or byte-code dispatchers, or to simulate OO-like virtual methods.
Then there are hidden gems in the standard library, such as qsort(),bsearch(), strpbrk(), strcspn() [the latter two being useful for implementing a strtok() replacement].
A misfeature of C is that signed arithmetic overflow is undefined behavior (UB). So whenever you see an expression such as x+y, both being signed ints, it might potentially overflow and cause UB.

Multi-character constants:
int x = 'ABCD';
This sets x to 0x41424344 (or 0x44434241, depending on architecture).
EDIT: This technique is not portable, especially if you serialize the int.
However, it can be extremely useful to create self-documenting enums. e.g.
enum state {
stopped = 'STOP',
running = 'RUN!',
waiting = 'WAIT',
};
This makes it much simpler if you're looking at a raw memory dump and need to determine the value of an enum without having to look it up.

I never used bit fields but they sound cool for ultra-low-level stuff.
struct cat {
unsigned int legs:3; // 3 bits for legs (0-4 fit in 3 bits)
unsigned int lives:4; // 4 bits for lives (0-9 fit in 4 bits)
// ...
};
cat make_cat()
{
cat kitty;
kitty.legs = 4;
kitty.lives = 9;
return kitty;
}
This means that sizeof(cat) can be as small as sizeof(char).
Incorporated comments by Aaron and leppie, thanks guys.

C has a standard but not all C compilers are fully compliant (I've not seen any fully compliant C99 compiler yet!).
That said, the tricks I prefer are those that are non-obvious and portable across platforms as they rely on the C semantic. They usually are about macros or bit arithmetic.
For example: swapping two unsigned integer without using a temporary variable:
...
a ^= b ; b ^= a; a ^=b;
...
or "extending C" to represent finite state machines like:
FSM {
STATE(x) {
...
NEXTSTATE(y);
}
STATE(y) {
...
if (x == 0)
NEXTSTATE(y);
else
NEXTSTATE(x);
}
}
that can be achieved with the following macros:
#define FSM
#define STATE(x) s_##x :
#define NEXTSTATE(x) goto s_##x
In general, though, I don't like the tricks that are clever but make the code unnecessarily complicated to read (as the swap example) and I love the ones that make the code clearer and directly conveying the intention (like the FSM example).

Interlacing structures like Duff's Device:
strncpy(to, from, count)
char *to, *from;
int count;
{
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}

I'm very fond of designated initializers, added in C99 (and supported in gcc for a long time):
#define FOO 16
#define BAR 3
myStructType_t myStuff[] = {
[FOO] = { foo1, foo2, foo3 },
[BAR] = { bar1, bar2, bar3 },
...
The array initialization is no longer position dependent. If you change the values of FOO or BAR, the array initialization will automatically correspond to their new value.

C99 has some awesome any-order structure initialization.
struct foo{
int x;
int y;
char* name;
};
void main(){
struct foo f = { .y = 23, .name = "awesome", .x = -38 };
}

anonymous structures and arrays is my favourite one. (cf. http://www.run.montefiore.ulg.ac.be/~martin/resources/kung-f00.html)
setsockopt(yourSocket, SOL_SOCKET, SO_REUSEADDR, (int[]){1}, sizeof(int));
or
void myFunction(type* values) {
while(*values) x=*values++;
}
myFunction((type[]){val1,val2,val3,val4,0});
it can even be used to instanciate linked lists...

gcc has a number of extensions to the C language that I enjoy, which can be found here. Some of my favorites are function attributes. One extremely useful example is the format attribute. This can be used if you define a custom function that takes a printf format string. If you enable this function attribute, gcc will do checks on your arguments to ensure that your format string and arguments match up and will generate warnings or errors as appropriate.
int my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));

the (hidden) feature that "shocked" me when I first saw is about printf. this feature allows you to use variables for formatting format specifiers themselves. look for the code, you will see better:
#include <stdio.h>
int main() {
int a = 3;
float b = 6.412355;
printf("%.*f\n",a,b);
return 0;
}
the * character achieves this effect.

Well... I think that one of the strong points of C language is its portability and standardness, so whenever I find some "hidden trick" in the implementation I am currently using, I try not to use it because I try to keep my C code as standard and portable as possible.

Compile-time assertions, as already discussed here.
//--- size of static_assertion array is negative if condition is not met
#define STATIC_ASSERT(condition) \
typedef struct { \
char static_assertion[condition ? 1 : -1]; \
} static_assertion_t
//--- ensure structure fits in
STATIC_ASSERT(sizeof(mystruct_t) <= 4096);

Constant string concatenation
I was quite surprised not seeing it allready in the answers, as all compilers I know of support it, but many programmers seems to ignore it. Sometimes it's really handy and not only when writing macros.
Use case I have in my current code:
I have a #define PATH "/some/path/" in a configuration file (really it is setted by the makefile). Now I want to build the full path including filenames to open ressources. It just goes to:
fd = open(PATH "/file", flags);
Instead of the horrible, but very common:
char buffer[256];
snprintf(buffer, 256, "%s/file", PATH);
fd = open(buffer, flags);
Notice that the common horrible solution is:
three times as long
much less easy to read
much slower
less powerfull at it set to an arbitrary buffer size limit (but you would have to use even longer code to avoid that without constant strings contatenation).
use more stack space

Well, I've never used it, and I'm not sure whether I'd ever recommend it to anyone, but I feel this question would be incomplete without a mention of Simon Tatham's co-routine trick.

When initializing arrays or enums, you can put a comma after the last item in the initializer list. e.g:
int x[] = { 1, 2, 3, };
enum foo { bar, baz, boom, };
This was done so that if you're generating code automatically you don't need to worry about eliminating the last comma.

Struct assignment is cool. Many people don't seem to realize that structs are values too, and can be assigned around, there is no need to use memcpy(), when a simple assignment does the trick.
For example, consider some imaginary 2D graphics library, it might define a type to represent an (integer) screen coordinate:
typedef struct {
int x;
int y;
} Point;
Now, you do things that might look "wrong", like write a function that creates a point initialized from function arguments, and returns it, like so:
Point point_new(int x, int y)
{
Point p;
p.x = x;
p.y = y;
return p;
}
This is safe, as long (of course) as the return value is copied by value using struct assignment:
Point origin;
origin = point_new(0, 0);
In this way you can write quite clean and object-oriented-ish code, all in plain standard C.

Strange vector indexing:
int v[100]; int index = 10;
/* v[index] it's the same thing as index[v] */

C compilers implement one of several standards. However, having a standard does not mean that all aspects of the language are defined. Duff's device, for example, is a favorite 'hidden' feature that has become so popular that modern compilers have special purpose recognition code to ensure that optimization techniques do not clobber the desired effect of this often used pattern.
In general hidden features or language tricks are discouraged as you are running on the razor edge of whichever C standard(s) your compiler uses. Many such tricks do not work from one compiler to another, and often these kinds of features will fail from one version of a compiler suite by a given manufacturer to another version.
Various tricks that have broken C code include:
Relying on how the compiler lays out structs in memory.
Assumptions on endianness of integers/floats.
Assumptions on function ABIs.
Assumptions on the direction that stack frames grow.
Assumptions about order of execution within statements.
Assumptions about order of execution of statements in function arguments.
Assumptions on the bit size or precision of short, int, long, float and double types.
Other problems and issues that arise whenever programmers make assumptions about execution models that are all specified in most C standards as 'compiler dependent' behavior.

When using sscanf you can use %n to find out where you should continue to read:
sscanf ( string, "%d%n", &number, &length );
string += length;
Apparently, you can't add another answer, so I'll include a second one here, you can use "&&" and "||" as conditionals:
#include <stdio.h>
#include <stdlib.h>
int main()
{
1 || puts("Hello\n");
0 || puts("Hi\n");
1 && puts("ROFL\n");
0 && puts("LOL\n");
exit( 0 );
}
This code will output:
Hi
ROFL

using INT(3) to set break point at the code is my all time favorite

My favorite "hidden" feature of C, is the usage of %n in printf to write back to the stack. Normally printf pops the parameter values from the stack based on the format string, but %n can write them back.
Check out section 3.4.2 here. Can lead to a lot of nasty vulnerabilities.

Compile-time assumption-checking using enums:
Stupid example, but can be really useful for libraries with compile-time configurable constants.
#define D 1
#define DD 2
enum CompileTimeCheck
{
MAKE_SURE_DD_IS_TWICE_D = 1/(2*(D) == (DD)),
MAKE_SURE_DD_IS_POW2 = 1/((((DD) - 1) & (DD)) == 0)
};

Gcc (c) has some fun features you can enable, such as nested function declarations, and the a?:b form of the ?: operator, which returns a if a is not false.

I discoverd recently 0 bitfields.
struct {
int a:3;
int b:2;
int :0;
int c:4;
int d:3;
};
which will give a layout of
000aaabb 0ccccddd
instead of without the :0;
0000aaab bccccddd
The 0 width field tells that the following bitfields should be set on the next atomic entity (char)

C99-style variable argument macros, aka
#define ERR(name, fmt, ...) fprintf(stderr, "ERROR " #name ": " fmt "\n", \
__VAR_ARGS__)
which would be used like
ERR(errCantOpen, "File %s cannot be opened", filename);
Here I also use the stringize operator and string constant concatentation, other features I really like.

Variable size automatic variables are also useful in some cases. These were added i nC99 and have been supported in gcc for a long time.
void foo(uint32_t extraPadding) {
uint8_t commBuffer[sizeof(myProtocol_t) + extraPadding];
You end up with a buffer on the stack with room for the fixed-size protocol header plus variable size data. You can get the same effect with alloca(), but this syntax is more compact.
You have to make sure extraPadding is a reasonable value before calling this routine, or you end up blowing the stack. You'd have to sanity check the arguments before calling malloc or any other memory allocation technique, so this isn't really unusual.

Related

Using macros to implement a generic vector in C. Is this a good idea?

I am a programmer who knows both C and C++. I have used both languages in my own projects but I do not know which one I prefer.
When I program in C the feature that I miss the most from C++ is std::vector from the STL (Standard Template Library)
I still haven't figured out how I should represent growing arrays in C. Up to this point I duplicated my memory allocation code all over the place in my projects. I do not like code duplication and I know that it is bad practice so this does not seems like a very good solution to me.
I thought about this problem some time ago and came up with the idea to implement a generic vector using preprocessor macros.
This is how the implementation looks:
#ifndef VECTOR_H_
#define VECTOR_H_
#include <stdlib.h>
#include <stdio.h>
/* Declare a vector of type `TYPE`. */
#define VECTOR_OF(TYPE) struct { \
TYPE *data; \
size_t size; \
size_t capacity; \
}
/* Initialize `VEC` with `N` capacity. */
#define VECTOR_INIT_CAPACITY(VEC, N) do { \
(VEC).data = malloc((N) * sizeof(*(VEC).data)); \
if (!(VEC).data) { \
fputs("malloc failed!\n", stderr); \
abort(); \
} \
(VEC).size = 0; \
(VEC).capacity = (N); \
} while (0)
/* Initialize `VEC` with zero elements. */
#define VECTOR_INIT(VEC) VECTOR_INIT_CAPACITY(VEC, 1)
/* Get the amount of elements in `VEC`. */
#define VECTOR_SIZE(VEC) (VEC).size
/* Get the amount of elements that are allocated for `VEC`. */
#define VECTOR_CAPACITY(VEC) (VEC).capacity
/* Test if `VEC` is empty. */
#define VECTOR_EMPTY(VEC) ((VEC).size == 0)
/* Push `VAL` at the back of the vector. This function will reallocate the buffer if
necessary. */
#define VECTOR_PUSH_BACK(VEC, VAL) do { \
if ((VEC).size + 1 > (VEC).capacity) { \
size_t n = (VEC).capacity * 2; \
void *p = realloc((VEC).data, n * sizeof(*(VEC).data)); \
if (!p) { \
fputs("realloc failed!\n", stderr); \
abort(); \
} \
(VEC).data = p; \
(VEC).capacity = n; \
} \
(VEC).data[VECTOR_SIZE(VEC)] = (VAL); \
(VEC).size += 1; \
} while (0)
/* Get the value of `VEC` at `INDEX`. */
#define VECTOR_AT(VEC, INDEX) (VEC).data[INDEX]
/* Get the value at the front of `VEC`. */
#define VECTOR_FRONT(VEC) (VEC).data[0]
/* Get the value at the back of `VEC`. */
#define VECTOR_BACK(VEC) (VEC).data[VECTOR_SIZE(VEC) - 1]
#define VECTOR_FREE(VEC) do { \
(VEC).size = 0; \
(VEC).capacity = 0; \
free((VEC).data); \
} while(0)
#endif /* !defined VECTOR_H_ */
This code goes in the header file called "vector.h".
Note that it does miss some functionality (like VECTOR_INSERT and VECTOR_ERASE) but I think that it is good enough to show my concept.
The use of the vector looks like this:
int main()
{
VECTOR_OF(int) int_vec;
VECTOR_OF(double) dbl_vec;
int i;
VECTOR_INIT(int_vec);
VECTOR_INIT(dbl_vec);
for (i = 0; i < 100000000; ++i) {
VECTOR_PUSH_BACK(int_vec, i);
VECTOR_PUSH_BACK(dbl_vec, i);
}
for (i = 0; i < 100; ++i) {
printf("int_vec[%d] = %d\n", i, VECTOR_AT(int_vec, i));
printf("dbl_vec[%d] = %f\n", i, VECTOR_AT(dbl_vec, i));
}
VECTOR_FREE(int_vec);
VECTOR_FREE(dbl_vec);
return 0;
}
It uses the same allocation rules as std::vector (the size starts as 1 and then doubles each time that is required).
To my surprise I found out that this code runs more than twice as fast as the same code written using std::vector and generates a smaller executable! (compiled with GCC and G++ using -O3 in both cases).
My questions to you are:
Are there any serious faults with this approach?
Would you recommend using this in a serious project?
If not then I would like you to explain why and tell me what a better alternative would be.
To my surprise I found out that this code runs more than twice as fast as the same code written using std::vector and generates a smaller executable! (compiled with GCC and G++ using -O3 in both cases).
There are three reasons why your C version is faster/smaller than the C++ version:
The implementation of new in the standard C++ library that is used by g++ is suboptimal. If you implement void* operator new (size_t size) as a call-through to malloc() you get better performance than with the built-in version.
If realloc() has to use a new chunk of memory, it moves the old data over in the fashion of memmove(), i. e. it ignores the logical structure of the data and simply moves the bits. That operation can easily be accelerated to the point that the memory bus is the bottleneck. std::vector<>, on the other hand, must take care of possibly calling constructors/destructors correctly, it can't just call through to memmove(). In the case of int and double that boils down to moving the data one int/double at a time, the loop is in the code generated for the std::vector<>. That is not too bad, but its worse than using SSE instructions which a good memmove() implementation will do.
The realloc() function is part of the standard C library which is dynamically linked to your executable. The memory management code generated by std::vector<>, however, is precisely that: generated. As such, it must be a part of your executable.
Are there any serious faults with this approach?
This is a matter of taste, but I think, the approach is smelly: Your macro definitions are far away from their uses, and they do not all behave like simple constants or inline function. In fact, they act suspiciously like elements of a programming language (i. e. templates), which is not a good thing for preprocessor macros. It is generally a bad idea to try to modify the language by use of the preprocessor.
You also have a serious issue with your macro implementations: The VECTOR_INIT_CAPACITY(VEC, N) macro evaluates its VEC argument four times and the N argument twice. Now think about what happens if you do a call VECTOR_INIT_CAPACITY(foo, baz++): The size stored in the capacity field of the new vector will be larger than the size of the memory allocated for it. The line with the malloc() call will increment the baz variable, and that new value will be stored in the capacity member before baz is incremented a second time. You should write all macros in a way that the evaluate their arguments exactly once.
Would you recommend using this in a serious project?
I think, I wouldn't bother. The realloc() code is trivial enough that some replications won't hurt too much. But again, your mileage may vary.
If not then I would like you to explain why and tell me what a better alternative would be.
As I said before, I wouldn't bother trying to write a general container class in the style of std::vector<>, neither by (ab)using the preprocessor, nor by (ab)using void*.
But I would take a close look at the memory handling on the system that I write for: With many kernels, it is extremely unlikely that you ever get a return value of NULL out of a malloc()/realloc() call. They over-commit their memory, making promises they cannot be certain to be able to fulfill. And when they realize that they can't back up their promises, they start shooting processes via the OOM-killer. On such a system (linux is one of them), your error handling is simply pointless. It will never get executed. As such, you can avoid the pain of adding it and replicating it to all the places where you need a dynamically growing array.
My memory reallocation code in C usually looks something like this:
if(size == capacity) {
array = realloc(array, (capacity *= 2) * sizeof(*array));
}
array[size++] = ...;
Without the functionless error handling code, this is so short that it can safely be replicated as many times as it is needed.
Are there any serious faults with this approach?
You're reinventing templates in a way that interacts poorly with C's type system. For instance, your VECTOR types are anonymous, so I can't write a function that takes a VECTOR_OF(int) as a parameter.
Even if you do name your types somehow, I wouldn't be able to write a generic function---something that takes a VECTOR_OF(T) for arbitrary T and does something with it.
These might not be serious faults, but there's a hundred minor drawbacks like this to every generics-using-macros approach I've seen in C. This all comes up because the language doesn't try to support generic programming at all.
Would you recommend using this in a serious project?
Sure; you can develop a serious project using container types like this, and they won't even necessarily get in your face. You'll probably need to traffic in void *'s to pass these things around, and that leads to some casting that's a little bit error-prone.
My questions to you are:
Are there any serious faults with this approach?
Yes, you're trying to reinvent the wheel.
Would you recommend using this in a serious project?
No, especially since your speed up generally indicated you might be missing some security checks.
If not then I would like you to explain why and tell me what a better alternative would be.
VPool from above, or something else like that. If you search for "C growable buffer", you'll find several hints on stackoverflow and via google

Why #defines instead of enums [duplicate]

Which one is better to use among the below statements in C?
static const int var = 5;
or
#define var 5
or
enum { var = 5 };
It depends on what you need the value for. You (and everyone else so far) omitted the third alternative:
static const int var = 5;
#define var 5
enum { var = 5 };
Ignoring issues about the choice of name, then:
If you need to pass a pointer around, you must use (1).
Since (2) is apparently an option, you don't need to pass pointers around.
Both (1) and (3) have a symbol in the debugger's symbol table - that makes debugging easier. It is more likely that (2) will not have a symbol, leaving you wondering what it is.
(1) cannot be used as a dimension for arrays at global scope; both (2) and (3) can.
(1) cannot be used as a dimension for static arrays at function scope; both (2) and (3) can.
Under C99, all of these can be used for local arrays. Technically, using (1) would imply the use of a VLA (variable-length array), though the dimension referenced by 'var' would of course be fixed at size 5.
(1) cannot be used in places like switch statements; both (2) and (3) can.
(1) cannot be used to initialize static variables; both (2) and (3) can.
(2) can change code that you didn't want changed because it is used by the preprocessor; both (1) and (3) will not have unexpected side-effects like that.
You can detect whether (2) has been set in the preprocessor; neither (1) nor (3) allows that.
So, in most contexts, prefer the 'enum' over the alternatives. Otherwise, the first and last bullet points are likely to be the controlling factors — and you have to think harder if you need to satisfy both at once.
If you were asking about C++, then you'd use option (1) — the static const — every time.
Generally speaking:
static const
Because it respects scope and is type-safe.
The only caveat I could see: if you want the variable to be possibly defined on the command line. There is still an alternative:
#ifdef VAR // Very bad name, not long enough, too general, etc..
static int const var = VAR;
#else
static int const var = 5; // default value
#endif
Whenever possible, instead of macros / ellipsis, use a type-safe alternative.
If you really NEED to go with a macro (for example, you want __FILE__ or __LINE__), then you'd better name your macro VERY carefully: in its naming convention Boost recommends all upper-case, beginning by the name of the project (here BOOST_), while perusing the library you will notice this is (generally) followed by the name of the particular area (library) then with a meaningful name.
It generally makes for lengthy names :)
In C, specifically? In C the correct answer is: use #define (or, if appropriate, enum)
While it is beneficial to have the scoping and typing properties of a const object, in reality const objects in C (as opposed to C++) are not true constants and therefore are usually useless in most practical cases.
So, in C the choice should be determined by how you plan to use your constant. For example, you can't use a const int object as a case label (while a macro will work). You can't use a const int object as a bit-field width (while a macro will work). In C89/90 you can't use a const object to specify an array size (while a macro will work). Even in C99 you can't use a const object to specify an array size when you need a non-VLA array.
If this is important for you then it will determine your choice. Most of the time, you'll have no choice but to use #define in C. And don't forget another alternative, that produces true constants in C - enum.
In C++ const objects are true constants, so in C++ it is almost always better to prefer the const variant (no need for explicit static in C++ though).
The difference between static const and #define is that the former uses the memory and the later does not use the memory for storage. Secondly, you cannot pass the address of an #define whereas you can pass the address of a static const. Actually it is depending on what circumstance we are under, we need to select one among these two. Both are at their best under different circumstances. Please don't assume that one is better than the other... :-)
If that would have been the case, Dennis Ritchie would have kept the best one alone... hahaha... :-)
In C #define is much more popular. You can use those values for declaring array sizes for example:
#define MAXLEN 5
void foo(void) {
int bar[MAXLEN];
}
ANSI C doesn't allow you to use static consts in this context as far as I know. In C++ you should avoid macros in these cases. You can write
const int maxlen = 5;
void foo() {
int bar[maxlen];
}
and even leave out static because internal linkage is implied by const already [in C++ only].
Another drawback of const in C is that you can't use the value in initializing another const.
static int const NUMBER_OF_FINGERS_PER_HAND = 5;
static int const NUMBER_OF_HANDS = 2;
// initializer element is not constant, this does not work.
static int const NUMBER_OF_FINGERS = NUMBER_OF_FINGERS_PER_HAND
* NUMBER_OF_HANDS;
Even this does not work with a const since the compiler does not see it as a constant:
static uint8_t const ARRAY_SIZE = 16;
static int8_t const lookup_table[ARRAY_SIZE] = {
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16}; // ARRAY_SIZE not a constant!
I'd be happy to use typed const in these cases, otherwise...
If you can get away with it, static const has a lot of advantages. It obeys the normal scope principles, is visible in a debugger, and generally obeys the rules that variables obey.
However, at least in the original C standard, it isn't actually a constant. If you use #define var 5, you can write int foo[var]; as a declaration, but you can't do that (except as a compiler extension" with static const int var = 5;. This is not the case in C++, where the static const version can be used anywhere the #define version can, and I believe this is also the case with C99.
However, never name a #define constant with a lowercase name. It will override any possible use of that name until the end of the translation unit. Macro constants should be in what is effectively their own namespace, which is traditionally all capital letters, perhaps with a prefix.
#define var 5 will cause you trouble if you have things like mystruct.var.
For example,
struct mystruct {
int var;
};
#define var 5
int main() {
struct mystruct foo;
foo.var = 1;
return 0;
}
The preprocessor will replace it and the code won't compile. For this reason, traditional coding style suggest all constant #defines uses capital letters to avoid conflict.
It is ALWAYS preferable to use const, instead of #define. That's because const is treated by the compiler and #define by the preprocessor. It is like #define itself is not part of the code (roughly speaking).
Example:
#define PI 3.1416
The symbolic name PI may never be seen by compilers; it may be removed by the preprocessor before the source code even gets to a compiler. As a result, the name PI may not get entered into the symbol table. This can be confusing if you get an error during compilation involving the use of the constant, because the error message may refer to 3.1416, not PI. If PI were defined in a header file you didn’t write, you’d have no idea where that 3.1416 came from.
This problem can also crop up in a symbolic debugger, because, again, the name you’re programming with may not be in the symbol table.
Solution:
const double PI = 3.1416; //or static const...
I wrote quick test program to demonstrate one difference:
#include <stdio.h>
enum {ENUM_DEFINED=16};
enum {ENUM_DEFINED=32};
#define DEFINED_DEFINED 16
#define DEFINED_DEFINED 32
int main(int argc, char *argv[]) {
printf("%d, %d\n", DEFINED_DEFINED, ENUM_DEFINED);
return(0);
}
This compiles with these errors and warnings:
main.c:6:7: error: redefinition of enumerator 'ENUM_DEFINED'
enum {ENUM_DEFINED=32};
^
main.c:5:7: note: previous definition is here
enum {ENUM_DEFINED=16};
^
main.c:9:9: warning: 'DEFINED_DEFINED' macro redefined [-Wmacro-redefined]
#define DEFINED_DEFINED 32
^
main.c:8:9: note: previous definition is here
#define DEFINED_DEFINED 16
^
Note that enum gives an error when define gives a warning.
The definition
const int const_value = 5;
does not always define a constant value. Some compilers (for example tcc 0.9.26) just allocate memory identified with the name "const_value". Using the identifier "const_value" you can not modify this memory. But you still could modify the memory using another identifier:
const int const_value = 5;
int *mutable_value = (int*) &const_value;
*mutable_value = 3;
printf("%i", const_value); // The output may be 5 or 3, depending on the compiler.
This means the definition
#define CONST_VALUE 5
is the only way to define a constant value which can not be modified by any means.
Although the question was about integers, it's worth noting that #define and enums are useless if you need a constant structure or string. These are both usually passed to functions as pointers. (With strings it's required; with structures it's much more efficient.)
As for integers, if you're in an embedded environment with very limited memory, you might need to worry about where the constant is stored and how accesses to it are compiled. The compiler might add two consts at run time, but add two #defines at compile time. A #define constant may be converted into one or more MOV [immediate] instructions, which means the constant is effectively stored in program memory. A const constant will be stored in the .const section in data memory. In systems with a Harvard architecture, there could be differences in performance and memory usage, although they'd likely be small. They might matter for hard-core optimization of inner loops.
Don't think there's an answer for "which is always best" but, as Matthieu said
static const
is type safe. My biggest pet peeve with #define, though, is when debugging in Visual Studio you cannot watch the variable. It gives an error that the symbol cannot be found.
Incidentally, an alternative to #define, which provides proper scoping but behaves like a "real" constant, is "enum". For example:
enum {number_ten = 10;}
In many cases, it's useful to define enumerated types and create variables of those types; if that is done, debuggers may be able to display variables according to their enumeration name.
One important caveat with doing that, however: in C++, enumerated types have limited compatibility with integers. For example, by default, one cannot perform arithmetic upon them. I find that to be a curious default behavior for enums; while it would have been nice to have a "strict enum" type, given the desire to have C++ generally compatible with C, I would think the default behavior of an "enum" type should be interchangeable with integers.
A simple difference:
At pre-processing time, the constant is replaced with its value.
So you could not apply the dereference operator to a define, but you can apply the dereference operator to a variable.
As you would suppose, define is faster that static const.
For example, having:
#define mymax 100
you can not do printf("address of constant is %p",&mymax);.
But having
const int mymax_var=100
you can do printf("address of constant is %p",&mymax_var);.
To be more clear, the define is replaced by its value at the pre-processing stage, so we do not have any variable stored in the program. We have just the code from the text segment of the program where the define was used.
However, for static const we have a variable that is allocated somewhere. For gcc, static const are allocated in the text segment of the program.
Above, I wanted to tell about the reference operator so replace dereference with reference.
We looked at the produced assembler code on the MBF16X... Both variants result in the same code for arithmetic operations (ADD Immediate, for example).
So const int is preferred for the type check while #define is old style. Maybe it is compiler-specific. So check your produced assembler code.
I am not sure if I am right but in my opinion calling #defined value is much faster than calling any other normally declared variable (or const value).
It's because when program is running and it needs to use some normally declared variable it needs to jump to exact place in memory to get that variable.
In opposite when it use #defined value, the program don't need to jump to any allocated memory, it just takes the value. If #define myValue 7 and the program calling myValue, it behaves exactly the same as when it just calls 7.

Create a min() macro for any type of array

I would like to create a C macro returning the scalar minimum for any type of static array in input. For example:
float A[100];
int B[10][10];
// [...]
float minA = MACRO_MIN(A);
int minB = MACRO_MIN(B);
How can I do so?
It can be probably be done with GCC extensions, but not in standard C. Other compilers might have suitable extensions, too. It will of course make the code fantastically hard to port. I would advise against it, since it's quite hard to achieve it will be "unexpected" and probably act as a source of confusion (or, worse, bugs) down the line.
You're going to have to declare a temporary variable to hold the max/min seen "so far" when iterating over the array, and the type of that variable is hard to formulate without extensions.
Also returning the value of the temporary is hard, but possible with GCC extensions.
To make the above more concrete, here's a sketch of what I imagine. I did not test-compile this, so it's very likely to have errors in it:
#define ARRAY_MAX(a) ({ typeof(a) tmp = a[0];\
for(size_t i = 1; i < sizeof a / sizeof tmp; ++i)\
{\
if(a[i] > tmp)\
tmp = a[i];\
}\
tmp;\
})
The above uses:
({ and }) is the GCC Statement Expressions extension, allowing the macro to have a local variable which is used as the "return value".
typeof is used to compute the proper type.
Note assumption that the array is not of zero size. This should not be a very limiting assumption.
The use of sizeof is of course standard.
As I wrote the above, I realize there might be issues with multi-dimensional arrays that I hadn't realized until trying. I'm not going to polish it further, though. Note that it starts out with "probably".

Reassemble float from bytes inline

I'm working with HiTech PICC32 on the PIC32MX series of microprocessors, but I think this question is general enough for anyone knowledgable in C. (This is almost equivalent to C90, with sizeof(int) = sizeof(long) = sizeof(float) = 4.)
Let's say I read a 4-byte word of data that represents a float. I can quickly convert it to its actual float value with:
#define FLOAT_FROM_WORD(WORD_VALUE) (*((float*) &(WORD_VALUE)))
But this only works for lvalues. I can't, for example, use this on a function return value like:
FLOAT_FROM_WORD(eeprom_read_word(addr));
Is there a short and sweet way to do this inline, i.e. without a function call or temp variable? To be honest, there's no HUGE reason for me to avoid a function call or extra var, but it's bugging me. There must be a way I'm missing.
Added: I didn't realise that WORD was actually a common typedef. I've changed the name of the macro argument to avoid confusion.
You can run the trick the other way for return values
float fl;
*(int*)&fl = eeprom_read_word(addr);
or
#define WORD_TO_FLOAT(f) (*(int*)&(f))
WORD_TO_FLOAT(fl) = eeprom_read_word(addr);
or as R Samuel Klatchko suggests
#define ASTYPE(type, val) (*(type*)&(val))
ASTYPE(WORD,fl) = eeprom_read_word(addr);
If this were GCC, you could do this:
#define atob(original, newtype) \
(((union { typeof(original) i; newtype j })(original)).k)
Wow. Hideous. But the usage is nice:
int i = 0xdeadbeef;
float f = atob(i, float);
I bet your compiler doesn't support either the typeof operator nor the union casting that GCC does, since neither are standard behavior, but in the off-chance that your compiler can do union casting, that is your answer. Modified not to use typeof:
#define atob(original, origtype newtype) \
(((union { origtype i; newtype j })(original)).k)
int i = 0xdeadbeef;
float f = atob(i, int, float);
Of course, this ignores the issue of what happens when you use two types of different sizes, but is closer to "what you want," i.e. a simple macro filter that returns a value, instead of taking an extra parameter. The extra parameters this version takes are just for generality.
If your compiler doesn't support union casting, which is a neat but non-portable trick, then there is no way to do this the "way you want it," and the other answers have already got it.
you can take the address of a temporary value if you use a const reference:
FLOAT_FROM_WORD(w) (*(float*)&(const WORD &)(w))
but that won't work in c :(
(c doesn't have references right? works in visual c++)
as others have said, be it an inlined function or a temp in a define, the compiler will optimize it out.
Not really an answer, more a suggestion. Your FLOAT_FROM_WORD macro will be more natural to use and more flexible if it doesn't have a ; at the end
#define FLOAT_FROM_WORD(w) (*(float*)&(w))
fl = FLOAT_FROM_WORD(wd);
It may not be possible in your exact situation, but upgrading to a C99 compiler would solve your problem too.
C99 has inline functions which, while acting like normal functions in parameters and return values, get improved efficiency in exactly this case with none of the drawbacks of macros.

Pointer to literal value

Suppose I have a constant defined in a header file
#define THIS_CONST 'A'
I want to write this constant to a stream. I do something like:
char c = THIS_CONST;
write(fd, &c, sizeof(c))
However, what I would like to do for brevity and clarity is:
write(fd, &THIS_CONST, sizeof(char)); // error
// lvalue required as unary ‘&’ operand
Does anyone know of any macro/other trick for obtaining a pointer to a literal? I would like something which can be used like this:
write(fd, PTR_TO(THIS_CONST), sizeof(char))
Note: I realise I could declare my constants as static const variables, but then I can't use them in switch/case statements. i.e.
static const char THIS_CONST = 'A'
...
switch(c) {
case THIS_CONST: // error - case label does not reduce to an integer constant
...
}
Unless there is a way to use a const variable in a case label?
There is no way to do this directly in C89. You would have to use a set of macros to create such an expression.
In C99, it is allowed to declare struct-or-union literals, and initializers to scalars can be written using a similar syntax. Therefore, there is one way to achieve the desired effect:
#include <stdio.h>
void f(const int *i) {
printf("%i\n", *i);
}
int main(void) {
f(&(int){1});
return 0;
}
These answers are all outdated, and apart from a comment nobody refers to recent language updates.
On a C99-C11-C17 compiler using a compound literal, http://en.cppreference.com/w/c/language/compound_literal, is possible to create
a pointer to a nameless constant, as in:
int *p = &((int){10});
The only way you can obtain a pointer is by putting the literal into a variable (your first code example). You can then use the variable with write() and the literal in switch.
C simply does not allow the address of character literals like 'A'. For what it's worth, the type of character literals in C is int (char in C++ but this question is tagged C). 'A' would have an implementation defined value (such as 65 on ASCII systems). Taking the address of a value doesn't make any sense and is not possible.
Now, of course you may take the address of other kinds of literals such as string literals, for example the following is okay:
write(fd, "potato", sizeof "potato");
This is because the string literal "potato" is an array, and its value is a pointer to the 'p' at the start.
To elaborate/clarify, you may only take the address of objects. ie, the & (address-of) operator requires an object, not a value.
And to answer the other question that I missed, C doesn't allow non-constant case labels, and this includes variables declared const.
Since calling write() to write a single character to a file descriptor is almost certainly a performance killer, you probably want to just do fputc( THIS_CONST, stream ).
#define THIS_CONST 'a'
Is just a macro. The compiler basically just inserts 'a' everywhere you use THIS_CONST.
You could try:
const char THIS_CONST = 'a';
But I suspect that will not work wither (don't have a c-compiler handy to try it out on, and it has been quite a few years since I've written c code).
just use a string constant, which is a pointer to a character, and then only write 1 byte:
#define MY_CONST_STRING "A"
write(fd, MY_CONST_STRING, 1);
Note that the '\0' byte at the end of the string is not written.
You can do this for all sorts of constant values, just use the appropriate hex code string, e.g.
#define MY_CONST_STRING "\x41"
will give also the character 'A'. For multiple-byte stuff, take care that you use the correct endianness.
Let's say you want to have a pointer to a INT_MAX, which is e.g. 0x7FFFFFFF on a 32 bit system. Then you can do the following:
#define PTR_TO_INT_MAX "\xFF\xFF\xFF\x7F"
You can see that this works by passing it as a dereferenced pointer to printf:
printf ("max int value = %d\n", *(int*)PTR_TO_INT_MAX);
which should print 2147483647.
For chars, you may use extra global static variables. Maybe something like:
#define THIS_CONST 'a'
static char tmp;
#define PTR_TO(X) ((tmp = X),&tmp)
write(fd,PTR_TO(THIS_CONST),sizeof(char));
There's no reason the compiler has to put the literal into any memory location, so your question doesn't make sense. For example a statement like
int a;
a = 10;
would probably just be directly translated into "put a value ten into a register". In the assembly language output of the compiler, the value ten itself never even exists as something in memory which could be pointed at, except as part of the actual program text.
You can't take a pointer to it.
If you really want a macro to get a pointer,
#include <stdio.h>
static char getapointer[1];
#define GETAPOINTER(x) &getapointer, getapointer[0] = x
int main ()
{
printf ("%d\n",GETAPOINTER('A'));
}
I can see what you're trying to do here, but you're trying to use two fundamentally different things here. The crux of the matter is that case statements need to use values which are present at compile time, but pointers to data in memory are available only at run time.
When you do this:
#define THIS_CONST 'A'
char c = THIS_CONST;
write(fd, &c, sizeof(c))
you are doing two things. You are making the macro THIS_CONST available to the rest of the code at compile time, and you are creating a new char at runtime which is initialised to this value. At the point at which the line write(fd, &c, sizeof(c)) is executed, the concept of THIS_CONST no longer exists, so you have correctly identified that you can create a pointer to c, but not a pointer to THIS_CONST.
Now, when you do this:
static const char THIS_CONST = 'A';
switch(c) {
case THIS_CONST: // error - case label does not reduce to an integer constant
...
}
you are writing code where the value of the case statement needs to be evaluated at compile time. However, in this case, you have specified THIS_CONST in a way where it is a variable, and therefore its value is available only at runtime despite you "knowing" that it is going to have a particular value. Certainly, other languages allow different things to happen with case statements, but those are the rules with C.
Here's what I'd suggest:
1) Don't call a variable THIS_CONST. There's no technical reason not to, but convention suggests that this is a compile-time macro, and you don't want to confuse your readers.
2) If you want the same values to be available at compile time and runtime, find a suitable way of mapping compile-time macros into run-time variables. This may well be as simple as:
#define CONST_STAR '*'
#define CONST_NEWLINE '\n'
static const char c_star CONST_STAR;
static const char c_newline CONST_NEWLINE;
Then you can do:
switch(c) {
case CONST_STAR:
...
write(fd, &c_star, sizeof(c_star))
...
}
(Note also that sizeof(char) is always one, by definition. You may know that already, but it's not as widely appreciated as perhaps it should be.)
Here is another way to solve this old but still relevant question
#define THIS_CONST 'A'
//I place this in a header file
static inline size_t write1(int fd, char byte)
{
return write(fd, &byte, 1);
}
//sample usage
int main(int argc, char * argv[])
{
char s[] = "Hello World!\r\n";
write(0, s, sizeof(s));
write1(0, THIS_CONST);
return 0;
}
Ok, I've come up with a bit of a hack, for chars only - but I'll put it here to see if it inspires any better solutions from anyone else
static const char *charptr(char c) {
static char val[UCHAR_MAX + 1];
val[(unsigned char)c] = c;
return &val[(unsigned char)c];
}
...
write(fd, charptr(THIS_CONST), sizeof(char));

Resources