I'd like to write a low-level logging function that would look like:
DO_DBG("some string", val1, val2)
What I want it to do, is to store the pointer to the string rather than a copy of the string, for performance reasons. This assumes that the string is a read-only literal. To prevent people having to debug the debugger, it would be nice if the compiler could complain if the first parameter of DO_DBG was in a writable section of code vs text, etc. I'm wondering if a mechanism to do that exists. (I'm using gcc 4.9.1, ld 2.24).
You could use the automatic literal string concatenation to your advantage:
#define DO_DBG(s, a, b) _DO_DBG("" s, a, b)
And implement your real macro as _DO_DBG.
You can set up a SIGSEGV handler, and then try to do:
s[0] = s[0];
If the handler is triggered, it means that the variable is in a read-only segment, so you can save the pointer. If not, you can complain (or just make a copy of the string).
You'll need to declare the string volatile so the compiler doesn't optimize the assignment away.
Alternatively you could use the stringify macro construct:
#define DO_DBG(s, a, b) DO_DBG_INTERNAL(# s, a, b)
and call your function DO_DGB_INTERNAL
so people would do:
DO_DBG(some string, val1, val2)
PS, quite possibly a variadic function and macro would be a good idea here.
Your function gets a pointer to the start of the string. If your operating system allows it, and the compiler sets it up that way, a constant string (like in your example) could be allocated in a read-only memory area. If your compiler is smart, it will store one copy of a string constant that appears several times. It might even figure out that some strings are never used, and just not store them at all.
Other than the above, unless your program relies on undefined behaviour (getting away with writing on a constant string), you should not be able to figure out if it is in read-only memory or not. You could compare addresses of strings to see if they are only one copy, but that is it.
In any reasonable C implementation (i.e., one that doesn't go out of its's way just to screw you over), you will see no (or at most an extremely small) performance difference if the string is read-only, read-write, one copy or several.
If you write you function with a char * parameter, and store said pointer away (not a copy to the string pointed to), you should always get the same string back (unless your environment is really strange).
Here's a SIGSEGV-handler-based test to check if pointed-at memory is mapped read-only:
#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <string.h>
#include <setjmp.h>
static sigjmp_buf label;
static void segv_hndlr(int Sig) { siglongjmp(label,1); } //safe in this ctx
_Bool is_ro(char volatile *X)
{
_Bool r=0;
struct sigaction old;
X[0]; //fault now if the target isn't writable
if(sigsetjmp(label,1))
{ r=1; goto out; }
sigaction(SIGSEGV,&(struct sigaction){.sa_handler=segv_hndlr}, &old);
X[0]=X[0];
out: sigaction(SIGSEGV,&old, NULL);
return r;
}
//example:
int main()
{
#define PR_BOOL(X) printf(#X"=%d\n", X)
char bar[]="bar";
char *baz = strdup("baz");
static int const static_ro_int = 42;
int const auto_ro_int = 42;
PR_BOOL(is_ro("foo")); //1
PR_BOOL(is_ro(bar)); //0
PR_BOOL(is_ro(baz)); //0
PR_BOOL(is_ro((void*)&static_ro_int)); //1
PR_BOOL(is_ro((void*)&auto_ro_int)); //0
}
I am learning C programming, and my book says that unlike variables, constants cannot be changed during the program's execution. And that their are two types of constants Literal and Symbolic. I think that I understand Symbolic pretty well. But Literal Constants are confusing me.
The example it gave me for was
int count = 20;
I wrote this simple program, and I could change the value of the Literal Constant.
/* Demonstrates variables and constants */
#include <stdio.h>
/* Trying to figure out if literal constants are any different from variables */
int testing = 22;
int main( void )
{
/* Print testing before changing the value */
printf("\nYour int testing has a value of %d", testing);
/* Try to change the value of testing */
testing = 212345;
/* Print testing after changing the value */
printf("\nYour int testing has a value of %d", testing);
return 0;
}
It outputted this:
Your int testing has a value of 22
Your int testing has a value of 212345
RUN SUCCESSFUL (total time: 32ms)
Can someone explain how this happens, am I declaring it wrong? Or is there any difference between normal variable and literal constants?
-Thanks
The literal constant is the 20. You can change the value of count, but you cannot change the value of 20 to be, for example, 19.
(As some trivia, there are versions of FORTRAN where you could do exactly this, so it's not meaningless to talk about)
The literal in this case is the number 22 (and later the number 212345). You assign this literal to a variable testing. This variable can be changed.
This is a little trickier when it comes to strings and string literals. If you have a pointer to a string literal, you can change the actual pointer to point to some other string, but you can change what the original pointer points to.
For example:
const char *string_pointer = "Foobar";
With the above definition, you can not do e.g. string_pointer[0] = 'L';, as that is an attempt to modify the literal string that the pointer points to.
Depending on context, a symbolic constant can be either a "variable" declared as const or a preprocessor macro.
A literal constant in C is just any number itself, e.g. your 20. You cannot change it by doing, say, 20 = 50;. That is illegal.
There are also constant variables, e.g. const int blah = 42;. You cannot change this blah by doing something like blah = 100;.
But if you have a normal, non-constant variable, e.g. int foo = 123;, you can change it, e.g. foo = 456;.
Is there any way in C to find if a variable has the const qualifier? Or if it's stored in the .rodata section?
For example, if I have this function:
void foo(char* myString) {...}
different actions should be taken in these two different function calls:
char str[] = "abc";
foo(str);
foo("def");
In the first case I can modify the string, in the second one no.
Not in standard C, i.e. not portably.
myString is just a char* in foo, all other information is lost. Whatever you feed into the function is automatically converted to char*.
And C does not know about ".rodata".
Depending on your platform you could check the address in myString (if you know your address ranges).
You can't differ them using the language alone. In other words, this is not possible without recurring to features specific to the compiler you're using, which is likely not to be portable. A few important remarks though:
In the first case you COULD modify the string, but you MUST NOT. If you want a mutable string, use initialization instead of assignment.
char *str1 = "abc"; // NOT OK, should be const char *
const char *str2 = "abc"; // OK, but not mutable
char str3[] = "abc"; // OK, using initialization, you can change its contents
#include<stdio.h>
void foo(char *mystr)
{
int a;
/*code goes here*/
#ifdef CHECK
int local_var;
printf(" strings address %p\n",mystr);
printf("local variables address %p \n",&local_var);
puts("");
puts("");
#endif
return;
}
int main()
{
char a[]="hello";
char *b="hello";
foo(a);
foo(b);
foo("hello");
}
On compiling with gcc -DCHECK prog_name.c and executing on my linux machine the following output comes...
strings address 0xbfdcacf6
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
strings address 0x8048583
local variables address 0xbfdcacc8
for first case when string is defined and initialized in the "proper c way for mutable strings" the difference between the addresses is 0x2E.(5 bytes).
in the second case when string is defined as char *p="hello" the differences in addresses is
0xB7D82745.Thats bigger than the size of my stack.so i am pretty sure the string is not on the stack.Hence the only place where you can find it is .rodata section.
The third one is similar case
PS:As mentioned above this isn't portable but the original question hardly leaves any scope for portability by mentioning .rodata :)
GCC provides the __builtin_constant_p builtin function, which enables you to determine whether an expression is constant or not at compile-time:
Built-in Function: int __builtin_constant_p (exp)
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constant-folding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compile-time constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the `-O' option.
So I guess you should rewrite your foo function as a macro in such a case:
#define foo(x) \
(__builtin_constant_p(x) ? foo_on_const(x) : foo_on_var(x))
foo("abc") would expand to foo_on_const("abc") and foo(str) would expand to foo_on_var(str).
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I know there is a standard behind all C compiler implementations, so there should be no hidden features. Despite that, I am sure all C developers have hidden/secret tricks they use all the time.
More of a trick of the GCC compiler, but you can give branch indication hints to the compiler (common in the Linux kernel)
#define likely(x) __builtin_expect((x),1)
#define unlikely(x) __builtin_expect((x),0)
see: http://kerneltrap.org/node/4705
What I like about this is that it also adds some expressiveness to some functions.
void foo(int arg)
{
if (unlikely(arg == 0)) {
do_this();
return;
}
do_that();
...
}
int8_t
int16_t
int32_t
uint8_t
uint16_t
uint32_t
These are an optional item in the standard, but it must be a hidden feature, because people are constantly redefining them. One code base I've worked on (and still do, for now) has multiple redefinitions, all with different identifiers. Most of the time it's with preprocessor macros:
#define INT16 short
#define INT32 long
And so on. It makes me want to pull my hair out. Just use the freaking standard integer typedefs!
The comma operator isn't widely used. It can certainly be abused, but it can also be very useful. This use is the most common one:
for (int i=0; i<10; i++, doSomethingElse())
{
/* whatever */
}
But you can use this operator anywhere. Observe:
int j = (printf("Assigning variable j\n"), getValueFromSomewhere());
Each statement is evaluated, but the value of the expression will be that of the last statement evaluated.
initializing structure to zero
struct mystruct a = {0};
this will zero all stucture elements.
Function pointers. You can use a table of function pointers to implement, e.g., fast indirect-threaded code interpreters (FORTH) or byte-code dispatchers, or to simulate OO-like virtual methods.
Then there are hidden gems in the standard library, such as qsort(),bsearch(), strpbrk(), strcspn() [the latter two being useful for implementing a strtok() replacement].
A misfeature of C is that signed arithmetic overflow is undefined behavior (UB). So whenever you see an expression such as x+y, both being signed ints, it might potentially overflow and cause UB.
Multi-character constants:
int x = 'ABCD';
This sets x to 0x41424344 (or 0x44434241, depending on architecture).
EDIT: This technique is not portable, especially if you serialize the int.
However, it can be extremely useful to create self-documenting enums. e.g.
enum state {
stopped = 'STOP',
running = 'RUN!',
waiting = 'WAIT',
};
This makes it much simpler if you're looking at a raw memory dump and need to determine the value of an enum without having to look it up.
I never used bit fields but they sound cool for ultra-low-level stuff.
struct cat {
unsigned int legs:3; // 3 bits for legs (0-4 fit in 3 bits)
unsigned int lives:4; // 4 bits for lives (0-9 fit in 4 bits)
// ...
};
cat make_cat()
{
cat kitty;
kitty.legs = 4;
kitty.lives = 9;
return kitty;
}
This means that sizeof(cat) can be as small as sizeof(char).
Incorporated comments by Aaron and leppie, thanks guys.
C has a standard but not all C compilers are fully compliant (I've not seen any fully compliant C99 compiler yet!).
That said, the tricks I prefer are those that are non-obvious and portable across platforms as they rely on the C semantic. They usually are about macros or bit arithmetic.
For example: swapping two unsigned integer without using a temporary variable:
...
a ^= b ; b ^= a; a ^=b;
...
or "extending C" to represent finite state machines like:
FSM {
STATE(x) {
...
NEXTSTATE(y);
}
STATE(y) {
...
if (x == 0)
NEXTSTATE(y);
else
NEXTSTATE(x);
}
}
that can be achieved with the following macros:
#define FSM
#define STATE(x) s_##x :
#define NEXTSTATE(x) goto s_##x
In general, though, I don't like the tricks that are clever but make the code unnecessarily complicated to read (as the swap example) and I love the ones that make the code clearer and directly conveying the intention (like the FSM example).
Interlacing structures like Duff's Device:
strncpy(to, from, count)
char *to, *from;
int count;
{
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
I'm very fond of designated initializers, added in C99 (and supported in gcc for a long time):
#define FOO 16
#define BAR 3
myStructType_t myStuff[] = {
[FOO] = { foo1, foo2, foo3 },
[BAR] = { bar1, bar2, bar3 },
...
The array initialization is no longer position dependent. If you change the values of FOO or BAR, the array initialization will automatically correspond to their new value.
C99 has some awesome any-order structure initialization.
struct foo{
int x;
int y;
char* name;
};
void main(){
struct foo f = { .y = 23, .name = "awesome", .x = -38 };
}
anonymous structures and arrays is my favourite one. (cf. http://www.run.montefiore.ulg.ac.be/~martin/resources/kung-f00.html)
setsockopt(yourSocket, SOL_SOCKET, SO_REUSEADDR, (int[]){1}, sizeof(int));
or
void myFunction(type* values) {
while(*values) x=*values++;
}
myFunction((type[]){val1,val2,val3,val4,0});
it can even be used to instanciate linked lists...
gcc has a number of extensions to the C language that I enjoy, which can be found here. Some of my favorites are function attributes. One extremely useful example is the format attribute. This can be used if you define a custom function that takes a printf format string. If you enable this function attribute, gcc will do checks on your arguments to ensure that your format string and arguments match up and will generate warnings or errors as appropriate.
int my_printf (void *my_object, const char *my_format, ...)
__attribute__ ((format (printf, 2, 3)));
the (hidden) feature that "shocked" me when I first saw is about printf. this feature allows you to use variables for formatting format specifiers themselves. look for the code, you will see better:
#include <stdio.h>
int main() {
int a = 3;
float b = 6.412355;
printf("%.*f\n",a,b);
return 0;
}
the * character achieves this effect.
Well... I think that one of the strong points of C language is its portability and standardness, so whenever I find some "hidden trick" in the implementation I am currently using, I try not to use it because I try to keep my C code as standard and portable as possible.
Compile-time assertions, as already discussed here.
//--- size of static_assertion array is negative if condition is not met
#define STATIC_ASSERT(condition) \
typedef struct { \
char static_assertion[condition ? 1 : -1]; \
} static_assertion_t
//--- ensure structure fits in
STATIC_ASSERT(sizeof(mystruct_t) <= 4096);
Constant string concatenation
I was quite surprised not seeing it allready in the answers, as all compilers I know of support it, but many programmers seems to ignore it. Sometimes it's really handy and not only when writing macros.
Use case I have in my current code:
I have a #define PATH "/some/path/" in a configuration file (really it is setted by the makefile). Now I want to build the full path including filenames to open ressources. It just goes to:
fd = open(PATH "/file", flags);
Instead of the horrible, but very common:
char buffer[256];
snprintf(buffer, 256, "%s/file", PATH);
fd = open(buffer, flags);
Notice that the common horrible solution is:
three times as long
much less easy to read
much slower
less powerfull at it set to an arbitrary buffer size limit (but you would have to use even longer code to avoid that without constant strings contatenation).
use more stack space
Well, I've never used it, and I'm not sure whether I'd ever recommend it to anyone, but I feel this question would be incomplete without a mention of Simon Tatham's co-routine trick.
When initializing arrays or enums, you can put a comma after the last item in the initializer list. e.g:
int x[] = { 1, 2, 3, };
enum foo { bar, baz, boom, };
This was done so that if you're generating code automatically you don't need to worry about eliminating the last comma.
Struct assignment is cool. Many people don't seem to realize that structs are values too, and can be assigned around, there is no need to use memcpy(), when a simple assignment does the trick.
For example, consider some imaginary 2D graphics library, it might define a type to represent an (integer) screen coordinate:
typedef struct {
int x;
int y;
} Point;
Now, you do things that might look "wrong", like write a function that creates a point initialized from function arguments, and returns it, like so:
Point point_new(int x, int y)
{
Point p;
p.x = x;
p.y = y;
return p;
}
This is safe, as long (of course) as the return value is copied by value using struct assignment:
Point origin;
origin = point_new(0, 0);
In this way you can write quite clean and object-oriented-ish code, all in plain standard C.
Strange vector indexing:
int v[100]; int index = 10;
/* v[index] it's the same thing as index[v] */
C compilers implement one of several standards. However, having a standard does not mean that all aspects of the language are defined. Duff's device, for example, is a favorite 'hidden' feature that has become so popular that modern compilers have special purpose recognition code to ensure that optimization techniques do not clobber the desired effect of this often used pattern.
In general hidden features or language tricks are discouraged as you are running on the razor edge of whichever C standard(s) your compiler uses. Many such tricks do not work from one compiler to another, and often these kinds of features will fail from one version of a compiler suite by a given manufacturer to another version.
Various tricks that have broken C code include:
Relying on how the compiler lays out structs in memory.
Assumptions on endianness of integers/floats.
Assumptions on function ABIs.
Assumptions on the direction that stack frames grow.
Assumptions about order of execution within statements.
Assumptions about order of execution of statements in function arguments.
Assumptions on the bit size or precision of short, int, long, float and double types.
Other problems and issues that arise whenever programmers make assumptions about execution models that are all specified in most C standards as 'compiler dependent' behavior.
When using sscanf you can use %n to find out where you should continue to read:
sscanf ( string, "%d%n", &number, &length );
string += length;
Apparently, you can't add another answer, so I'll include a second one here, you can use "&&" and "||" as conditionals:
#include <stdio.h>
#include <stdlib.h>
int main()
{
1 || puts("Hello\n");
0 || puts("Hi\n");
1 && puts("ROFL\n");
0 && puts("LOL\n");
exit( 0 );
}
This code will output:
Hi
ROFL
using INT(3) to set break point at the code is my all time favorite
My favorite "hidden" feature of C, is the usage of %n in printf to write back to the stack. Normally printf pops the parameter values from the stack based on the format string, but %n can write them back.
Check out section 3.4.2 here. Can lead to a lot of nasty vulnerabilities.
Compile-time assumption-checking using enums:
Stupid example, but can be really useful for libraries with compile-time configurable constants.
#define D 1
#define DD 2
enum CompileTimeCheck
{
MAKE_SURE_DD_IS_TWICE_D = 1/(2*(D) == (DD)),
MAKE_SURE_DD_IS_POW2 = 1/((((DD) - 1) & (DD)) == 0)
};
Gcc (c) has some fun features you can enable, such as nested function declarations, and the a?:b form of the ?: operator, which returns a if a is not false.
I discoverd recently 0 bitfields.
struct {
int a:3;
int b:2;
int :0;
int c:4;
int d:3;
};
which will give a layout of
000aaabb 0ccccddd
instead of without the :0;
0000aaab bccccddd
The 0 width field tells that the following bitfields should be set on the next atomic entity (char)
C99-style variable argument macros, aka
#define ERR(name, fmt, ...) fprintf(stderr, "ERROR " #name ": " fmt "\n", \
__VAR_ARGS__)
which would be used like
ERR(errCantOpen, "File %s cannot be opened", filename);
Here I also use the stringize operator and string constant concatentation, other features I really like.
Variable size automatic variables are also useful in some cases. These were added i nC99 and have been supported in gcc for a long time.
void foo(uint32_t extraPadding) {
uint8_t commBuffer[sizeof(myProtocol_t) + extraPadding];
You end up with a buffer on the stack with room for the fixed-size protocol header plus variable size data. You can get the same effect with alloca(), but this syntax is more compact.
You have to make sure extraPadding is a reasonable value before calling this routine, or you end up blowing the stack. You'd have to sanity check the arguments before calling malloc or any other memory allocation technique, so this isn't really unusual.