Variables defined and assigned at the same time - c

A coding style presentation that I attended lately in office advocated that variables should NOT be assigned (to a default value) when they are defined. Instead, they should be assigned a default value just before their use.
So, something like
int a = 0;
should be frowned upon.
Obviously, an example of 'int' is simplistic but the same follows for other types also like pointers etc.
Further, it was also mentioned that the C99 compatible compilers now throw up a warning in the above mentioned case.
The above approach looks useful to me only for structures i.e. you memset them only before use. This would be efficient if the structure is used (or filled) only in an error leg.
For all other cases, I find defining and assigning to a default value a prudent exercise as I have encountered a lot of bugs because of un-initialized pointers both while writing and maintaining code. Further, I believe C++ via constructors also advocates the same approach i.e. define and assign.
I am wondering why(if) C99 standard does not like defining & assigning. Is their any considerable merit in doing what the coding style presentation advocated?

Usually I'd recommend initialising variables when they are defined if the value they should have is known, and leave variables uninitialised if the value isn't. Either way, put them as close to their use as scoping rules allow.
Instead, they should be assigned a default value just before their use.
Usually you shouldn't use a default value at all. In C99 you can mix code and declarations, so there's no point defining the variable before you assign a value to it. If you know the value it's supposed to take, then there is no point in having a default value.
Further, it was also mentioned that the C99 compatible compilers now throw up a warning in the above mentioned case.
Not for the case you show - you don't get a warning for having int x = 0;. I strongly suspect that someone got this mixed up. Compilers warn if you use a variable without assigning a value to it, and if you have:
... some code ...
int x;
if ( a )
x = 1;
else if ( b )
x = 2;
// oops, forgot the last case else x = 3;
return x * y;
then you will get a warning that x may be used without being initialised, at least with gcc.
You won't get a warning if you assign a value to x before the if, but it is irrelevant whether the assignment is done as an initialiser or as a separate statement.
Unless you have a particular reason to assign the value twice for two of the branches, there's no point assigning the default value to x first, as it stops the compiler warning you that you've covered every branch.

There's no such requirement (or even guideline that I'm aware of) in C99, nor does the compiler warn you about it. It's simply a matter of style.
As far as coding style is concerned, I think you took things too literally. For example, your statement is right in the following case...
int i = 0;
for (; i < n; i++)
do_something(i);
... or even in ...
int i = 1;
[some code follows here]
while (i < a)
do_something(i);
... but there are other cases that, in my mind, are better handled with an early "declare and assign". Consider structures constructed on the stack or various OOP constructs, like in:
struct foo {
int bar;
void *private;
};
int my_callback(struct foo *foo)
{
struct my_struct *my_struct = foo->private;
[do something with my_struct]
return 0;
}
Or like in (C99 struct initializers):
void do_something(int a, int b, int c)
{
struct foo foo = {
.a = a,
.b = b + 1,
.c = c / 2,
};
write_foo(&foo);
}

I sort of concur with the advice, even though I'm not altogether sure the standard says anything about it, and I very much doubt the bit about compiler warnings is true.
The thing is, modern compilers can and do detect the use of uninitialised variables. If you set your variables to default values at initialisation, you lose that detection. And default values can cause bugs too; certainly in the case of your example, int a = 0;. Who says 0 is an appropriate value for a?
In the 1990s, the advice would've been wrong. Nowadays, it's correct.

I find it highly useful to pre-assign some default data to variables so that i don't have to do (as many) null checks in code.

I have seen so many bugs due to uninitialized pointers that I always advocated to declare each variable with NULL_PTR and each primitivewith some invalid/default value.
Since I work on RTOS and high performance but low resource systems, it is possible that the compilers we use do not catch non-initialized usage. Though I doubt modern compilers can also be relied on 100%.
In large projects where Macro's are extensively used, I have seen rare scenarios where even Kloclwork /Purify have failed to find non-initialized usage.
So I say stick with it as long as you are using plain old C/C++.
Modern languages like .Net can guarantee to initialize varaibles, or give a compiler error for uninitialized variable usage. Following link does a performance analysis and validates that there is a 10-20% performance hit for .NET. The analysis is in quite detail and is explained well.
http://www.codeproject.com/KB/dotnet/DontInitializeVariables.aspx

Related

Different ways of suppressing 'uninitialized variable warnings' in C

I have encountered multiple uses of the uninitialized_var() macro designed to get rid of warnings like:
warning: ‘ptr’ is used uninitialized in this function [-Wuninitialized]
For GCC (<linux/compiler-gcc.h>) it is defined such a way:
/*
* A trick to suppress uninitialized variable warning without generating any
* code
*/
#define uninitialized_var(x) x = x
But also I discovered that <linux/compiler-clang.h> has the same macro defined in a different way:
#define uninitialized_var(x) x = *(&(x))
Why we have two different definitions? For what reason the first way may be insufficient? Is the first way insufficient just for Clang or in some other cases too?
c example:
#define uninitialized_var(x) x = x
struct some {
int a;
char b;
};
int main(void) {
struct some *ptr;
struct some *uninitialized_var(ptr2);
if (1)
printf("%d %d\n", ptr->a, ptr2->a); // warning about ptr, not ptr2
}
Compilers are made to recognize certain constructs as indications that the author intended something deliberately, when the compiler would otherwise warn about it. For example, given if (b = a), GCC and Clang both warn that an assignment is being used as a conditional, but they do not warn about if ((b = a)) even though it is equivalent in terms of the C standard. This particular construct with extra parentheses has simply been set as a way to tell the compiler the author truly intends this code.
Similarly, x = x has been set as a way to tell GCC not to warn about x being uninitialized. There are times where a function may appear to have a code path in which an object is used without being initialized, but the author knows the function is intended not to be used with parameters that would ever cause that particular code path to be executed and, for reasons of efficiency, they want to silence the compiler warning rather than add an initialization that is not actually necessary for program correctness.
Clang was presumably designed not to recognize GCC’s idiom for this and needed a different method.
Why we have two different definitions?
Unclear, but I speculate that it's because Clang still produces a warning for x = x when x is uninitialized, but not for x = *(&(x)). Under almost every circumstance* in which one of those expressions has well-defined behavior, the other has the same well-defined behavior. Under other circumstances, such as when the value of x is undefined or indeterminate, both have undefined behavior, or the behavior of x = x is defined and that of x = *(&(x)) undefined, so the latter provides no advantage.
For what reason the first way may be insufficient?
Because the behavior of both is undefined in the use cases for which they seem to be intended. It is not at all surprising, then, that different compilers handle them differently.
Is the first way insufficient just for Clang or in some other cases too?
Both expressions' meaning and behavior is undefined. In one sense, then, one cannot safely conclude that either is sufficient for anything. In the empirical sense of whether using one or the other fools certain compilers into not emitting warnings that they otherwise would, and still should, emit, it is likely that there were, are, and / or will be compilers that handle the undefined behavior associated with both of those expressions differently than GCC and Clang do.
* The exception being when x is declared with register storage class, in which case the second expression has undefined behavior regardless of whether x has a well-defined value.

How does this implementation of randomization work in C?

Since I found this particular documentation on https://www.tutorialspoint.com/c_standard_library/c_function_rand.htm,I have been thinking about this particular line of code srand((unsigned)time(&t));.Whenever I had to generate some stuff,I used srand(time(NULL)) in order not to generate the same stuff everytime I run the program,but when I came across this,I have been wondering :Is there any difference between srand((unsigned)time(&t)) and srand(time(NULL))?Because to me they seem like they do the same thing.Why is a time_t variable used?And why is the adress operator used in srand()?
#include <stdio.h>
#include<stdlib.h>
int main(){
int i,n;
time_t t;
n = 5;
srand((unsigned)time(&t));
for (i = 0; i < n; i++) {
printf("%d\n", rand() % 50);
}
return(0);
}
Yes, it will yield the same result. But the example is badly written.
I would be careful reading Tutorialspoint. It's a site known for bad C code, and many bad habits you see in questions here at SO can be traced to that site. Ok, it's anecdotal evidence, but I did ask a user here why they cast the result of malloc, and they responded that they had learned that on Tutorialspoint. You can actually see (at least) four examples in this short snippet.
They cast the result from the call to time() which is completely unnecessary and just clutters the code.
For some reason they use the variable t, which is completely useless in this example. If you read the documentation for time() you'll see that just passing NULL is perfectly adequate in this example.
Why use the variable n? For this short example it's perfectly ok with a hardcoded value. And when you use variables to avoid hardcoded values, you should declare them const and give them a much more descriptive name than n. (Ok, I realize I was a bit on the edge when writing this. Omitting const isn't that big of a deal, even if it's preferable. And "n" is a common name meaning "number of iterations". And using a variable instead of a hard coded value is in general a good thing. )
Omitted #include<time.h> which would be ok if they also omitted the rest of the includes.
Using int main() instead of int main(void).
For 5, I'd say that in most cases, this does not matter for the main function, but declaring other functions as for example int foo() with empty parenthesis instead of int foo(void) could cause problems, because they mean different things. From the C standard:
The use of function declarators with empty parentheses (not prototype-format parameter type declarators) is an obsolescent feature.
Here is a question related to that: What are the semantics of function pointers with empty parentheses in each C standard?
One could also argue about a few other things, but some people would disagree about these.
Why declare i outside the for loop? Declaring it inside have been legal since C99, which is 20 years old.
Why end the function with return 0? Omitting this is also ok since C99. You only need to have a return in main if you want to return something else than 0. Personally, in general I find "it's good practice" as a complete nonsense statement unless there are some good arguments to why it should be good practice.
These are good to remember if your goal is to maintain very old C code in environments where you don't have compilers that supports C99. But how common is that?
So if I got to rewrite the example at tutorialspoint, i'd write it like this:
#include<stdio.h>
#include<stdlib.h>
#include<time.h>
int main(void){
srand(time(NULL));
for (int i = 0; i < 5; i++) {
printf("%d\n", rand() % 50);
}
}
Another horrible example can be found here: https://www.tutorialspoint.com/c_standard_library/c_function_gets.htm
The function gets is removed from standard C, because it's very dangerous. Yet, the site does not even mention that.
Also, they teach you to cast the result of malloc https://www.tutorialspoint.com/c_standard_library/c_function_malloc.htm which is completely unnecessary. Read why here: Do I cast the result of malloc?
And although they mention that malloc returns NULL on failure, they don't show in the examples how to properly error check it. Same goes for functions like scanf.

Casting function pointers with different return types

This question is about using function pointers, which are not precisely compatible, but which I hope I can use nevertheless as long as my code relies only on the compatible parts. Let's start with some code to get the idea:
typedef void (*funcp)(int* val);
static void myFuncA(int* val) {
*val *= 2;
return;
}
static int myFuncB(int* val) {
*val *= 2;
return *val;
}
int main(void) {
funcp f = NULL;
int v = 2;
f = myFuncA;
f(&v);
// now v is 4
// explicit cast so the compiler will not complain
f = (funcp)myFuncB;
f(&v);
// now v is 8
return 0;
}
While the arguments of myFuncA and myFuncB are identical and fully compatible, the return values are not and are thus just ignored by the calling code. I tried the above code and it works correctly using GCC.
What I learned so far from here and here is that the functions are incompatible by definition of the standard and may cause undefined behavior. My intuition, however, tells me that my code example will still work correctly, since it does not rely in any way on the incompatible parts (the return value). However, in the answers to this question a possible corruption of the stack has been mentioned.
So my question is: Is my example above valid C code so it will always work as intended (without any side effects), or does it depend on the compiler?
EDIT:
I want to do this in order to use a "more powerful" function with a "les powerful" interface. In my example funcp is the interface but I would like to provide additional functionality like myFuncB for optional use.
Agreed it is undefined behaviour, don't do that!
Yes the code functions, i.e. it doesn't fall over, but the value you assign after returning void is undefined.
In a very old version of "C" the return type was unspecified and int and void functions could be 'safely' intermixed. The integer value being returned in the designated accumulator register. I remember writing code using this 'feature'!
For almost anything else you might return the results are likely to be fatal.
Going forward a few years, floating-point return values are often returned using the fp coprocessor (we are still in the 80s) register, so you can't mix int and float return types, because the state of the coprocessor would be confused if the caller does not strip off the value, or strips off a value that was never placed there and causes an fp exception. Worse, if you build with fp emulation, then the fp value may be returned on the stack as described next. Also on 32-bit builds it is possible to pass 64bit objects (on 16 bit builds you can have 32 bit objects) which would be returned either using multiple registers or on the stack. If they are on the stack and allocated the wrong size, then some local stomping will occur,
Now, c supports struct return types and return value copy optimisations. All bets are off if you don't match the types correctly.
Also some function models have the caller allocate stack space for the parameters for the call, but the function itself releases the stack. Disagreement between caller and implementation on on the number or types of parameters and return values would be fatal.
By default C function names are exported and linked undecorated - just the function name defines the symbol, so different modules of your program could have different views about function signatures, which conflict when you link, and potentially generate very interesting runtime errors.
In c++ the function names are highly decorated primarily to allow overloading, but also it helps to avoid signature mismatches. This helps with keeping arguments in step, but actually, ( as noted by #Jens ) the return type is not encoded into the decorated name, primarily because the return type isn't used (wasn't, but I think occasionally can now influence) for overload resolution.
Yes, this is an undefined behaviour and you should never rely on undefined behaviour if want to write portable code.
Function with different return values can have different calling conventions. Your example will probably work for small return types, but when returning large structs (e.g. larger than 32 bits) some compilers will generate code where struct is returned in a temporary memory area which should be cleaned up the the caller.

memcpy vs assignment in C -- should be memmove?

As pointed out in an answer to this question, the compiler (in this case gcc-4.1.2, yes it's old, no I can't change it) can replace struct assignments with memcpy where it thinks it is appropriate.
I'm running some code under valgrind and got a warning about memcpy source/destination overlap. When I look at the code, I see this (paraphrasing):
struct outer
{
struct inner i;
// lots of other stuff
};
struct inner
{
int x;
// lots of other stuff
};
void frob(struct inner* i, struct outer* o)
{
o->i = *i;
}
int main()
{
struct outer o;
// assign a bunch of fields in o->i...
frob(&o.i, o);
return 0;
}
If gcc decides to replace that assignment with memcpy, then it's an invalid call because the source and dest overlap.
Obviously, if I change the assignment statement in frob to call memmove instead, then the problem goes away.
But is this a compiler bug, or is that assignment statement somehow invalid?
I think that you are mixing up the levels. gcc is perfectly correct to replace an assignment operation by a call to any library function of its liking, as long as it can guarantee the correct behavior.
It is not "calling" memcpy or whatsoever in the sense of the standard. It is just using one function it its library for which it might have additional information that guarantees correctness. The properties of memcpy as they are described in the standard are properties seen as interfaces for the programmer, not for the compiler/environment implementor.
Whether or not memcpy in that implementation in question implements a behavior that makes it valid for the assignment operation is another question. It should not be so difficult to check that or even to inspect the code.
As far as I can tell, this is a compiler bug. i is allowed to alias &o.i according to the aliasing rules, since the types match and the compiler cannot prove that the address of o.i could not have been previously taken. And of course calling memcpy with overlapping (or same) pointers invokes UB.
By the way note that, in your example, o->i is nonsense. You meant o.i I think...
I suppose that there is a typo: "&o" instead of "0".
Under this hypothesis, the "overlap" is actually a strict overwrite: memcpy(&o->i,&o->i,sizeof(o->i)). In this particular case memcpy behaves correctly.

Reassemble float from bytes inline

I'm working with HiTech PICC32 on the PIC32MX series of microprocessors, but I think this question is general enough for anyone knowledgable in C. (This is almost equivalent to C90, with sizeof(int) = sizeof(long) = sizeof(float) = 4.)
Let's say I read a 4-byte word of data that represents a float. I can quickly convert it to its actual float value with:
#define FLOAT_FROM_WORD(WORD_VALUE) (*((float*) &(WORD_VALUE)))
But this only works for lvalues. I can't, for example, use this on a function return value like:
FLOAT_FROM_WORD(eeprom_read_word(addr));
Is there a short and sweet way to do this inline, i.e. without a function call or temp variable? To be honest, there's no HUGE reason for me to avoid a function call or extra var, but it's bugging me. There must be a way I'm missing.
Added: I didn't realise that WORD was actually a common typedef. I've changed the name of the macro argument to avoid confusion.
You can run the trick the other way for return values
float fl;
*(int*)&fl = eeprom_read_word(addr);
or
#define WORD_TO_FLOAT(f) (*(int*)&(f))
WORD_TO_FLOAT(fl) = eeprom_read_word(addr);
or as R Samuel Klatchko suggests
#define ASTYPE(type, val) (*(type*)&(val))
ASTYPE(WORD,fl) = eeprom_read_word(addr);
If this were GCC, you could do this:
#define atob(original, newtype) \
(((union { typeof(original) i; newtype j })(original)).k)
Wow. Hideous. But the usage is nice:
int i = 0xdeadbeef;
float f = atob(i, float);
I bet your compiler doesn't support either the typeof operator nor the union casting that GCC does, since neither are standard behavior, but in the off-chance that your compiler can do union casting, that is your answer. Modified not to use typeof:
#define atob(original, origtype newtype) \
(((union { origtype i; newtype j })(original)).k)
int i = 0xdeadbeef;
float f = atob(i, int, float);
Of course, this ignores the issue of what happens when you use two types of different sizes, but is closer to "what you want," i.e. a simple macro filter that returns a value, instead of taking an extra parameter. The extra parameters this version takes are just for generality.
If your compiler doesn't support union casting, which is a neat but non-portable trick, then there is no way to do this the "way you want it," and the other answers have already got it.
you can take the address of a temporary value if you use a const reference:
FLOAT_FROM_WORD(w) (*(float*)&(const WORD &)(w))
but that won't work in c :(
(c doesn't have references right? works in visual c++)
as others have said, be it an inlined function or a temp in a define, the compiler will optimize it out.
Not really an answer, more a suggestion. Your FLOAT_FROM_WORD macro will be more natural to use and more flexible if it doesn't have a ; at the end
#define FLOAT_FROM_WORD(w) (*(float*)&(w))
fl = FLOAT_FROM_WORD(wd);
It may not be possible in your exact situation, but upgrading to a C99 compiler would solve your problem too.
C99 has inline functions which, while acting like normal functions in parameters and return values, get improved efficiency in exactly this case with none of the drawbacks of macros.

Resources