What is the cost of unnamed scope in C? - c

I was playing with C++ earlier and was thinking if, in some cases, Since my C compiler refuses to let me write code such as:
for (int i = 0; i < 30; ++i)
{
...
}
I try writing something like:
#include <stdio.h>
int main(int argc, char **argv)
{
{
int i;
for (i = 0; i < 30; ++i) {
printf("%d.\n", i);
}
}
return 0;
}
The result is that I do not have this i variable in my scope for more than I need it for, and frees up i to be used for other purposes in the main scope (i wouldn't do this to i, since it's iterator by convention). So I am allowed to write silly code like:
#include <stdio.h>
int main(int argc, char **argv)
{
int i = 3;
/* my loop scope. */
{
int i;
for (i = 0; i < 30; ++i) {
printf("%d.\n", i);
}
}
printf("i remains intact! %d.\n", i);
return 0;
}
Again, I would not intentionally make real code to abuse i like this, but in many cases, especially dealing with temporary variables necessary in libc function calls, such as sem_getvalue or Windows API FormatMessage, where it is easy to clutter up the scope with many temp variables and get confused with what's going on. By using unnamed scopes, In theory, I could reduce complexity of my code by isolating these temporary variables to unnamed scopes.
The idea seems silly but compelling and I am curious if there is any cost/drawback to writing code in this style. Is there inherent issues with writing code this way, or is this a decent practice?

{
int x = 3;
{
int y = 4;
int sum = x+y;
}
}
Has no more cost than:
{
int x = 3;
int y = 4;
int sum = x+y;
}
Because braces do not translate into any machine code themselves, they are just there to tell the compiler about scope.
The fact your variables have the same name also has no effect because variable names are also just for the compiler and do not change machine code.

frees up i to be used for other purposes in the main scope
This is the flaw in your reasoning. You should not use a variable with the same name for unrelated purposes inside the same function. Apart from making the code unreadable, it opens up for all kinds of subtle bugs. Copy/paste one snippet and put it elsewhere in the function, and suddenly it is working with another variable. That is very bad practice, period.
Similarly, it is most often bad practice to have variables in different scopes but with the same name.
If you have multiple loops in the same function that all uses an interator i with the same type, simply declare it at the beginning of the function and re-use it over and over.
If you need an i with different type at different places in a function, that's a clear sign saying that you should split the function in several.
Overall, whenever you find yourself in need to use an obscure language mechanism, you need to step back and consider if you couldn't just design the program in a simpler way. This is almost always the case. Excellent programmers always strive for simplicity and never for complexity.

I don't think local variable allocation works how you think it does.
The compiler maps all the local variables memory requirements together and on entry to the function allocates an offset of that amount on the stack at one time regardless of how many you have. So allocating 20 variables takes the same time as 1.

Related

Using struct field as loop counter?

Some background to the issue
if I have a struct like
typedef struct {
idx_type type;
union {
char *str;
int num;
} val
} cust_idx;
and I have loops like this
for (i = 0; i < some_get(x); i++) {
some_fun(z, NULL, i);
}
that I want to refactor to use the struct like some_fun(z, idx) where idx is one of my cust_idx structs, would it be best to keep i as the loop counter and update idx or change the for header to use idx.val.num instead of i?
For the purposes of this, assume idx_type is an enum for string and number types, and all other fields will have macros, but I'm only going to use the IDX_NUM macro here as I'm not worried about anything to do with idx.type.
To sum up my concerns:
Will it be readable? I don't want to leave behind a mess that someone will read and just shake their head...
Is it advised against?
Which of these is the best solution?
Struct field as loop counter
#define IDX_NUM(x) (x.val.num)
...
cust_idx j;
j.type = TYPE_num;
for (IDX_NUM(j) = 0; IDX_NUM(j) < some_get(x); IDX_NUM(j)++) {
some_fun(z, j);
}
This does the same as the original, but the using struct field/macro extends and complicates the for loop header in my opinion but it's still fairly understandable.
Modify struct with original counter
cust_idx j;
j.type = TYPE_num;
for (i = 0; i < some_get(x); i++) {
IDX_NUM(j) = i;
some_fun(z, j);
}
This results in the least changes from old code logically, but will end in by far the largest amount of code due to the add assignment lines.
Pointer to struct field
cust_idx j;
int *i = &(j.val.num);
j.type = TYPE_num;
for ((*i) = 0; (*i) < some_get(x); (*i)++) {
some_fun(z, j);
}
I'm not sure how good this would be in the long run, or if it's advised against.
As to readability, I would always prefer separate loop counters.
EDIT: The following in italic is not right in this specific case as C structs by default are passed as value copies over the stack, so passing j to some_fun() in the loop is ok. But I'll leave the caveat here, as it applies to many similar situations, where the struct or array is passed by a pointer value. (aka 'passed by reference').
That is especially true in code like you posted, where you call a function with the structure as an argument inside the loop.
If I don't know what some_fun() does, I can only hope that the struct's member that I use as a loop counter is not modified. And hope is not a strategy.
So, unless there are very hard reasons for doing otherwise, I'd always place readability first. Remember: If you write code that is at the limits of your own syntactic and semantic capabilities, you will have very little fun debugging such code, as debugging is an order of magnitude harder than writing (buggy) code. ;)
Addition: You could look at the disassemblies of all variants. The compiler might do a lot of optimizations here, especially if it can 'see' some_fun().

Is there a downside to keeping static duration variables in arrays for easy processing?

I'm writing a program which uses a number of structures that must be cleaned up/updated/initialized and so on. I'm currently using something like the following code to deal with them:
typedef struct Thing { ... } Thing;
typedef void (*ProcessThing)(Thing * target);
const int ThingCount = 3;
Thing ListOfThings[ThingCount];
Thing * ThingA = ListOfThings[0];
Thing * ThingB = ListOfThings[1];
Thing * ThingC = ListOfThings[2];
int DoStuffToThings(ProcessThing Action){
int i;
for(i = 0; i < ThingCount; i++){
(*Action)(ListOfThings[i]);
}
}
ThingA, ThingB, and ThingC (and so on) are known at compile time, though the exact number/nature of them is changing as I develop the program.
The main benefit of this setup is that I can easily add more ThingN if needed by adding another definition line and incrementing ThingCount, and can easily process the list of things in new ways with more ProcessThing's.
This isn't a technique I often see used in example code, even when it seems like it would be useful. I don't have a lot of practical experience with C, and am concerned that there may be some problem with this method that I am not seeing. Is there such a problem, or am I being overly nervous?

program for finding the minimum value in an array of integersi(in C) won't compile

I'm trying to come up with a program that reads in numbers from the command line, turns the argv array into integers, and then finds the smallest integer in the array of those integers.
Below is my code for this program, can anyone help me out?
#include <stdio.h>
#include <stdlib.h>
int *integerizeArgs(int, char **);
int *findMin(int, int *);
int *integerizeArgs(int argc, char **argv)
{
int i = 0;
int *a = malloc(sizeof(int) * (argc-1));
for(i= 1; i < argc; ++i){
a[i-1] = atoi(argv[1]);
return a;
}
return 0;
}
int *findMin(int itemCount, int *a) {
int i, smallest = a[0];
for (i=0; i < itemCount; i++) {
if(a[i] < smallest) {
smallest = a[i];
return smallest;
}
return 0;
}
return 0;
}
int main(int argc, char **argv){
int *a = integerizeArgs(argc, argv);
int b = findMin(argc, a[0]);
printf("%d", b);
return 0;
}
Write proper code.
You should check if malloc() was successful.
Be careful for typo. argv[1] is not as reasonable as argv[i] here.
Do not use return; when you don't want to return from the function.
Use proper type. Distinguish between "normal" integers and pointers.
Be careful for off-by-one error.
In this case, argc-1 elements are allocated, not argc elements.
You should free whatever you allocated.
Corrected code:
#include <stdio.h>
#include <stdlib.h>
int *integerizeArgs(int, char **);
int findMin(int, int *);
int *integerizeArgs(int argc, char **argv)
{
int i = 0;
int *a = malloc(sizeof(int) * (argc-1));
if (a == NULL){ /* add error check */
perror("malloc");
exit(1);
}
for(i= 1; i < argc; ++i){
a[i-1] = atoi(argv[i]); /* convert each arguments instead of only the first one */
/* don't return when the process is not done */
}
return a; /* return the result */
}
/* use proper return type */
int findMin(int itemCount, int *a) {
int i, smallest = a[0];
for (i=0; i < itemCount; i++) {
if(a[i] < smallest) {
smallest = a[i];
/* don't return when the process is not done */
}
/* don't return when the process is not done */
}
return smallest; /* return the result */
}
int main(int argc, char **argv){
int *a = integerizeArgs(argc, argv);
/* pass the (pointer to) the array instead of the first element of the array (&a[0] is also OK) */
/* pass correct itemCount (there are argc-1 items because the first argument typically is the command) */
int b = findMin(argc - 1, a);
printf("%d", b);
free(a); /* free whatever you allocated */
return 0;
}
There are multiple problems with this code. Such significant misunderstandings would tend to elude to the point that your resources aren't working for you. Have you considered trying other resources?
If you don't yet understand the basics of procedural programming, I recommend learning a different language first. Unfortunately I haven't come across a decent C programming book that teaches both procedural programming in general and C programming. The only books I know of seem to require that you already understand procedural programming. I'll try to help a bit with that with this post.
I can highly recommend K&R 2E, providing you've understood the procedural programming basics first; remember to do the exercises as you encounter them as they're a valuable part of the learning experience.
You seem to be quite confused about the effect upon the flow of execution caused by return, for and if. C is a procedural language, meaning it has a structure similar to a recipe in a cookbook (a procedure), for example.
Okay, that's grossly over-simplified, but if you think of steps like "preheat the oven to 180C (described in page 42)" and "caramelise the onion" as though they are separate procedures then we can establish a use for keywords like return, for and if, so bear with me.
As you turn to page 42 you might notice the procedure is simple but disjoint from the recipe, something like this:
Ensure the oven is empty, and if necessary install an oven thermometer onto the front of one of the racks.
if the oven is a gas oven:
Strike a match.
Kneel in front of the oven, and turn the knob to the appropriate temperature.
Bring the lit match into contact with the gas stream, keeping your fingers well clear of the gas stream at all times.
else turn the knob to the appropriate temperature.
Close the oven.
for the duration beginning when you close the oven and ending when the thermometer reaches the appropriate value, periodically check the thermometer
return to the previous procedure.
Here, the words if and else are clearly meant to ensure that you choose the appropriate path for your oven. The same goes for your computer. You're telling it to choose which one of the paths based on the condition in your code.
The return keyword is used to tell you that the procedure is over, and you should resume the procedure you were using before.
There is a loop embedded into the procedure, too. That could be expressed using for language, i.e. "for the duration beginning when you close the oven and ending when the thermometer reaches the appropriate value, periodically check the thermometer" would loosely translate to something like:
close(oven);
for (actual_temp = check_temp(oven); actual_temp < desired_temp; actual_temp = check_temp(oven)) {
sleep(15 minutes);
}
Recipes are easy to understand because they're written in natural language. Computer programs, however, aren't. Programming languages have idioms that aren't so commonly used in natural language, such as variable scope and memory location, so it's important to use those idioms consistently.
I recommend designing software as though it's meant to fit in with the environment. I can see you've given that some thought by forwarding argc and argv (or its translated equivalent) to each of your functions, but a deeper analysis of the environment is required.
It goes against the grain for a function to perform allocation and expect the caller to perform cleanup for that allocation. If you analyse the example set by the C standard library, the only functions that allocate memory are allocation functions, thread creation and file creation. Everything else lets the caller choose how memory is allocated. Thus, your integerizeArgs function should probably be refactored:
int *integerize_arguments(int *destination, char **source, size_t length) {
for (size_t x = 0; x < length; x++) {
if (sscanf(source[x], "%d", &destination[x]) != 1) {
return NULL;
}
}
return destination;
}
Did you notice how closely this resembles memcpy, for example? Using this pattern the caller can choose what kind of allocation the array should have, which is very nice for maintenance should you decide you don't need malloc.
Also notice how I used size_t for index variables. This is important because it doesn't usually make sense to allow negative numbers for arrays.
As for your findMin function, the best advice I can give there is to think about the procedural language we discussed earlier. I don't think the instructions you're giving your computer are the same instructions you have in your head. If they are, they don't make sense and you might need to run through that procedure a few times by hand to see what's going wrong.
Unless the caller (main, in this case) needs to know the address of the minimum value, findMin doesn't need to return int *; it should return int instead. If, on the other hand, main needs to know where the minimum value is located, then your algorithm needs to be adapted because you're currently not storing that position in your logic.
As covered by MikeCAT, you shouldn't be returning to the previous procedure until you've inspected the entire array. Hence your return smallest; should be below the for loop.
This post is getting quite lengthy, kind of turning into a book of it's own... so I'll wrap up here, and recommend in summary that you purchase a book about programming.

Variable reuse in C

The code I'm looking at is this:
for (i = 0; i < linesToFree; ++i ){
printf("Parsing line[%d]\n", i);
memset( &line, 0x00, 65 );
strcpy( line, lines[i] );
//get Number of words:
int numWords = 0;
tok = strtok(line , " \t");
while (tok != NULL) {
++numWords;
printf("Number of words is: %d\n", numWords);
println(tok);
tok = strtok(NULL, " \t");
}
}
My question centers around the use of numWords. Does the runtime system reuse this variable or does it allocate a new int every time it runs through the for loop? If you're wondering why I'm asking this, I'm a Java programmer by trade who wants to get into HPC and am therefore trying to learn C. Typically I know you want to avoid code like this, so this question is really exploratory.
I'm aware the answer is probably reliant upon the compiler... I'm looking for a deeper explanation than that. Assume the compiler of your choice.
Your conception about how this works in Java might be misinformed - Java doesn't "allocate" a new int every time through a loop like that either. Primitive type variables like int aren't allocated on the Java heap, and the compiler will reuse the same local storage for each loop iteration.
On the other hand, if you call new anything in Java every time through a loop, then yes, a new object will be allocated every time. However, you're not doing that in this case. C also won't allocate anything from the heap unless you call malloc or similar (or in C++, new).
Please note the difference between automatic and dynamic memory allocation. In Java only the latter exists.
This is automatic allocation:
int numWords = 0;
This is dynamic allocation:
int *pNumWords = malloc(sizeof(int));
*pNumWords = 0;
The dynamic allocation in C only happens explicitly (when you call malloc or its derivatives).
In your code, only the value is set to your variable, no new one is allocated.
From a performance standpoint, it's not going to matter. (Variables map to registers or memory locations, so it has to be reused.)
From a logical standpoint, yes, it will be reused because you declared it outside the loop.
From a logical standpoint:
numWords will not be reused in the outer loop because it is declared inside it.
numWords will be reused in the inner loop because it isn't declared inside.
This is what is called "block", "automatic" or "local" scope in C. It is a form of lexical scoping, i.e., a name refers to its local environment. In C, it is top down, meaning that it happens as the file is parsed and compiled and visible only after defined in the program.
When the variable goes out of scope, the lexical name is no longer valid (visible) and the memory may be reused.
The variable is declared in a local scope or a block defined by curly braces { /* block */ }. This defines a whole group of C and C99 idioms, such as:
for(int i=0; i<10; ++i){ // C99 only. int i is local to the loop
// do something with i
} // i goes out of scope here...
There are subtleties, such as:
int x = 5;
int y = x + 10; // this works
int x = y + 10;
int y = 5; // compiler error
and:
int g; // static by default and init to 0
extern int x; // defined and allocated elsewhere - resolved by the linker
int main (int argc, const char * argv[])
{
int j=0; // automatic by default
while (++j<=2) {
int i=1,j=22,k=3; // j from outer scope is lexically redefined
for (int i=0; i<10; i++){
int j=i+10,k=0;
k++; // k will always be 1 when printed below
printf("INNER: i=%i, j=%i, k=%i\n",i,j,k);
}
printf("MIDDLE: i=%i, j=%i, k=%i\n",i,j,k); // prints middle j
}
// printf("i=%i, j=%i, k=%i\n",i,j,k); compiler error
return 0;
}
There are idiosyncrasies:
In K&R C, ANSI C89, and Visual Studio, All variables must be declared at the beginning of the function or compound statement (i.e., before the first statement)
In gcc, Variables may be declared anywhere in the function or compound statement and is only visible from that point on.
In C99 and C++, Loop variables may be declared in for statement and are visible until end of loop body.
In a loop block, the allocation is performed ONCE and the RH assignment (if any) is performed each time.
In the particular example you posted, you enquired about int numWords = 0; and if a new int is allocated each time through the loop. No, there is only one int allocated in a loop block, but the right hand side of the = is executed every time. This can be demonstrated so:
#include <stdio.h>
#include <time.h>
#include <unistd.h>
volatile time_t ti(void){
return time(NULL);
}
void t1(void){
time_t t1;
for(int i=0; i<=10; i++){
time_t t2=ti(); // The allocation once, the assignment every time
sleep(1);
printf("t1=%ld:%p t2=%ld:%p\n",t1,(void *)&t1,t2,(void *)&t2);
}
}
Compile that with any gcc (clang, eclipse, etc) compatible compiler with optimizations off (-O0) or on. The address of t2 will always be the same.
Now compare with a recursive function:
int factorial(int n) {
if(n <= 1)
return 1;
printf("n=%i:%p\n",n,(void *)&n);
return n * factorial(n - 1);
}
The address of n will be different each time because a new automatic n is allocated with each recursive call.
Compare with an iterative version of factorial forced to used a loop-block allocation:
int fac2(int num) {
int r=0; // needed because 'result' goes out of scope
for (unsigned int i=1; i<=num; i++) {
int result=result*i; // only RH is executed after the first time through
r=result;
printf("result=%i:%p\n",result,(void *)&result); // address is always the same
}
return r;
}
In conclusion, you asked about int numWords = 0; inside the for loop. The variable is reused in this example.
The way the code is written, the programmer is relying on the RH of int numWords = 0; after the first to be executed and resetting the variable to 0 for use in the while loop that follows.
The scope of the numWords variable is inside the for loop. Just as Java, you can only use the variable inside the loop, so theoretically its memory would have to be freed on exit - since it is also on the stack in your case.
Any good compiler however would use the same memory and simply re-set the variable to 0 on each iteration.
If you were using a class instead of an int, you would see the destructor being called every time the for loops.
Even consider this:
class A;
A* pA = new A;
delete pA;
pA = new A;
The two objects created here will probably reside at the same memory.
It will be allocated every time through the loop (the compiler can optimize out that allocation)
for (i = 0; i < 100; i++) {
int n = 0;
printf("%d : %p\n", i, (void*)&n);
}
No guarantees all 100 lines will have the same address (though probably they will).
Edit: The C99 Standard, in 6.2.4/5 says: "[the object] lifetime extends from entry into the block with which it is associated until execution of that block ends in any way." and, in 6.8.5/5, it says that the body of a for statement is in fact a block ... so the paragraph 6.2.4/5 applies.

More efficient to use temporary variable or direct from array?

Is it more efficient to access an array each time I use a variable, or to create a temporary variable and set it to the array:
For example:
int A; int B; ...etc... int Z;
int *ints = [1000 ints in here];
for (int i = 0; i < 1000; i++) {
A = ints[i];
B = ints[i];
C = ints[i];
...etc...
Z = ints[i];
}
or
int A; int B; ...etc... int Z;
int *ints = [1000 ints in here];
for (int i = 0; i < 1000; i++) {
int temp = ints[i];
A = temp;
B = temp;
C = temp;
...etc...
Z = temp;
}
Yes, this is not something I want to do, but it is the easiest example I could think of.
So which for loop would be quicker at using the array?
It doesn't matter; the compiler will most likely produce the same code in both cases (unless you have disabled all optimizations). (The generated assembly code will likely resemble the second example - first, the array element will be loaded into a register, and then, the register will be used whenever the array element is needed.) Go with the style you find to be most readable and least prone to errors (which is probably the latter style, which avoids repeating the index).
(This assumes that you don't have any threads or volatile variables, so that the array element is guaranteed not to change in the course of a loop iteration.)
The compiler is smart enough to realize that these are equivalent and will produce the same code. You should therefore write it in the most understandable way for future people reading your code.
As Aasmund's answer states, there is likely no performance difference since the compiler will treat both in the same way. However, you might find assigning to a temporary variable gives improved code readability, and if in the future you want to use ints[i+1] throughout the loop you will only need to change one line rather than many. Never call a variable "temp" though, give it a useful name like currentInt.

Resources