#include <stdio.h>
int main()
{
for(int i=0;i<100;i++)
{
int count=0;
printf("%d ",++count);
}
return 0;
}
output of the above program is: 1 1 1 1 1 1..........1
Please take a look at the code above. I declared variable "int count=0" inside the for loop.
With my knowledge, the scope of the variable is within the block, so count variable will be alive up to for loop execution.
"int count=0" is executing 100 times, then it has to create the variable 100 times else it has to give the error (re-declaration of the count variable), but it's not happening like that — what may be the reason?
According to output the variable is initializing with zero every time.
Please help me to find the reason.
Such simple code can be visualised on http://www.pythontutor.com/c.html for easy understanding.
To answer your question, count gets destroyed when it goes outside its scope, that is the closing } of the loop. On next iteration, a variable of the same name is created and initialised to 0, which is used by the printf.
And if counting is your goal, print i instead of count.
The C standard describes the C language using an abstract model of a computer. In this model, count is created each time the body of the loop is executed, and it is destroyed when execution of the body ends. By “created” and “destroyed,” we mean that memory is reserved for it and is released, and that the initialization is performed with the reservation.
The C standard does not require compilers to implement this model slavishly. Most compilers will allocate a fixed amount of stack space when the routine starts, with space for count included in this fixed amount, and then count will use that same space in each iteration. Then, if we look at the assembly code generated, we will not see any reservation or release of memory; the stack will be grown and shrunk only once for the whole routine, not grown and shrunk in each loop iteration.
Thus, the answer is twofold:
In C’s abstract model of computing, a new lifetime of count begins and ends in each loop iteration.
In most actual implementations, memory is reserved just once for count, although implementations may also allocate and release memory in each iteration.
However, even if you know your C implementation allocates stack space just once per routine when it can, you should generally think about programs in the C model in this regard. Consider this:
for (int i = 0; i < 100; ++i)
{
int count = 0;
// Do some things with count.
float x = 0;
// Do some things with x.
}
In this code, the compiler might allocate four bytes of stack space to use for both count and x, to be used for one of them at a time. The routine would grow the stack once, when it starts, including four bytes to use for count and x. In each iteration of the loop, it would use the memory first for count and then for x. This lets us see that the memory is first reserved for count, then released, then reserved for x, then released, and then that repeats in each iteration. The reservations and releases occur conceptually even though there are no instructions to grow and shrink the stack.
Another illuminating example is:
for (int i = 0; i < 100; ++i)
{
extern int baz(void);
int a[baz()], b[baz()];
extern void bar(void *, void *);
bar(a, b);
}
In this case, the compiler cannot reserve memory for a and b when the routine starts because it does not know how much memory it will need. In each iteration, it must call baz to find how much memory is needed for a and how much for b, and then it must allocate stack space (or other memory) for them. Further, since the sizes may vary from iteration to iteration, it is not possible for both a and b to start in the same place in each iteration—one of them must move to make way for the other. So this code lets us see that a new a and a new b must be created in each iteration.
int count=0 is executing 100 times, then it has to create the variable 100 times
No, it defines the variable count once, then assigns it the value 0 100 times.
Defining a variable in C does not involve any particular step or code to "create" it (unlike for example in C++, where simply defining a variable may default-construct it). Variable definitions just associate the name with an "entity" that represents the variable internally, and definitions are tied to the scope where they appear.
Assigning a variable is a statement which gets executed during the normal program flow. It usually has "observable effects", otherwise the compiler is allowed to optimize it out entirely.
OP's example can be rewritten in a completely equivalent form as follows.
for(int i=0;i<100;i++)
{
int count; // definition of variable count - defined once in this {} scope
count=0; // assignment of value 0 to count - executed once per iteration, 100 times total
printf("%d ",++count);
}
Eric has it correct. In much shorter form:
Typically compilers determine at compile time how much memory is needed by a function and the offsets in the stack to those variables. The actual memory allocations occur on each function call and memory release on the function return.
Further, when you have variables nested within {curly braces} once execution leaves that brace set the compiler is free to reuse that memory for other variables in the function. There are two reasons I intentionally do this:
The variables are large but only needed for a short time so why make stacks larger than needed? Especially if you need several large temporary structures or arrays at different times. The smaller the scope the less chance of bugs.
If a variable only has a sane value for a limited amount of time, and would be dangerous or buggy to use out of that scope, add extra curly braces to limit the scope of access so improper use generates immediate compiler errors. Using unique names for each variable, even if the compiler doesn't insist on it, can help the debugger, and your mind, less confused.
Example:
your_function(int a)
{
{ // limit scope of stack_1
int stack_1 = 0;
for ( int ii = 0; ii < a; ++ii ) { // really limit scope of ii
stack_1 += some_calculation(i, a);
}
printf("ii=%d\n", ii); // scope error
printf("stack_1=%d\n", stack_1); // good
} // done with stack_1
{
int limited_scope_1[10000];
do_something(a,limited_scope_1);
}
{
float limited_scope_2[10000];
do_something_else(a,limited_scope_2);
}
}
A compiler given code like:
void do_something(int, int*);
...
for (int i=0; i<100; i++)
{
int const j=(i & 1);
doSomething(i, &j);
}
could legitimately replace it with:
void do_something(int, int*);
...
int const __compiler_generated_0 = 0;
int const __compiler_generated_1 = 1;
for (int i=0; i<100; i+=2)
{
doSomething(i, &compiler_generated_0);
doSomething(i+1, &compiler_generated_1);
}
Although a compiler would typically allocate space on the stack once for j, when the function was entered, and then not reuse the storage during the loop (or even the function), meaning that j would have the same address on every iteration of the loop, there is no requirement that the address remain constant. While there typically wouldn't be an advantage to having the address vary on different iterations, compilers are be allowed to exploit such situations should they arise.
I was reading a book which says to use local variables to eliminate unnecessary memory references. For example, the code below is not very efficient:
int gsum; //global sum variable
void foo(int num) {
for (int i = 0; i < num; i++) {
gsum += i;
}
}
It is more efficient to have the code below:
void foo(int num) {
int fsum;
for (int i = 0; i < num; i++) {
fsum += i;
}
gsum = fsum;
}
I know the second case uses a local variable which is stored in a register. That's why it is a little bit faster while, in the first case, gsum has to be retrieved from main memory too many times.
But I still have questions:
Q1- Isn't the the gcc compiler smart enough to detect it and implicitly use a register to store the global variable so that subsequent references will use the register exactly as the second case?
Q2- If, for some reason, the compiler is not able to optimize, then we still have the cache. Referencing a global variable from the cache is still very fast but I see that some programs which use local variables are 10 times faster than the ones who reference global variables. Why is this?
Q1: That register is probably going to be needed by other functions which will be called between subsequent calls to foo. That means gsum will need to be shuttled in and out of the register whenever this function is called.
Q2: It's possible that the page containing gsum will stay in cache for a while. However, depending on what else your computer is doing, that page may get written to swap space in order to make room in memory for other pages.
Currently I'm learning about parallel programming. I have the following loop that needs to be parallelized.
for(i=0; i<n/2; i++)
a[i] = a[i+1] + a[2*i]
If I run this sequentially there is no problem, but if I want to run this in parallel, there occurs data recurrence. To avoid this I want to store the information to 'read' in a seperate variable e.g b.
So then the code would be:
b = a;
#pragma omp parallel for private(i)
for(i=0; i<n/2; i++)
a[i] = b[i+1] + b[2*i];
But here comes the part I where I begin to doubt. Probably the variable b will point to the same memory location as a. So the second code block will do exactly as the first code block. Including recurrence I'm trying to avoid.
I tried something with * restric {variable}. Unfortunately I can't really find the right documentation.
My question:
Do I avoid data recurrence by writing the code as follows?
int *restrict b;
int *restrict a;
b = a;
#pragma omp parallel for private(i)
for(i=0; i<n/2; i++)
a[i] = b[i+1] + b[2*i];
If not, what is a correct way to achieve this goal?
Thanks,
Ter
In your proposed code:
int *restrict b;
int *restrict a;
b = a;
the assignment of a to b violates the restrict requirement. That requires that a and b do not point to the same memory, yet they clearly do point to the same memory.
It is not safe.
You'd have to.make a separately allocated copy of the array to be safe. You could do that with:
int *b = malloc(n * size of(*b));
…error check…;
memmove(b, a, n *sizeof(*b));
…revised loop using a and b…
free(b);
I always use memmove() because it is always correct, dealing with overlapping copies. In this case, it would be legitimate to use memcpy() because the space allocated for b will be separate from the space for a. The system would be broken if the newly allocated space for b overlaps with a at all, assuming the pointer to a is valid. If there was an overlap, the trouble would be that a was allocated and freed — so a is a dangling pointer pointing to released memory (and should not be being used at all), and b was coincidentally allocated where the old a was previously allocated. On the whole, it's not a problem worth worrying about. (Using memmove() doesn't help if a is a dangling pointer, but it is always safe if given valid pointers, even if the areas of memory overlap.)
I have a function that needs external parameters and afterwards creates variables that are heavily used inside that function. E.g. the code could look like this:
void abc(const int dim);
void abc(const int dim) {
double arr[dim] = { 0.0 };
for (int i = 0; i != dim; ++i)
arr[i] = i;
// heavy usage of the arr
}
int main() {
const int par = 5;
abc(par);
return 0;
}
But I am getting a compiler error, because the allocation on the stack needs compile-time constants. When I tried allocating manually on the stack with _malloca, the time performance of the code worsened (compared to the case when I declare the constant par inside the abc() function). And I don't want the array arr to be on the heap, because it is supposed to contain only small amount of values and it is going to get used quite often inside the function. Is there some way to combine the efficiency while keeping the possibility to pass the size parameter of an array to the function?
EDIT: I am using MSVC compiler and I received an error C2131: expression did not evaluate to a constant in VC 2017.
If you're using a modern C compiler, that implements the entire C99, or the C11 with variable-length array extension, this would work, with one little modification:
void abc(const int dim);
void abc(const int dim) {
double arr[dim];
for (int i = 0; i != dim; ++i)
arr[i] = i;
// heavy usage of the arr
}
int main(void) {
const int par = 5;
abc(par);
return 0;
}
I.e. double arr[dim] would work - it doesn't have a compile-time constant size, but it is enough to know its size at runtime. However, such a VLA cannot be initialized.
Unfortunately MSVC is not a modern C compiler / at MS they don't want to implement the VLA themselves - and I even suspect they're a big part of why the VLA's were made optional in C11, so you'd need to define the array in main then pass a pointer to it to the function abc; or if the size is globally constant, use an actual compile-time constant, i.e. a #define.
However, you're not showing the actual code that you're having performance problems with. It might very well be that the compiler can produce optimized output if it knows the number of iterations - if that is true, then the "globally defined size" might be the only way to get excellent performance.
Unfortunately the Microsoft Compiler does not support variable length arrays.
If the array is not too large you could allocate by the largest possible size needed and pass a pointer to that stack array and a dimension to the function. This approach could help limit the number of allocations.
Another option is to implement a simple heap allocated global pool for functions of this type to use. The pool would allocate a large continuous chunk on the heap and then you can get a pointer to your reservation in the pool. The benefit of this approach is you will not have to worry about over allocation on the stack causing a segmentation fault (which can happen with variable length arrays).
How can i make a parameter vector treated as a local variable in each instance in cuda?
__global__ void kern(char *text, int N){
//if i change text[0]='0'; the change only affects the current instance of the kernel and not the other threads
}
Thanks!
Every thread will receive the same input parameters, so in this case char *text is the same in every thread - that's a fundamental part of the programming model. Since the pointer points to global memory, if one thread changes data through the pointer (i.e. modifies the global memory) then the change affects all threads (ignoring hazards).
This is exactly the same as standard C, except now you have multiple threads accessing through the pointer. In other words, if you modify text[0] inside a standard C function then the changes are visible outside the function.
If I understand correctly, you're asking for every thread to have a local copy of the contents of text. Well the solution is exactly the same as for standard C if you don't want changes visible outside the function:
__global__ void kern(char* text, int N) {
// If you have an upper bound for N...
char localtext[NMAX];
// If you don't know the range of N...
char *localtext;
localtext = malloc(N*sizeof(char));
// Copy from text to localtext
// Since each thread has the same value for i this will
// broadcast from the L1 cache
for (int i = 0 ; i < N ; i++)
localtext[i] = text[i];
//...
}
Note that I'm assuming you have sm_20 or later. Also note that while using malloc in device code is possible, you will pay a performance price.