for(i = 0; i < 5; i++){
int j;
printf("%X\n", &j);
}
How many temporary variables j are created in this loop?
Is j created 5 times, or only once?
Though the address is the same..
5. Although the compiler could certainly optimize that to 1.
Suppose you modified your code slightly:
for (i = 0; i < 5; i++)
{
int j = rand();
printf("%5d (%p)\n", j, (void *)&j);
}
You would see a different value for j in each iteration, making it clear that j is given a value each time through the loop, even if the address of j was the same in each loop. This would demonstrate more clearly that j is initialized on each iteration (and is logically created and destroyed on each iteration).
Well, there are two things to mention here,
Locality of a Variable:
The locality of the variable j is the next '}'. When the program approaches the bracket, the variable j dies.
Temporal Locality:
There is a mechanism in computers that they tend to keep the extensively used variables and prevent them from dying because they might be needed again. So it is very much possible that the computer will in real not generate the variable j 5 times.
Conclusion:
As far as the answer to your question is concerned so you can say 5 as the person who wanted asked you this question wanted you to get an idea about locality of a variable.
Related
edit - -
This code will be run with optimizations off
full transparency this is a homework assignment.
I’m having some trouble figuring out how to optimize this code...
My instructor went over unrolling and splitting but neither seems to greatly reduce the time needed to execute the code. Any help would be appreciated!
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
Assuming you mean same number of additions to sum at runtime (rather than same number of additions in the source code), unrolling could give you something like:
for (j = 0; j + 5 < ARRAY_SIZE; j += 5) {
sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4];
}
for (; j < ARRAY_SIZE; j++) {
sum += array[j];
}
Alternatively, since you're adding the same values each time through the outer loop, you don't need to process it N_TIMES times, just do this:
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
for (j = 0; j < ARRAY_SIZE; j++) {
sum += array[j];
}
sum *= N_TIMES;
break;
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
This requires that the initial value of sum is zero, which is likely but there's actually nothing in your question that mandates this, so I include it as a pre-condition for this method.
Except by cheating*, this inner loop is essentially non-optimizable. Because you must fetch all the array elements and perform all the additions anyway.
The body of the loop performs:
a conditional branch on j;
a fetch of array[j];
the accumulation to a scalar variable;
the incrementation of j.
As said, 2. to 4. are inescapable.Then all you can do is reducing the number of conditional branches by loop unrolling (this turns the conditional branch in an unconditional one, at the expense of the number of iterations becoming fixed).
It is no surprise that you don't see a big difference. Modern processors are "loop aware", meaning that branch prediction is well tuned to such loops so that the cost of the branches is pretty low.
Cheating:
As others said, you can completely bypass the outer loop. This is just exploiting a flaw in the exercise statement.
As optimizations must be turned off, using inline assembly, pragmas, vector instructions or intrinsics should be banned as well (not mentioning automatic parallelization).
There is a possibility to pack two ints in a long long. If the sum doesn't overflow, you will perform two additions at a time. But is this legal ?
One might think of an access pattern that favors cache utilization. But here there is no hope as the array is fully traversed on every loop and there is no possibility of reuse of the values fetched.
First of all, unless you are explicitly compiling with -O0, your compiler has already likely optimized this loop much further than you could possibly expect.
Including unrolling, and on top of unrolling also vectorization and more. Trying to optimize this by hand is something you should never, absolutely never do. At most you will successfully make the code harder to read and understand, while most likely not even being able to match the compiler in terms of performance.
As to why there is no measurable gain? Possibly because you already hit a bottleneck, even with the "non optimized" version. For ARRAY_SIZE greater than your processors cache even the compiler optimized version is already limited by memory bandwidth.
But for completeness, let's just assume you have not hit that bottleneck, and that you actually had turned optimizations almost off (so no more than -O1), and optimize for that.
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
int j;
int tmpSum[4] = {0,0,0,0};
for (j = 0; j < ARRAY_SIZE; j+=4) {
tmpSum[0] += array[j+0];
tmpSum[1] += array[j+1];
tmpSum[2] += array[j+2];
tmpSum[3] += array[j+3];
}
sum += tmpSum[0] + tmpSum[1] + tmpSum[2] + tmpSum[3];
if(ARRAY_SIZE % 4 != 0) {
j -= 4;
for (; j < ARRAY_SIZE; j++) {
sum += array[j];
}
}
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
There is pretty much only one factor left which still could have reduced the performance, for a smaller array.
Not the overhead for the loop, so plain unrolling would had been pointless with a modern processor. Don't even bother, you won't beat the branch prediction.
But the latency between two instructions, until a value written by one instruction may be read again by the next instruction still applies. In this case, sum is constantly written and read all over again, and even if sum is cached in a register, this delay still applies and the processors pipeline had to wait.
The way around that, is to have multiple independent additions going on simultaneously, and finally just combine the results. This is by the way also an optimization which most modern compilers do know how to perform.
On top of that, you could now also express the first loop with vector instructions - once again also something the compiler would have done. At this point you are running into instruction latency again, so you will likely have to introduce one more set of temporaries, so that you now have two independent addition streams each using vector instructions.
Why the requirement of at least -O1? Because otherwise the compiler won't even place tmpSum in a register, or will try to express e.g. array[j+0] as a sequence of instructions for performing the addition first, rather than just using a single instruction for that. Hardly possible to optimize in that case, without using inline assembly directly.
Or if you just feel like (legit) cheating:
const int N_TIMES = 1000;
const int ARRAY_SIZE = 1024;
const int array[1024] = {1};
int sum = 0;
__attribute__((optimize("O3")))
__attribute__((optimize("unroll-loops")))
int fastSum(const int array[]) {
int j;
int tmpSum;
for (j = 0; j < ARRAY_SIZE; j++) {
tmpSum += array[j];
}
return tmpSum;
}
int main() {
int i;
for (i = 0; i < N_TIMES; i++) {
// You can change anything between this comment ...
sum += fastSum(array);
// ... and this one. But your inner loop must do the *same
// number of additions as this one does.
}
return sum;
}
The compiler will then apply pretty much all the optimizations described above.
I want to initialize an 16-cel-long array with 0, 1, 2 and 3 by blocks of four cels. So here is my first attempt at this:
int main(void) {
int t[16];
int i;
for (i = 0; i<=15; t[i++]=i/4)
{
printf("%d \n", t[i]);
}
return 0;
}
However, here is what I get. I know I can do it differently by just getting the affectation into the for loop, but why does this not work?
EDIT: Please do note that the printf only serves to check what the loop did put in the array.
The initialization works fine; you're just printing the cell before initializing it. Remember that the loop increment is done after each iteration. If you unroll the loop, you have:
i = 0; /* i is now 0 */
print(t[i]); /* prints t[0] */
t[i++] = i/4; /* sets t[0] */
/* i is now 1 */
print(t[i]); /* prints t[1] */
t[i++] = i/4; /* sets t[1] */
/* i is now 2 */
print(t[i]); /* prints t[1] */
/* etc. */
As well as the off-by-one errors with the loop begin/end that have been mentioned in other posts, this code:
t[i++]=i/4
causes undefined behaviour because i is read and written without a sequence point. "Undefined behaviour" means anything can happen: the value could be 3, or 4, or anything else, or the program could crash, etc.
See this thread for more in-depth discussion, and welcome to C..:)
I do not understand what you are trying to accomplish, but please let me show you a similar piece of code, first.
int main(void) {
int t[16];
int i;
//edited the code; providing standard way to do the task
for (i = 0; i<=15; i++)
{
t[i]=i/4;
printf("%d \n", t[i]);
}
return 0;
}
EDIT:
The while loop should be written that way:
int i = 0;
while (i<=15){
t[i] = i%4;
i++;
}
Which means set t[i] equal to i%4 and then increment i.
Since you are a beginner, I've updated the for loop and it now provides a standard way to do your task. It's better to have a simple increment on the third for loop command; do the rest of the job inside the for loop, as described above.
#naltipar: Yeah, I just forgot to initialize the first cel, just like grawity pointed out. Actually, the version I wrote for myself was with i++ but even then, since the third expression is executed after each loop, it sent out the same result. But whatever, it is fixed now.
However, I've got another problem which I'm sure I'm missing on but still can't figure it out:
int i = 0;
while (i<=15)
t[++i] = i%4;
This was first:
for(i = 0; i<=15; t[++i] = i%4);
but it resulted with an infinite loop. So in order to make sure that's not a problem specific to for, I switched to while andthe same thing still happens. That being said, it doesn't occur if i replace ++i by i++. I unrolled the whole loop and everything seems just fine...
I'm a beginner, by the way, in case you were wondering.
A clearer way to write this would be much less error-prone:
for (i = 0; i < 16; ++i)
printf ("%d\n", (t[i] = i % 4));
Personally I'd code something that way, but I'd never recommend it. Moreover, I don't really see much benefit in condensing statements like that, especially in the most important category: execution time. It is perhaps more difficult to optimize, so performance could actually degrade when compared to simply using:
for (i = 0; i < 16; ++i)
{
t[i] = i % 4;
printf ("%d\n", t[i]);
}
Even if it is you reading your own code, you make it difficult for your future self to understand. KISS (Keep It Simple, Stupid), and you'll find code is easier to write now and just as easy to modify later if you need to.
I am having little difficulty in figuring out following piece of simple for loop code in C.
int j=20;
for(int i=0, j; i<=j ; i++, j--)
printf("i = %d and j = %d \n",i,j);
Prints output as
i=0 and j=2
i=1 and j=1
Why it does not starts with j=20 and rather prints j=2 and stops after j=1.
But when I use this code
int j=20;
for(int i=0, j=20; i<=j ; i++, j--)
printf("i = %d and j = %d \n",i,j);
It starts properly with
i=0 and j=20 upto ... i=9 and j= 11
Is there something that I am missing ?
You are. Declaring j inside of the for construct creates a new (scoped) j, which has a value different from the outer. If you fail to initialize it, you get whatever crap happened to be in memory when allocated.
Variables like this are called "automatic" variables, and are allocated on the program's stack. As you need one, more stack space is allocated. When they go out of scope (really when the function returns), they are cleaned up by popping them all back off.
When the next bit of automatic storage is needed, the same thing happens and you then get whatever bit pattern happened to be left over on the stack as your new variables value.
Note that in the first portion of the for loop you have done int i = 0, j. What this does is to declare a variable named j which has a scope local to the for loop . Therefore there is no relation between the j declared and defined before the for loop and the one which you declare and define inside the for loop scope. Referring j inside the loop will refer to the one which is the innermost block, therefore taking j initialized to zero you get the first output.
Also note that you are lucky enough that the value of j is zero. It is an automatic variable and is not guaranteed to be zero upon definition.
On the next loop you see the output you want because, as previously the j defined inside the for loop is referred, but as you have initialized the value of j local to the for loop with the same value of the j outside (which has nothing to do with the j inside the for loop), hence you get the second output in your question.
Basically this is the common confusion in for loop syntax. What happening in your case is:
int i=0, j; //create 2 int variables - i (which is initialized to 0) and uninitialized j
This looks similar to for(i,j; i<j; i++, j--). However what you did - is basically created additional uninitialized variable j.
This question is about one-line code.
when you type int i=0, j; in a line, it equals to int i = 0; int j;
however, when you type int i=0, j=20;, it will give you an error unless the j has defined before, it does not equals to int i = 0; int j = 20;
I am using Code::Blocks 10.05, and the GNU GCC Compiler.
Basically, I ran into a really strange (and for me, inexplicable) issue that arises when trying to initialize an array outside it's declared size. In words, it's this:
*There is a declared array of size [x][y].
*There is another declared array with size [y-1].
The issue comes up when trying to put values into this second, size [y-1] array, outside of the [y-1] size. When this is attempted, the first array [x][y] will no longer maintain all of its values. I simply don't understand why breaking (or attempting to break) one array would affect the contents of the other. Here is some sample code to see it happening (it is in the broken format. To see the issue vanish, simply change array2[4] to array2[5] (thus eliminating what I have pinpointed to be the problem).
#include <stdio.h>
int main(void)
{
//Declare the array/indices
char array[10][5];
int array2[4]; //to see it work (and verify the issue), change 4 to 5
int i, j;
//Set up use of an input text file to fill the array
FILE *ifp;
ifp = fopen("input.txt", "r");
//Fill the array
for (i = 0; i <= 9; i++)
{
for (j = 0; j <= 5; j++)
{
fscanf(ifp, "%c", &array[i][j]);
//printf("[%d][%d] = %c\n", i, j, array[i][j]);
}
}
for (j = 4; j >= 0; j--)
{
for (i = 0; i <= 9; i++)
{
printf("[%d][%d] = %c\n", i, j, array[i][j]);
}
//PROBLEM LINE*************
array2[j] = 5;
}
fclose(ifp);
return 0;
}
So does anyone know how or why this happens?
Because when you write outside of an array bounds, C lets you. You're just writing to somewhere else in the program.
C is known as the lowest level high level language. To understand what "low level" means, remember that each of these variables you have created you can think of as living in physical memory. An array of integers of length 16 might occupy 64 bytes if integers are size 4. Perhaps they occupy bytes 100-163 (unlikely but I'm not going to make up realistic numbers, also these are usually better thought of in hexadecimal). What occupies byte 164? Maybe another variable in your program. What happens if you write to one past your array of 16 integers? well, it might write to that byte.
C lets you do this. Why? If you can't think of any answers, then maybe you should switch languages. I'm not being pedantic - if this doesn't benefit you then you might want to program in a language in which it is a little harder for you to make weird mistakes like this. But reasons include:
It's faster and smaller. Adding bounds checking takes time and space, so if you're writing code for a microprocessor, or writing a JIT compiler, speed and size really do matter a lot.
If you want to understand machine architecture and go into hardware, e.g. if you're a student, it's a good gateway from programming into OS/hardware/electrical engineering. And much of computer science.
Being close to machine code, it's standard in a way that many other languages and systems have to, or can easily, support some degree of compatibility with.
Other reasons that I would be able to give if I ever actually had to work this close to the machine code.
The moral is: In C, be very careful. You must check your own array bounds. You must clean up your own memory. If you don't, your program often won't crash but will start just doing really weird things without telling you where or why.
for (j = 0; j <= 5; j++)
should be
for (j = 0; j <= 4; j++)
and array2 max index is 3 so
array2[j] = 5;
is also going to be a problem when j == 4.
C array indexes start from 0. So an [X] array valid indexes are from 0 to X-1, thus you get X elements in total.
You should use the < operator, instead of <=, in order to show the same number in both the array declaration [X] and in the expression < X. For instance
int array[10];
...
for (i=0 ; i < 10 ; ++i) ... // instead of `<= 9`
This is less error prone.
If you're outside the bounds of one array, there's always a possibility you'll be inside the bounds of the other.
array2[j] = 5; - This is your problem of overflow.
for (j = 0; j <= 5; j++) - This is also a problem of overflow. Here also you are trying to access 5th index, where you can access only 0th to 4th index.
In the process memory, while calling each function one activation records will be created to keep all the local variables of the function and also it will have some more memory to store the called function address location also. In your function four local variables are there, array, array2, i and j. All these four will be aligned in an order. So if overflow happens it will first tries to overwrite in the variable declared above or below which depends on architecture. If overflow happens for more bytes then it may corrupt the entire stack itself by overwriting some of the local variables of the called functions. This may leads to crash also, Sometimes it may not but it will behave indifferently as you are facing now.
I have read that use of strlen is more expensive than such testing like this:
We have a string x 100 characters long.
I think that
for (int i = 0; i < strlen(x); i++)
is more expensive than this code:
for (int i = 0; x[i] != '\0'; i++)
Is it true? Maybe the second code will not work in some situation so is it better to use the first?
Will it be better with the below?
for (char *tempptr = x; *tempptr != '\0'; tempptr++)
for (int i=0;i<strlen(x);i++)
This code is calling strlen(x) every iteration. So if x is length 100, strlen(x) will be called 100 times. This is very expensive. Also, strlen(x) is also iterating over x every time in the same way that your for loop does. This makes it O(n^2) complexity.
for (int i=0;x[i]!='\0';i++)
This code calls no functions, so will be much quicker than the previous example. Since it iterates through the loop only once, it is O(n) complexity.
The first code checks length of x in every iteration of i and it takes O(n) to find the last 0, so it takes O(n^2), the second case is O(n)
Yes your second may not work 100% percent of the time but it would be slighty quite. This is because when using strlen() you have to call the method each time. A better way would be like so
int strLength = strlen(x);
for (int i = 0; i < strLength; i++)
Hope this helps.
I can suppose that in first variant you find strlen each iteration, while in second variant you don't do that.
Try this to check:
int a = strlen(x);
for (int i = 0; i < a; i++) {...}
If no there's no compilation optimization. Yes because strlen will iterate on every byte of the string every time and the second implementation will do it only once.
I guess the compiler could optimize (check Gregory Pakosz comment) your first for loop so that you would be like this:
int len = strlen(x);
for (int i = 0; i < len; i++)
Which is still O(n).
Anyway, I think your second for loop will be faster.
It's worth noting that you may be opening yourself up to buffer overflow problems if you don't also test the index against the size of the buffer containing the string. Depending on what you're doing with this code, this may or may not be a practical issue, but it rarely hurts to have extra checks, and it's a good habit to get into.
I'd suggest:
for (i = 0; i < buf_size && x[i] != '\0'; i++)
Where:
buf_size is the predetermined size of the buffer. If x is an array, this will be sizeof(x); if x is a pointer, you should have the buffer size available.
I've removed the declaration of i from the loop because it's only valid c99. If you're only compiling under c99, feel free to put it back in.
Of course the first one will take longer than the second. They don't actually do the same thing at all--the first one is calculating the length of the string each time; the second one is (effectively) doing it only once.
The first version is equivalent to this:
for (int i=0; x[i] != 0; i++)
for (int j=0; j<i; j++)
;
Maybe you meant to say "is it more efficient to call a library strlen() than to implement my own?" The answer to that is, of course, "it depends". Your compiler may have intrinsic versions of strlen() that are optimized beyond what you would expect from source, you may be on a platform on which function calls are expensive (rare nowadays), etc.
The only answer that's both general and correct is "profile and find out".