I have ran into a problem when I was playing around with the auto parallelization function of the Oracle Solaris Compiler. Let's say I have the following code:
int var = -1;
int i;
for (i = 0; i < 3; i++){
bool flag = false;
// do operations to set the flag
if (flag == true)
var = i;
}
// do other operations with var
when I run this code, the compiler complains that it cannot be parallelized because of unsafe dependences.
Does anyone know what might be wrong here? Is there any way to avoid this but maintain the original functionality of the code?
Any help would be appreciated, thank you!
What the compiler sees is a bunch of loop iterations, all of which can potentially assign to V. If writing to V happens to be atomic, then this will get some random value of i, and one might argue that is OK. Under the assumption that writing to a variable is not atomic, most compilers see this as a data race ... (then what exactly ends up in V?). Thus the complaint.
Related
Say I have a tight loop in C, within which I use the value of a global variable to do some arithmetics, e.g.
double c;
// ... initialize c somehow ...
double f(double*a, int n) {
double sum = 0.0;
int i;
for (i = 0; i < n; i++) {
sum += a[i]*c;
}
return sum;
}
with c the global variable. Is c "read anew from global scope" in each loop iteration? After all, it could've been changed by some other thread executing some other function, right? Hence would the code be faster by taking a local (function stack) copy of c prior to the loop and only use this copy?
double f(double*a, int n) {
double sum = 0.0;
int i;
double c_cp = c;
for (i = 0; i < n; i++) {
sum += a[i]*c_cp;
}
return sum;
}
Though I haven't specified how c is initialized, let's assume it's done in some way such that the value is unknown at compile time. Also, c is really a constant throughout runtime, i.e. I as the programmer knows that its value won't change. Can I let the compiler in on this information, e.g. using static double c in the global scope? Does this change the a[i]*c vs. a[i]*c_cp question?
My own research
Reading e.g. the "Global variables" section of this, it seems clear that taking a local copy of the global variable is the way to go. However, they want to update the value of the global variable, whereas I only ever want to read its value.
Using godbolt I fail to notice any real difference in the assembly for both c vs. c_cp and double c vs. static double c.
Any decently smart compiler will optimize your code so it will behave as your second code snippet. Using static won't change much, but if you want to ensure read on each iteration then use volatile.
Great point there about changes from a different thread. Compiler will maintain integrity of your code as far as single-threaded execution goes. That means that it can reorder your code, skip something, add something -- as long as the end result is still the same.
With multiple threads it is your job to ensure that things still happen in a specific order, not just that the end result is right. The way to ensure that are memory barriers. It's a fun topic to read, but one that is best avoided unless you're an expert.
Once everything translated to machine code, you will get no difference whatsoever. If c is global, any access to c will reference the address of c or most probably, in a tight loop c will be kept in a register, or in the worst case the L1 cache.
On a Linux machine you can easily generate the assembly and examine the resultant code.
You can also run benchmarks.
I tried to setup this code to avoid buffer overflow and I'm not sure why it doesn't work. I'm fairly new to this and help would be appreciated.
I've tried using assert to make sure it ends but i want the assert to succeed
void authenticate (void)
{
char buffer1[8];
int i;
for (i = 0; i < 16; i++)
{
assert (i < sizeof(buffer1));
buffer1[i] = ‘x’;
}
}
expect assert to pass but it fails. Want to fix it without completely rewriting the loop. Thanks!
There seems to be some misunderstanding here on exactly how assert functions. The assert macro performs a runtime check of the given condition. If that condition is false it causes the program to abort.
In this case, the value of i ranges from 0 to 15 inside of the loop. On the iterations where the value of i is less that 8 the assert passes. But once i becomes 8 the assert fails causing the program to abort. The failed assert will not cause the program to for example skip the next loop iteration.
The proper way to handle this is to limit the loop counter to not go out of bounds:
for (i=0; i<sizeof(buf); i++)
The C language by itself doesn't perform bounds checking like some other languages. That's part of what makes it fast. That also means that the language trusts the developer to not do things like read / write out of bounds of an array. Breaking that trust results in undefined behavior. So it's up to you to make sure that doesn't happen.
There are also tools such an valgrind which will help identify mismanagement of memory.
Assert fails as expected. Change counter limit to 8 to pass.
for (i = 0; i < 8; i++)
But perhaps you really want
buf[7]=0;
for (i = 0; i < 8; i++)
I was trying to debug my code in another function when I stumbled upon this "weird" behaviour.
#include <stdio.h>
#define MAX 20
int main(void) {
int matrix[MAX][MAX] = {{0}};
return 0;
}
If I set a breakpoint on the return 0; line and I look at the local variables with Code::Blocks the matrix is not entirely filled with zeros.
The first row is, but the rest of the array contains just random junk.
I know I can do a double for loop to initialize manually everything to zero, but wasn't the C standard supposed to fill this matrix to zero with the {{0}} initializer?
Maybe because it's been a long day and I'm tired, but I could've sworn I knew this.
I've tried to compile with the different standards (with the Code::Blocks bundled gcc compiler): -std=c89, -std=c99, std=c11 but it's the same.
Any ideas of what's wrong? Could you explain it to me?
EDIT:
I'm specifically asking about the {{0}} initializer.
I've always thought it would fill all columns and all rows to zero.
EDIT 2:
I'm bothered specifically with Code::Blocks and its bundled GCC. Other comments say the code works on different platforms. But why wouldn't it work for me? :/
Thanks.
I've figured it out.
Even without any optimization flag on the compiler, the debugger information was just wrong..
So I printed out the values with two for loops and it was initialized correctly, even if the debugger said otherwise (weird).
Thanks however for the comments
Your code should initialize it to zero. In fact, you can just do int matrix[MAX][MAX] = {};, and it will be initialized to 0. However, int matrix[MAX][MAX] = {{1}}; will only set matrix[0][0] to 1, and everything else to 0.
I suspect what you are observing with Code::Blocks is that the debugger (gdb?) is not quite showing you exactly where it is breaking in the code - either that or some other side-effect from the optimizer. To test that theory, add the following loop immediately after the initialization:
``` int i,j;
for (i = 0; i < MAX; i++)
for (j = 0; j < MAX; j++)
printf("matrix[%d][%d] = %d\n", i, j, matrix[i][j]);
```
and see if what it prints is consistent with the output of the debugger.
I am going to guess that what might be happening is that since you are not using matrix the optimizer might have decided to not initialize it. To verify, disassemble your main (disass main in gdb and see if the matrix is actually being initialized.
I'm wondering whether compilers (gcc with -O3 more specifically) can/will optimize out nested struct element dereferences (or not nested even).
For example, is there any point in doing the following code
register int i = 0;
register double multiple = struct1->struct2->element1;
for (i = 0; i < 10000; i++)
result[i] = multiple * -struct1->struct3->element3[i];
instead of
register int i = 0;
for (i = 0; i < 10000; i++)
result[i] = struct1->struct2->element1 * -struct1->struct3->element3[i];
I'm looking for the most optimized, but am not going to go through and bring outside of the loop struct dereferences if a compiler will optimize this out. If it does I think my best option is the following
register int i = 0;
register double* R = &result[0];
register double* amount = &struct1->struct3->element[0];
for (i = 0; i < 10000; i++, R++, amount++)
*R = struct1->struct2->element1 * -*amount;
which eliminates all unnecessary dereferences etc. (I think). Would the 2 deferences to get to element3 be optimized?
Any thoughts?
Thanks
This optimization is known as Loop-invariant code motion. Loop invariants (things that never change inside the loop) are moved outside of the loop, to avoid re-calculating the same thing over and over.
GCC supports it, and is enabled by the -fmove-loop-invariants flag:
-fmove-loop-invariants
Enables the loop invariant motion pass in the new loop optimizer. Enabled at level -O1
Today, compilers are almost always smart enough to do the "right thing" no matter how you formulate your code. Focus on writing the simplest, cleanest, easiest to read (for a human!) code you can. Let the compiler take care of the rest by enabling optimizations. -O2 is commonly used.
Would it be possible to implement an if that checks for -1 and if not negative -1 than assign the value. But without having to call the function twice? or saving the return value to a local variable. I know this is possible in assembly, but is there a c implementation?
int i, x = -10;
if( func1(x) != -1) i = func1(x);
saving the return value to a local variable
In my experience, avoiding local variables is rarely worth the clarity forfeited. Most compilers (most of the time) can often avoid the corresponding load/stores and just use registers for those locals. So don't avoid it, embrace it! The maintainer's sanity that gets preserved just might be your own.
I know this is possible in assembly, but is there a c implementation?
If it turns out your case is one where assembly is actually appropriate, make a declaration in a header file and link against the assembly routine.
Suggestion:
const int x = -10;
const int y = func1(x);
const int i = y != -1
? y
: 0 /* You didn't really want an uninitialized value here, right? */ ;
It depends whether or not func1 generates any side-effects. Consider rand(), or getchar() as examples. Calling these functions twice in a row might result in different return values, because they generate side effects; rand() changes the seed, and getchar() consumes a character from stdin. That is, rand() == rand() will usually1 evaluate to false, and getchar() == getchar() can't be predicted reliably. Supposing func1 were to generate a side-effect, the return value might differ for consecutive calls with the same input, and hence func1(x) == func1(x) might evaluate to false.
If func1 doesn't generate any side-effect, and the output is consistent based solely on the input, then I fail to see why you wouldn't settle with int i = func1(x);, and base logic on whether or not i == -1. Writing the least repetitive code results in greater legibility and maintainability. If you're concerned about the efficiency of this, don't be. Your compiler is most likely smart enough to eliminate dead code, so it'll do a good job at transforming this into something fairly efficient.
1. ... at least in any sane standard library implementation.
int c;
if((c = func1(x)) != -1) i = c;
The best implementation I could think of would be:
int i = 0; // initialize to something
const int x = -10;
const int y = func1(x);
if (y != -1)
{
i = y;
}
The const would let the compiler to any optimizations that it thinks is best (perhaps inline func1). Notice that func is only called once, which is probably best. The const y would also allow y to be kept in a register (which it would need to be anyway in order to perform the if). If you wanted to give more of a suggestion, you could do:
register const int y = func1(x);
However, the compiler is not required to honor your register keyword suggestion, so its probably best to leave it out.
EDIT BASED ON INSPIRATION FROM BRIAN'S ANSWER:
int i = ((func1(x) + 1) ?:0) - 1;
BTW, I probably wouldn't suggest using this, but it does answer the question. This is based on the SO question here. To me, I'm still confused as to the why for the question, it seems like more of a puzzle or job interview question than something that would be encountered in a "real" program? I'd certainly like to hear why this would be needed.