I am learning to code in C and need to get more familiar with overflow and dealing with large numbers. I need help dealing with the below code.
This isn't my desired output as when I do the calculations on my own, the negative numbers are incorrect. I know it has to do with the larger numbers I'm dealing with. How do I go about approaching this problem? I'm not to sure where to start?
Thanks!
int main() {
unsigned A = 1103624256;
unsigned B = 11254;
unsigned X = 1;
unsigned max_unsigned = (long)(UINT_MAX);
X = ((A*X)+B)%max_unsigned;
printf("X1 =\t%d\n", X);
X = ((A*X)+B)% max_unsigned;
printf("X2 =\t%d\n",X);
X = ((A*X)+B)%max_unsigned;
printf("X3 =\t%d\n", X);
X = ((A*X)+B)% max_unsigned;
printf("X4 =\t%d\n",X);
return 0;
}
my output is:
X1 = 1103635510
X2 = 823626102
X3 = -473507466
X4 = -1793402506
Program ended with exit code: 0
Unsigned Int uses quite often 32 bits. That means, the biggest representable number is 4294967296. This is then max_unsigned in your code.
Your first calculation X = ((A*X)+B)%max_unsigned is therefore 1103624256*1+11254%4294967296=1103635510, which is the result, you are seeing.
In your second calculation the expression A*X is then too big to fit into 32 bits.
That means, your code is illegal, because you do an illegal calculation. This is not covered by the C++ language standard. For example see here: https://en.cppreference.com/w/cpp/language/operator_arithmetic :
When signed integer arithmetic operation overflows (the result does not fit in the result type), the behavior is undefined.
That really means, that anything can happen. The program could behave on Mondays as you expect and on Tuesdays not. The program may crash. Your computer may crash. Your result may be incorrect. The behaviour is just not part of the language specification for C. And therefore the answer to your question must end here. That is the reason why for two years there is not even one answer to the question.
It may be the case that on certain platforms and certain compilers it is possible to predict the behaviour. And you may want to look at the Assembler code to see what happens. But all this is then outside the scope of the C language and therefore not of general interest. C is not a specification language for creating Assembler code. C has rules that must be followed. Otherwise you are not in the scope of C.
It would also not be very professional to speculate more, because in a professional environment you are normally paid to follow rules. And that includes the rules of the programming language.
If you are really interested to see what happens, you have to show the Assembler code and make it a question about Assembler programming. But there is no guarantee that your compiler always produces the same Assembler code for such illegal things. The compiler only guarantees you a predictable result if you follow the rules of the programming language and have no overflows.
Related
I have the following C program:
#include <stdio.h>
#include <math.h>
#define LOG2(x) ((int)( log((double)(x)) / log(2) ))
int main() {
int num = 64;
int val1 = LOG2(num);
int val2 = LOG2(64);
printf("val1: %d, val2 %d\n", val1, val2);
return 0;
}
Which outputs:
val1: 5, val2: 6
Why does this macro produce a different (and wrong) answer when I use it with a variable, but works correctly when I just type 64 directly?
Regardless of whether or not this is actually a good way to get the log base 2, what is causing this behavior? Is there any way I can get this macro to work properly with variables? (all my inputs will be exact powers of 2)
This is, mathematically, a fine way of computing base-2 logs, but since log(val) and log(2) are both going to be long, messy fractions, it's not unlikely that the result of the division will end up being 5.999, which will truncate down to 5. I recommend rounding, especially if you know the inputs will always be powers of 2.
(But why did you get different answers for constant vs. variable? That's a good question, and I'm not sure of the answer. Usually the answer is that when there are constants involved the compiler is able to perform some/all of the calculation at compile time, but often the compiler ends up using subtly or significantly different floating-point arithmetic than the run-time environment. But I wouldn't have thought the compiler would be interpreting functions like log() while doing compile-time constant folding.)
Also, some C libraries have a log2() function which should give you perfect answers, but I don't know how standard that function is.
Apple Secure Coding Guide says the following (page 27):
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped.
However, regards to signed integer overflow C standard (89) says:
An example of undefined behavior is the behavior on integer overflow.
and
If an exception occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined.
Is the Coding Guide wrong? Is there something here that I don't get? I am not convinced myself that Apple Secure Coding Guide could get this wrong.
Here is a second opinion, from a static analyzer described as detecting undefined behavior:
int x;
int main(){
x = 0x7fffffff + 1;
}
The analyzer is run so:
$ frama-c -val -machdep x86_32 t.c
And it produces:
[kernel] preprocessing with "gcc -C -E -I. t.c"
[value] Analyzing a complete application starting at main
...
t.c:4:[kernel] warning: signed overflow. assert 0x7fffffff+1 ≤ 2147483647;
...
[value] Values at end of function main:
NON TERMINATING FUNCTION
This means that the program t.c contains undefined behavior, and that no execution of it ever terminates without causing undefined behavior.
Let's take this example:
1 << 32
If we assume 32-bit int, C clearly says it is undefined behavior. Period.
But any implementation can define this undefined behavior.
gcc for example says (while not very explicit in defining the behavior):
GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change.
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
I don't know for clang but I suspect that as for gcc, the evaluation of an expression like 1 << 32 would give no surprise (that is, evaluate to 0).
But even if it is defined on implementations running in Apple operating systems, a portable program should not make use of expressions that invoke undefined behavior in the C language.
EDIT: I thought the Apple sentence was dealing only with bitwise << operator. It looks like it's more general and in that case for C language, they are utterly wrong.
The two statements are not mutually incompatible.
The standard does not define what behaviour each implementation is required to provide (so different implementations can do different things and still be standard conformant).
Apple is allowed to define the behaviour of its implementation.
You as a programmer would be well advised to treat the behaviour as undefined since your code may need to be moved to other platforms where the behaviour is different, and perhaps because Apple could, in theory, change its mind in the future and still conform to the standard.
Consider the code
void test(int mode)
{
int32_t a = 0x12345678;
int32_t b = mode ? a*0x10000 : a*0x10000LL;
return b;
}
If this method is invoked with a mode value of zero, the code will compute the long long value 0x0000123456780000 and store it into a. The behavior of this is fully defined by the C standard: if bit 31 of the result is clear, it will lop off all but the bottom 32 bits and store the resulting (positive) integer into a. If bit 31 were set and the result were being stored to a 32-bit int rather than a variable of type int32_t, the implementation would have some latitude, but implementations are only allowed to define int32_t if they would perform such narrowing conversions according to the rules of two's-complement math.
If this method were invoked with a non-zero mode value, then the numerical computation would yield a result outside the range of the temporary expression value, and as such would cause Undefined Behavior. While the rules dictate what should happen if a calculation performed on a longer type is stored into a shorter one, they do not indicate what should happen if calculations don't fit in the type with which they are performed. A rather nasty gap in the standard (which should IMHO be plugged) occurs with:
uint16_t multiply(uint16_t x, uint16_t y)
{
return x*y;
}
For all combinations of x and y values where the Standard says anything about what this function should do, the Standard requires that it compute and return the product mod 65536. If the Standard were to mandate that for all combinations of x and y values 0-65535 this method must return the arithmetical value of (x*y) mod 65536, it would be mandating behavior with which 99.99% of standards-compliant compilers would already be in conformance. Unfortunately, on machines where int is 32 bits, the Standard presently imposes no requirements with regard to this function's behavior in cases where the arithmetical product would be larger than 2147483647. Even though any portion of the intermediate result beyond the bottom 16 bits will ignored, the code will try to evaluate the result using a 32-bit signed integer type; the Standard imposes no requirements on what should happen if a compiler recognizes that the product will overflow that type.
I have the following code
void Fun2()
{
if(X<=A)
X=ceil(M*1.0/A*X);
else
X=M*1.0/(M-A)*(M-X);
}
I want to program it in fast manner using C99, take into account the following comments.
Xand A, are 32 bit variables and I declare them as uint64_t, While M as static const uint64_t.
This function is called by another function and the value of A are changed to a new value every n times of calling.
The optimization is needed in the execution time, CPU is Core i3, OS is windows 7
The math model I want to implement it is
F=ceil(Max/A*X) if x<=A
F=floor(M/(M-A)*(M-X)) if x>A
For clarity and no confusion My previous post was
I have the following code
void Fun2()
{
if(X0<=A)
X0=ceil(Max1*X0);
else
X0=Max2*(Max-X0);
}
I want to program it in fast manner using C99, take into account the following comments.
X0, A, Max1, and Max2 are 32 bit variable and I declare them as uint64_t, While Max as static const uint64_t.
This function is called by another function and the values of Max1, A, Max2 are changed to random values every n times of calling.
I work in Windows 7 and in codeblocks software
Thanks
It is completely pointless and impossible to optimize code like this without a specific target in mind. In order to do so, you need the following knowledge:
Which CPU is used.
Which OS is used (if any).
In-depth knowledge of the above, to the point where you know more, or about as much of the system as the people who wrote the optimizer for the given compiler port.
What kind of optimization that is most important: execution speed, RAM usage or program size.
The only kind of optimization you can do without knowing the above is on the algorithm level. There are no such algorithms in the code posted.
Thus your question cannot be answered by anyone until more information is provided.
If "fast manner" means fast execution, your first change is to declare this function as an inline one, a feature of C99.
inline void Fun2()
{
...
...
}
I recall that GNU CC has some interesting macros that may help optimizing this code as well. I don't think this is C99 compliant but it is always interesting to note. I mean: your function has an if statement. If you can know by advance what probability has each branch of being taken, you can do things like:
if (likely(X0<=A)).....
If it's probable that X0 is less or equal than A. Or:
if (unlikely(X0<=A)).....
If it's not probable that X0 is less or equal than A.
With that information, the compiler will optimize the comparison and jump so the most probable branch will be executed with no jumps, so it will be executed faster in architectures with no branch prediction.
Another thing that may improve speed is to use the ?: ternary operator, as both branches assign a value to the same variable, something like this:
inline void Func2()
{
X0 = (X0>=A)? Max1*X0 : Max2*(Max-X0);
}
BTW: why use ceil()? ceil() is for double numbers to round down a decimal number to the nearest non greater integer. If X0 and Max1 are integer numbers, there won't be decimals in the result, so ceil() won't have any effect.
I think one thing that can be improved is not to use floating point. Your code mostly deals with integers, so you want to stick to integer arithmetic.
The only floating point number is Max1. If it's always whole, it can be an integer. If not, you may be able to replace it with two integers: Max1*X0 -> X0 * Max1_nom / Max1_denom. If you calculate the nominator/denominator once, and use many times, this can speed things up.
I'd transform the math model to
Ceil (M*(X-0) / (A-0)) when A<=X
Floor (M*(X-M) / (A-M)) when A>X
with
Ceil (A / B) = Floor((A + (B-1)) / B)
Which substituted to the first gives:
((M * (X - m0) + c ) / ( A - m0))
where
c = A-1; m0 = 0, when A <= X
c = 0; m0 = M, when A >= X
Everything will be performed in integer arithmetic, but it'll be quite tough to calculate the reciprocals in advance;
It may still be possible to use some form of DDA to avoid calculating the division between iterations.
Using the temporary constants c, m0 is simply for unifying the pipeline for both branches as the next step is in pursuit of parallelism.
I'm working on a homework assignment and am probably psyching myself out about this thing a little too much, so I am just seeking some input. Here's the basic code:
for(x = 100; x > 0; x = x + x) {
sum = sum + x;
There are two versions: one where x is a float and one where it is an int. The question is are these infinite loops.
I am thinking that when x is an int, it will eventually overflow, making it less than zero and the loop will stop. When x is a float, x will reach infinity and the loop will be infinite.
Am I close?
The behavior when a signed integer is increased beyond its limit is undefined. So the loop may end or it may be infinite. Or it may crash (or the loop may never run at all). Or as some C gurus like to say, demons may fly out of your nose - though personally I doubt any compiler implementor would go through the trouble of implementing nasal demon functionality.
As far as floating point values are concerned, you are correct that it will be an infinite loop.
When signed integer overflows, the behavior is undefined. Expecting that x will become negative is naive at best.
Some compilers (like GCC) actually implement so called strict value semantics, which means that the compiler takes advantage of that undefined behavior for optimization purposes. In your specific example, the compiler might immediately generate a straightforward infinite loop, i.e. a loop that doesn't have any termination condition at all.
You are indeed correct, integers will overflow to negative values (as long as they're signed) so the loop will end, and floats will stick to "+infinity" which is always greater than any number except NaN.
Edit: I stand corrected, the int version does loop infinitely (on some compilers due to their assumptions): http://ideone.com/HZkht
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
An example of unspecified behavior in the C language is the order of evaluation of arguments to a function. It might be left to right or right to left, you just don't know. This would affect how foo(c++, c) or foo(++c, c) gets evaluated.
What other unspecified behavior is there that can surprise the unaware programmer?
A language lawyer question. Hmkay.
My personal top3:
violating the strict aliasing rule
violating the strict aliasing rule
violating the strict aliasing rule
:-)
Edit Here is a little example that does it wrong twice:
(assume 32 bit ints and little endian)
float funky_float_abs (float a)
{
unsigned int temp = *(unsigned int *)&a;
temp &= 0x7fffffff;
return *(float *)&temp;
}
That code tries to get the absolute value of a float by bit-twiddling with the sign bit directly in the representation of a float.
However, the result of creating a pointer to an object by casting from one type to another is not valid C. The compiler may assume that pointers to different types don't point to the same chunk of memory. This is true for all kind of pointers except void* and char* (sign-ness does not matter).
In the case above I do that twice. Once to get an int-alias for the float a, and once to convert the value back to float.
There are three valid ways to do the same.
Use a char or void pointer during the cast. These always alias to anything, so they are safe.
float funky_float_abs (float a)
{
float temp_float = a;
// valid, because it's a char pointer. These are special.
unsigned char * temp = (unsigned char *)&temp_float;
temp[3] &= 0x7f;
return temp_float;
}
Use memcopy. Memcpy takes void pointers, so it will force aliasing as well.
float funky_float_abs (float a)
{
int i;
float result;
memcpy (&i, &a, sizeof (int));
i &= 0x7fffffff;
memcpy (&result, &i, sizeof (int));
return result;
}
The third valid way: use unions. This is explicitly not undefined since C99:
float funky_float_abs (float a)
{
union
{
unsigned int i;
float f;
} cast_helper;
cast_helper.f = a;
cast_helper.i &= 0x7fffffff;
return cast_helper.f;
}
My personal favourite undefined behaviour is that if a non-empty source file doesn't end in a newline, behaviour is undefined.
I suspect it's true though that no compiler I will ever see has treated a source file differently according to whether or not it is newline terminated, other than to emit a warning. So it's not really something that will surprise unaware programmers, other than that they might be surprised by the warning.
So for genuine portability issues (which mostly are implementation-dependent rather than unspecified or undefined, but I think that falls into the spirit of the question):
char is not necessarily (un)signed.
int can be any size from 16 bits.
floats are not necessarily IEEE-formatted or conformant.
integer types are not necessarily two's complement, and integer arithmetic overflow causes undefined behaviour (modern hardware won't crash, but some compiler optimizations will result in behavior different from wraparound even though that's what the hardware does. For example if (x+1 < x) may be optimized as always false when x has signed type: see -fstrict-overflow option in GCC).
"/", "." and ".." in a #include have no defined meaning and can be treated differently by different compilers (this does actually vary, and if it goes wrong it will ruin your day).
Really serious ones that can be surprising even on the platform you developed on, because behaviour is only partially undefined / unspecified:
POSIX threading and the ANSI memory model. Concurrent access to memory is not as well defined as novices think. volatile doesn't do what novices think. Order of memory accesses is not as well defined as novices think. Accesses can be moved across memory barriers in certain directions. Memory cache coherency is not required.
Profiling code is not as easy as you think. If your test loop has no effect, the compiler can remove part or all of it. inline has no defined effect.
And, as I think Nils mentioned in passing:
VIOLATING THE STRICT ALIASING RULE.
My favorite is this:
// what does this do?
x = x++;
To answer some comments, it is undefined behaviour according to the standard. Seeing this, the compiler is allowed to do anything up to and including format your hard drive.
See for example this comment here. The point is not that you can see there is a possible reasonable expectation of some behaviour. Because of the C++ standard and the way the sequence points are defined, this line of code is actually undefined behaviour.
For example, if we had x = 1 before the line above, then what would the valid result be afterwards? Someone commented that it should be
x is incremented by 1
so we should see x == 2 afterwards. However this is not actually true, you will find some compilers that have x == 1 afterwards, or maybe even x == 3. You would have to look closely at the generated assembly to see why this might be, but the differences are due to the underlying problem. Essentially, I think this is because the compiler is allowed to evaluate the two assignments statements in any order it likes, so it could do the x++ first, or the x = first.
Dividing something by a pointer to something. Just won't compile for some reason... :-)
result = x/*y;
Another issue I encountered (which is defined, but definitely unexpected).
char is evil.
signed or unsigned depending on what the compiler feels
not mandated as 8 bits
I can't count the number of times I've corrected printf format specifiers to match their argument. Any mismatch is undefined behavior.
No, you must not pass an int (or long) to %x - an unsigned int is required
No, you must not pass an unsigned int to %d - an int is required
No, you must not pass a size_t to %u or %d - use %zu
No, you must not print a pointer with %d or %x - use %p and cast to a void *
I've seen a lot of relatively inexperienced programmers bitten by multi-character constants.
This:
"x"
is a string literal (which is of type char[2] and decays to char* in most contexts).
This:
'x'
is an ordinary character constant (which, for historical reasons, is of type int).
This:
'xy'
is also a perfectly legal character constant, but its value (which is still of type int) is implementation-defined. It's a nearly useless language feature that serves mostly to cause confusion.
A compiler doesn't have to tell you that you're calling a function with the wrong number of parameters/wrong parameter types if the function prototype isn't available.
The clang developers posted some great examples a while back, in a post every C programmer should read. Some interesting ones not mentioned before:
Signed integer overflow - no it's not ok to wrap a signed variable past its max.
Dereferencing a NULL Pointer - yes this is undefined, and might be ignored, see part 2 of the link.
The EE's here just discovered that a>>-2 is a bit fraught.
I nodded and told them it was not natural.
Be sure to always initialize your variables before you use them! When I had just started with C, that caused me a number of headaches.