I have an array in C that I want to address in manner similar to a circular buffer, so for example: a[-1] would return me the last element of the array.
To do that I tried to use modulo arithmetic (obviously), problem is, I'm getting quite weird results when negative numbers are involved:
-1 % 4 = -1
-1 % 4U = 3
So far, so good.
-1 % 4000 = -1
(-1+4000U) % 4000U = 3999
(-1) % 4000U = 3295
Question: The value (3295) does hold for the (a/b)*b + a%b shall equal a, truncated towards zero (for a=-1, b=4000) from C standard (6.5.5#6) so it's not a bug per se, but why is the standard defined this way?! Surely, there must be some logic in this...
How do I have to write a%b to get sensible results for negative a (as (a+b)%b stops working when abs(a)>b)?
Test application:
#include <stdio.h>
int main(int argc, char **argv) {
int i=0;
#define MAX_NUM 4000U
int weird = (i-1)%MAX_NUM;
printf("%i\n", weird);
printf("%i\n", (i-1+MAX_NUM))%MAX_NUM);
printf("a: %i, b: %i, a from equation: %i\n", i-1, MAX_NUM,
((i-1)/MAX_NUM)*MAX_NUM + weird);
return 0;
}
Arithmetic in C always (short of some oddities with bit shift operators) promotes all operands to a common type before performing the operation. Thus:
(-1) % 4000U
is promoted as (assuming 32 bit ints):
0xffffffffu % 4000u
which yields 3295.
If you want to use modular arithmetic for array offsets that could be negative, you first need to abandon using unsigned arithmetic on the offsets. As such, your results will now be in the range -MAX_NUM+1 to MAX_NUM-1, due to C's ugly definition of signed integer division and remainder. If the code is not performance critical, just add if (result<0) result+=MAX_NUM; and be done with it. If you really need to avoid the branch (and you've measured to determine that you need to avoid it) then ask again how to optimize this computation and myself or someone brighter than me on SO will surely be able to help. :-)
As 6.5.3 says, "The usual arithmetic conversions are performed on the operands." In the case of your example:
(-1) % 4000U
that means converting the -1 into an unsigned int. Your -1, therefore, is really being interpreted as 4294967295... for which the remainder is exactly what you're seeing: 3295.
The "Usual arithmetic conversions" are described in 6.3.1.8.
Related
I'm trying to manipulating binary numbers with c. I found a strange thing with the minimum code below. Can anyone tell me what is the difference between "+" and "|" here? Thank you!
char next_byte1 = 0b11111111;
char next_byte2 = 0b11110101;
short a = (next_byte1 << 8) | next_byte2;
short b = (next_byte1 << 8) + next_byte2;
printf("a vs b is %d ~ %d.\n", a, b);
It showed: a vs b is -11 ~ -267, which is 0b11111111 11110101 and 0b11111110 11110101. I'm very confused with this result.
The problem you are seeing is because next_byte2 is sign-extended to a full int before doing the bitwise operation and thus is "corrupting" the high byte.
When doing bit manipulation it's better to use unsigned types (that is actually what unsigned are to be used for). Plain char types can be (and normally are) signed types and thus are better avoided for these uses.
You should never use char for binary/bitwise arithmetic, because it has implementation-defined signedness and might be negative. In general, use stdint.h over the default C types.
In case char is signed, then the value inside it ends up converted to -1 in two's complement during the variable initialization. This happens to next_byte1 and next_byte2 both.
Whenever you use a small integer type inside an expression, it is usually promoted to signed int. So your -1 (0xFF) gets changed to -1 (0xFFFFFFF) before you left shift.
Left shifting a negative operand is undefined behavior, meaning that any kind of bugs may rain all over your program. This happens in this case, so no results are guaranteed.
Apparently in your case, the undefined behavior manifested itself as you ending up with a large negative number with the binary representation 0xFFFFFF00.
The difference between | and + is that the latter cares about sign, so in case of + you end up adding negative numbers together, but in case of | the binary representations are simply OR:ed together.
You can fix the program in the following way:
#include <stdio.h>
#include <stdint.h>
int main(void)
{
uint8_t next_byte1 = 0xFF;
uint8_t next_byte2 = 0xF5;
uint16_t a = (next_byte1 << 8) | next_byte2;
uint16_t b = (next_byte1 << 8) + next_byte2;
printf("a vs b is %d ~ %d.\n", a, b);
}
And now | and + work identically, as intended.
Both answers should resolve your problem. I just want to add clarity around '|' and '+' operators.
'|' is the bitwise inclusive OR operator
'+' is the addition operator, which translates into inclusive OR
operator with CARRY propagation for multi-bit numbers.
They are not the same operator and one should not expect the same result in general, although for some cases they might lead to the same result like in your example.
Context
We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:
if(~counter) counter++;
The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.
Problem
We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:
uint8_t i = 0;
int sz;
sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1
sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4
In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):
if(~i) i++;
we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.
We are aware that we can resolve this by casting like so:
if((uint8_t)~i) i++;
or, by
if(i < 0xFF) i++;
however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.
Question
Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.
On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.
Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result
I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.
What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an intor
unsigned int may be used
An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int ,signed int, orunsigned int`.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.
Section 6.5.3.3 describes that integer promotions apply to the ~ operator:
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.
Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.
if (!++counter) counter--;
The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.
in sizeof(i); you request the size of the variable i, so 1
in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4
To use
if(~i)
to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do
if (i != 255)
and you will have a portable and readable code
There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)
To manage any size of unsigned :
if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))
The expression is constant, so computed at compile time.
#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t
Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:
Add one if and only if x is less than the maximum value representable in its type:
x += x < Maximum(x);
See the following item for the definition of Maximum. This method
stands a good chance of being optimized by a compiler to efficient
instructions such as a compare, some form of conditional set or move,
and an add.
Compare to the largest value of the type:
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
(This calculates 2N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)
This can be wrapped in a macro as desired for aesthetics and clarity:
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
Increment x and correct if it wraps to zero, using an if:
if (!++x) --x; // !++x is true if ++x wraps to zero.
Increment x and correct if it wraps to zero, using an expression:
++x; x -= !x;
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
x += 1 - x/Maximum(x);
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.
Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.
If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....
if(~counter) counter++;
or...supply the mask or typecast directly
if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;
At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.
If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night
if(sizeof(uint_8)!=1) return(FAIL);
And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...
Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.
6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
C 2011 Online Draft
The issue is that the operand of ~ is being promoted to int before the operator is applied.
Unfortunately, I don't think there's an easy way out of this. Writing
if ( counter + 1 ) counter++;
won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:
#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;
Consider following program (C99):
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void)
{
printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
intmax_t i;
if (scanf("%jd", &i) == 1)
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}
Now as I understand it, this contains easily triggerable undefined behaviour, like this:
Enter int in range -9223372036854775808 .. 9223372036854775807:
> -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808
Questions:
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)
If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.
7.8.2.1 The imaxabs function
The imaxabs function computes the absolute value of an integer j. If the result cannot
be represented, the behavior is undefined. 221)
221) The absolute value of the most negative number cannot be represented in two’s complement.
The check that makes no assumptions and is always defined is:
intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
//handle error
}
(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.
So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)
uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);
So how does this work?
uintmax_t j = i;
This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)
if (j > (uintmax_t)INTMAX_MAX) {
If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.
j = -j;
The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.
So algebraically the result for negative i looks like
j = - (i + 2n) + 2n = -i
Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.
Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.
When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.
In 6.2.5 I see
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.2 I see
#1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)
#2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.
So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.
Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.
I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.
I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.
It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.
uintmax_t j = i;
if (i < 0) {
j = -j;
if (j == 0) {
printf("your platform sucks\n");
exit(1);
}
}
printf("Result: |%jd| = %ju\n", i, j);
#plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago
Umm,
assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX
uintmax_t j = i;
after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1
if (i < 0) {
i is less than zero so we run the commands inside the if
j = -j;
after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX
which is the correct answer, so no need to trap it in an error case.
On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.
The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).
So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.
Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.
You may want to use some bit hacks:
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.
You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.
Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
according to this http://linux.die.net/man/3/imaxabs
Notes
Trying to take the absolute value of the most negative integer is not defined.
To handle the full range you could add something like this to your code
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string.
Tested with gcc, though printf required %lld as %jd was not a supported format.
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.
That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.
The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.
The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :
r = (i < 0) ? -i : i;
if (r < 0) { // This code may be pointless
// Do overflow recovery
doRecoveryProcessing();
} else {
printf("%jd", r);
}
As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN).
Programs printing tables of abs() ought to avoid INT*_MIN and so on.
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.
Let us have
int a, b, c; // may be char or float, anything actually
c = a + b;
let int type be represented by 4 bytes. Let's say a+b requires 1 bit more than 4 bytes (ie, let's say the result is 1 00....0 (32 zeroes, in binary)). This would result in C=0, and I am sure the computer's microprocessor would set some kind of overflow flag. Is there any built in method to check this in C?
I am actually working on building a number type that is 1024 bits long (for example, int is a built in number type that is 32 bits long). I have attempted this using unsigned char type arrays with 128 elements. I also need to define addition and subtraction operations on these numbers. I have written the code for addition but I am having problem on subtraction. I don't need to worry about getting negative results because the way I will call the subtracting function always ensures that the result of subtraction is always positive, but to implement the subtraction function I need to somehow get the 2's complement of the subtrahend, which is it self my custom 1024 bit number.
I am sorry if it is difficult to understand my description. If needed I will elaborate it more. I am including my code for the adding function and the incomplete subtracting function. the NUM_OF_WORDS is a constant declared as
#define NUM_OF_WORDS 128
Please let me know if you did not understand my question or any part of my code.
PS: I don't see how to upload attachments in this forum so I am directing you to another website. My code may be found there
click on download in this page
Incidentally, I found this
I intend to replace INT_MAX by UCHAR_MAX as my 1024 bit numbers consist of array of char types (8-bit variable)
Is this check sufficient for all cases?
Update:
Yes I am working on Cryptography.
I need to implement a Montgomery Multiplication routine for 1024 bit size integers.
I had also considered using GMP library but couldn't find out how to use it.
I looked up a tutorial and after a few small modifications I was able to build the GMP project file in VC++ 6 which resulted in a lot of .obj files, but now I am not sure what to do with them.
Still it would be good if I can write my own data types, as it will give me complete control over how the arithmetic operations on my custom data type work, and I also need to be able to extend it from 1024 bits to larger numbers in the future.
If you're adding unsigned numbers then you can do this
c = a+b;
if (c<a) {
// you'll get here if and only if overflow has occurred
}
and you may even find that your compiler is clever enough to implement it by checking the overflow or carry flag instead of doing an extra comparison. For instance, I just fed this to gcc -O3 -S:
unsigned int foo() {
unsigned int x=g(), y=h();
unsigned int z = x+y;
return z<0 ? 0 : z;
}
and got this for the key bit of the code:
movl $0, %edx
addl %ebx, %eax
cmovb %edx, %eax
where you'll notice there's no extra comparison instruction.
Contrary to popular belief, an int overflow results in undefined behavior. This means that once a + b overflows, it doesn't make sense to use this value (or do anything else, for that matter). The wrap-around is just what most machines happen to do in case of overflow, but they might as well explode.
To check whether an int overflow will occur when adding two non-negative integers a and b, you can do the following:
if (INT_MAX - b < a) {
/* int overflow when evaluating a+b */
}
This is due to the fact that if a + b > INT_MAX, then INT_MAX - b < a, but INT_MAX - b can not overflow.
You will have to pay special attention to the case where b is negative, which is left as an exercise for the reader ;)
Regarding your actual goal: 1024-bit numbers suffer from exactly the same overall issues as 32-bit numbers. It might be more promising to choose a completely different approach, e.g. representing numbers as, say, linked lists of digits, using a very large base B. Usually, B is chosen such that B = sqrt(INT_MAX), so multiplication of digits doesn't overflow the machine's int type.
This way, you can represent arbitrarily large numbers, where "arbitrary" means "only limited by the amount of main memory available".
If you are working with unisigned numbers, then if a <= UINT_MAX, b <= UINT_MAX, and a + b >= UINT_MAX, then c = (a + b) % UINT_MAX will always be smaller than a and b. And this is the only case where this can happen.
So you can detect overflow this way.
int add_return_overflow(unsigned int a, unsigned int b, unsigned int* c) {
*c = a + b;
return *c < a && *c < b;
}
Information which maybe useful in this subject :
Secure Coding in C and C++
IntSafe library
You can base a solution on a particular feature of the C language. According to the specification, when you add two unsigned ints, "the result value is congruent to the modulo 2^n of the true result" ("C - A reference manual" by Harbison and Steele). This means you can use some simple arithmetic checks to detect overflow:
#include <stdio.h>
int main() {
unsigned int a, b, c;
char *overflow;
a = (unsigned int)-1;
for (b = 0; b < 3; b++) {
c = a + b;
overflow = (a < b) ? "yes" : "no";
printf("%u + %u = %u, %s overflow\n", a, b, c, overflow);
}
return 0;
}
Just xor MSB of both operands and result. Result of this operation is overflow flag.
But that will not show you if result is correct or not. (it might be correct result even whit overflow) for instance 3 + (-1) is 2 whit overflow.
In order to figure that using signed arithmetic you need to check if both operdas were same sign (xor of MSB).
Once you add 1 to INT_MAX, you end up getting INT_MIN (i.e. overflow).
In C, there's no reliable way to test for overflow, because all 32 bytes are used to represent the integer (and not a state flag). You can only test to see if the number you get will be within a valid range, as in your link.
You'll get answers suggesting that you can test if (c < a), however note that you could overflow the value of a and/or b to the point where their addition forms a number greater than a (but still overflown)
I have the following code:
NSInteger index1 = (stop.timeIndex - 1); //This will be -1
index1 = index1 % [stop.schedule count]; // [stop.schedule count] = 33
So I have the expression -1 % 33. This should give me 32, but is instead giving me 3... I've double checked the values in the debugger. Does anyone have any ideas?
In C, the modulus operator doesn't work with negative numbers. (It gives a remainder, rather than doing modular arithmetic as its common name suggests.)
C99 says in Section 6.5.5 Multiplicative operators (bold mine):
The result of the / operator is the quotient from the division of the first operand by the
second; the result of the % operator is the remainder. In both operations, if the value of
the second operand is zero, the behavior is undefined.
When integers are divided, the result of the / operator is the algebraic quotient with any
fractional part discarded. If the quotient a/b is representable, the expression
(a/b)*b + a%b shall equal a.
It says that % is the remainder, and does not use the word "modulus" to describe it. In fact, the word "modulus" only occurs in three places in my copy of C99, and those all relate to the library and not to any operator.
It does not say anything that requires that the remainder be positive. If a positive remainder is required, then rewriting a%b as (a%b + b) % b will work for either sign of a and b and give a positive answer at the expense of an extra addition and division. It may be cheaper to compute it as m=a%b; if (m<0) m+=b; depending on whether missed branches or extra divisions are cheaper in your target architecture.
Edit: I know nothing about Objective-C. Your original question was tagged C and all answers to date reflect the C language, although your example appears to be Objective-C code. I'm assuming that knowing what is true about C is helpful.
The results of using the mod operator on negative numbers are often unexpected. For example, this:
#include <stdio.h>
int main() {
int n = -1 % 33;
printf( "%d\n", n );
}
produces -1 with GCC, but I can't see why you expect the expression to evaluate to 32 - so it goes. It's normally better not to perform such operations, particularly if you want your code to be portable.