Memory management and overflow in C - c

I always wonder why C manages the memory the way it does.
Take a look at the following codes:
int main(){
int x = 10000000000;
printf("%d", x);
}
Of course, overflow occurs and it returns the following number:
1410065408
Or:
int main(){
int x = -10;
printf("%u", x);
}
Here x is signed and I am using the unsigned keyword "%u"
Returns:
4294967286
Or take a look at this one:
int main(){
char string_ = 'abc';
printf("%d", string_);
}
This returns:
99
That being said, I mainly have two questions:
Why the program returns these specific numbers for specific inputs? I don't think it is a simple malfunctioning because it produces the same result for the same input. So there is a deterministic way for it to calculate these numbers. What is going under the hood when I pass these obviously invalid numbers?
Most of these problems occur because C is not a memory-safe language. Wikipedia says:
In general, memory safety can be safely assured using tracing garbage
collection and the insertion of runtime checks on every memory access
Then besides historical reasons, why are some languages not memory-safe? Is there any advantage of not being memory-safe rather than being memory-safe?

Of course, overflow occurs and it returns the following number:
There is no overflow in int x = 10000000000;. Overflow in the C standard is when the result of an operation is not representable in the type. However, in int x = 10000000000;, 10,000,000,000 is converted to type int, and this conversion is defined to produce an implementation-defined result (that is implicitly representable in int) or an implementation-defined result (C 2018 6.3.1.3 3). So there is no result that is not representable in int.
You did not say which C implementation you are using (particularly the compiler), so we cannot be sure what the implementation defines for this conversion. For a 32-bit int, it is common that an implementation wraps the number modulo 232. The remainder of 10,000,000,000 when divided by 232 is 1,410,065,408, and that matches the result you observed.
4294967286
In this case, you passed an int where printf expected an unsigned int. The C standard does not define the behavior, but a common result is that the bits of an int are reinterpreted as an unsigned int. When two’s complement is used for a 32-bit int value of −10, the bits are FFFFFFF616. When the bits of an unsigned int have that value, they represent 4,294,967,286, and that matches the result you observed.
char string_ = 'abc';
'abc' is a character constant with more than one character. Its value is implementation defined (C 2018 6.4.4.4 10). Again, since you did not tell us which implementation you are using, we cannot be sure what the definition is.
One behavior of such constants is that 'abc' will have the value ('a'*256 + 'b')*256 + 'c'. When ASCII is used, this is (97*256 + 98)*256 + 99 = 6,382,179. Then char string_ = 'abc'; converts this value to char. If char is unsigned and is eight bits, the C standard defines this to wrap modulo 2256 (C 2018 6.3.1.3 2). If it is signed, it is implementation-defined, and a common behavior is to wrap modulo 2256. With either of those two methods, the result is 99, as the remainder of 6,382,179 when divided by 256 is 99, and this matches the result you observed.
Most of these problems occur because C is not a memory-safe language.
None of the above has anything to do with memory safety. None of the constants or the conversions access memory, so they are not affected by memory safety.

Related

Is casting a positive double to unsigned int implementation defined or portable?

I'm trying to cast a double to an unsigned int. The double is guaranteed to be >= 0. Different compilers yield different results, though. Here is the example:
#include <stdio.h>
int main(int argc, char *argv[])
{
double x = 5140528219;
unsigned int y = (unsigned int) x;
printf("%x\n", y);
return 0;
}
Visual C seems to simply kill all bits >= 32 because it converts the double to 0x32663c5b. Gcc, however, seems to clip the whole number to UINT_MAX because the result is 0xffffffff.
Now there are some threads that mention compiler bugs in Visual C when it comes to converting double to unsigned int so I was wondering whether the behaviour I'm seeing here is a bug in Visual C as well or whether the conversion of double to unsigned int is just implementation dependent and thus undefined?
Any ideas?
My Visual C version is quite old (15.00.30729.01 for x64).
C 2018 6.3.1.4 1 says:
When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.
Therefore, if the value 5,140,528,219 is representable in unsigned int, it is the result. Otherwise, the C standard does not define the behavior.
In typical current C implementations, unsigned int is 32 bits or narrower, so it cannot represent numbers greater than 4,294,967,295. In such implementations, the behavior of converting 5,140,528,219 to unsigned int is not defined by the C standard.
Behavior around code like this has been observed to differ when performed at compile time versus execution time. When compiling, if a compiler deduces the value of the operand being converted, it may use its own arithmetic code to compute the result. If it cannot, it may generate an instruction that performs the conversion, and that conversion may generate a different result than the compiler’s built-in arithmetic. Thus, the behavior you observe may vary depending on whether optimization is enabled or not and depending on the context in which the conversion appears.

char data type is not giving negative numbers by adding a positive number to the character which is holding maximum positive value

I am trying to add '1' to a character which is holding maximum positive value it can hold. It is giving 0 as output instead of giving -256.
#include <stdio.h>
int main() {
signed char c = 255;
printf("%d\n", c + 1 );
}
O / P : 0
c + 2 = 1;
c + 3 = 2;
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct? I am testing on Ubuntu.
A signed char is very often 8-bit encoding values [-128...127].
signed char c = 255; is attempting to initialize c to a value outside the signed char range.
It is implementation behavior what happens next. Very commonly 255 is converted "mod" 256 to the value of -1.
signed char c = 255;
printf("%d\n", c ); // -1 expected
printf("%d\n", c + 1 ); // 0 expected
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct?
No. Adding 1 to the maximum int value is undefined behavior. There is no should. It might result in a negative number, might not, might exit code - it is not defined.
Had code been
signed char c = 127;
printf("%d\n", c + 1 );
c + 1 would be 128 and "128\n" would be printed as c + 1 is an int operation with an in range int sum.
There's several implicit conversions to keep track of here:
signed char c = 255; Is a conversion of the constant 255 which has type int, into a smaller signed char. This is "lvalue conversion through assignment" (initialization follows the rules of assignment) where the right operand gets converted to the type of the left.
The actual conversion from a large signed type to a small signed type follows this rule:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
In practice, the very likely conversion to happen on a two's complement computer is that you end up with the signed char having the decimal value equivalent to 0xFF, which is -1.
c + 1 is an operation with two operands of types signed char and int respectively. For the + operator, it means that the usual arithmetic conversions are performed, see Implicit type promotion rules.
Meaning c gets converted to int and the operation is carried out on int type, which is also the type of the result.
printf("%d\n", stuff ); The functions like printf accepting a variable number of arguments undergo an oddball conversion rule called the default argument promotions. In case of integers, it means that the integer promotions (see link above) are carried out. If you pass c + 1 as parameter, then the type is int and no promotion takes place. But if you had just passed c, then it gets implicitly promoted to int as per these rules. Which is why using %d together with character type actually works, even though it's the wrong conversion specifier for printing characters.
As per my understanding, it should give negative numbers once it reaches the maximum limit (). Is this correct?
If you simply do signed char c = 127; c++; then that's a signed overflow, undefined behavior with no predictable outcome.
If you do signed char c = 127; ... c + 1 then there's no overflow because of the implicit promotion to int.
If you do unsigned char c = 255; c++; then there is a well-defined wrap around since this is an unsigned type. c will become zero. Signed types do not have such a well-defined wrap around - they overflow instead.
In practice, signed number overflow is artificial nonsense invented by the C standard. All well-known computers just set an overflow and/or carry bit when you do overflow on assembler level, properly documented and well-defined by the core manual. The reason it turns "undefined behavior" in C is mainly because C allows for nonsensical signedness formats like one's complement or signed magnitude, that may have padding bits, trap representations or other such exotic, mostly fictional stuff.
Though nowadays, optimizing compilers take advantage of overflow not being allowed to happen, in order to generate more efficient code. Which is unfortunate, since we could have had both fast and 100% deterministic code if 2's complement was the only allowed format.

C is integer math equivalent with unsigned math?

So can i cast the values to unsigned values, do the operation and cast back, and get the same result? I want to do this because unsigned integers can overflow, while signed cant.
Unsigned integer arithmetic does not overflow in C terminology because it is defined to wrap modulo 2N, where N is the number of bits in the unsigned type being operated on, per C 2018 6.2.5 9:
… A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.
For other types, if an overflow occurs, the behavior is not defined by the C standard, per 6.5 5:
If an exceptional condition occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not in the range of representable values for its type), the behavior is undefined. Note that not just the result is undefined; the entire behavior of the program is undefined. It could give a result you do not expect, it could trap, or it could execute entirely different code from what you expect.
Regarding your question:
So can I cast the values to unsigned values, do the operation and cast back, and get the same result?
we have two problems. First, consider a + b given int a, b;. If a + b overflows, then the behavior is not defined by the C standard. So we cannot say whether converting to unsigned, adding, and converting back to int will produce the same result because there is no defined result for a + b to start with.
Second, the conversion back is partly implementation-defined, per C 6.3.1.3. Consider int c = (unsigned) a + (unsigned) b;, which implicitly converts the unsigned sum to an int to store in c. Paragraph 1 tells us that, if the value of the sum is representable in int, it is the result of the conversion. But paragraph 3 tells us what happens if the value is not representable in int:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
GCC, for example, defines the result to be the result of wrapping modulo 2N. So, for int c = (unsigned) a + (unsigned) b;, GCC will produce the same result as int c = a + b; would if a + b wrapped modulo 2N. However, GCC does not guarantee the latter. When optimizing, GCC expects overflow will not occur, which can result in it eliminating any code branches where the program does allow overflow to occur. (GCC may have some options regarding its treatment of overflow.)
Additionally, even if both signed arithmetic and unsigned arithmetic wrap, performing an operation using unsigned values and converting back does not mathematically produce the same result as doing the operation with signed values. For example, consider -3/2. The int result is −1. But if -3 is converted to 32-bit unsigned, the resulting value is 232−3, and then (int) ((unsigned) -3 / (unsigned) 2) is 2−31−2 = 2,147,483,646.

Cyclic Nature of char datatype [duplicate]

This question already has answers here:
Simple Character Interpretation In C
(9 answers)
Closed 5 years ago.
I had been learning C and came across topic called Cyclic Nature of Data Type in C.
It is like example
char c=125;
c=c+10;
printf("%d",c);
The output is -121.
the logic given was
125+1= 126
125+2= 127
125+3=-128
125+4=-127
125+5=-126
125+6=-125
125+7=-124
125+8=-123
125+9=-122
125+10=-121
This is due to cyclic nature if char datatype. Y does Char exhibit cyclic nature?? How is it possible to char??
On your system char is signed char. When a signed integral type overflows, the result is undefined. It may or may not be cyclic. Although on most of the machines performing 2's complement arithmetic, you may observe this as cyclic.
The char data type is signed type, as per your implementation. As such it can store values in range: -128 to 127. When you store a value greater than 127, you would end up with a value that might be in negative or positive number, depending on how large is the value stored and what kind of platform you are working on.
Signed integer overflow is undefined behavior in C and then not at all unsigned numbers are guaranteed to wrap around.
char isn’t special in this regard (besides its implementation-defined signedness), all conversions to signed types usually exhibit this “cyclic nature”. However, there are undefined and implementation-defined aspects of signed overflow, so be careful when doing such things.
What happens here:
In the expression
c=c+10
the operands of + are subject to the usual arithmetic conversions. They include integer promotion, which converts all values to int if all values of their type can be represented as an int. This means, the left operand of + (c) is converted to an int (an int can hold every char value1)). The result of the addition has type int. The assignment implicitly converts this value to a char, which happens to be signed on your platform. An (8-bit) signed char cannot hold the value 135 so it is converted in an implementation-defined way 2). For gcc:
For conversion to a type of width N, the value is reduced modulo 2N to be within range of the type; no signal is raised.
Your char has a width of 8, 28 is 256, and 135 ☰ -121 mod 256 (cf. e.g. 2’s complement on Wikipedia).
You didn’t say which compiler you use, but the behaviour should be the same for all compilers (there aren’t really any non-2’s-complement machines anymore and with 2’s complement, that’s the only reasonable signed conversion definition I can think of).
Note, that this implementation-defined behaviour only applies to conversions, not to overflows in arbitrary expressions, so e.g.
int n = INT_MAX;
n += 1;
is undefined behaviour and used for optimizations by some compilers (e.g. by optimizing such statements out), so such things should definitely be avoided.
A third case (unrelated here, but for sake of completeness) are unsigned integer types: No overflow occurs (there are exceptions, however, e.g. bit-shifting by more than the width of the type), the result is always reduced modulo 2N for a type with precision N.
Related:
A simple C program output is not as expected
Allowing signed integer overflows in C/C++
1 At least for 8-bit chars, signed chars, or ints with higher precision than char, so virtually always.
2 The C standard says (C99 and C11 (n1570) 6.3.1.3 p.3) “[…] either the result is implementation-defined or an implementation-defined signal is raised.” I don’t know of any implementation raising a signal in this case. But it’s probably better not to rely on that conversion without reading the compiler documentation.

Why use "strlen30()" instead of "strlen()"?

I've read and wondered about the source code of sqlite
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Why use strlen30() instead of strlen() (in string.h)??
The commit message that went in with this change states:
[793aaebd8024896c] part of check-in [c872d55493] Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007) (user: drh branch: trunk)
(this is my answer from Why reimplement strlen as loop+subtraction? , but it was closed)
I can't tell you the reason why they had to re-implement it, and why they chose int instead if size_t as the return type. But about the function:
/*
** Compute a string length that is limited to what can be stored in
** lower 30 bits of a 32-bit signed integer.
*/
static int strlen30(const char *z){
const char *z2 = z;
while( *z2 ){ z2++; }
return 0x3fffffff & (int)(z2 - z);
}
Standard References
The standard says in (ISO/IEC 14882:2003(E)) 3.9.1 Fundamental Types, 4.:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer. 41)
...
41): This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer
type
That part of the standard does not define overflow-behaviour for signed integers. If we look at 5. Expressions, 5.:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined, unless such an expression is a constant expression
(5.19), in which case the program is ill-formed. [Note: most existing implementations of C + + ignore integer
overflows. Treatment of division by zero, forming a remainder using a zero divisor, and all floating point
exceptions vary among machines, and is usually adjustable by a library function. ]
So far for overflow.
As for subtracting two pointers to array elements, 5.7 Additive operators, 6.:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as ptrdiff_t in the cstddef header (18.1). [...]
Looking at 18.1:
The contents are the same as the Standard C library header stddef.h
So let's look at the C standard (I only have a copy of C99, though), 7.17 Common Definitions :
The types used for size_t and ptrdiff_t should not have an integer conversion rank
greater than that of signed long int unless the implementation supports objects
large enough to make this necessary.
No further guarantee made about ptrdiff_t. Then, Annex E (still in ISO/IEC 9899:TC2) gives the minimum magnitude for signed long int, but not a maximum:
#define LONG_MAX +2147483647
Now what are the maxima for int, the return type for sqlite - strlen30()? Let's skip the C++ quotation that forwards us to the C-standard once again, and we'll see in C99, Annex E, the minimum maximum for int:
#define INT_MAX +32767
Summary
Usually, ptrdiff_t is not bigger than signed long, which is not smaller than 32bits.
int is just defined to be at least 16bits long.
Therefore, subtracting two pointers may give a result that does not fit into the int of your platform.
We remember from above that for signed types, a result that does not fit yields undefined behaviour.
strlen30 does applies a bitwise or upon the pointer-subtract-result:
| 32 bit |
ptr_diff |10111101111110011110111110011111| // could be even larger
& |00111111111111111111111111111111| // == 3FFFFFFF<sub>16</sub>
----------------------------------
= |00111101111110011110111110011111| // truncated
That prevents undefiend behaviour by truncation of the pointer-subtraction result to a maximum value of 3FFFFFFF16 = 107374182310.
I am not sure about why they chose exactly that value, because on most machines, only the most significant bit tells the signedness. It could have made sense versus the standard to choose the minimum INT_MAX, but 1073741823 is indeed slightly strange without knowing more details (though it of course perfectly does what the comment above their function says: truncate to 30bits and prevent overflow).
The CVS commit message says:
Never use strlen(). Use our own internal sqlite3Strlen30() which is guaranteed to never overflow an integer. Additional explicit casts to avoid nuisance warning messages. (CVS 6007)
I couldn't find any further reference to this commit or explanation how they got an overflow in that place. I believe that it was an error reported by some static code analysis tool.

Resources