When can I get away with not declaring int with signed? - c

In C, signed integers like -1 are supposedly supposed to be declared with the keyword signed, like so:
signed int i = -1;
However, I tried this:
signed int i = -2;
unsigned int i = -2;
int i = -2;
and all 3 cases print out -2 with printf("%d", i);. Why?

Since you confirmed you are printing using:
printf("%d", i);
this is undefined behavior in the unsigned case. This is covered in the draft C99 standard section 7.19.6.1 The fprintf function which also covers printf for format specifiers, it says in paragraph 9:
If a conversion specification is invalid, the behavior is undefined.248)[...]
The standard defined in section 3.4.3 undefined behavior as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and further notes:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Finally, we can see that int is the same as signed int. We can see this by going to section 6.7.2 Type specifiers, in paragraph 2 it groups int as follows:
int, signed, or signed int
and later on says in paragraph 5 says:
Each of the comma-separated sets designates the same type, except that for bit-field[...]

The way an integer variable is printed, is subjected to the format string that you pass to printf:
If you use %d, then you'll be printing it as a signed integer.
If you use %u, then you'll be printing it as an unsigned integer.

printf has no way of knowing what you pass to it. C compiler does the default type promotions on passing the arguments, and then the function itself reinterprets the values in accordance with the format specifiers that you pass, because it has no other information regarding the type of the value that you passed.
When you pass an unsigned int to printf in a position of %d, it is undefined behavior. Your program is incorrect, and it could print anything.
It happens that on hardware that represent negative numbers in two's complement representation you get the same number that you started with. However, this is not a universal rule.

unsigned int i = -2; // i actually holds 4294967294
printf("%d", i); // printf casts i back to an int which is -2 hence the same output

You've got 2 things going on:
Signed and unsigned are different ways of interpreting the same 64 (or 32, or whatever) bits.
Printf is a variadic function which accepts parameters of different types
You passed a signed value (-2) to an unsigned variable, and then asked printf to interpret it as signed.
Remember that "signed" and unsigned have to do with how arithmetic is done on the numbers.
the printf family accepts internally casts whatever you pass in based on the format designators. (this is the nature of variadic functions that accept more than one type of parameter. They cannot use traditional type safety mechanisms)
This is all very well, but not all things will work the same.
Addition and subtraction work the same on most architectures (as long as you're not on some oddball architecture that doesn't use 2's complement for representing negative numbers
Multiplication and division may also work the same.
Inequality comparisons are the hardest thing to know how they will work, and I have been bit a number of times doing a comparison between signed and unsigned that I thought would be ok, be cause they were in the small signed number range.
Thats what "undefined" means. Behaviour is left to the compiler and hardware implementers and cannot be relied to be the same between architectures or even over time on the same architecture.

Related

Effect of type casting on printf function

Here is a question from my book,
Actually, I don't know what will be the effect on printf function, so I tried the statements in the original system of C lang. Here is my code:
#include <stdio.h>
void main() {
int x = 4;
printf("%hi\n", x);
printf("%hu\n", x);
printf("%i\n", x);
printf("%u\n", x);
printf("%li\n", x);
printf("%lu\n", x);
}
Try it online!
So, the output is very simple. But, is this really the solution to above problem?
There are numerous problems in this question that make it unsuitable for teaching C.
First, to work on this problem at all, we have to assume a non-standard C implementation is used. In standard C, %x is a complete conversion specification, so %xu and %xd cannot be; the conversion specification has already ended before the u or d. And the uses of z in a conversion specification interferes with its standard use for size_t.
Nonetheless, let’s assume this C variant does not have those standard conversion specifications and instead uses the ones shown in the table but that this C variant otherwise conforms to the C standard with minimal changes.
Our next problem is that, in Y num = 42;, we have a plain Y, not the signed Y or unsigned Y shown in the table. Let’s assume signed Y is intended.
Then num is a signed four-bit integer. The greatest value it can represent is 01112 = 710. So it cannot represent 42. Attempting to initialize it with 42 results in a conversion specified by C 2018 6.3.1.3, which says, in part:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The result is we do not know what value is in num or even whether the program continues to execute; it may trap and terminate.
Well, let’s assume this implementation just takes the low bits of the value. 42 is 1010102, so its low four bits are 1010. So if the bits in num are 1010, it is negative. The C standard permits several methods of representation for negative numbers, but we will assume the overwhelmingly most common one, two’s complement, so the bits 1010 in num represent −6.
Now, we get to the printf statements. Except the problem text shows Printf, which is not defined by the C standard. (Are you sure this problem relates to C code at all?) Let’s assume it means printf.
In printf("%xu",num);, if the conversion specification is supposed to work like the ones in standard C, then the corresponding argument should be an unsigned X value that has been promoted to int for the function call. As a two-bit unsigned integer, an unsigned X can represent 0, 1, 2, or 3. Passing it −6 is not defined. So we do not know what the program will print. It might take just the low two bits, 10, and print “2”. Or it might use all the bits and print “-6”. Both of those would be consistent with the requirement that the printf behave as specified for values that are in the range representable by unsigned X.
In printf("%xd",num); and printf("%yu",num);, the same problem exists.
In printf("%yd",num);, we are correctly passing a signed Y value for a signed Y conversion specification, so “-6” is printed.
Then printf("%zu",num); has the same problem with the value mismatched for the type.
Finally, in printf("%zd",num);, the value is again in the correct range, and “-6” is printed.
From all the assumptions we had to make and all the points where the behavior is undefined, you can see this is a terrible exercise. You should question the quality of the book it is in and of any school using it.

Weird results with type conversion of literals and return values in C

Two questions regarding this simple code:
float foo(){
return 128.0;
}
int main(){
char x = (char) 128;
char y = (char) 128.0;
char z = (char) foo();
printf("x: %d \ny: %d \nz: %d", x, y, z);
}
My output:
x: -128
y: 127
z: -128
Why is 128.0 prevented from overflowing when converted to a char and why isn't it when it's the return-value of a function rather than a literal? I'm happy to really get into detail if an adequate answer requires it :)
(I'm using gcc-4.8.1 and don't use any options for compiling, just gcc file.c)
Your C implementation appears to have a signed eight-bit char type. 128 cannot be represented in this type.
In char x = (char) 128;, there is a conversion from the int value 128 to the char type. For this integer-to-integer conversion, C 2018 6.3.1.3 3 says:
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
“Implementation-defined” means a conforming C implementation is required to document how it behaves for this conversion and must abide by that. Your implementation apparently wraps modulo 256, with the result that converting 128 to char produces −128.
In char y = (char) 128.0;, there is a conversion from the double value 128 to the char type. For this floating-point-to-integer conversion, C 2018 6.3.1.4 2 says:
If the value being converted is outside the range of values that can be represented, the behavior is undefined.
“Undefined” means the C standard does not impose any requirements; a C implementation does not have to document its behavior and does not have to abide by any particular rules. In particular, the behavior may be different in different situations.
We see this in that char y = (char) 128.0; and char z = (char) foo(); produced different results (127 and −128). A common reason for this is that, since a constant is used, the compiler may have evaluated (char) 128.0 itself using internal software that clamps out-of-range results to the limits of the type, so the out-of-range 128 resulted in the maximum-possible 127, and y was initialized to that. On the other hand, for char () foo(), the compiler may have generated instructions to perform the conversion at run-time, and those instructions behaved different from the internal compiler evaluation and wrapped modulo 256, producing −128.
Since these behaviors are not specified by the C standard or, likely, your compiler’s documentation, you should not rely on them.
result of y is an Undefined Behaviour
https://godbolt.org/z/3x68ze
As you see different compilers give different results.
z is also an Undefined Behaviour but without optimisations enabled even this very trivial example is evaluated. But if you enable optimizations the result of will the same (because compiler chooses the result).
https://godbolt.org/z/WaPE4r

Why can I printf with the wrong specifier and still get output?

My question involves the memory layout and mechanics behind the C printf() function. Say I have the following code:
#include <stdio.h>
int main()
{
short m_short;
int m_int;
m_int = -5339876;
m_short = m_int;
printf("%x\n", m_int);
printf("%x\n", m_short);
return 0;
}
On GCC 7.5.0 this program outputs:
ffae851c
ffff851c
My question is, where is the ffff actually coming from in the second hex number? If I'm correct, those fs should be outside the bounds of the short, but printf is getting them from somewhere.
When I properly format with specifier %hx, the output is rightly:
ffae851c
851c
As far as I have studied, the compiler simply truncates the top half of the number, as shown in the second output. So in the first output, are the first four fs from the program actually reading into memory that it shouldn't? Or does the C compiler behind-the-scenes still reserve a full integer even for a short, sign-extended, but the high half shall be undefined behavior, if used?
Note: I am performing research, in a real-world application, I would never try to abuse the language.
When a char or short (including signed and unsigned versions) is used as a function argument where there is no specific type (as with the ... arguments to printf(format,...))1, it is automatically promoted to an int (assuming it is not already as wide as an int2).
So printf("%x\n", m_short); has an int argument. What is the value of that argument? In the assignment m_short = m_int;, you attempted to assign it the value −5339876 (represented with bytes 0xffae851c). However, −5339876 will not fit in this 16-bit short. In assignments, a conversion is automatically performed, and, when a conversion of an integer to a signed integer type does not fit, the result is implementation-defined. It appears your implementation, as many do, uses two’s complement and simply takes the low bits of the integer. Thus, it puts the bytes 0x851c in m_short, representing the value −31460.
Recall that this is being promoted back to int for use as the argument to printf. In this case, it fits in an int, so the result is still −31460. In a two’s complement int, that is represented with the bytes 0xffff851c.
Now we know what is being passed to printf: An int with bytes 0xffff851c representing the value −31460. However, you are printing it with %x, which is supposed to receive an unsigned int. With this mismatch, the behavior is not defined by the C standard. However, it is a relatively minor mismatch, and many C implementations let it slide. (GCC and Clang do not warn even with -Wall.)
Let’s suppose your C implementation does not treat printf as a special known function and simply generates code for the call as you have written it, and that you later link this program with a C library. In this case, the compiler must pass the argument according to the specification of the Application Binary Interface (ABI) for your platform. (The ABI specifies, among other things, how arguments are passed to functions.) To conform to the ABI, the C compiler will put the address of the format string in one place and the bits of the int in another, and then it will call printf.
The printf routine will read the format string, see %x, and look for the corresponding argument, which should be an unsigned int. In every C implementation and ABI I know of, an int and an unsigned int are passed in the same place. It may be a processor register or a place on the stack. Let’s say it is in register r13. So the compiler designed your calling routine to put the int with bytes 0xffff851c in r13, and the printf routine looked for an unsigned int in r13 and found bytes 0xffff851c.
So the result is that printf interprets the bytes 0xffff851c as if they were an unsigned int, formats them with %x, and prints “ffff851c”.
Essentially, you got away with this because (a) a short is promoted to an int, which is the same size as the unsigned int that printf was expecting, and (b) most C implementations are not strict about mismatching integer types of the same width with printf. If you had instead tried printing an int using %ld, you might have gotten different results, such as “garbage” bits in the high bits of the printed value. Or you might have a case where the argument you passed is supposed to be in a completely different place from the argument printf expected, so none of the bits are correct. In some architectures, passing arguments incorrectly could corrupt the stack and break the program in a variety of ways.
Footnotes
1 This automatic promotion happens in many other expressions too.
2 There are some technical details regarding these automatic integer promotions that need not concern us at the moment.

Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers.
I was trying to understand the concept of signed and unsigned integers and the found the following:
unsigned int b=-12;
printf("%d\n",b); //prints -12
printf("%u\n\n",b); //prints 4294967284
I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100.
So, since b is unsigned , b technically stores 4294967284.
But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12.
However,
printf("%f\n",2); //prints 0.000000
printf("%f\n",100); //prints 0.000000
printf("%d\n",3.2); //prints 2147483639
printf("%d\n",3.1); //prints 2147483637
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
Why does this not happen and what exactly takes place at machine level ?
Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior.
Remember that 2 is an integer value, and vararg functions (like printf) doesn't really know the types of the arguments. The printf function have to rely on the format specifier to assume the argument is of the specified type.
To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions:
The system uses 32 bits for the int type
The system uses 64 bits for the double type
Now what happens with
printf("%f\n",2); //prints 0.000000
is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. The printf function will then print the (invalid) double value. If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash.
Similarly with
printf("%d\n",3.2); //prints 2147483639
the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. Exactly which 32 bits are copied into the internal int value depends on endianness. Integers don't have trap values so no crashes happens, just an unexpected value will be printed.
what exactly takes place at machine level ?
The stdio.h functions are quite far from the machine level. They provide a standardized abstraction layer on top of various OS API. Whereas "machine level" would refer to the generated assembler. The behavior you experience is mostly related to details of the C language rather than the machine.
On the machine level, there exists no signed numbers, but everything is treated as raw binary data. The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". Specifically, as a two's complement signed number on all common computers. But this is irrelevant when explaining why your code misbehaves.
The integer constant 12 is of type int. When we write -12 we apply the unary - operator on that. The result is still of type int but now of value -12.
Then you attempt to store this negative number in an unsigned int. This triggers an implicit conversion to unsigned int, which should be carried out according to the C standard:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
The maximum value of a 32 bit unsigned int is 2^32 - 1, which equals 4.29*10^9 - 1. "One more than the maximum" gives 4.29*10^9. If we calculate-12 + 4.29*10^9 we get 4294967284. This is in range of an unsigned int and is the result you see later.
Now as it happens, the printf family of functions is very unsafe. If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior.
So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. That's what happened when you used %d.
When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. Because you are still invoking undefined behavior.
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0, is because they are variadic (variable argument) functions. Meaning they can take any number of arguments. In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. The printf implementation is therefore left with no type information but the format string you gave it. This is why variadic functions are so unsafe to use.
Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning.
The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. In particular, at least for your system:
The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other.
The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int; it gets completely unrelated bits that were left lying around by previous operations.
Whenever a function is called, values for its arguments must be placed in certain places. These places vary according to the software and hardware used, and they vary by the type and number of arguments. However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used.
First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments. When printf executes, it examines the format string. When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument.
Now, there are two things that can happen. Say you passed an unsigned but used a format specifier for int, like %d. In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed.
Then printf will interpret those bits as if they encoded the value for an int, and it will print the results. In other words, the bits of your unsigned value are reinterpreted as the bits of an int.1
This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d. When the bits 11111111111111111111111111110100 are interpreted as an unsigned, they represent the value 4,294,967,284. When they are interpreted as an int, they represent the value −12 on your system. (This encoding system is called two’s complement. Other encoding systems include one’s complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.)
That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. When printf goes looking for an int argument for %d, it will not find the bits of your double at all. Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used.
This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. printf has no way to recover from this mistake.
As stated, all of the above neglects compiler optimization. The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. However, once those rules are written, compilers can take advantage of them to allow optimization. The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. This permission allows compilers to speed up programs tremendously in some circumstances. But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything. Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf, the compiler is allowed to produce completely different results. Therefore, although you may commonly see the behaviors I describe above, you may not rely on them.
Footnote
1 Note that this is not a conversion. A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. But they are conceptually different.

output of negative integer to %u format specifier

Consider the following code
char c=125;
c+=10;
printf("%d",c); //outputs -121 which is understood.
printf("%u",c); // outputs 4294967175.
printf("%u",-121); // outputs 4294967175
%d accepts negative numbers therefore output is -121 in first case.
output in case 2 and case 3 is 4294967175. I don't understand why?
Do
232 - 121 = 4294967175
printf interprets data you provide thanks to the % values
%d signed integer, value from -231 to 231-1
%u unsigned integer, value from 0 to 232-1
In binary, both integer values (-121 and 4294967175) are (of course) identical:
`0xFFFFFF87`
See Two's complement
printf is a function with variadic arguments. In such case a "default argument promotions" are applied on arguments before the function is called. In your case, c is first converted from char to int and then sent to printf. The conversion does not depend on the corresponding '%' specifier of the format. The value of this int parameter can be interpreted as 4294967175 or -121 depending on signedness. The corresponding parts in the C standard are:
6.5.2.2 Function call
6 - ... If the expression that denotes the called function has a type that does not include a
prototype, the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument
promotions.
7- If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters, taking the type of each parameter to be the unqualified version
of its declared type. The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.
If char is signed in your compiler (which is the most likely case) and is 8 bits wide (extremely likely,) then c+=10 will overflow it. Overflow of a signed integer results in undefined behavior. This means you can't reason about the results you're getting.
If char is unsigned (not very likely on most PC platforms), then see the other answers.
printf uses something called variadic arguments. If you make a brief research about them you'll find out that the function that uses them does not know the type of the input you're passing to it. Therefore there must be a way to tell the function how it must interpret the input, and you're doing it with the format specifiers.
In your particular case, c is a 8-bit signed integer. Therefore, if you set it to the literal -121 inside it, it will memorize: 10000111. Then, by the integer promotion mechanism you have it converted to an int: 11111111111111111111111110000111.
With "%d" you tell printf to interpret 11111111111111111111111110000111 as a signed integer, therefore you have -121 as output. However, with "%u" you're telling printf that 11111111111111111111111110000111 is an unsigned integer, therefore it will output 4294967175.
EDIT: As stated in the comments, actually the behaviour is undefined in C. That's because you have more than one way to encode negative numbers (sign and modulo, One's complement, ...) and sone other aspects (such as endianness, if I'm not wrong, influences this result). So the result is said to be implementation defined. Therefore you may get a different output rather than 4294967175. But the main concepts I explained for different interpretation of the same string of bits and the lossness of the type of data in variadic arguments still hold.
Try to convert the number into base 10, first as a pure binary number, then knowing that it's memorized in 32-bit Two's complement... you get two different results. But if I do not tell you which intepretation you need to use, that binary string can represent everything (a 4-char ASCII string, a number, a small 8-bit 2x2 image, your safe combination, ...).
EDIT: you can think of "%<format_string>" as a sort of "extension" for that string of bits. You know, when you create a file, you usually give it an extension, which is actually a part of the filename, to remember in which format/encoding that file has been stored. Let's suppose you have your favorite song saved as song.ogg file on your PC. If you rename the file in song.txt, song.odt, song.pdf, song, song.akwardextension, that does not change the content of the file. But if you try to open it with the program usually associated to .txt or .whatever, it reads the bytes in the file, but when it tries to interpret sequences of bytes it may fail (that's why if you open song.ogg with Emacs or VIm or whatever text editor you get sonething that looks like garbage information, if you open it with, for instance, GIMP, GIMP cannot read it, and if you open it with VLC you listen to your favorite song). The extension is just a reminder for you: it reminds you how to interpret that sequence of bits. As printf has no knowledge for that interpretation, you need to provide it one, and if you tell printf that a signed integer is acutally unsigned, well, it's like opening song.ogg with Emacs...

Resources