How are operands promoted - c

I have the following code in C:
int l;
short s;
l = 0xdeadbeef;
s = l;
Assuming int is 32 bits and short is 16 bits, when performing s = l, s will be promoted to 32 bits and after assignment, only lower 16 bits will be kept in s. My question is that when s is promoted to 32 bits, will the additional 16 bits be set to 0x0 or 0xf ?
Source : http://www.phrack.com/issues.html?issue=60&id=10

Actually s is not promoted at all. Since s is signed and l is too large to fit in s, assigning l to s in this case is implementation defined behavior.
6.3.1.3-3
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.

Assembler have operation for moving whole register or part of it (MOV EAX, 0, MOV AX, 0, MOV AL, 0 - respectively 32bits, 16bits, 8bits). As short is 16-bit integer MOV AX, 0 form would be used, although, that depends on compiler implementation.

I assume you're going to promote s to some wider type.
This depends on the destination type: whether it is signed or unsigned. If the destination type is signed there will be signed promotion done. Otherwise -- unsigned promotion. The signed promotion fills higher bits by 0 or 1 depending on the sign of the promoted value.

Related

C bitfield with assigned value 1 shows -1

I played with bit-fields and stuck with some strange thing:
#include <stdio.h>
struct lol {
int a;
int b:1,
c:1,
d:1,
e:1;
char f;
};
int main(void) {
struct lol l = {0};
l.a = 123;
l.c = 1; // -1 ???
l.f = 'A';
printf("%d %d %d %d %d %c\n", l.a, l.b, l.c, l.d, l.e, l.f);
return 0;
}
The output is:
123 0 -1 0 0 A
Somehow the value of l.c is -1. What is the reason?
Sorry if obvious.
Use unsigned bitfields if you don't want sign-extension.
What you're getting is your 1 bit being interpreted as the sign bit in a two's complement representation. In two's complement, the sign-bit is the highest bit and it's interpreted as -(2^(width_of_the_number-1)), in your case -(2^(1-1)) == -(2^0) == -1. Normally all other bits offset this (because they're interpreted as positive) but a 1-bit number doesn't and can't have other bits, so you just get -1.
Take for example 0b10000000 as a as int8_t in two's complement.
(For the record, 0b10000000 == 0x80 && 0x80 == (1<<7)). It's the highest bit so it's interpreted as -(2^7) (==-128)
and there's no positive bits to offset it, so you get printf("%d\n", (int8_t)0x80); /*-128*/
Now if you set all bits on, you get -1, because -128 + (128-1) == -1. This (all bits on == -1) holds true for any width interpreted as in two's complement–even for width 1, where you get -1 + (1-1) == -1`.
When such a signed integer gets extended into a wider width, it undergoes so called sign extension.
Sign extension means that the highest bit gets copied into all the newly added higher bits.
If the highest bit is 0, then it's trivial to see that sign extension doesn't change the value (take for example 0x01 extended into 0x00000001).
When the highest bit is 1 as in (int8_t)0xff (all 8 bits 1), then sign extension copies the sign bit into all the new bits: ((int32_t)(int8_t)0xff == (int32_t)0xffffffff). ((int32_t)(int8_t)0x80 == (int32_t)0xffffff80) might be a better example as it more clearly shows the 1 bits are added at the high end (try _Static_assert-ing either of these).
This doesn't change the value either as long as you assume two's complement, because if you start at:
-(2^n) (value of sign bit) + X (all the positive bits) //^ means exponentiation here
and add one more 1-bit to the highest position, then you get:
-(2^(n+1)) + 2^(n) + X
which is
2*-(2^(n)) + 2^(n) + X == -(2^n) + X //same as original
//inductively, you can add any number of 1 bits
Sign extension normally happens when you width-extend a native signed integer into a native wider width (signed or unsigned), either with casts or implicit conversions.
For the native widths, platforms usually have an instruction for it.
Example:
int32_t signExtend8(int8_t X) { return X; }
Example's dissassembly on x86_64:
signExtend8:
movsx eax, dil //the sx stands for Sign-eXtending
ret
If you want to make it work for nonstandard widths, you can usually utilize the fact that signed-right-shifts normally copy the the sign bit alongside the shifted range (it's really implementation defined what signed right-shifts do)
and so you can unsigned-left-shift into the sign bit and then back to get sign-extension artificially for non-native width such as 2:
#include <stdint.h>
#define MC_signExtendIn32(X,Width) ((int32_t)((uint32_t)(X)<<(32-(Width)))>>(32-(Width)))
_Static_assert( MC_signExtendIn32(3,2 /*width 2*/)==-1,"");
int32_t signExtend2(int8_t X) { return MC_signExtendIn32(X,2); }
Disassembly (x86_64):
signExtend2:
mov eax, edi
sal eax, 30
sar eax, 30
ret
Signed bitfields essentially make the compiler generate (hidden) macros like the above for you:
struct bits2 { int bits2:2; };
int32_t signExtend2_via_bitfield(struct bits2 X) { return X.bits2; }
Disassembly (x86_64) on clang:
signExtend2_via_bitfield: # #signExtend2_via_bitfield
mov eax, edi
shl eax, 30
sar eax, 30
ret
Example code on godbolt: https://godbolt.org/z/qxd5o8 .
Bit-fields are very poorly standardized and they are generally not guaranteed to behave predictably. The standard just vaguely states (6.7.2.1/10):
A bit-field is interpreted as having a signed or unsigned integer type consisting of the
specified number of bits.125)
Where the informative note 125) says:
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int,
then it is implementation-defined whether the bit-field is signed or unsigned.
So we can't know if int b:1 gives a signed type or unsigned type, it's up to the compiler. Your compiler apparently decided that it would be a great idea to have signed bits. So it treats your 1 bit as binary translated into a two's complement 1 bit number, where binary 1 is decimal -1 and zero is zero.
Furthermore, we can't know where b in your code ends up in memory, it could be anywhere and also depends on endianess. What we do know is that you save absolutely no memory from using a bit-field here, since at least 16 bits for an int will get allocated anyway.
General good advise:
Never use bit-fields for any purpose.
Use the bit-wise operators << >> | & ^ ~ and named bit-masks instead, for 100% portable and well-defined code.
Use the stdint.h types or at least unsigned ones whenver dealing with raw binary.
You are using a signed integer, and since the representation of 1 in binary has the very first bit set to 1, in a signed representation that is translated with the existence of negative signedness, so you get -1. As other comments suggest, use the unsigned keyword to remove the possibility to represent negative integers.

Using Type Modifier (signed) Comparisons

This prints "signed comparison" https://onlinegdb.com/eA87wKQkU
#include <stdio.h>
#include <stdint.h>
int main()
{
uint64_t A = -1, B = 1;
if ((signed)A < (signed)B)
{
printf("signed comparison");
}
return 0;
}
To ensure an overall signed comparison, looks like the (signed) type modifier must be applied to A and B.
Is this correct?
Also, I haven't seen any C code using ((signed)A < (signed)B) and was wondering if it's valid C89/99?
Perhaps ((int64_t)A < (int64_t)B) is a better approach?
Thanks.
The answer to both questions is yes:
if you only convert A or B as (signed), which means (signed int), the comparison will still be performed as uint64_t because the converted value will be converted to the larger type uint64_t. Converting both A and B is hence necessary.
converting to int64_t is probably a better idea as this signed type is larger, but it should not matter in this particular example: converting A, whose value is UINT64_MAX to int or int64_t is implementation defined and may or may not produce -1. The C Standard allows for an implementation defined signal to be raised by this out of range conversion.
on most current architectures, no signal will be raised and the conversion of A will indeed produce -1 and the code will print signed comparison. Yet you should end the output with a newline for proper operation.
This is a slightly unusual solution, irrelevant for all practical purposes, but it does have the one advantage of avoiding the following conversion rule from the C standard:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
The idea of this solution is to invert the top bit of the unsigned type in the values being compared before the comparison.
Converting -1 to uint64_t produces the value 0xffffffffffffffff in A.
Converting 1 to uint64_t produces the value 0x0000000000000001 in B.
So the comparison A < B is false.
Denoting A and B with the top bit inverted as Ax and Bx respectively, then Ax has the value 0x7fffffffffffffff and Bx has the value 0x8000000000000001. The comparison Ax < Bx is true.
One way to invert the top bit of a uint64_t value is to add or subtract INT64_MIN and convert the result back to uint64_t. Converting INT64_MIN to uint64_t produces the value 0x8000000000000000 and adding or subtracting that to another uint64_t value will invert the top bit. This also works for any other exact-width unsigned type with the 64 changed to exact width in question.
So the following will do a "signed comparison" of the uint64_t values A and B:
if ((uint64_t)(A - INT64_MIN) < (uint64_t)(B - INT64_MIN))
printf("A signed less than B\n");
Type-casting the adjusted values back to the unsigned integer type as shown above is only necessary for unsigned types whose values can all be represented by int. For example, it is necessary when A and B are of type uint8_t. A would have the value 255, B would have the value 1, Ax would have value 383 (for subtraction) or 127 (for addition), Bx would have the value 129 (for subtraction) or -127 (for addition), and Ax < Bx would be false. The type-cast would convert Ax to 127 and Bx to 129 so the comparison Ax < Bx would be true.

Different results between a 16-bit int machine and a 32-bit int machine in a subtraction

When the code below is run against a 16-bit integer machine like MSP430 micro controller, s32 yields 65446
#include <stdint.h>
uint16_t u16c;
int32_t s32;
int main()
{
u16c = 100U;
s32 = 10 - u16c;
}
My understanding is that 10 - u16c gets implicit type promotion to unsigned int. Mathematically 10 - u16c equals to -90. But how is it possible to represent a negative number as an unsigned int?
When -90 gets promoted to unsigned int, does it mean that the sign of a number is ignored?
Lets suppose, the sign of a number is ignored.
The binary representation of 90 is 00000000 01011010.
When this gets assigned to s32 which is 32-bit wide signed integer variable,
how does the transformation take place?
In order for s32 equal to 65446, 90 has to take 2's complement.
That would be 00000000 10100110.
I am not confident in understand the process of s32 becoming 65446.
In a 32-bit wide integer machine like ARM CORTEX, s32 is -90, which is correct.
To fix this situation in 16-bit integer machine, there needs a typecast of (int16_t) for u16c. How does this remedy this problem?
Added hexa data representation of s32 as shown from IAR Workbench (Lower right corner).
It is shown that s32 becomes 0x0000FFA6.
So for MSP430, the machine implementation of converting from unsigned 16 bit to signed 32 bit, it simply prepends 16 0's bits.
My understanding is that 10-u16c gets implicit type promotion to unsigned int.
This depends upon the representation of the type of 10 (int, as it were). Your understanding is correct for some systems, and we'll cover that first, but as you'll see later on in this answer, you're missing a big part of the picture.
Section 5.2.4, Environmental limits specifies that values of type int can range from -32767 to 32767; this range may be extended at the discretion of implementations, but int values must be able to represent this range.
uint16_t, however, if it exists (it's not required to) has a range from 0 to 65535. Implementations can't extend that; it's a requirement that the range be precisely [0..65535] (hence the reason this type isn't required to exist).
Section 6.3.1.3, Signed and unsigned integers tells us about the conversions to and fro. I couldn't paraphrase it better, so here's a direct quote:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
This all supports your theory that the int value 10 would get converted to a uint16_t if and only if int is a sixteen bit type. However, section 6.3.1.8, usual arithmetic conversion rules should be applied first to decide which of the three above conversions takes place, as these rules change the way you'll look at the conversion when int is greater than sixteen bits:
If both operands have the same type, then no further conversion is needed.
Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
So, as you can see from this, the type of the expression 10-u16c might vary from system to system. On systems where int is sixteen bits, that expression will be a uint16_t.
Mathematically 10-u16c equals to -90. But how is it possible to represent a negative number as an unsigned int. When -90 gets promoted to unsigned int, does it mean that the sign of a number is ignored?
According to Annex H.2.2:
C's unsigned integer types are ''modulo'' in the LIA-1 sense in that overflows or out-of-bounds results silently wrap.
In other words, if 10 gets converted to a uint16_t and the subtraction is performed, the result will be a large number, in this case you can see that number by explicitly converting both operands (i.e. casting them) to a uint16_t. You could see a similar effect by using unsigned integer constants such as -90U. This is largely supported by rule #2 from the quote from 6.3.1.3 earlier.
When this gets assigned to s32 which is 32 bit wide signed integer variable, how does the transformation takes place?
The expression 10-u16c is converted according to rule #1 in 6.3.1.3 (quoted above) to an int32_t value and stored as that value.
To fix this situation in 16bit integer machine, there needs a typecast of (int16_t) for u16c. How does this remedy this problem?
The typecast adds no useful information to this discussion. Perhaps you're using a non-compliant (buggy) compiler. I suspect the manual might shed some light on this, but since I don't know which compiler you're using I can't read it...
100 = 0x0064
0x000A - 0x0064 =
0x000A + 0xFF9B + 1 =
0xFFA6 = 65446.
Note that none of the above is either signed nor unsigned, addition and subtraction are blind to such things. now that the 16 bit math is done it can be promoted to 0xFFFFFF9B with the sign extension. In both cases 0xFF9B and 0xFFFFFF9B the answer is -90 if you interpret those bits as signed, if you interpret those bits as unsigned one is 65446 the other 4294967206.
take this:
#include <stdio.h>
int main()
{
unsigned int ra,rb;
ra=0x00000005;
for(rb=0;rb<10;rb++)
{
ra--;
printf("0x%08X\n",ra);
}
return(0);
}
you get this
0x00000004
0x00000003
0x00000002
0x00000001
0x00000000
0xFFFFFFFF
0xFFFFFFFE
0xFFFFFFFD
0xFFFFFFFC
0xFFFFFFFB
which is exactly what you would expect, you subtract one from all zeros and you get all ones, has nothing to do with signed or unsigned. And subtracting 100 from 10 is like doing that loop 100 times.
Were you expecting to see:
0x00000004
0x00000003
0x00000002
0x00000001
0x00000000
0x00000000
0x00000000
0x00000000
0x00000000
0x00000000
for the above program? would that be accurate? would that make sense? no.
The only curious part of your code is this:
s32 = 0xFFA6;
Now actually the folks in the comments can jump right in, but does the standard say that your u16c gets converted from unsigned (0x0064) to signed (0x0064) or does it remain unsigned and the 10 (0x000A) is considered to be unsigned? Basically do we get 0x000A - 0x0064 = 0xFFA6 as signed math or unsigned (my guess is unsigned since the one thing that is declared in that operation is an unsigned). Then that unsigned bit pattern gets promoted to signed, you take a 16 bit bit pattern and sign extend it to 32 bits 0xFFA6 becomes 0xFFFFFFA6, which is what I get on a desktop linux machine with gcc...
Short answer:
The type of s32 is irrelevant for the calculation.
The integer constant 10 is of type int.
If int is 16 bits, then no integer promotion takes place. The usual arithmetic conversions convert the 10 operand to type uint16_t (unsigned int). The operation is carried out on 16 bit unsigned type.
Unsigned wrap-around gives 10u - 100u = 65445u. = 0xFFA5. This result can fit inside a int32_t. Two's complement does not apply since the types involved in the operation are unsigned.
If int is 32 bits, then the parameter u16c is integer promoted to type int. No further artithmetic conversions take place since both operands are of type int after integer promotion. The operation is carried out on 32 bit signed type, the result is -90.
Portable code should be written either as:
10 - (int32_t)u16c; // signed arithmetic intended
or as
10u - u16c; // unsigned wrap-around intended

How does in assembly does assigning negative number to an unsigned int work?

I Learned About 2's Complement and unsigned and signed int. So I Decided to test my knowledge , as far as i know that a negative number is stored in 2's complement way so that addition and subtraction would not have different algorithm and circuitry would be simple.
Now If I Write
int main()
{
int a = -1 ;
unsigned int b = - 1 ;
printf("%d %u \n %d %u" , a ,a , b, b);
}
Output Comes To Be -1 4294967295 -1 4294967295 . Now , i looked at the bit pattern and various things and then i realized that -1 in 2's complement is 11111111 11111111 11111111 11111111 , so when i interpret it using %d , it gives -1 , but when i interpret using %u , it treat it as a positive number and so it gives 4294967295. I Checked the Assembly Of the code is
.LC0:
.string "%d %u \n %d %u"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], -1
mov DWORD PTR [rbp-8], -1
mov esi, DWORD PTR [rbp-8]
mov ecx, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-4]
mov r8d, esi
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
Now here -1 is moved to the register both the times in unsigned and signed . What i want to know if reinterpretation is only that matters , then why do we have two types unsigned and signed , it is printf format string %d and %u that matters ?
Further what really happens when i assign negative number to a unsigned integer (I learned That The initializer converts this value from int to unsigned int. ) but in the assembly code I did not saw such a thing. So what really happens ??
And How does Machine knows when it has to do 2's complement and when not , does it see the negative sign and performs 2's complement?
I have read almost every question and answer you could think this question be duplicate of , but I could not find a satisfactory solution.
Both signed and unsigned are pieces of memory and according to operations it matters how they behave.
It doesn't make any difference when adding or subtracting because due to 2-complement the operations are exactly the same.
It matters when we compare two numbers: -1 is lower than 0 while 4294967295 obviously isn't.
About conversion - for the same size it simply takes variable content and moves it to another - so 4294967295 becomes -1. For bigger size it's first signed extended and then content is moves.
How does machine now - according the instruction we use. Machines have either different instructions for comparing signed and unsigned or they provide different flags for it (x86 has Carry for unsigned overflow and Overflow for signed overflow).
Additionally, note that C is relaxed how the signed numbers are stored, they don't have to be 2-complements. But nowadays, all common architectures store the signed like this.
There are a few differences between signed and unsigned types:
The behaviors of the operators <, <=, >, >=, /, %, and >> are all different when dealing with signed and unsigned numbers.
Compilers are not required to behave predictably if any computation on a signed value exceeds the range of its type. Even when using operators which would behave identically with signed and unsigned values in all defined cases, some compilers will behave in "interesting" fashion. For example, a compiler given x+1 > y could replace it with x>=y if x is signed, but not if x is unsigned.
As a more interesting example, on a system where "short" is 16 bits and "int" is 32 bits, a compiler given the function:
unsigned mul(unsigned short x, unsigned short y) { return x*y; }
might assume that no situation could ever arise where the product would exceed 2147483647. For example, if it saw the function invoked as unsigned x = mul(y,65535); and y was an unsigned short, it may omit code elsewhere that would only be relevant if y were greater than 37268.
It seems you seem to have missed the facts that firstly, 0101 = 5 in both signed and unsigned integer values and that secondly, you assigned a negative number to an unsigned int - something your compiler may be smart enough to realise and, therfore, correct to a signed int.
Setting an unsigned int to -5 should technically cause an error because unsigned ints can't store values under 0.
You could understand it better when you try to assign a negative value to a larger sized unsigned integer. Compiler generates the assembly code to do sign extension when transferring small size negative value to larger sized unsigned integer.
see this blog post for assembly level explanation.
Choice of signed integer representation is left to the platform. The representation applies to both negative and non-negative values - for example, if 11012 (-5) is the two's complement of 01012 (5), then 01012 (5) is also the two's complement of 11012 (-5).
The platform may or may not provide separate instructions for operations on signed and unsigned integers. For example, x86 provides different multiplication and division instructions for signed (idiv and imul) and unsigned (div and mul) integers, but uses the same addition (add) and subtraction (sub) instructions for both.
Similarly, x86 provides a single comparison (cmp) instruction for both signed and unsigned integers.
Arithmetic and comparison operations will set one or more status register flags (carry, overflow, zero, etc.). These can be used differently when dealing with words that are supposed to represent signed vs. unsigned values.
As far as printf is concerned, you're absolutely correct that the conversion specifier determines whether the bit pattern 0xFFFF is displayed as -1 or 4294967295, although remember that if the type of the argument does not match up with what the conversion specifier expects, then the behavior is undefined. Using %u to display a negative signed int may or may not give you the expected equivalent unsigned value.

Unexpected output when executing left-shift by 32 bits

When I do a left shift of a hex I get -1 as output with the following code:
unsigned int i,j=0;
i= (0xffffffff << (32-j));
printf("%d",i);
Similarly when I changed the shift value to 32, the output is 0, but I get warnings from compiler as (left shift count >= width of type)
unsigned int i,j=32;
i= (0xffffffff << (32));
printf("%d",i);
I was expecting the same results in both the cases (ie, 0), but got confused why is displaying wrong output in case #1, and in case #2 the result is correct but the compiler warns!
The result is same in 32 and 64 bit x86 machines.
Can someone explain the results above?
It's undefined behavior to left-shit 32 or greater on a 32-bit integer. That's what the error is about.
C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is
that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Shifting a 32-bit variable by 32 yields undefined behavior.
Here is the assembly generated by the VS-2013 compiler:
int n = 0;
mov dword ptr [n],0
int a = 0xFFFFFFFF << 32;
mov dword ptr [a],0
int b = 0xFFFFFFFF << (32-n);
mov ecx,20h
sub ecx,dword ptr [n]
or eax,0FFFFFFFFh
shl eax,cl
mov dword ptr [b],eax
As you can see, what happens de-facto is:
When you shift by a constant value of 32, the compiler simply sets the result to 0
When you shift by a variable (such as 32-n with n==0), the compiler uses shl
The actual result of shl depends on the implementation of this operation in the underlying architecture. On your processor, it probably takes the 2nd operand modulo 32, hence the 1st operand is shifted by 0.
Again, the description above is not dictated by the standard, so it really depends on the compiler in use.

Resources