Could someone explain to me what can happen if I'll forget suffix(postfix) for constants(literals) in ANSI C?
For example I saw for bit shift operations such defines:
#define AAR_INTENSET_NOTRESOLVED_Pos (2UL) /*!< Position of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Msk (0x1UL << AAR_INTENSET_NOTRESOLVED_Pos) /*!< Bit mask of NOTRESOLVED field. */
#define AAR_INTENSET_NOTRESOLVED_Disabled (0UL) /*!< Interrupt disabled. */
#define AAR_INTENSET_NOTRESOLVED_Enabled (1UL) /*!< Interrupt enabled. */
#define AAR_INTENSET_NOTRESOLVED_Set (1UL) /*!< Enable interrupt on write. */
It's used in 32bit architecture. But it could be ported to 16bit or 8bit.
What can happen if postfix UL is not used and I'll use these macros for bit shift operations as it is supposed?
I just assume that e.g. in 8-bit architecture can (1<<30) leads to overflow.
EDIT: I have found nice link: http://dystopiancode.blogspot.cz/2012/08/constant-suffixes-and-prefixes-in-ansi-c.html
But is it safe to use suffixes if the code is supposed to be ported on various architectures?
For instance if suffix U represents unisgned int so for 8bit architecture it's usually 16bit but for 32bit it's 32bit variable, so 0xFFFFAAAAU is ok for 32bit compiler but not for 8bit compiler, right?
A decimal number like -1,1,2,12345678, etc. without any suffix will get the smallest type it will fit, starting with int, long, long long.
An octal or hex number like 0, 0123, 0x123, 0X123 without any suffix will get the smallest type it will fit, starting with int, unsigned, long, unsigned long, long long, unsigned long long.
The following is a potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 31. Note: unsigned long must be at least 32 bits. It would result in 0 ** if unsigned long was 32 bits, but non-zero if longer.
(0x1UL << AAR_INTENSET_NOTRESOLVED_Pos)
The following is a similar potential problem should AAR_INTENSET_NOTRESOLVED_Pos exceed 15. 0x1 is an unsigned, which must only be at least 16 bits. Also if unsigned/int is 16 bits, the minimum, 0x1 will be int. So without explicitly using U, 0x1 could be a problem if AAR_INTENSET_NOTRESOLVED_Pos == 15. [#Matt McNabb]
(0x1 << AAR_INTENSET_NOTRESOLVED_Pos)
Bitwise shift operators
"The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined." C11dr §6.5.7 3
Machine width is not the key issue. An 8bit or 16bit machine could use 16, 32, etc. bit size int. Again, 16 bit is the minimum size for a compliant C compiler.
[Edit] **
I should have said " It (shifting more than 31 bits) would result in Undefined behavior, UB, if unsigned long was 32 bits."
It can break.
It might be better to include the cast in the code itself, i.e.
uint32_t something = (uint32_t) AAR_INTENSET_NOTRESOLVED_Set << 30;
This makes it work even if the #define for the constant is simply an integer.
Related
Here is an example from a ST CMSIS header:
#define USART_ISR_TC_Pos (6U)
#define USART_ISR_TC_Msk (0x1UL << USART_ISR_TC_Pos)
Everywhere in CMSIS headers, the bitfield positions (_Pos) are given as decimal integer constants of type unsigned int and the unshifted masks are unsigned long int.
Why is it that they both are not specified as unsigned long int?
Bit position: Position in the register cannot be more than 31 and any integer type in C can hold it. There is no reason even to make the position unsigned.
The mask. As the minimum unsigned int size required by the C standard is not big enough to hold 32 bit value, It has to be declared as unsigned long. CMSIS authors do not know the compiler you are going to use so they use the minimal sufficient type.
There's no apparent reason why. I very much doubt these libs are meant to be shared with 8- or 16 bitters, which would be the only sensible explanation.
A Cortex M will be 32 bit with 32 bit int and 32 bit long. And most importantly 32 bit hardware peripheral registers. So even if long was 64 bit it would be senseless to shift a mask outside the register bounds.
The U suffix is enough to prevent undefined behavior left shifts on a signed operand. Other sloppy libs use the bugged 1 << n style, so maybe 1UL was a quick fix to correct that bug - where 1U would have worked just as fine.
In case of unsigned short I shifted 383 by 11 positions towards left and again in the same instruction shifted it by 15 positions right, I expected the value to be 1 but it was 27. But when I used both the shift operations in different instructions one after another(first left shift and then right), output was 1.
here is a sample code :-
unsigned short seed = 383;
printf("size of short: %d\n",sizeof(short));
unsigned short seedout,seed1,seed2,seed3,seedout1;
seed1 = (seed<<11);
seed2 = (seed1>>15);
seed3 = ((seed<<11)>>15);
printf("seed1 :%d\t seed2: %d\t seed3: %d\n",seed1,seed2,seed3);
and its output was :
size of short: 2
seed1 :63488 seed2: 1 seed3: 23
seedout1: 8 seedout :382
Process returned 0 (0x0) execution time : 0.154 s
For clarity, you compare
unsigned short seed1 = (seed<<11);
unsigned short seed2 = (seed1>>15);
on one hand and
unsigned short seed3 = ((seed<<11)>>15);
on the other hand.
The first one takes the result of the shift operation, stores it in an unsigned short variable (which apparently is 16 bit on your platform) and shifts this result right again.
The second one shifts the result immediately.
The reason why this is different resp. the bits which are shifted out to the left are retained is the following:
Although seed is unsigned short, seed<<11 is signed int. Thus, these bits are not cut off as it is the case when storing the result, but they are kept in the intermediate signed int. Only the assignment to seed1 makes the value unsigned short, which leads to a clipping of the bits.
In other words: your second example is merely equivalent to
int seed1 = (seed<<11); // instead of short
unsigned short seed2 = (seed1>>15);
Regarding left shifting, type signedness & implicit promotion:
Whenever something is left shifted into the sign bit of a signed integer type, we invoke undefined behavior. Similarly, we also invoke undefined behavior when left-shifting a negative value.
Therefore we must always ensure that the left operand of << is unsigned. And here's the problem, unsigned short is a small integer type, so it is subject to implicit type promotion whenever used in an expression. Shift operators always integer promote the left operand:
C17 6.5.7:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.
(This makes shifts in particular a special case, since they don't care about the type of the right operand but only look at the left one.)
So in case of a 16 bit system, you'll run into the case where unsigned short gets promoted to unsigned int, because a 16 bit int cannot hold all values of a 16 bit unsigned short. And that's fine, it's not a dangerous conversion.
On a 32 bit system however, the unsigned short gets promoted to int which is signed. Should you left shift a value like 0x8000 (MSB) set 15 bits or more, you end up shifting data into the sign bit of the promoted int, which is a subtle and possibly severe bug. For example, this prints "oops" on my Windows computer:
#include <stdio.h>
int main (void)
{
unsigned short x=0x8000;
if((x<<16) < 0) ) // undefined behavior
puts("oops");
}
But the compiler could as well have assumed that a left shift of x can never result in a value < x and removed the whole machine code upon optimization.
We need to be sure that we never end up with a signed type by accident! Meaning we must know how implicit type promotion works in C.
As for left-shifting unsigned int or larger unsigned types, that's perfectly well-defined as long as we don't shift further than the width of the (promoted) type itself (more than 31 bits on a 32 bit system). Any bits shifted out well be discarded, and if you right shift, it will always be a logical shift where zeroes are shifted in from the right.
To answer the actual question:
Your unsigned short is integer promoted to an int on a 32 bit system. This allows to shift beyond the 16 bits of an unsigned short, but if you discard those extra bits by saving the result in an unsigned short, you end up with this:
383 = 0x17F
0x17f << 11 = 0xBF800
0xBF800 truncated to 16 bits = 0xF800 = 63488
0xF800 >> 15 = 0x1
However, if skipping the middle step truncation to 15 bits, you have this instead:
0xBF800 >> 15 = 0x17 = 23
But again, this is only by luck since this time we didn't end up shifting data into the sign bit.
Another example, when executing this code, you might expect to get either the value 0 or the value 32768:
unsigned short x=32768;
printf("%d", x<<16>>16);
But it prints -32768 on my 2's complement PC. The x<<16 invokes undefined behavior, and the >>16 then apparently sign extended the result.
These kind of subtle shift bugs are common, particularly in embedded systems. A frightening amount of all C programs out there are written by people who didn't know about implicit promotions.
I shifted 383 by 11 positions towards left and again in the same instruction shifted it by 15 positions right, I expected the value to be 1 but it was 27
Simple math, you've shifted it 4 bits to the right, which is equivalent to dividing by 16.
Divide 383 by 16, and you get 27 (integer-division, of course).
Note that the "simply shifted it 4 bits" part holds because:
You've used an unsigned operand, which means that you did not "drag 1s" when shifting right
The shift-left operation likely returns an unsigned integer (32 bits) on your platform, so no data was loss during that part.
BTW, with regards to the 2nd bullet above - when you do this in parts and store the intermediate result into an unsigned short, you do indeed lose data and get a different result.
In other words, when doing seed<<11, the compiler uses 32-bit operations, and when storing it into seed1, only the LSB part of the previous result is preserved.
EDIT:
27 above should be 23. I copied that from your description without checking, though I see that you did mention 23 further down your question, so I'm assuming that 27 was a simple typo...
I am writing code that may run on architectures of different word size (32-bit, 64-bit, etc) and I want to clear the low byte of a value. There is a macro (MAX) that is set to the maximum value of a word. So, for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values. If I have a word-sized variable that may be signed or unsigned, how can I clear the low byte of the variable with a single expression (no branching)?
My first idea was:
value & ~( MAX - 0xFF )
but this does not appear to work for signed values. My other thought was:
value = value - (value & 0xFF)
which has the disadvantage that it requires a stack operation.
To clear low byte, when not knowing the integer type width can result in incorrect code. So code should be careful.
Consider the below where value is wider than int/unsigned. 0xFF is an int constant with the value 255. ~0xFF is then that value with its bit inverted. With common 2's complemented, that would be -256 with its upper bits set as FF...FF00. -256 converted to a wider signed type retains its value and pattern FF...FF00. -256 converted to a wider unsigned type becomes Uxxx_MAX + 1 - 256, agian with the bit pattern FF...FF00. In both cases, the & will retain the uppers bits and clear the lower 8.
value_low_8bits_cleared = value & ~0xFF;
An alternative is to do all masking operation with unsigned math to avoid unexpected properties of int math and int encodings.
The below has no concerns about sign extension, int overflow. An optimizing compiler will certainly emit efficient code with a simply and mask. Further, there is no need to code the correct matching max value corresponding to value.
value_low_8bits_cleared = (value | 0xFFu) ^ 0xFFu;
here is the easy way to clear the low order 8 bits:
value &= ~0xFF;
I am writing code that may run on architectures of different word size
(32-bit, 64-bit, etc) and I want to clear the low byte of a value.
There is a macro (MAX) that is set to the maximum value of a word. So,
for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit
system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values.
Although C is designed so that implementations can take machine word size into account, the language itself has no inherent sense of machine words. C cares instead about types, and that makes a difference.
Anyway, I take you exactly at your word that you arrange for the replacement text of macro MAX to be one of the two alternatives you give, depending on the architecture of the machine. Note well that when that replacement text is interpreted as an integer constant, its type may vary between C implementations, and maybe even depending on compiler options.
If I have a
word-sized variable that may be signed or unsigned, how can I clear
the low byte of the variable with a single expression (no branching)?
The only reason I see for needing a single expression that cannot take the actual type of value explicitly into account is that you want to use the expression in a macro itself. In that case, you need to take great care around type conversions, especially when you have to account for signed types. This makes your MAX macro uncomfortable to work with for your purpose.
I'm inclined to suggest a different approach:
(value | 0xFF) ^ 0xFF
The constant 0xFF will be interpreted as a (signed) int with a positive value. Provided that value's type is not smaller than int, both appearances of 0xFF will be converted to that type without change in value, whether that type is signed or unsigned. Furthermore, the result of each operation and of the overall expression then has the same type as value, so no unexpected conversions occur.
How about
value & ~((intptr_t)0xFF)
First you want a mask that has all bits on, but those of the lower order byte
MAX ^ 0xFF
This converts 0xFF to the same type as MAX and then does the exclusive or with that value. Because MAX has all low order bits 1 these then become 0 and the high order bits stay as they are, that is 1.
Then you have to pull that mask over the value that interests you
value & ( MAX ^ 0xFF )
The Standard specifies that hexadecimal constants like 0x8000 (larger than fits in a signed integer) are unsigned (just like octal constants), whereas decimal constants like 32768 are signed long. (The exact types assume a 16-bit integer and a 32-bit long.) However, in regular C environments both will have the same representation, in binary 1000 0000 0000 0000.
Is a situation possible where this difference really produces a different outcome? In other words, is a situation possible where this difference matters at all?
Yes, it can matter. If your processor has a 16-bit int and a 32-bit long type, 32768 has the type long (since 32767 is the largest positive value fitting in a signed 16-bit int), whereas 0x8000 (since it is also considered for unsigned int) still fits in a 16-bit unsigned int.
Now consider the following program:
int main(int argc, char *argv[])
{
volatile long long_dec = ((long)~32768);
volatile long long_hex = ((long)~0x8000);
return 0;
}
When 32768 is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type long; the cast is
superfluous.
When 0x8000 is considered unsigned int, the negation will invert
16 bits, resulting in a representation 0x7FFF with type unsigned int;
the cast will then zero-extend to a long value of 0x00007FFF.
Look at H&S5, section 2.7.1 page 24ff.
It is best to augment the constants with U, UL or L as appropriate.
On a 32 bit platform with 64 bit long, a and b in the following code will have different values:
int x = 2;
long a = x * 0x80000000; /* multiplication done in unsigned -> 0 */
long b = x * 2147483648; /* multiplication done in long -> 0x100000000 */
Another examine not yet given: compare (with greater-than or less-than operators) -1 to both 32768 and to 0x8000. Or, for that matter, try comparing each of them for equality with an 'int' variable equal to -32768.
Assuming int is 16 bits and long is 32 bits (which is actually fairly unusual these days; int is more commonly 32 bits):
printf("%ld\n", 32768); // prints "32768"
printf("%ld\n", 0x8000); // has undefined behavior
In most contexts, a numeric expression will be implicitly converted to an appropriate type determined by the context. (That's not always the type you want, though.) This doesn't apply to non-fixed arguments to variadic functions, such as any argument to one of the *printf() functions following the format string.
The difference would be if you were to try and add a value to the 16 bit int it would not be able to do so because it would exceed the bounds of the variable whereas if you were using a 32bit long you could add any number that is less than 2^16 to it.
I have a lot of code that performs bitwise operations on unsigned integers. I wrote my code with the assumption that those operations were on integers of fixed width without any padding bits. For example an array of 32-bit unsigned integers of which all 32 bits available for each integer.
I'm looking to make my code more portable and I'm focused on making sure I'm C89 compliant (in this case). One of the issues that I've come across is possible padded integers. Take this extreme example, taken from the GMP manual:
However on Cray vector systems it may be noted that short and int are always stored in 8 bytes (and with sizeof indicating that) but use only 32 or 46 bits. The nails feature can account for this, by passing for instance 8*sizeof(int)-INT_BIT.
I've also read about this type of padding in other places. I actually read of a post on SO last night (forgive me, I don't have the link and I'm going to cite something similar from memory) where if you have, say, a double with 60 usable bits the other 4 could be used for padding and those padding bits could serve some internal purpose so they cannot be modified.
So let's say for example my code is compiled on a platform where an unsigned int type is sized at 4 bytes, each byte being 8 bits, however the most significant 2 bits are padding bits. Would UINT_MAX in that case be 0x3FFFFFFF (1073741823)?
#include <stdio.h>
#include <stdlib.h>
/* padding bits represented by underscores */
int main( int argc, char **argv )
{
unsigned int a = 0x2AAAAAAA; /* __101010101010101010101010101010 */
unsigned int b = 0x15555555; /* __010101010101010101010101010101 */
unsigned int c = a ^ b; /* ?? __111111111111111111111111111111 */
unsigned int d = c << 5; /* ?? __111111111111111111111111100000 */
unsigned int e = d >> 5; /* ?? __000001111111111111111111111111 */
printf( "a: %X\nb: %X\nc: %X\nd: %X\ne: %X\n", a, b, c, d, e );
return 0;
}
Is it safe to XOR two integers with padding bits?
Wouldn't I XOR whatever the padding bits are?
I can't find this behavior covered in C89.
Furthermore is the c variable guaranteed to be 0x3FFFFFFF or if for example the two padding bits were both on in a or b would c be 0xFFFFFFFF?
Same question with d and e. Am I manipulating the padding bits by shifting?
I would expect to see this below, assuming 32 bits with the 2 most significant bits used for padding, but I want to know if something like this is guaranteed:
a: 2AAAAAAA
b: 15555555
c: 3FFFFFFF
d: 3FFFFFE0
e: 01FFFFFF
Also are padding bits always the most significant bits or could they be the least significant bits?
EDIT 12/19/2010 5PM EST: Christoph has answered my question. Thanks!
I had also asked (above) whether padding bits are always the most significant bits. This is cited in the rationale for the C99 standard, and the answer is no. I am playing it safe and assuming the same for C89. Here is specifically what the C99 rationale says for §6.2.6.2 (Representation of Integer Types):
Padding bits are user-accessible in an unsigned integer type. For example, suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
Footnotes 44 and 45 mention that parity bits might be padding bits. The committee does not know of any machines with user-accessible parity bits within an integer. Therefore, the committee is not aware of any machines that treat parity bits as padding bits.
EDIT 12/28/2010 3PM EST: I found an interesting discussion on comp.lang.c from a few months ago.
Bitwise Operator Effects on Padding Bits (VelocityReviews reader)
Bitwise Operator Effects on Padding Bits (Google Groups alternate link)
One point made by Dietmar which I found interesting:
Let's note that padding bits are not necessary for the existence of trap representations; combinations of value bits which do not represent a value of the object type would also do.
Bitwise operations (like arithmetic operations) operate on values and ignore padding. The implementation may or may not modify padding bits (or use them internally, eg as parity bits), but portable C code will never be able to detect this. Any value (including UINT_MAX) will not include the padding.
Where integer padding might lead to problems on is if you use things like sizeof (int) * CHAR_BIT and then try to use shifts to access all these bits. If you want to be portable, either only use (unsigned) char, fixed-sized integers (a C99 addition) or determine the number of value-bits programatically. This can be done at compile-time with the preprocessor by comparing UINT_MAX against powers of 2 or at runtime by using bit-operations.
edit:
C90 does not mention integer padding at all, but as far as I can tell, 'invisible' preceding or trailing integer padding bits shouldn't violate the standard (I didn't go through all relevant sections to make sure this is really the case, though); there probaby are problems with mixed padding and value bits as mentioned in the C99 rationale because otherwise, the standard would not have needed to be changed.
As to the meaning of user-accessible: Padding bits are accessible insofar as you can alwaye get at any bit of foo (including padding) by using bit-operations on ((unsigned char *)&foo)[…]. Be careful when modifying the padding bits, though: the result won't change the value of the integer, but might create be a trap-representation nevertheless. In case of C90, this is implicitly unspecified (as in not mentioned at all), in case of C99, it's implementation-defined.
This was not what the rationale quotation was about, though: the cited architecture represents 32-bit integers via two 16-bit integers. In case of unsigned types, the resulting integer has 32 value bits and a precision of 32; in case of signed integers, it only has 31 value bits and a precision of 30: one of the sign bits of the 16-bit integers is used as the sign bit of the 32-bit integer, the other one is ignored, thus creating a padding bit surrounded by value bits. Now, if you access a 32-bit signed integer as an unsigned integer (which is explicitly allowed and does not violate the C99 aliasing rules), the padding bit becomes a (user-accessible) value bit.