void times(unsigned short int time)
{
hours=time>>11;
minutes=((time<<5)>>10);
}
Take the input time to be 24446
The output values are
hours = 11
minutes = 763
The expected values are
hours = 11
minutes = 59
What internal processing is going on in this code?
Binary of 24446 is 0101111101111110
Time>>11 gives 01011 which means 11.
((Time<<5)>>10) gives 111011 which means 59.
But what else is happening here?
What else is going on here?
If time is unsigned short, there is an important difference between
minutes=((time<<5)>>10);
and
unsigned short temp = time << 5;
minutes = temp >> 10;
In both expressions, time << 5 is computed as an int, because of integer promotion rules. [Notes 1 and 2].
In the first expression, this int result is then right-shifted by 10. In the second expression, the assignment to unsigned short temp narrows the result to a short, which is then right-shifted by 10.
So in the second expression, high-order bits are removed (by the cast to unsigned short), while in the first expression they won't be removed if int is wider than short.
There is another important caveat with the first expression. Since the integer promotions might change an unsigned short into an int, the intermediate value might be signed, in which case overflow would be possible if the left shift were large enough. (In this case, it isn't.) The right shift might then be applied to a negative number, the result is "implementation-defined"; many implementations define the behaviour of right-shift of a negative number as sign-extending the number. This can also lead to surprises.
Notes:
Assuming that int is wider than short. If unsigned int and unsigned short are the same width, no conversion will happen and you won't see the difference you describe. The "integer promotions" are described in §6.3.1.1/2 of the C standard (using the C11 draft):
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned
int. These are called the integer promotions. All other types are unchanged by the
integer promotions.
Integer promotion rules effectively make it impossible to do any arithmetic computation directly with a type smaller than int, although compilers may use the what-if rule to use sub-word opcodes. In that case, they have to produce the same result as would have been produced with the promoted values; that's easy for unsigned addition and multiplication, but trickier for shift.
The bitshift operators are an exception to the semantics of arithmetic operations. For most arithmetic operations, the C standard requires that "the usual arithmetic conversions" be applied before performing the operation. The usual arithmetic conversions, among other things, guarantee that the two operands have the same type, which will also be the type of the result. For bitshifts, the standard only requires that integer promotions be performed on both operands, and that the result will have the type of the left operand. That's because the shift operators are not symmetric. For almost all architectures, there is no valid right operand for a shift which will not fit in an unsigned char, and there is obviously no need for the types or even the signedness of the left and right operands to be the same.
In any event, as with all arithmetic operators, the integer promotions (at least) are going to be performed. So you will not see intermediate results narrower than an int.
This piece of code seems to think that int is 16 bit and left shifting time would clear the top 5 bits.
Since you're most likely working in 32/64 bits, it doesn't happen if the value in time is a 16 bit value:
time >> 5 == (time << 5) >> 10
Try this:
minutes = (time >> 5) & 0x3F;
or
minutes = (time & 0x07FF) >> 5;
or
Declare time as unsigned short and cast to unsigned short after every shift operation since math is 32/64 bit.
24446 in binary is: 0101 1111 0111 1110
Bits 0-4 - unknown
Bits 5-10 minutes
Bits 11-16 hours
It seems that size of 'int' for the platform you working on 32 bits.
As far as processing is concerned assume that,
First statement is dividing "time" by '11'.
Second statement is multiplying "time" by 5 then dividing whole by 10.
Answer of your question ends here.
If you add what time value actually contains(number of seconds/miliseconds/hours or something else) then you may get more help.
Edit:
As #egur pointed out,you might be porting your code from 16 bit to 32/64 bit platform.
A widely accepted C coding style to make the code portable is something like below:
make Typedef.h file and include it in every other C file,
//Typedef.h
typedef unsigned int U16
typedef signed int S16
typedef unsigned short U8
typedef signed short S8
:
:
//END
Use U16,U8 etc. while declaring variables.
Now when you move to larger bit processor say 32 bit,Change your Typedef.h to
//Typedef.h
typedef unsigned int U32
typedef signed int S32
typedef unsigned short U16
typedef signed short S16
No need to change anything in rest code.
edit2:
after seeing your edit:
((Time<<5)>>10) gives 111011 which means 59.
For 32 bit processors
((Time<<5)>>10) gives 0000 0010 1111 1011 which means 763.
Related
In case of unsigned short I shifted 383 by 11 positions towards left and again in the same instruction shifted it by 15 positions right, I expected the value to be 1 but it was 27. But when I used both the shift operations in different instructions one after another(first left shift and then right), output was 1.
here is a sample code :-
unsigned short seed = 383;
printf("size of short: %d\n",sizeof(short));
unsigned short seedout,seed1,seed2,seed3,seedout1;
seed1 = (seed<<11);
seed2 = (seed1>>15);
seed3 = ((seed<<11)>>15);
printf("seed1 :%d\t seed2: %d\t seed3: %d\n",seed1,seed2,seed3);
and its output was :
size of short: 2
seed1 :63488 seed2: 1 seed3: 23
seedout1: 8 seedout :382
Process returned 0 (0x0) execution time : 0.154 s
For clarity, you compare
unsigned short seed1 = (seed<<11);
unsigned short seed2 = (seed1>>15);
on one hand and
unsigned short seed3 = ((seed<<11)>>15);
on the other hand.
The first one takes the result of the shift operation, stores it in an unsigned short variable (which apparently is 16 bit on your platform) and shifts this result right again.
The second one shifts the result immediately.
The reason why this is different resp. the bits which are shifted out to the left are retained is the following:
Although seed is unsigned short, seed<<11 is signed int. Thus, these bits are not cut off as it is the case when storing the result, but they are kept in the intermediate signed int. Only the assignment to seed1 makes the value unsigned short, which leads to a clipping of the bits.
In other words: your second example is merely equivalent to
int seed1 = (seed<<11); // instead of short
unsigned short seed2 = (seed1>>15);
Regarding left shifting, type signedness & implicit promotion:
Whenever something is left shifted into the sign bit of a signed integer type, we invoke undefined behavior. Similarly, we also invoke undefined behavior when left-shifting a negative value.
Therefore we must always ensure that the left operand of << is unsigned. And here's the problem, unsigned short is a small integer type, so it is subject to implicit type promotion whenever used in an expression. Shift operators always integer promote the left operand:
C17 6.5.7:
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand.
(This makes shifts in particular a special case, since they don't care about the type of the right operand but only look at the left one.)
So in case of a 16 bit system, you'll run into the case where unsigned short gets promoted to unsigned int, because a 16 bit int cannot hold all values of a 16 bit unsigned short. And that's fine, it's not a dangerous conversion.
On a 32 bit system however, the unsigned short gets promoted to int which is signed. Should you left shift a value like 0x8000 (MSB) set 15 bits or more, you end up shifting data into the sign bit of the promoted int, which is a subtle and possibly severe bug. For example, this prints "oops" on my Windows computer:
#include <stdio.h>
int main (void)
{
unsigned short x=0x8000;
if((x<<16) < 0) ) // undefined behavior
puts("oops");
}
But the compiler could as well have assumed that a left shift of x can never result in a value < x and removed the whole machine code upon optimization.
We need to be sure that we never end up with a signed type by accident! Meaning we must know how implicit type promotion works in C.
As for left-shifting unsigned int or larger unsigned types, that's perfectly well-defined as long as we don't shift further than the width of the (promoted) type itself (more than 31 bits on a 32 bit system). Any bits shifted out well be discarded, and if you right shift, it will always be a logical shift where zeroes are shifted in from the right.
To answer the actual question:
Your unsigned short is integer promoted to an int on a 32 bit system. This allows to shift beyond the 16 bits of an unsigned short, but if you discard those extra bits by saving the result in an unsigned short, you end up with this:
383 = 0x17F
0x17f << 11 = 0xBF800
0xBF800 truncated to 16 bits = 0xF800 = 63488
0xF800 >> 15 = 0x1
However, if skipping the middle step truncation to 15 bits, you have this instead:
0xBF800 >> 15 = 0x17 = 23
But again, this is only by luck since this time we didn't end up shifting data into the sign bit.
Another example, when executing this code, you might expect to get either the value 0 or the value 32768:
unsigned short x=32768;
printf("%d", x<<16>>16);
But it prints -32768 on my 2's complement PC. The x<<16 invokes undefined behavior, and the >>16 then apparently sign extended the result.
These kind of subtle shift bugs are common, particularly in embedded systems. A frightening amount of all C programs out there are written by people who didn't know about implicit promotions.
I shifted 383 by 11 positions towards left and again in the same instruction shifted it by 15 positions right, I expected the value to be 1 but it was 27
Simple math, you've shifted it 4 bits to the right, which is equivalent to dividing by 16.
Divide 383 by 16, and you get 27 (integer-division, of course).
Note that the "simply shifted it 4 bits" part holds because:
You've used an unsigned operand, which means that you did not "drag 1s" when shifting right
The shift-left operation likely returns an unsigned integer (32 bits) on your platform, so no data was loss during that part.
BTW, with regards to the 2nd bullet above - when you do this in parts and store the intermediate result into an unsigned short, you do indeed lose data and get a different result.
In other words, when doing seed<<11, the compiler uses 32-bit operations, and when storing it into seed1, only the LSB part of the previous result is preserved.
EDIT:
27 above should be 23. I copied that from your description without checking, though I see that you did mention 23 further down your question, so I'm assuming that 27 was a simple typo...
I'm a little confused about how the C language treats different sized ints when you do basic arithmetic and bitwise operation. what would happen if in this case:
int8_t even = 0xBB;
int16_t twice = even << 8;
twice = twice + even;
What If i went the other way and attempted to add a 16 bit int to an 8 bit? is the normal int declaration dynamic? Why would i want to designate a size? What happens when I add to 8 bit ints that are both 0xFF?
As I mentioned in my comment, everything gets promoted to at least an int during arithmetic.
Here's an example program to demonstrate that:
main(){
struct {
char x : 1;
}t;
t.x = 1;
int i = t.x << (sizeof i * 8 - 1);
printf("i = %x\n",i);
}
t.x is only one bit, but in this operation it is promoted all the way to an integer to give the output:
i = 80000000
On the other hand, if we add the line
long long j = t.x << (sizeof j * 8 - 1)
gcc gave me the warning:
warning: left shift count >= width of type [enabled by default]
What If i went the other way and attempted to add a 16 bit int to an 8 bit?
All the arithmetic would be done at integer precision (probably 32 bits) and then the bottom 8 bits of the result would be stored in the 8 bit number.
Is the normal int declaration dynamic?
No. Its a fixed width on an implementation (probably 32 bits on yours).
Why would i want to designate a size?
Maybe you have space constraints. Maybe your algorithm was made to work with a specific size integer. Maybe you really want to work on some modulo field provided by uint16_t (Z_65536).
What happens when I add to 8 bit ints that are both 0xFF?
The promotion doesn't matter here, the 8 bit result will be 0xFE. If you were to store the result in a 16 bit number, then the result will be 0x1FE (Unless int's in your implementation are only 8 bits. This is highly unlikely except for some esoteric embedded applications).
Edit
I wrote about the unsigned convention here, because you representing the number as 0xFF seemed to refer to unsigned numbers (in K&R C 0xFF is actually an unsigned literal). If you were actually referring to the signed 8-bit value 0xFF that is equivalent to -1, and your problem becomes sort of trivial. No matter how big the integer, it should always be able to represent -1, as well as -1 + (-1) = -2 (you only actually need two bits to represent these numbers).
Normal "int" is not dynamic, but is not common across compilers; it's the size that is most convenient for the CPU you're compiling for. You would want to designate a size if you have some reason (network protocol, file format, etc) why you actually care about the representation.
Answers to your exercises below the answer to your question:
Your answer can be found in the C11 final draft spec (was ratified in October 2011, but the "final" document is not free): http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
The first relevant section is 6.3.1.1 Conversions : Arithmetic Operands : Boolean, characters and integers, beginning on page 50
If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
The integer promotions preserve value including sign. As discussed earlier, whether a "plain" char is treated as signed is implementation-defined.
Essentially, the integer types are all promoted to at least int, and possibly to larger sizes than that if any of the operands had a rank greater than that. They generally become signed, but remain unsigned if either operand's type has greater range than int.
There's a bunch of stuff specifically defining how the conversions are performed, and codifying the rules of which types are "higher rank" (bigger) than others, and then we reach the real meat in 6.3.1.8: Usual arithmetic conversions:
Many operators that expect operands of arithmetic type cause conversions and yield result
types in a similar way. The purpose is to determine a common real type for the operands
and result. For the specified operands, each operand is converted, without change of type
domain, to a type whose corresponding real type is the common real type. Unless
explicitly stated otherwise, the common real type is also the corresponding real type of
the result, whose type domain is the type domain of the operands if they are the same,
and complex otherwise. This pattern is called the
usual arithmetic conversions:
... some stuff about floats ...
Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:
If both operands have the same type, then no further conversion is needed. Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank. Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned integer type.
Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.
int8_t even = 0xBB;
0xBB formed as a constant of type int (0x000000BB), then truncated down to int8_t
int16_t twice = even << 8;
the value of even is sign extended from int8_t to int, becoming 0xFFFFFFBB (if I assume int is 32-bits on this machine)
The value of 8 is formed as type int
These types match, so no further conversions occur
The operator << is performed (now that types match), yielding 0xFFFFBB00
This value is truncated to fit in int16_t, yielding 0xBB00
twice = twice + even;
The value of twice is sign extended from int16_t to int, yielding 0xFFFFBB00 (-17664)
The value of even is sign extended from int8_t to int, yielding 0xFFFFFFBB (-69)
These values are added, yielding 0xFFFFBABB (-17733)
This value is then truncated to fit in int16_t, yielding 0xBABB
What happens when I add to 8 bit ints that are both 0xFF?
They are both sign extended to int yielding 0xFFFFFFFF, then added yielding 0xFFFFFFFE. This may or may not be truncated back down depending on where you store it.
The compiler is of course free to notice that much of this extending/truncating stuff is worthless busywork because the output bits are not kept, and optimize it out. But the result is required to be "as if" if had done things the long way.
I am reverse engineering some old C, running under Win95 (yes, in production) appears to have been compiled with a Borland compiler (I don't have the tool chain).
There is a function which does (among other things) something like this:
static void unknown(int *value)
{
int v = *value;
v-=0x8000;
*value = v;
}
I can't quite work out what this does. I assume 'int' in this context is signed 32 bit. I think 0x8000 would be unsigned 32bit int, and outside the range of a signed 32 bit int. (edit - this is wrong, it is outside of a signed 16 bit int)
I am not sure if one of these would be cast first, and how the casting would handle overflows, and/or how the subtraction would handle the over flow.
I could try on a modern system, but I am also unsure if the results would be the same.
Edit for clarity:
1: 'v-=0x8000;' is straight from the original code, this is what makes little sense to me. v is defined as an int.
2: I have the code, this is not from asm.
3: The original code is very, very bad.
Edit: I have the answer! The answer below wasn't quite right, but it got me there (fix up and I'll mark it as the answer).
The data in v is coming from an ambiguous source, which actually seems to be sending unsigned 16 bit data, but it is being stored as a signed int. Latter on in the program all values are converted to floats and normalised to an average 0 point, so actual value doesn't matter, only order. Because we are looking at an unsigned int as a signed one, values over 32767 are incorrectly placed below 0, so this hack leaves the value as signed, but swaps the negative and positive numbers around (not changing order). End results is all numbers have the same order (but different values) as if they were unsigned in the first place.
(...and this is not the worst code example in this program)
In Borland C 3.x, int and short were the same: 16 bits. long was 32-bits.
A hex literal has the first type in which the value can be represented: int, unsigned int, long int or unsigned long int.
In the case of Borland C, 0x8000 is a decimal value of 32768 and won't fit in an int, but will in an unsigned int. So unsigned int it is.
The statement v -= 0x8000 ; is identical to v = v - 0x8000 ;
On the right-hand side, the int value v is implicitly cast to unsigned int, per the rules, the arithmetic operation is performed, yielding an rval that is an unsigned int. That unsigned int is then, again per the rules, implicitly cast back to the type of the lval.
So, by my estimation, the net effect is to toggle the sign bit — something that could be more easily and clearly done via simple bit-twiddling: *value ^= 0x8000 ;.
There is possibly a clue on this page http://www.ousob.com/ng/borcpp/nga0e24.php - Guide to Borland C++ 2.x ( with Turbo C )
There is no such thing as a negative numeric constant. If
a minus sign precedes a numeric constant it is treated as
the unary minus operator, which, along with the constant,
constitutes a numeric expression. This is important with
-32768, which, while it can be represented as an int,
actually has type long int, since 32768 has type long. To
get the desired result, you could use (int) -32768,
0x8000, or 0177777.
This implies the use of two's complement for negative numbers. Interestingly, the two's complement of 0x8000 is 0x8000 itself (as the value +32768 does not fit in the range for signed 2 byte ints).
So what does this mean for your function? Bit wise, this has the effect of toggling the sign bit, here are some examples:
f(0) = f(0x0000) = 0x8000 = -32768
f(1) = f(0x0001) = 0x8001 = -32767
f(0x8000) = 0
f(0x7fff) = 0xffff
It seems like this could be represented as val ^= 0x8000, but perhaps the XOR operator was not implemented in Borland back then?
I have a file that I've read into an array of data type signed char. I cannot change this fact.
I would now like to do this: !((c[i] & 0xc0) & 0x80) where c[i] is one of the signed characters.
Now, I know from section 6.5.10 of the C99 standard that "Each of the operands [of the bitwise AND] shall have integral type."
And Section 6.5 of the C99 specification tells me:
Some operators (the unary operator ~ , and the binary operators << , >> , & , ^ , and | ,
collectively described as bitwise operators )shall have operands that have integral type.
These operators return
values that depend on the internal representations of integers, and
thus have implementation-defined aspects for signed types.
My question is two-fold:
Since I want to work with the original bit patterns from the file, how can I convert/cast my signed char to unsigned char so that the bit patterns remain unchanged?
Is there a list of these "implementation-defined aspects" anywhere (say for MVSC and GCC)?
Or you could take a different route and argue that this produces the same result for both signed and unsigned chars for any value of c[i].
Naturally, I will reward references to relevant standards or authoritative texts and discourage "informed" speculation.
As others point out, in all likelyhood your implementation is based on two's complement, and will give exactly the result you expect.
However, if you're worried about the results of an operation involving a signed value, and all you care about is the bit pattern, simply cast directly to an equivalent unsigned type. The results are defined under the standard:
6.3.1.3 Signed and unsigned integers
...
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
This is essentially specifying that the result will be the two's complement representation of the value.
Fundamental to this is that in two's complement maths the result of a calculation is modulo some power of two (i.e. the number of bits in the type), which in turn is exactly equivalent to masking off the relevant number of bits. And the complement of a number is the number subtracted from the power of two.
Thus adding a negative value is the same as adding any value which differs from the value by a multiple of that power of two.
i.e:
(0 + signed_value) mod (2^N)
==
(2^N + signed_value) mod (2^N)
==
(7 * 2^N + signed_value) mod (2^N)
etc. (if you know modulo, that should be pretty self-evidently true)
So if you have a negative number, adding a power of two will make it positive (-5 + 256 = 251), but the bottom 'N' bits will be exactly the same (0b11111011) and it will not affect the outcome of a mathematical operation. As values are then truncated to fit the type, the result is exactly the binary value you expected with even if the result 'overflows' (i.e. what you might think happens if the number was positive to start with - this wrapping is also well defined behaviour).
So in 8-bit two's complement:
-5 is the same as 251 (i.e 256 - 5) - 0b11111011
If you add 30, and 251, you get 281. But that's larger than 256, and 281 mod 256 equals 25. Exactly the same as 30 - 5.
251 * 2 = 502. 502 mod 256 = 246. 246 and -10 are both 0b11110110.
Likewise if you have:
unsigned int a;
int b;
a - b == a + (unsigned int) -b;
Under the hood, this cast is unlikely to be implemented with arithmetic and will certainly be a straight assignment from one register/value to another, or just optimised out altogether as the maths does not make a distinction between signed and unsigned (intepretation of CPU flags is another matter, but that's an implementation detail). The standard exists to ensure that an implementation doesn't take it upon itself to do something strange instead, or I suppose, for some weird architecture which isn't using two's complement...
unsigned char UC = *(unsigned char*)&C - this is how you can convert signed C to unsigned keeping the "bit pattern". Thus you could change your code to something like this:
!(( (*(unsigned char*)(c+i)) & 0xc0) & 0x80)
Explanation(with references):
761 When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.
1124 When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
These two implies that unsigned char pointer points to the same byte as original signed char pointer.
You appear to have something similar to:
signed char c[] = "\x7F\x80\xBF\xC0\xC1\xFF";
for (int i = 0; c[i] != '\0'; i++)
{
if (!((c[i] & 0xC0) & 0x80))
...
}
You are (correctly) concerned about sign extension of the signed char type. In practice, however, (c[i] & 0xC0) will convert the signed character to a (signed) int, but the & 0xC0 will discard any set bits in the more significant bytes; the result of the expression will be in the range 0x00 .. 0xFF. This will, I believe, apply whether you use sign-and-magnitude, one's complement or two's complement binary values. The detailed bit pattern you get for a specific signed character value varies depending on the underlying representation; but the overall conclusion that the result will be in the range 0x00 .. 0xFF is valid.
There is an easy resolution for that concern — cast the value of c[i] to an unsigned char before using it:
if (!(((unsigned char)c[i] & 0xC0) & 0x80))
The value c[i] is converted to an unsigned char before it is promoted to an int (or, the compiler might promote to int, then coerce to unsigned char, then promote the unsigned char back to int), and the unsigned value is used in the & operations.
Of course, the code is now merely redundant. Using & 0xC0 followed by & 0x80 is entirely equivalent to just & 0x80.
If you're processing UTF-8 data and looking for continuation bytes, the correct test is:
if (((unsigned char)c[i] & 0xC0) == 0x80)
"Since I want to work with the original bit patterns from the file,
how can I convert/cast my signed char to unsigned char so that the bit
patterns remain unchanged?"
As someone already explained in a previous answer to your question on the same topic, any small integer type, be it signed or unsigned, will get promoted to the type int whenever used in an expression.
C11 6.3.1.1
"If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is converted to
an int; otherwise, it is converted to an unsigned int. These are
called the integer promotions."
Also, as explained in the same answer, integer literals are always of the type int.
Therefore, your expression will boil down to the pseudo code (int) & (int) & (int). The operations will be performed on three temporary int variables and the result will be of type int.
Now, if the original data contained bits that may be interpreted as sign bits for the specific signedness representation (in practice this will be two's complement on all systems), you will get problems. Because these bits will be preserved upon promotion from signed char to int.
And then the bit-wise & operator performs an AND on every single bit regardless of the contents of its integer operand (C11 6.5.10/3), be it signed or not. If you had data in the signed bits of your original signed char, it will now be lost. Because the integer literals (0xC0 or 0x80) will have no bits set that corresponds to the sign bits.
The solution is to prevent the sign bits from getting transferred to the "temporary int". One solution is to cast c[i] to unsigned char, which is completely well-defined (C11 6.3.1.3). This will tell the compiler that "the whole contents of this variable is an integer, there are no sign bits to be concerned about".
Better yet, make a habit of always using unsigned data in every form of bit manipulations. The purist, 100% safe, MISRA-C compliant way of re-writing your expression is this:
if ( ((uint8_t)c[i] & 0xc0u) & 0x80u) > 0u)
The u suffix actually enforces the expression to be of unsigned int, but it is good practice to always cast to the intended type. It tells the reader of the code "I actually know what I am doing and I also understand all weird implicit promotion rules in C".
And then if we know our hex, (0xc0 & 0x80) is pointless, it is always true. And x & 0xC0 & 0x80 is always the same as x & 0x80. Therefore simplify the expression to:
if ( ((uint8_t)c[i] & 0x80u) > 0u)
"Is there a list of these "implementation-defined aspects" anywhere"
Yes, the C standard conveniently lists them in Appendix J.3. The only implementation-defined aspect you encounter in this case though, is the signedness implementation of integers. Which in practice is always two's complement.
EDIT:
The quoted text in the question is concerned with that the various bit-wise operators will produce implementation-defined results. This is just briefly mentioned as implementation-defined even in the appendix with no exact references. The actual chapter 6.5 doesn't say much regarding impl.defined behavior of & | etc. The only operators where it is explicitly mentioned is the << and >>, where left shifting a negative number is even undefined behavior, but right shifting it is implementation-defined.
The number 1, right shifted by anything greater than 0, should be 0, correct? Yet I can type in this very simple program which prints 1.
#include <stdio.h>
int main()
{
int b = 0x80000000;
int a = 1 >> b;
printf("%d\n", a);
}
Tested with gcc on linux.
6.5.7 Bitwise shift operators:
If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
The compiler is at license to do anything, obviously, but the most common behaviors are to optimize the expression (and anything that depends on it) away entirely, or simply let the underlying hardware do whatever it does for out-of-range shifts. Many hardware platforms (including x86 and ARM) mask some number of low-order bits to use as a shift-amount. The actual hardware instruction will give the result you are observing on either of those platforms, because the shift amount is masked to zero. So in your case the compiler might have optimized away the shift, or it might be simply letting the hardware do whatever it does. Inspect the assembly if you want to know which.
according to the standard, shifting for more than the bits actually existing can result in undefined behavior. So we cannot blame the compiler for that.
The motivation probably resides in the "border meaning" of 0x80000000 that sits on the boundary of the maximum positive and negative together (and that is "negative" having the highmost bit set) and on certain check that should be done and that the compiled program doesn't to to avoid to waste time verifying "impossible" things (do you really want the processor to shift bits 3 billion times?).
It's very probably not attempting to shift by some large number of bits.
INT_MAX on your system is probably 2**31-1, or 0x7fffffff (I'm using ** to denote exponentiation). If that's the case, then In the declaration:
int b = 0x80000000;
(which was missing a semicolon in the question; please copy-and-paste your exact code) the constant 0x80000000 is of type unsigned int, not int. The value is implicitly converted to int. Since the result is outside the bounds of int, the result is implementation-defined (or, in C99, may raise an implementation-defined signal, but I don't know of any implementation that does that).
The most common way this is done is to reinterpret the bits of the unsigned value as a 2's-complement signed value. The result in this case is -2**31, or -2147483648.
So the behavior isn't undefined because you're shifting by value that equals or exceeds the width of type int, it's undefined because you're shifting by a (very large) negative value.
Not that it matters, of course; undefined is undefined.
NOTE: The above assumes that int is 32 bits on your system. If int is wider than 32 bits, then most of it doesn't apply (but the behavior is still undefined).
If you really wanted to attempt to shift by 0x80000000 bits, you could do it like this:
unsigned long b = 0x80000000;
unsigned long a = 1 >> b; // *still* undefined
unsigned long is guaranteed to be big enough to hold the value 0x80000000, so you avoid part of the problem.
Of course, the behavior of the shift is just as undefined as it was in your original code, since 0x80000000 is greater than or equal to the width of unsigned long. (Unless your compiler has a really big unsigned long type, but no real-world compiler does that.)
The only way to avoid undefined behavior is not to do what you're trying to do.
It's possible, but vanishingly unlikely, that your original code's behavior is not undefined. That can only happen if the implementation-defined conversion of 0x80000000 from unsigned int to int yields a value in the range 0 .. 31. IF int is smaller than 32 bits, the conversion is likely to yield 0.
well read that maybe can help you
expression1 >> expression2
The >> operator masks expression2 to avoid shifting expression1 by too much.
That because if the shift amount exceeded the number of bits in the data type of expression1, all the original bits would be shifted away to give a trivial result.
Now for ensure that each shift leaves at least one of the original bits,
the shift operators use the following formula to calculate the actual shift amount:
mask expression2 (using the bitwise AND operator) with one less than the number of bits in expression1.
Example
var x : byte = 15;
// A byte stores 8 bits.
// The bits stored in x are 00001111
var y : byte = x >> 10;
// Actual shift is 10 & (8-1) = 2
// The bits stored in y are 00000011
// The value of y is 3
print(y); // Prints 3
That "8-1" is because x is 8 bytes so the operacion will be with 7 bits. that void remove last bit of original chain bits