What does a combination "&" with " -" mean? - c

Does anyone know what does &- in C programming?
limit= address+ (n &- sizeof(uint));

This isn't really one operator, but two:
(n) & (-sizeof(uint))
i.e. this is performing a bitwise and operation between n and -sizeof(uint).
What does this mean?
Let's assume -sizeof(uint) is -4 - then by two's complement representation, -sizeof(uint) is 0xFFFFFFFC or
1111 1111 1111 1111 1111 1111 1111 1100
We can see that this bitwise and operation will zero-out the last two bits of n. This effectively aligns n to the lowest multiple of sizeof(uint).

&- is the binary bitwise AND operator written together with - which is the unary minus operator. Operator precedence (binary & having lowest precedence here) gives us the operands of & as n and -sizeof(uint).
The purpose is to create a bit mask in a very obscure way, relying on unsigned integer arithmetic. Assuming uint is 4 bytes (don't use homebrewed types either btw, use stdint.h), then the code is equivalent to this
n & -(size_t)4
size_t being the type returned by sizeof, which is guaranted to be a large, unsigned integer type. Applying unary minus on unsigned types is of course nonsense too. Though even if it is obscure, applying minus on unsigned arithmetic results in well-defined wrap-around1), so in case of the value 4, we get 0xFFFFFFFFFFFFFFFC on a typical PC where size_t is 64 bits.
n & 0xFFFFFFFFFFFFFFFC will mask out everything but the 2 least significant bits.
What the relation between these 2 bits and the size of the type used is, I don't know. I guess that the purpose is to store something equivalent to the type's size in bytes in that area. Something with 4 values will fit in the two least significant bits: binary 0, 1, 10, 11. (The purpose could maybe be masking out misaligned addresses or some such?)
Assuming I guessed correct, we can write the same code without any obfuscation practices as far more readable code:
~(sizeof(uint32_t)-1)
Which gives us 4-1 = 0x3, ~0x3 = 0xFFFF...FC. Or in case of 8 byte types, 0xFFFF...F8. And so on.
So I'd rewrite the code as
#include <stdint.h>
uint32_t mask = ~(sizeof(uint32_t)-1);
limit = address + (n & mask);
1) C17 6.3.1.3
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type60)
Where the foot note 60) says:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
In this case repeatedly subtracting SIZE_MAX+1 from 4 until the value is in range of what can fit inside a size_t variable.

Related

can't shift negative numbers to the right in c

I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t

How to determine size limits of variable?

I'm struggling with determining edge-sizes of each variable. I can't understand the following problem.
To get maximum value of char for example, I use: ~ 0 >> 1
Which should work like this:
transfer 0 to binary: 0000 0000 (I assume that char is stored on 8
bits)
negate it: 1111 1111 (now I'm out of char max size)
move one place right: 0111 1111 (I get 127 which seems to be correct)
Now I want to present this result using printf function.
Why exactly do I have to use cast like this:
printf("%d\n", (unsigned char)(~0) >> 1)?
I just don't get it. I assume that it has something to do with point 2 when I get out of char range, but I'm not sure.
I will be grateful if you present me more complex explanation to this problem.
Please don't use these kinds of tricks. They might work on ordinary machines but they are possibly unportable and hard to understand. Instead, use the symbolic constants from the header file limits.h which contains the size limits for each of the basic types. For instance, CHAR_MAX is the upper bound for a char, CHAR_MIN is the lower bound. Further limits for the numeric types declared in stddef.h and stdint.h can be found in stdint.h.
Now for your question: Arithmetic is done on values of type int by default, unless you cause the operands involved to have a different type. This happens for various reasons, like one of the variables involved having a different type or you using a iteral of different type (like 1.0 or 1L or 1U). Even more importantly, the type of an arithmetic expression promotes from the inside to the outside. Thus, in the statement
char c = 1 + 2 + 3;
The expression 1 + 2 + 3 is evaluated as type int and only converted to char immediately before assigning. Even more important is that in the C language, you can't do arithmetic on types smaller than int. For instance, in the expression c + 1 where c is of type char, the compiler inserts an implicit conversion from char to int before adding one to c. Thus, a statement like
c = c + 1;
actually behaves like this in C:
c = (char)((int)c + 1);
Thus, ~0 >> 1 actually evaluates to 0xffffffff (-1) on a usual 32 bit architecture because the type int usually has 32 bits and right shifting of signed types usually shifts sign bits so the most significant bit becomes a one. Casting to unsigned char cause truncation, with the result being 0xff (255). All arguments but the first to printf are part of a variable argument list which is a bit complicated but basically means that all types smaller than int are converted to int, float is converted to double and all other types are left unchanged.
Now, how can we get this right? On an ordinary machine with two's complement and no padding bits one could use expressions like these to compute the largest and smallest char, assuming sizeof (char) < sizeof (int):
(1 << CHAR_BIT - 1) - 1; /* largest char */
-(1 << CHAR_BIT - 1); /* smallest char */
For other types, this is going to be slightly more difficult since we need to avoid overflow. Here is an expression that works for all signed integer types on an ordinary machine, where type is the type you want to have the limits of:
(type)(((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) - 1) /* largest */
(type)-((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) /* smallest */
For an unsigned type type, you could use this to get the maximum:
~(type)0
Please notice that all these tricks should not appear in portable code.
The exact effect of your actions is different from what you assumed.
0 is not 0000 0000. 0 has type int, which means that it is most likely 0000 0000 0000 0000 0000 0000 0000 0000, depending on how many bits int has on your platform. (I will assume 32-bit int.)
Now, ~0 is, expectedly, 1111 1111 1111 1111 1111 1111 1111 1111, which still has type int and is a negative value.
When you shift it to the right, the result is implementation-defined. Right-shifting negative signed integer values in C does not guarantee that you will obtain 0 in the sign bit. Quite the opposite, most platforms will actually replicate the sign bit when right-shifting. Which means that ~0 >> 1 will still give you 1111 1111 1111 1111 1111 1111 1111 1111.
Note, that even if you do this on a platform that shifts-in a 0 into the sign bit when right-shifting negative values, you will still obtain 0111 1111 1111 1111 1111 1111 1111 1111, which is in general case not the maximum value of char you were trying to obtain.
If you want to make sure that a right-shift operation shifts-in 0 bits from the left, you have to either 1) shift an unsigned bit-pattern or 2) shift a signed, but positive bit-pattern. With negative bit patterns you risk running into the sign-extending behavior, meaning that for negative values 1 bits would be shifted-in from the left instead of 0 bits.
Since C language does not have shifts that would work in the domain of [unsigned/signed] char type (the operand is promoted to int anyway before the shift), what you can do is make sure that you are shifting a positive int value and make sure that your initial bit-mask has the correct number of 1s in it. That is exactly what you achieve by using (unsigned char) ~0 as the initial mask. (unsigned char) ~0 will participate in the shift as a value of type int equal to 0000 0000 0000 0000 0000 0000 1111 1111 (assuming 8-bit char). After the shift you will obtain 0000 0000 0000 0000 0000 0000 0111 1111, which is exactly what you were trying to obtain.
That only works with unsigned integers. For signed integers, right shifting a negative number and the behaviour of in bit-wise inversion is implementation defined. Not only it depends on the representation of negative values, but also what CPU instruction the compiler uses to perform the right-shift (some CPUs do not have arithmetic (right) shift for instance.
So, unless you make additional constraints for your implementation, it is not possible to determine the limits of signed integers. This implies there is no completely portable way (for signed integers).
Note that whether char is signed or unsigned is also implementation defined and that (unsigned char)(~0) >> 1 is subject to integer promotions, so it will not yield a character result, but an int. (which makes the format specifier correct - allthough presumably unintended).
Use limits.h to get macros for your implementation's integer limits. This file has to be provided by any standard-compliant C compiler.

How does unsigned subtraction work when it wraps around?

This is a macro in the lwIP source code:
#define TCP_SEQ_LT(a,b) ((int32_t)((uint32_t)(a) - (uint32_t)(b)) < 0)
Which is used to check if a TCP sequence number is less than another, taking into account when the sequence numbers wrap around. It exploits the fact that arithmetic wraps around, but I am unable to understand how this works in this particular case.
Can anyone explain what happens and why the above works ?
Take a simple 4 bit integer example where a = 5 and b = 6. The binary representation of each will be
a = 0101
b = 0110
Now when we subtract these (or take two's complement of b, sum with a, and add 1), we get the following
0101
1001
+ 1
-----
1111
1111 is equal to 15 (unsigned) or -1 (signed, again translated using two's complement). By casting the two numbers to unsigned, we ensure that if b > a, the difference between the two is going to be a large unsigned number and have it's highest bit set. When translating this large unsigned number into its signed counterpart we will always get a negative number due to the set MSB.
As nos pointed out, when a sequence number wraps around from the max unsigned value back to the min, the macro will also return that the max value is < min using the above arithmetic, hence its usefulness.
On wrap-around a will be much greater than b. If we subtract them the result will also be very large, ie have its high-order bit set. If we then treat the result as a signed value the large difference will turn into a negative number, less than 0.
If you had 2 sequence numbers 2G apart it wouldn't work, but that's not going to happen.
Because it is first cast as a signed integer before it is compared to zero. Remember that the first bit reading from left to right determines the sign in a signed number but is used to increase the unsigned int's range by an extra bit.
Example: let's say you have a 4 bit unsigned number. This would mean 1001 is 17. But as a signed integer this would be -1.
Now lets say we did b0010 - b0100. This ends up with b1110. Unsigned this is 14, and signed this is -6.

Bitwise AND on signed chars

I have a file that I've read into an array of data type signed char. I cannot change this fact.
I would now like to do this: !((c[i] & 0xc0) & 0x80) where c[i] is one of the signed characters.
Now, I know from section 6.5.10 of the C99 standard that "Each of the operands [of the bitwise AND] shall have integral type."
And Section 6.5 of the C99 specification tells me:
Some operators (the unary operator ~ , and the binary operators << , >> , & , ^ , and | ,
collectively described as bitwise operators )shall have operands that have integral type.
These operators return
values that depend on the internal representations of integers, and
thus have implementation-defined aspects for signed types.
My question is two-fold:
Since I want to work with the original bit patterns from the file, how can I convert/cast my signed char to unsigned char so that the bit patterns remain unchanged?
Is there a list of these "implementation-defined aspects" anywhere (say for MVSC and GCC)?
Or you could take a different route and argue that this produces the same result for both signed and unsigned chars for any value of c[i].
Naturally, I will reward references to relevant standards or authoritative texts and discourage "informed" speculation.
As others point out, in all likelyhood your implementation is based on two's complement, and will give exactly the result you expect.
However, if you're worried about the results of an operation involving a signed value, and all you care about is the bit pattern, simply cast directly to an equivalent unsigned type. The results are defined under the standard:
6.3.1.3 Signed and unsigned integers
...
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
This is essentially specifying that the result will be the two's complement representation of the value.
Fundamental to this is that in two's complement maths the result of a calculation is modulo some power of two (i.e. the number of bits in the type), which in turn is exactly equivalent to masking off the relevant number of bits. And the complement of a number is the number subtracted from the power of two.
Thus adding a negative value is the same as adding any value which differs from the value by a multiple of that power of two.
i.e:
(0 + signed_value) mod (2^N)
==
(2^N + signed_value) mod (2^N)
==
(7 * 2^N + signed_value) mod (2^N)
etc. (if you know modulo, that should be pretty self-evidently true)
So if you have a negative number, adding a power of two will make it positive (-5 + 256 = 251), but the bottom 'N' bits will be exactly the same (0b11111011) and it will not affect the outcome of a mathematical operation. As values are then truncated to fit the type, the result is exactly the binary value you expected with even if the result 'overflows' (i.e. what you might think happens if the number was positive to start with - this wrapping is also well defined behaviour).
So in 8-bit two's complement:
-5 is the same as 251 (i.e 256 - 5) - 0b11111011
If you add 30, and 251, you get 281. But that's larger than 256, and 281 mod 256 equals 25. Exactly the same as 30 - 5.
251 * 2 = 502. 502 mod 256 = 246. 246 and -10 are both 0b11110110.
Likewise if you have:
unsigned int a;
int b;
a - b == a + (unsigned int) -b;
Under the hood, this cast is unlikely to be implemented with arithmetic and will certainly be a straight assignment from one register/value to another, or just optimised out altogether as the maths does not make a distinction between signed and unsigned (intepretation of CPU flags is another matter, but that's an implementation detail). The standard exists to ensure that an implementation doesn't take it upon itself to do something strange instead, or I suppose, for some weird architecture which isn't using two's complement...
unsigned char UC = *(unsigned char*)&C - this is how you can convert signed C to unsigned keeping the "bit pattern". Thus you could change your code to something like this:
!(( (*(unsigned char*)(c+i)) & 0xc0) & 0x80)
Explanation(with references):
761 When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.
1124 When applied to an operand that has type char, unsigned char, or signed char, (or a qualified version thereof) the result is 1.
These two implies that unsigned char pointer points to the same byte as original signed char pointer.
You appear to have something similar to:
signed char c[] = "\x7F\x80\xBF\xC0\xC1\xFF";
for (int i = 0; c[i] != '\0'; i++)
{
if (!((c[i] & 0xC0) & 0x80))
...
}
You are (correctly) concerned about sign extension of the signed char type. In practice, however, (c[i] & 0xC0) will convert the signed character to a (signed) int, but the & 0xC0 will discard any set bits in the more significant bytes; the result of the expression will be in the range 0x00 .. 0xFF. This will, I believe, apply whether you use sign-and-magnitude, one's complement or two's complement binary values. The detailed bit pattern you get for a specific signed character value varies depending on the underlying representation; but the overall conclusion that the result will be in the range 0x00 .. 0xFF is valid.
There is an easy resolution for that concern — cast the value of c[i] to an unsigned char before using it:
if (!(((unsigned char)c[i] & 0xC0) & 0x80))
The value c[i] is converted to an unsigned char before it is promoted to an int (or, the compiler might promote to int, then coerce to unsigned char, then promote the unsigned char back to int), and the unsigned value is used in the & operations.
Of course, the code is now merely redundant. Using & 0xC0 followed by & 0x80 is entirely equivalent to just & 0x80.
If you're processing UTF-8 data and looking for continuation bytes, the correct test is:
if (((unsigned char)c[i] & 0xC0) == 0x80)
"Since I want to work with the original bit patterns from the file,
how can I convert/cast my signed char to unsigned char so that the bit
patterns remain unchanged?"
As someone already explained in a previous answer to your question on the same topic, any small integer type, be it signed or unsigned, will get promoted to the type int whenever used in an expression.
C11 6.3.1.1
"If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is converted to
an int; otherwise, it is converted to an unsigned int. These are
called the integer promotions."
Also, as explained in the same answer, integer literals are always of the type int.
Therefore, your expression will boil down to the pseudo code (int) & (int) & (int). The operations will be performed on three temporary int variables and the result will be of type int.
Now, if the original data contained bits that may be interpreted as sign bits for the specific signedness representation (in practice this will be two's complement on all systems), you will get problems. Because these bits will be preserved upon promotion from signed char to int.
And then the bit-wise & operator performs an AND on every single bit regardless of the contents of its integer operand (C11 6.5.10/3), be it signed or not. If you had data in the signed bits of your original signed char, it will now be lost. Because the integer literals (0xC0 or 0x80) will have no bits set that corresponds to the sign bits.
The solution is to prevent the sign bits from getting transferred to the "temporary int". One solution is to cast c[i] to unsigned char, which is completely well-defined (C11 6.3.1.3). This will tell the compiler that "the whole contents of this variable is an integer, there are no sign bits to be concerned about".
Better yet, make a habit of always using unsigned data in every form of bit manipulations. The purist, 100% safe, MISRA-C compliant way of re-writing your expression is this:
if ( ((uint8_t)c[i] & 0xc0u) & 0x80u) > 0u)
The u suffix actually enforces the expression to be of unsigned int, but it is good practice to always cast to the intended type. It tells the reader of the code "I actually know what I am doing and I also understand all weird implicit promotion rules in C".
And then if we know our hex, (0xc0 & 0x80) is pointless, it is always true. And x & 0xC0 & 0x80 is always the same as x & 0x80. Therefore simplify the expression to:
if ( ((uint8_t)c[i] & 0x80u) > 0u)
"Is there a list of these "implementation-defined aspects" anywhere"
Yes, the C standard conveniently lists them in Appendix J.3. The only implementation-defined aspect you encounter in this case though, is the signedness implementation of integers. Which in practice is always two's complement.
EDIT:
The quoted text in the question is concerned with that the various bit-wise operators will produce implementation-defined results. This is just briefly mentioned as implementation-defined even in the appendix with no exact references. The actual chapter 6.5 doesn't say much regarding impl.defined behavior of & | etc. The only operators where it is explicitly mentioned is the << and >>, where left shifting a negative number is even undefined behavior, but right shifting it is implementation-defined.

Why am i Getting correct in bitwise shifting by a negative value? [duplicate]

I have C code in which I do the following.
int nPosVal = +0xFFFF; // + Added for ease of understanding
int nNegVal = -0xFFFF; // - Added for valid reason
Now when I try
printf ("%d %d", nPosVal >> 1, nNegVal >> 1);
I get
32767 -32768
Is this expected?
I am able to think something like
65535 >> 1 = (int) 32767.5 = 32767
-65535 >> 1 = (int) -32767.5 = -32768
That is, -32767.5 is rounded off to -32768.
Is this understanding correct?
It looks like your implementation is probably doing an arithmetic bit shift with two's complement numbers. In this system, it shifts all of the bits to the right and then fills in the upper bits with a copy of whatever the last bit was. So for your example, treating int as 32-bits here:
nPosVal = 00000000000000001111111111111111
nNegVal = 11111111111111110000000000000001
After the shift, you've got:
nPosVal = 00000000000000000111111111111111
nNegVal = 11111111111111111000000000000000
If you convert this back to decimal, you get 32767 and -32768 respectively.
Effectively, a right shift rounds towards negative infinity.
Edit: According to the Section 6.5.7 of the latest draft standard, this behavior on negative numbers is implementation dependent:The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
Their stated rational for this: The C89 Committee affirmed the freedom in implementation granted by K&R in not requiring the signed right shift operation to sign extend, since such a requirement might slow down fast code and since the usefulness of sign extended shifts is marginal. (Shifting a negative two’s complement
integer arithmetically right one place is not the same as dividing by two!)
So it's implementation dependent in theory. In practice, I've never seen an implementation not do an arithmetic shift right when the left operand is signed.
No, you don't get fractional numbers like 0.5 when working with integers. The results can be easily explained when you look at the binary representations of the two numbers:
65535: 00000000000000001111111111111111
-65535: 11111111111111110000000000000001
Bit shifting to the right one bit, and extending at the left (note that this is implementation dependant, thanks Trent):
65535 >> 1: 00000000000000000111111111111111
-65535 >> 1: 11111111111111111000000000000000
Convert back to decimal:
65535 >> 1 = 32767
-65535 >> 1 = -32768
The C specification does not specify if the sign bit is shifted over or not. It is implementation dependent.
When you right-shift, the least-significant-bit is discarded.
0xFFFF = 0 1111 1111 1111 1111, which right-shifts to give 0 0111 1111 1111 1111 = 0x7FFF
-0xFFFF = 1 0000 0000 0000 0001 (2s complement), which right-shifts to 1 1000 0000 0000 0000 = -0x8000
A-1: Yes. 0xffff >> 1 is 0x7fff or 32767. I'm not sure what -0xffff does. That's peculiar.
A-2: Shifting is not the same thing as dividing. It is bit shifting—a primitive binary operation. That it sometimes can be used for some types of division is convenient, but not always the same.
Beneath the C level, machines have a CPU core which is entirely integer or scalar. Although these days every desktop CPU has an FPU, this was not always the case and even today embedded systems are made with no floating point instructions.
Today's programming paradigms and CPU designs and languages date from the era where the FPU might not even exist.
So, CPU instructions implement fixed point operations, generally treated as purely integer ops. Only if a program declares items of float or double will any fractions exist. (Well, you can use the CPU ops for "fixed point" with fractions but that is now and always was quite rare.)
Regardless of what was required by a language standard committee years ago, all reasonable machines propagate the sign bit on right shifts of signed numbers. Right shifts of unsigned values shift in zeroes on the left. The bits shifted out on the right are dropped on the floor.
To further your understanding you will need to investigate "twos-complement arithmetic".

Resources