In the function outb(0x3D5, (uint8_t) (pos & 0xFF));, I am having trouble understanding the purpose of the bitwise operation. pos is a uint16_t variable.
What is the purpose of the bitwise & with simply 8 bits of 1. The & operation would just compare this value with 8 bits of 1, which to my knowledge, would just result in the same value. How does this operation change the value of pos?
For background if it is important, the function outb is intended to move the value of the right argument into the register indexed by the port (left argument). In this case, pos is a position in VGA adapter memory. The function outb uses inline assembly and the assembly instruction out.
The & operation would just compare this value with 8 bits of 1s, which to my knowledge, would just result in the same value.
The & operation does not “compare” values. It performs a bitwise AND. When the 16-bit pos is ANDed with 0xFF, the result is the low eight bits of pos. Thus, if pos is 0x1234u, then pos & 0xFF will produce the value of 0x34u.
The conversion to uint8_t with the cast (uint8_t) would produce the same value, because a conversion of an integer to uint8_t wraps the value modulo 28, which, for non-negative integers, is equivalent to taking the low eight bits.
Thus, these expressions all produce the same value:
pos & 0xFF,
(uint8_t) pos, and
(uint8_t) (pos & 0xFF).
(For negative integers, wrapping modulo 28 and taking the low eight bits are not equivalent if the C implementation uses sign-and-magnitude or one’s complement representations, which are allowed by the C standard but are very rare these days. Of course, no uint16_t values are negative.)
Related
Does anyone know what does &- in C programming?
limit= address+ (n &- sizeof(uint));
This isn't really one operator, but two:
(n) & (-sizeof(uint))
i.e. this is performing a bitwise and operation between n and -sizeof(uint).
What does this mean?
Let's assume -sizeof(uint) is -4 - then by two's complement representation, -sizeof(uint) is 0xFFFFFFFC or
1111 1111 1111 1111 1111 1111 1111 1100
We can see that this bitwise and operation will zero-out the last two bits of n. This effectively aligns n to the lowest multiple of sizeof(uint).
&- is the binary bitwise AND operator written together with - which is the unary minus operator. Operator precedence (binary & having lowest precedence here) gives us the operands of & as n and -sizeof(uint).
The purpose is to create a bit mask in a very obscure way, relying on unsigned integer arithmetic. Assuming uint is 4 bytes (don't use homebrewed types either btw, use stdint.h), then the code is equivalent to this
n & -(size_t)4
size_t being the type returned by sizeof, which is guaranted to be a large, unsigned integer type. Applying unary minus on unsigned types is of course nonsense too. Though even if it is obscure, applying minus on unsigned arithmetic results in well-defined wrap-around1), so in case of the value 4, we get 0xFFFFFFFFFFFFFFFC on a typical PC where size_t is 64 bits.
n & 0xFFFFFFFFFFFFFFFC will mask out everything but the 2 least significant bits.
What the relation between these 2 bits and the size of the type used is, I don't know. I guess that the purpose is to store something equivalent to the type's size in bytes in that area. Something with 4 values will fit in the two least significant bits: binary 0, 1, 10, 11. (The purpose could maybe be masking out misaligned addresses or some such?)
Assuming I guessed correct, we can write the same code without any obfuscation practices as far more readable code:
~(sizeof(uint32_t)-1)
Which gives us 4-1 = 0x3, ~0x3 = 0xFFFF...FC. Or in case of 8 byte types, 0xFFFF...F8. And so on.
So I'd rewrite the code as
#include <stdint.h>
uint32_t mask = ~(sizeof(uint32_t)-1);
limit = address + (n & mask);
1) C17 6.3.1.3
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type60)
Where the foot note 60) says:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
In this case repeatedly subtracting SIZE_MAX+1 from 4 until the value is in range of what can fit inside a size_t variable.
The problem is simple:
Take an 32-bit or 64-bit integer and split it up to send over an (usually)1-byte interface like uart, spi or i2c.
To do this I can easily use bit masking and shifting to get what I want. However, I want this to be portable that will work on big and little endian, but also make it work for platforms that don't discard bits but rotate through carry(masking gets rid of excess bits right?).
Example code:
uint32_t value;
uint8_t buffer[4];
buffer[0] = (value >> 24) & 0xFF;
buffer[1] = (value >> 16) & 0xFF;
buffer[2] = (value >> 8) & 0xFF;
buffer[3] = value & 0xFF;
I want to guarantee this works on any platform that supports 32 bit integers or more. I don't know if this is correct.
The code you presented is the most portable way of doing it. You convert a single unsigned integer value with 32 bits width into an array of unsigned integer values of exactly 8 bits width. The resulting bytes in the buffer array are in big endian order.
The masking is not needed. From C11 6.5.7p5:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has
a signed type and a nonnegative value, the value of the result is the integral part of the quotient of
E1 / 2^E2.
and casting to an integer with 8 bits width is (to the value) equal to masking 8 bits. So (result >> 24) & 0xff is equal to (uint8_t)(result >> 24) (to the value). As you assign to uint8_t variable the masking is not needed. Anyway I would safely assume it will be optimized out by a sane compiler.
I can recommend to take a look at one implementation that I remembered, that I guess has implemented in a really safe manner all the possible variants of splitting and composing fixed-width integers up to 64 bits from bytes and back, that is at gpsd bits.h.
I am writing code that may run on architectures of different word size (32-bit, 64-bit, etc) and I want to clear the low byte of a value. There is a macro (MAX) that is set to the maximum value of a word. So, for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values. If I have a word-sized variable that may be signed or unsigned, how can I clear the low byte of the variable with a single expression (no branching)?
My first idea was:
value & ~( MAX - 0xFF )
but this does not appear to work for signed values. My other thought was:
value = value - (value & 0xFF)
which has the disadvantage that it requires a stack operation.
To clear low byte, when not knowing the integer type width can result in incorrect code. So code should be careful.
Consider the below where value is wider than int/unsigned. 0xFF is an int constant with the value 255. ~0xFF is then that value with its bit inverted. With common 2's complemented, that would be -256 with its upper bits set as FF...FF00. -256 converted to a wider signed type retains its value and pattern FF...FF00. -256 converted to a wider unsigned type becomes Uxxx_MAX + 1 - 256, agian with the bit pattern FF...FF00. In both cases, the & will retain the uppers bits and clear the lower 8.
value_low_8bits_cleared = value & ~0xFF;
An alternative is to do all masking operation with unsigned math to avoid unexpected properties of int math and int encodings.
The below has no concerns about sign extension, int overflow. An optimizing compiler will certainly emit efficient code with a simply and mask. Further, there is no need to code the correct matching max value corresponding to value.
value_low_8bits_cleared = (value | 0xFFu) ^ 0xFFu;
here is the easy way to clear the low order 8 bits:
value &= ~0xFF;
I am writing code that may run on architectures of different word size
(32-bit, 64-bit, etc) and I want to clear the low byte of a value.
There is a macro (MAX) that is set to the maximum value of a word. So,
for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit
system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values.
Although C is designed so that implementations can take machine word size into account, the language itself has no inherent sense of machine words. C cares instead about types, and that makes a difference.
Anyway, I take you exactly at your word that you arrange for the replacement text of macro MAX to be one of the two alternatives you give, depending on the architecture of the machine. Note well that when that replacement text is interpreted as an integer constant, its type may vary between C implementations, and maybe even depending on compiler options.
If I have a
word-sized variable that may be signed or unsigned, how can I clear
the low byte of the variable with a single expression (no branching)?
The only reason I see for needing a single expression that cannot take the actual type of value explicitly into account is that you want to use the expression in a macro itself. In that case, you need to take great care around type conversions, especially when you have to account for signed types. This makes your MAX macro uncomfortable to work with for your purpose.
I'm inclined to suggest a different approach:
(value | 0xFF) ^ 0xFF
The constant 0xFF will be interpreted as a (signed) int with a positive value. Provided that value's type is not smaller than int, both appearances of 0xFF will be converted to that type without change in value, whether that type is signed or unsigned. Furthermore, the result of each operation and of the overall expression then has the same type as value, so no unexpected conversions occur.
How about
value & ~((intptr_t)0xFF)
First you want a mask that has all bits on, but those of the lower order byte
MAX ^ 0xFF
This converts 0xFF to the same type as MAX and then does the exclusive or with that value. Because MAX has all low order bits 1 these then become 0 and the high order bits stay as they are, that is 1.
Then you have to pull that mask over the value that interests you
value & ( MAX ^ 0xFF )
I am going through 'The C language by K&R'. Right now I am doing the bitwise section. I am having a hard time in understanding the following code.
int mask = ~0 >> n;
I was playing on using this to mask n left side of another binary like this.
0000 1111
1010 0101 // random number
My problem is that when I print var mask it still negative -1. Assuming n is 4. I thought shifting ~0 which is -1 will be 15 (0000 1111).
thanks for the answers
Performing a right shift on a negative value yields an implementation defined value. Most hosted implementations will shift in 1 bits on the left, as you've seen in your case, however that doesn't necessarily have to be the case.
Unsigned types as well as positive values of signed types always shift in 0 bits on the left when shifting right. So you can get the desired behavior by using unsigned values:
unsigned int mask = ~0u >> n;
This behavior is documented in section 6.5.7 of the C standard:
5 The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative
value, the value of the result is the integral part of the quotient
of E1 / 2E2 .If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
Right-shifting negative signed integers is an implementation-defined behavior, which is usually (but not always) filling the left with ones instead of zeros. That's why no matter how many bits you've shifted, it's always -1, as the left is always filled by ones.
When you shift unsigned integers, the left will always be filled by zeros. So you can do this:
unsigned int mask = ~0U >> n;
^
You should also note that int is typically 2 or 4 bytes, meaning if you want to get 15, you need to right-shift 12 or 28 bits instead of only 4. You can use a char instead:
unsigned char mask = ~0U;
mask >>= 4;
In C, and many other languages, >> is (usually) an arithmetic right shift when performed on signed variables (like int). This means that the new bit shifted in from the left is a copy of the previous most-significant bit (MSB). This has the effect of preserving the sign of a two's compliment negative number (and in this case the value).
This is in contrast to a logical right shift, where the MSB is always replaced with a zero bit. This is applied when your variable is unsigned (e.g. unsigned int).
From Wikipeda:
The >> operator in C and C++ is not necessarily an arithmetic shift. Usually it is only an arithmetic shift if used with a signed integer type on its left-hand side. If it is used on an unsigned integer type instead, it will be a logical shift.
In your case, if you plan to be working at a bit level (i.e. using masks, etc.) I would strongly recommend two things:
Use unsigned values.
Use types with specific sizes from <stdint.h> like uint32_t
This is a two fold question .
I have been reading up on the intricacies of how compilers process code and I am having this confusion. Both processes seem to be following the same logic of sign extension for signed integers. So is conversion simply implemented as an arithmetic right shift?
One of the examples states a function as
Int Fun1(unsigned word ) {
Return (int) ((word << 24) >> 24 );
}
The argument passed is 0x87654321.
Since this would be signed when converted to binary, how would the shift happen? My logic was that the left shift should extract the last 8 bits leaving 0 as the MSB and this would then we extended while right shifting. Is this logic correct?
Edit: I understand that the downvote is probably due to unspecified info. Assume a 32 bit big endian machine with two's complement for signed integers.
Given OP's "Assume a 32 bit ... machine with two's complement for signed integers." (This implies 32-bit unsigned)
0x87654321 Since this would be signed when converted to binary
No. 0x87654321 is a hexadecimal constant. It has the type unsigned. It is not signed.
// Int Fun1(unsigned word ) {
int Fun1(unsigned word) {
Return (int) ((word << 24) >> 24 );
}
Fun1(0x87654321) results in unsigned word having the value of 0x87654321. No type nor value conversion occurred.
word << 24 has the value of 0x21000000 and still the type of unsigned.
(word << 24) >> 24 has the value of 0x21 and still the type of unsigned.
Casting to int retains the same value of 0x21, but now type int.
So is conversion simply implemented as an arithmetic right shift ?
Doubtful, as no signed shifting is coded. C does not specify how the compiler realizes the C code. A pair of shifts, or a mask or a multiply/divide may have occurred.