(((long)*(ptr)) << 1) >> 1; - c

One of my friends asked me this question,and I do else not know the meaning of the function.Maybe like the note above them /* sign-extend to 32 bits */.But I want to know the detail how the function realize role "sign-extend to 32 bits".
The function from Linux kernel. thx all.
Like #unwind said, the complete definition of the function is this:
/* Convert a prel31 symbol to an absolute address */
#define prel31_to_addr(ptr) \
({ \
/* sign-extend to 32 bits */ \
long offset = (((long)*(ptr)) << 1) >> 1; \
(unsigned long)(ptr) + offset; \
})
and it would be used in the function:
int __init unwind_init(void)
{
struct unwind_idx *idx;
/* Convert the symbol addresses to absolute values */
for (idx = __start_unwind_idx; idx < __stop_unwind_idx; idx++)
idx->addr = prel31_to_addr(&idx->addr);
pr_debug("unwind: ARM stack unwinding initialised\n");
return 0;
}

Look at where it's called here as an example:
else if ((idx->insn & 0x80000000) == 0)
/* prel31 to the unwind table */
ctrl.insn = (unsigned long *)prel31_to_addr(&idx->insn);
so, we know the ptr passed in dereferences to some value whose top (31st) bit is not set. That sort if ties in with the prel31 name, implying only (the low) 31 bits are used in this value.
To convert a signed 31-bit value into a signed 32-bit value, we need to fix up the top bit: there are still only 31 significant bits, but a negative value should have the top bit set. Setting the top bit to match the sign of the existing 31-bit value is sign extension.
By left-shifting one bit, the existing top bit is discarded; when we shift right again, the top bit will be filled to preserve the sign (so it will be 1 if the original 31-bit value was negative, and otherwise zero).
eg. 0x7FFFFFFF is a negative when interpreted as a 31-bit value (-1), but positive when interpreted as a 32-bit value (2,147,483,647). To get a 32-bit encoding with the same meaning as the 31-bit version, we:
shift left to discard the un-used top bit: 0x7FFFFFFF << 1 => 0xFFFFFFFE (which is now a negative 32-bit value)
shift right again to restore the original pattern in the low 31 bits, but fill the top bit dependent on the sign 0xFFFFFFFE >> 1 => 0xFFFFFFFF = -1
Note this (sign extension) behaviour is platform specific, but then so is all this code. To understand why it makes sense to do all this (rather than simply the meaning of sign extension, and what happens to the bit patterns) you'll need to research the addressing scheme being used.

I think this is what it does:
*ptr contains a signed 31bit value, with the sign bit at bit 30 (one less then MSB), so when you shift it left, the sign bit becomes at bit 31 (MSB), when you shift it back to the right, the sign bit will be 'extended' and will show up in bit 30 and bit 31.
So in short: it copies bit 30 to bit 31.

Related

Unknown system bitsize for int, how to create mask

I would like to create a mask for the MSB only, however the width of the int on the operating system is suppose to be unknown, so you cannot assume 32 bits.
see the following
// THE FOLLOWING FAILS BECAUSE OF SYSTEM IMPLEMENTING A LOGICAL
// RIGHT SHIFT
// Idea is
// 1. 0 inverted = all 1's
// 2. Arithmetic shift right
// 3. Then invert again to preseve MSB '1'
const int unsigned mask = ~(~0>>1); // FAIL, because of logic shift
Assuming 16 bit system
~0 give FFFF
~0>>1 give 7FFF
~(~0 >> 1) give 8000
You should add an u suffix to make what is shifted unsigned so that logical right shift is performed instead of arithmetic one.
const int unsigned mask = ~(~0u>>1);
You can just left shift the (unsigned) value 1 by the number of bits in the type minus 1 (i.e. for a 32-bit type, the MSB will be 1 << 31). To get the number of bits, use a combination of the sizeof operator and the CHAR_BIT constant (defined in <limits.h>):
const unsigned int MSB = 1u << (sizeof(unsigned int) * CHAR_BIT - 1);
INT_MAX is the int bit pattern of 0111...1111 (of some width)* for all implementations.
To form 1000...0000, invert those bits.
~INT_MAX
The above treads on undefined beahvior (UB).
Better to looks to unsigned or wider types.
unsigned mask = ~(unsigned) INT_MAX;
On rare machines, INT_MAX == UINT_MAX, so on those, look to wider types:
long long = ~(long long) INT_MAX;
On rarer machines (unheard of), INT_MAX == LONG_MAX is also true, then we are out of luck.
Pedantic: Rare machines use padding on int/unsigned, so best to drive code with (U)INT_MAX than sizeof.
* Maybe some padding bits too - very rare.

What does mask variable do in this CRC checksum calculation?

The question is about code in figure 14-6 in here.
The mask is calculated as:
mask = -(crc & 1)
Why do we & crc with 1 and then make result negative? The Figure 14-5 does not have this mask variable, why?
Edit:
So since this point is clear, why do we have this line also:
crc = crc ^ byte;
This line is not present in Figure 14-5.
Can this program be used if the generator polynomial length is not multiple of 8 bits?
What that does is to check the least significant bit of crc and then negating it. The effect is that if the bit is zero the mask will be zero (that is all zeroes) and if the bit is one the mask will be -1 (that is all ones). This is used to conditionally xor with 0xEDB88320.
The other solution instead uses if to make that condition.
The second trick they're using in the second solution is to do the xor for the bit check in one operation for all eight bits. In the first example they use (int)(crc^byte) < 0 (which means a check for the XOR of the most significant bit or the sign bit), they then shift both crc and byte one bit to the left and do the same on next bit. In the second example they do the XOR eight bits at a time and then checks each bit of the result.
To see what happens, consider if we change the first example to:
for(j=0; j<=7; j++) {
crc = crc ^ mask_sign_bit(byte);
if( (int)crc < 0 )
crc = (crc << 1) ^ 0x04C11DB7;
else
crc = crc << 1;
byte = byte << 1;
}
where mask_sign_bit masks out every bit except the sign bit, the sign of crc ^ byte becomes the same as crc ^ mask_sign_bit(byte) so the consequence of the if statement becomes the same. Then when shifting crc to the left one step the bit modified by crc = crc ^ mask_sign_bit(byte) will be lost.
This operation turns the least significant bit into a mask.
For example, for an 8-bit value (for simplicity) we have:
00000000 -> 00000000
00000001 -> 11111111
Using unary minus complicates the circuitry of the CRC function massively, which otherwise requires no addition operations. It can be implemented as function of addition, as follows
-x = ~x + 1
Some architectures might support a bit-vector "broadcast" operation, to send the least significant bits to all others bits, which will give huge performance gain.

How to find the nth bit of an integer in C

I've got an assignment where I need to convert from an 8 bit sign magnitude number to two's complement and then add those two numbers. I've got a relatively good idea as to how to do this, however I can't work out how to find the eighth bit of an integer such that I can tell what sign the number has.
The overall idea is that should the sign bit be 0 just return the number as it is already in two's complement if it is a one though then I want to set it to 0 before inverting all bits with the ~ operator and then add 1.
Thanks in advance
You can check if the high bit is set by creating a mask that has just that bit set and using a logical AND to see if the result is non-zero.
Once you know the high bit is set, you can convert to twos complement by flipping all bits and adding one.
uint8_t x = (some value)
if (x & (1 << 7)) {
printf("sign bit set\n");
x = (uint8_t)((~(x & (0x7F))) & 0xFF) + 1;
printf("converted value: %02X\n", x);
}
Then you can add this number to any other normally.
Assuming that your computer/compiler uses two's complement (almost certainly the case) and assuming that you want the result to be in two's complement.
Use an uint8_t to hold the sign and magnitude number.
To check if a bit is set, use the bitwise AND operator &, together with a bit mask corresponding to the msb. To get a bit mask corresponding to bit n, left shift the value 1 n times. In C code:
#define SIGN (1 << 7)
uint8_t sm = ...;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
}
else // it was zero, the SIGN bit is not set
{
}
To do the actual conversion, there are several ways. I simply would mask out and copy the relevant parts of the number, again with bitwise AND:
#define MAGNITUDE 0x7F
int8_t magnitude = sm & MAGNITUDE; // variable magnitude is two's compl.
EDIT complete solution (since someone already posted one):
#define SIGN (1 << 7)
#define MAGNITUDE 0x7F
uint8_t sm = ...;
int8_t twos_compl = sm & MAGNITUDE;
if(sm & SIGN) // if non-zero, then the SIGN bit is set
{
twos_compl = -twos_compl;
}
int8_t x = ...; // some other number in two's complement
int16_t result = twos_compl + x;
As a side note, be very careful when mixing the ~ operator with small integer types, because it performs an implicit integer promotion. For example uint8_t x = 1 and then ~my_uint8 gives you 0xFFFFFFFE (32 bit system) and not 0xFE as you might expect.
For the above task, there is no need to use ~ at all.

How to sign extend a 9-bit value when converting from an 8-bit value?

I'm implementing a relative branching function in my simple VM.
Basically, I'm given an 8-bit relative value. I then shift this left by 1 bit to make it a 9-bit value. So, for instance, if you were to say "branch +127" this would really mean, 127 instructions, and thus would add 256 to the IP.
My current code looks like this:
uint8_t argument = 0xFF; //-1 or whatever
int16_t difference = argument << 1;
*ip += difference; //ip is a uint16_t
I don't believe difference will ever be detected as a less than 0 with this however. I'm rusty on how signed to unsigned works. Beyond that, I'm not sure the difference would be correctly be subtracted from IP in the case argument is say -1 or -2 or something.
Basically, I'm wanting something that would satisfy these "tests"
//case 1
argument = -5
difference -> -10
ip = 20 -> 10 //ip starts at 20, but becomes 10 after applying difference
//case 2
argument = 127 (must fit in a byte)
difference -> 254
ip = 20 -> 274
Hopefully that makes it a bit more clear.
Anyway, how would I do this cheaply? I saw one "solution" to a similar problem, but it involved division. I'm working with slow embedded processors (assumed to be without efficient ways to multiply and divide), so that's a pretty big thing I'd like to avoid.
To clarify: you worry that left shifting a negative 8 bit number will make it appear like a positive nine bit number? Just pad the top 9 bits with the sign bit of the initial number before left shift:
diff = 0xFF;
int16 diff16=(diff + (diff & 0x80)*0x01FE) << 1;
Now your diff16 is signed 2*diff
As was pointed out by Richard J Ross III, you can avoid the multiplication (if that's expensive on your platform) with a conditional branch:
int16 diff16 = (diff + ((diff & 0x80)?0xFF00:0))<<1;
If you are worried about things staying in range and such ("undefined behavior"), you can do
int16 diff16 = diff;
diff16 = (diff16 | ((diff16 & 0x80)?0x7F00:0))<<1;
At no point does this produce numbers that are going out of range.
The cleanest solution, though, seems to be "cast and shift":
diff16 = (signed char)diff; // recognizes and preserves the sign of diff
diff16 = (short int)((unsigned short)diff16)<<1; // left shift, preserving sign
This produces the expected result, because the compiler automatically takes care of the sign bit (so no need for the mask) in the first line; and in the second line, it does a left shift on an unsigned int (for which overflow is well defined per the standard); the final cast back to short int ensures that the number is correctly interpreted as negative. I believe that in this form the construct is never "undefined".
All of my quotes come from the C standard, section 6.3.1.3. Unsigned to signed is well defined when the value is within range of the signed type:
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
Signed to unsigned is well defined:
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Unsigned to signed, when the value lies out of range isn't too well defined:
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Unfortunately, your question lies in the realm of point 3. C doesn't guarantee any implicit mechanism to convert out-of-range values, so you'll need to explicitly provide one. The first step is to decide which representation you intend to use: Ones' complement, two's complement or sign and magnitude
The representation you use will affect the translation algorithm you use. In the example below, I'll use two's complement: If the sign bit is 1 and the value bits are all 0, this corresponds to your lowest value. Your lowest value is another choice you must make: In the case of two's complement, it'd make sense to use either of INT16_MIN (-32768) or INT8_MIN (-128). In the case of the other two, it'd make sense to use INT16_MIN - 1 or INT8_MIN - 1 due to the presense of negative zeros, which should probably be translated to be indistinguishable from regular zeros. In this example, I'll use INT8_MIN, since it makes sense that (uint8_t) -1 should translate to -1 as an int16_t.
Separate the sign bit from the value bits. The value should be the absolute value, except in the case of a two's complement minimum value when sign will be 1 and the value will be 0. Of course, the sign bit can be where-ever you like it to be, though it's conventional for it to rest at the far left hand side. Hence, shifting right 7 places obtains the conventional "sign" bit:
uint8_t sign = input >> 7;
uint8_t value = input & (UINT8_MAX >> 1);
int16_t result;
If the sign bit is 1, we'll call this a negative number and add to INT8_MIN to construct the sign so we don't end up in the same conundrum we started with, or worse: undefined behaviour (which is the fate of one of the other answers).
if (sign == 1) {
result = INT8_MIN + value;
}
else {
result = value;
}
This can be shortened to:
int16_t result = (input >> 7) ? INT8_MIN + (input & (UINT8_MAX >> 1)) : input;
... or, better yet:
int16_t result = input <= INT8_MAX ? input
: INT8_MIN + (int8_t)(input % (uint8_t) INT8_MIN);
The sign test now involves checking if it's in the positive range. If it is, the value remains unchanged. Otherwise, we use addition and modulo to produce the correct negative value. This is fairly consistent with the C standard's language above. It works well for two's complement, because int16_t and int8_t are guaranteed to use a two's complement representation internally. However, types like int aren't required to use a two's complement representation internally. When converting unsigned int to int for example, there needs to be another check, so that we're treating values less than or equal to INT_MAX as positive, and values greater than or equal to (unsigned int) INT_MIN as negative. Any other values need to be handled as errors; In this case I treat them as zeros.
/* Generate some random input */
srand(time(NULL));
unsigned int input = rand();
for (unsigned int x = UINT_MAX / ((unsigned int) RAND_MAX + 1); x > 1; x--) {
input *= (unsigned int) RAND_MAX + 1;
input += rand();
}
int result = /* Handle positives: */ input <= INT_MAX ? input
: /* Handle negatives: */ input >= (unsigned int) INT_MIN ? INT_MIN + (int)(input % (unsigned int) INT_MIN)
: /* Handle errors: */ 0;
If the offset is in the 2's complement representation, then
convert this
uint8_t argument = 0xFF; //-1
int16_t difference = argument << 1;
*ip += difference;
into this:
uint8_t argument = 0xFF; //-1
int8_t signed_argument;
signed_argument = argument; // this relies on implementation-defined
// conversion of unsigned to signed, usually it's
// just a bit-wise copy on 2's complement systems
// OR
// memcpy(&signed_argument, &argument, sizeof argument);
*ip += signed_argument + signed_argument;

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Resources