Get byte - how is this wrong?

Get byte - how is this wrong? - c

I want to get the designated byte from a 32 bit integer. I am getting wrong values but I don't know why.
The restrictions to this problem are:
Must use signed bits, and I can't use multiplication.
I specifically need to know what is wrong with the function as it's below.
Here is the function:
int retrieveByteFromWord(int word, int byte)
{
return (word >> (byte << 3)) & 0xFF;
}
ex:
(3) (2) (1) (0) ------ byte number
In word: 10010011 11001100 00110011 10101000
I want to return byte 2 (1100 1100).
retrieveByteFromWord(word, 2) ---- gives: 1100 1100
But for some cases it's wrong and it won't tell me what case.
Any ideas?
Here is the problem:
You just started working for a company that is implementing a set of procedures to operate on a data structure where 4 signed bytes are packed into a 32 bit unsigned. Bytes within the word are numbered from 0(LSB) to 3(MSB). You have been assigned the task of implementing a function for a machine using 2's complement arithmetic and arithmetic right shifts with the following prototype:
typedef unsigned packed_t
int xbyte(packed_t word, int bytenum);
This is the previous employees attempt which got him fired for being wrong:
int xbyte(packed_t word, int bytenum)
{
return (word >> (bytenum << 3)) & 0xFF;
}
A) What is wrong with the code?
B) Write a correct implementation using only left and right shifts and one subtraction.
I have done B but still don't know why A is wrong. Is it because the decimal numbers going in like 12, 15, 19, 55 and then getting packed into a word and then when I extract them they aren't the same number anymore??? It might be so I am going to run some tests real fast...

As this is homework I won't give you a full answer, but I'll point you in the right direction. Your problem statement says that:
4 signed bytes are packed into a 32 bit unsigned.
When you bitwise & a 32 bit signed integer with 0xFF the most significant bit - i.e. the sign bit - of the result is always 0, so the original function never returns a negative value regardless of the input.
By way of example...
When you say "retrieveByteFromWord(word, 2) ---- gives: 11001100" you're wrong.
Your return type is a 32 bit integer - not an 8 bit integer. You're not returning 11001100 you're returning 00000000 00000000 00000000 11001100.

To work with numbers, use signed integer types such as int.
To work with bits, use unsigned integer types such as unsigned. I.e. let the word argument be of type unsigned. That is what the unsigned types are for.
To multiply by 8, write just *8 (this does not mean that that part of the code is technically wrong, just that it is artificially contrived and needlessly unreadable).
Even better, create a self-describing name for that magic number 8, e.g. *bitsPerByte (the standard library calls it CHAR_BIT, which is not particularly self-describing nor readable).
Finally, at the design level, think about designing your functions so that the code that uses a function of yours – each call – becomes clear and readable. E.g. like int const b = byteAt( 2, x );. That can prevent bugs by e.g. preventing wrong actual argument order, and since designing for readability makes the code easier to read, it reduces time spent on that. :-)
Cheers & hth.,

Works fine for positive numbers. You may want to cast word to unsigned to make it work for integers with the MSB set.
int retrieveByteFromWord(int word, int byte)
{
return ((unsigned)word >> (byte << 3)) & 0xFF;
}

Related

AVR uint8_t doesn't get correct value

I have a uint8_t that should contain the result of a bitwise calculation. The debugger says the variable is set correctly, but when i check the memory, the var is always at 0. The code proceeds like the var is 0, no matter what the debugger tells me. Here's the code:
temp = (path_table & (1 << current_bit)) >> current_bit;
//temp is always 0, debugger shows correct value
if (temp > 0) {
DS18B20_send_bit(pin, 0x01);
} else {
DS18B20_send_bit(pin, 0x00);
}
Temp's a uint8_t, path_table's a uint64_t and current_bit's a uint8_t. I've tried to make them all uint64_t but nothing changed. I've also tried using unsigned long long int instead. Nothing again.
The code always enters the else clause.
Chip's Atmega4809, and uses uint64_t in other parts of the code with no issues.
Note - If anyone knows a more efficient/compact way to extract a single bit from a variable i would really appreciate if you could share ^^

1 is an integer constant, of type int. The expression 1 << current_bit also has type int, but for 16-bit int, the result of that expression is undefined when current_bit is larger than 14. The behavior being undefined in your case, then, it is plausible that your debugger presents results for the overall expression that seem inconsistent with the observed behavior. If you used an unsigned int constant instead, i.e. 1u, then the resulting value of temp would be well defined as 0 whenever current_bit was greater than 15, because the result of the left shift would be zero.
Solve this problem by performing the computation in a type wide enough to hold the result. Here's a compact, correct, and pretty clear way to correct your code to do that:
DS18B20_send_bit(pin, (path_table & (((uint64_t) 1) << current_bit)) != 0);
Or if path_table has an unsigned type then I prefer this, though it's more of a departure from your original:
DS18B20_send_bit(pin, (path_table >> current_bit) & 1);

Realization #1 here is that AVR is 1980-1990s technology core. It is not a x64 PC that chews 64 bit numbers for breakfast, but an extremely inefficient 8-bit MCU. As such:
It likes 8 bit arithmetic.
It will struggle with 16 bit arithmetic, by doing tricks with 16 bit index registers, double accumulators or whatever 8 bit core tricks it prefers to do.
It will literally take ages to execute 32 bit arithmetic, by invoking software libraries inline.
It will probably melt through the floor if attempting 64 bit arithmetic.
Before you do anything else, you need to get rid of all 64 bit arithmetic and radically minimize the use of 32 bit arithmetic. Period. There should be no single variable of uint64_t in your code or you are doing it very very wrong.
With this revelation also comes that all 8 bit MCUs always have an int type which is 16 bits.
In the code 1<<current_bit, the integer constant 1 is of type int. Meaning that if current_bit is 15 or larger, you will shift bits into the sign bit of this temporary int. This is always a bug. Strictly speaking this is undefined behavior. In practice, you might end up with random change of sign of your numbers.
To avoid this, never use any form of bitwise operators on signed numbers. When mixing integer constants such as 1 with bitwise operators, change them to 1u to avoid bugs like the one mentioned.
If anyone knows a more efficient/compact way to extract a single bit from a variable i would really appreciate if you could share
The most efficient way in C is: uint8_t variable; ... if(variable & (1u << bits)). This should translate to the relevant "branch if bit set" instruction.
My general advise would be find your tool chain's disassembler and see what machine code that the C code actually generated. You don't have to be an assembler guru to read it, peeking at the instruction set should be enough.

Printing actual bit representation of integers in C

I wanted to print the actual bit representation of integers in C. These are the two approaches that I found.
First:
union int_char {
int val;
unsigned char c[sizeof(int)];
} data;
data.val = n1;
// printf("Integer: %p\nFirst char: %p\nLast char: %p\n", &data.f, &data.c[0], &data.c[sizeof(int)-1]);
for(int i = 0; i < sizeof(int); i++)
printf("%.2x", data.c[i]);
printf("\n");
Second:
for(int i = 0; i < 8*sizeof(int); i++) {
int j = 8 * sizeof(int) - 1 - i;
printf("%d", (val >> j) & 1);
}
printf("\n");
For the second approach, the outputs are 00000002 and 02000000. I also tried the other numbers and it seems that the bytes are swapped in the two. Which one is correct?

Welcome to the exotic world of endian-ness.
Because we write numbers most significant digit first, you might imagine the most significant byte is stored at the lower address.
The electrical engineers who build computers are more imaginative.
Someimes they store the most significant byte first but on your platform it's the least significant.
There are even platforms where it's all a bit mixed up - but you'll rarely encounter those in practice.
So we talk about big-endian and little-endian for the most part. It's a joke about Gulliver's Travels where there's a pointless war about which end of a boiled egg to start at. Which is itself a satire of some disputes in the Christian Church. But I digress.
Because your first snippet looks at the value as a series of bytes it encounters then in endian order.
But because the >> is defined as operating on bits it is implemented to work 'logically' without regard to implementation.
It's right of C to not define the byte order because hardware not supporting the model C chose would be burdened with an overhead of shuffling bytes around endlessly and pointlessly.
There sadly isn't a built-in identifier telling you what the model is - though code that does can be found.
It will become relevant to you if (a) as above you want to breakdown integer types into bytes and manipulate them or (b) you receive files for other platforms containing multi-byte structures.
Unicode offers something called a BOM (Byte Order Marker) in UTF-16 and UTF-32.
In fact a good reason (among many) for using UTF-8 is the problem goes away. Because each component is a single byte.
Footnote:
It's been pointed out quite fairly in the comments that I haven't told the whole story.
The C language specification admits more than one representation of integers and particularly signed integers. Specifically signed-magnitude, twos-complement and ones-complement.
It also permits 'padding bits' that don't represent part of the value.
So in principle along with tackling endian-ness we need to consider representation.
In principle. All modern computers use twos complement and extant machines that use anything else are very rare and unless you have a genuine requirement to support such platforms, I recommend assuming you're on a twos-complement system.

The correct Hex representation as string is 00000002 as if you declare the integer with hex represetation.
int n = 0x00000002; //n=2
or as you where get when printing integer as hex like in:
printf("%08x", n);
But when printing integer bytes 1 byte after the other, you also must consider the endianess, which is the byte order in multi-byte integers:
In big endian system (some UNIX system use it) the 4 bytes will be ordered in memory as:
00 00 00 02
While in little endian system (most of OS) the bytes will be ordered in memory as:
02 00 00 00

The first prints the bytes that represent the integer in the order they appear in memory. Platforms with different endian will print different results as they store integers in different ways.
The second prints the bits that make up the integer value most significant bit first. This result is independent of endian. The result is also independent of how the >> operator is implemented for signed ints as it does not look at the bits that may be influenced by the implementation.
The second is a better match to the question "Printing actual bit representation of integers in C". Although there is a lot of ambiguity.

It depends on your definition of "correct".
The first one will print the data exactly like it's laid out in memory, so I bet that's the one you're getting the maybe unexpected 02000000 for. *) IMHO, that's the correct one. It could be done simpler by just aliasing with unsigned char * directly (char pointers are always allowed to alias any other pointers, in fact, accessing representations is a usecase for char pointers mentioned in the standard):
int x = 2;
unsigned char *rep = (unsigned char *)&x;
for (int i = 0; i < sizeof x; ++i) printf("0x%hhx ", rep[i]);
The second one will print only the value bits **) and take them in the order from the most significant byte to the least significant one. I wouldn't call it correct because it also assumes that bytes have 8 bits, and because the shifting used is implementation-defined for negative numbers. ***) Furthermore, just ignoring padding bits doesn't seem correct either if you really want to see the representation.
edit: As commented by Gerhardh meanwhile, this second code doesn't print byte by byte but bit by bit. So, the output you claim to see isn't possible. Still, it's the same principle, it only prints value bits and starts at the most significant one.
*) You're on a "little endian" machine. On these machines, the least significant byte is stored first in memory. Read more about Endianness on wikipedia.
**) Representations of types in C may also have padding bits. Some types aren't allowed to include padding (like char), but int is allowed to have them. This second option doesn't alias to char, so the padding bits remain invisible.
***) A correct version of this code (for printing all the value bits) must a) correctly determine the number of value bits (8 * sizeof int is wrong because bytes (char) can have more then 8 bits, even CHAR_BIT * sizeof int is wrong, because this would also count padding bits if present) and b) avoid the implementation-defined shifting behavior by first converting to unsigned. It could look for example like this:
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL *30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 + 4-12/((m)%31+3))
int main(void)
{
int x = 2;
for (unsigned mask = 1U << (IMAX_BITS((unsigned)-1) - 1); mask; mask >>= 1)
{
putchar((unsigned) x & mask ? '1' : '0');
}
puts("");
}
See this answer for an explanation of this strange macro.

Getting the negative integer from a two's complement value Embedded C

I know that many had similar questions over here about converting from/to two's complement format and I tried many of them but nothing seems to help in my case.
Well, I'm working on an embedded project that involves writing/reading registers of a slave device over SPI. The register concerned here is a 22-bit position register that stores the uStep value in two's complement format and it ranges from -2^21 to +2^21 -1. The problem is when I read the register, I get a big integer that has nothing to do with the actual value.
Example:
After sending a command to the slave to move 4000 steps (forward/positive), I read the position register and I get exactly 4000. However, if I send a reverse move command, say -1, and then read the register, the value I get is something like 4292928. I believe it's the negative offset of the register as the two's complement has no zero. I have no problem sending a negative integer to the device to move x number of steps, however, getting the actual negative integer from the value retrieved is something else.
I know that this involves two's complement but the question is, how to get the actual negative integer out of that strange value? I mean, if I moved the device -4000 steps, what I have to do to get the exact value for the negative steps moved so far from my register?

You need to sign-extend bit 21 through the bits to the left.
For negative values when bit 21 is set, you can do this by ORring the value with 0xFFC00000.
For positive values when bit 21 is clear, you can ensure by ANDing the value with 0x003FFFFF.

The solutions by Clifford and Weather Vane assume the target machine is two's-complement. This is very likely true, but a solution that removes this dependency is:
static const int32_t sign_bit = 0x00200000;
int32_t pos_count = (getPosRegisterValue() ^ sign_bit) - sign_bit;
It has the additional advantage of being branch-free.

The simplest method perhaps is simply to shift the position value left by 10 bits and assign to an int32_t. You will then have a 32 bit value and the position will be scaled up by 210 (1024), and have 32 bit resolution, but 10 bit granularity, which normally shouldn't matter since the position units are entirely arbitrary in any case, and can be converted to real-world units if necessary taking into account the scaling:
int32_t pos_count = (int32_t)(getPosRegisterValue() << 10) ;
Where getPosRegisterValue() returns a uint32_t.
If you do however want to retain 22 bit resolution then it is simply a case of dividing the value by 1024:
int32_t pos_count = (int32_t)(getPosRegisterValue() << 10)) / 1024 ;
Both solutions rely in the implementation-defined behaviour of casting a uint32_t of value not representable in an int32_t; but one a two's complement machine any plausible implementation will not modify the bit-pattern and the result will be as required.
Another perhaps less elegant solution also retaining 22 bit resolution and single bit granularity is:
int32_t pos_count = getPosRegisterValue() ;
// If 22 bit sign bit set...
if( (pos_count & 0x00200000) != 0)
{
// Sign-extend to 32bit
pos_count |= 0xFFC00000 ;
}
It would be wise perhaps to wrap the solution is a function to isolate any implementation defined behaviour:
int32_t posCount()
{
return (int32_t)(getPosRegisterValue() << 10)) / 1024 ;
}

How to copy MSB to rest of the byte?

In an interrupt subroutine (called every 5 µs), I need to check the MSB of a byte and copy it to the rest of the byte.
I need to do something like:
if(MSB == 1){byte = 0b11111111}
else{byte = 0b00000000}
I need it to make it fast as it is on an interrupt subroutine, and there is some more code on it, so efficiency is calling.
Therefore, I don't want to use any if, switch, select, nor >> operands as I have the felling that it would slow down the process. If i'm wrong, then I'll go the "easy" way.
What I've tried:
byte = byte & 0b100000000
This gives me 0b10000000 or 0b00000000.
But I need the first to be 0b11111111.
I think I'm missing an OR somewhere (plus other gates). I don't know, my guts is telling me that this should be easy, but it isn't for me at this moment.

The trick is to use a signed type, such as int8_t, for your byte variable, and take advantage of sign extension feature of the shift-right operation:
byte = byte >> 7;
Demo.
Shifting right is very fast - a single instruction on most modern (and even not so modern) CPUs.
The reason this works is that >> on signed operands inserts the sign bit on the left to preserve the sign of its operand. This is called sign extension.
Note: Technically, this behavior is implementation-defined, and therefore is not universally portable. Thanks, Eugene Sh., for a comment and a reference.

EDIT: My answer has confused people because I did not specify unsigned bytes. Here my assumption is that B is of type unsigned char. As one comment notes below, I can omit the &1. This is not as fast as the signed byte solution that the other poster put up, but this code should be portable (once it is understood that B is unsigned type).
B = -((B >> 7)&1)
Negative numbers our are friends. Shifting bits should be fast by the way.

The MSB is actually the sign bit of the number. It is 1 for a negative number and 0 for a positive number.
so the simplest way to do that is
if(byte < 0)
{
byte = -1;
}
else
{
byte = 0;
}
because -1 = 11111111 in binary.
If it is the case for an unsigned integer, then just simply type cast it into a signed value and then compare it again as mentioned above.

constructing key by bit shifting 3 integers in C

I want to construct a key composed of 3 values by using bit shifting operations:
According to my understanding, the C statement code I am starting from creates a hash table by constructing its keys from certain data variables:
uint64_t key = (uint64_t)c->pos<<32 | c->isize;
My interpretation is that key is a combination of the last 32 digits
of c->pos, which must be a 64 bit unsigned integer, and c->isize, also a 64bit unsigned integer.
But I am not sure if that is the case, and maybe the | pipe operator
has a different meaning when applied to bit shifting operations.
What I want to do next is to modify the way key is constructed and
include a third c->barc element into the variable. Given the number
of possibilities of c->barc and c->isize, I was thinking that instead
of building key with 32+32 bits (pos+isize), I would build it
with 32+16+16 bits (pos+isize+barc) splitting the last 32 bits between
isize and barc.
Any ideas how to do that?

What I think you need is a solid explanation of bitmasking.
For this particular case, you should use the & operator to mask out the upper 16 bits of c->isize before shifting it up, and then use the & operator again to mask the upper 48 bits of c->barc.
Let's look at some diagrams.
let
c->pos = xxxx_xxxx_....._xxxx
c->isize = yyyy_yyyy_....._yyyy
c->barc = zzzz_zzzz_....._zzzz
where
x, y, and z are bits.
note: underscores are to identify groups of 4 bits.
If I understand correctly, you want a 64-bit number like this:
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_yyyy_yyyy_yyyy_yyyy_zzzz_zzzz_zzzz_zzzz
right?
As you already know, we get the upper 32 x's by doing
|-----32 bits of pos----|---32 0 bits--|
(uint64_t)c->pos<<32 = xxxx_xxxx_...._xxxx_xxxx_0000_...._0000
Now, we want to bitwise-or that with the following:
|----------32 0 bits----|
0000_0000_...._0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
To get that number there, we do this:
((c->isize & 0xffff) << 16)
because:
c->isize & 0xffff gives
yyyy_yyyy_yyyy_yyyy_yyyy_yyyy_yyyy_yyyy
& 0000_0000_0000_0000_1111_1111_1111_1111
---------------------------------------------
0000_0000_0000_0000_yyyy_yyyy_yyyy_yyyy
and then we shift it left by 16 to get
|--------32 0 bits------|
0000_0000_...._0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
Now, the final part, the
|-------48 0 bits-------|
0000_0000_...._0000_0000_zzzz_zzzz_zzzz_zzz
is the result plain and simply of
(c->barc & 0xffff) =
zzzz_zzzz_zzzz_zzzz_zzzz_zzzz_zzzz_zzzz
& 0000_0000_0000_0000_1111_1111_1111_1111
-------------------------------------------------
0000_0000_0000_0000_zzzz_zzzz_zzzz_zzzz
So we take all of these expressions and bitwise-or them together.
uint64_t key = ((uint64_t)c->pos << 32) | ((c->isize & 0xffff) << 16)
| (c->barc & 0xffff);
if we diagram it out, we see
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_0000_0000_0000_0000_0000_0000_0000_0000
0000_0000_0000_0000_0000_0000_0000_0000_yyyy_yyyy_yyyy_yyyy_0000_0000_0000_0000
or 0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_0000_zzzz_zzzz_zzzz_zzzz
-----------------------------------------------------------------------------------
xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_xxxx_yyyy_yyyy_yyyy_yyyy_zzzz_zzzz_zzzz_zzzz

The "pipe operator" is actually a bitwise OR operator. The code takes two (presumably) 32-bit integers, one of them shifts left by 32 bits and combines them together. Thus you get a single 64-bit number. See Wiki for more info about bitwise operations.
If you want to compose your key from three 32-bit integers, then you obviously have to manipulate them to fit them into 64 bits. You can do something like this:
uint64_t key = (uint64_t)c->pos<<32 | (c->isize & 0xFFFF0000) | (c->barc & 0xFFFF);
This code takes 32 bits from c->pos, shifts them in the higher 32 bits of the 64-bit key, then takes the higher 16 bits of c->isize and finally the lower 16 bits of c->barc. See here for more.

I wouldn't do it. It is not safe if you are not designing whole thing by yourself. But let's explain some things.
My interpretation is that key is a combination of the last 32 digits of c->pos,
Generally, yes.
which must be a 64 bit unsigned integer, and c->isize, also a 64bit unsigned integer.
No. You know nothing about size of type of pos andisize, it is cast onto uint64_t it might be any type that allows such a cast.
My bet is that both values are 32-bit. 1st value is being cast onto 64bit type, because bit shift equal to or greater than the width of the type is undefined behaviour. So to stay safe it is widened.
The code probably packs two 32bit values into a 64bit one, otherwise it would loose information.
Moreover, if it wanted to construct key from values which would overlap it would most probably use xor rather than or. Your way is not a good approach, unless you precisely know what are you doing. You should find out what types your operands are and then choose a method for creation keys out of them.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight