What do these macros do? - c

I have inherited some heavily obfuscated and poorly written PIC code to modify. There are two macros here:
#define TopByteInt(v) (*(((unsigned char *)(&v)+1)))
#define BottomByteInt(v) (*((unsigned char *)(&v)))
Is anyone able to explain what on earth they do and what that means please?
Thanks :)

They access a 16-bit integer variable one byte at a time, allowing access to the most significant and least significant byte halves. Little-endian byte order is assumed.
Usage would be like this:
uint16_t v = 0xcafe;
const uint8_t v_high = TopByteInt(&v);
const uint8_t v_low = BottomByteInt(&v);
The above would result in v_high being 0xca and v_low being 0xfe.
It's rather scary code, it would be cleaner to just do this arithmetically:
#define TopByteInt(v) (((v) >> 8) & 0xff)
#define BottomByteInt(v) ((v) & 0xff)

(*((unsigned char *)(&v)))
It casts the v (a 16 bit integer) into a char (8 bits), doing this you get only the bottom byte.
(*(((unsigned char *)(&v)+1)))
This is the same but it gets the address of v and sum 1 byte, so it gets only the top byte.
It'll only work as expected if v is a 16 bits integer.

Ugg.
Assuming you are on a little-endian platform, that looks like it might meaningfully be recorded as
#define TopByteInt(v) (((v) >> 8) & 0xff)
#define BottomByteInt(v) ((v) & 0xff)
It is basically taking the variable v, and extracting the least significant byte (BottomByteInt) and the next more significant byte (TopByteInt) from that. 'TopByte' is a bit of a misnomer if v isn't a 16 bit value.

Related

Bitwise operations and endianness in C program

I am having trouble trying to understand how exactly is my computer doing bitwise operations depending on endianness.
I've read this thread and this article and I think I have confirmed my machine works in little endian (I have tried several programs described in both sources and all of them seem to output my machine is indeed little endian).
I have the following macros defined that use SDL and swap 2 and 4 byte values in case needed:
#if SDL_BYTEORDER == SDL_BIG_ENDIAN
#define HTON16(n) (n)
#define NTOH16(n) (n)
#define HTON32(n) (n)
#define NTOH32(n) (n)
#define HTON64(n) (n)
#define NTOH64(n) (n)
#else
#define HTON16(n) SDL_Swap16(n)
#define NTOH16(n) SDL_Swap16(n)
#define HTON32(n) SDL_Swap32(n)
#define NTOH32(n) SDL_Swap32(n)
#define HTON64(n) SDL_Swap64(n)
#define NTOH64(n) SDL_Swap64(n)
#endif
My problem is:
When writing a 2 byte number (in this case 43981 = 0xabcd) to a char[], say, at the entry 0, the following code would produce the first 2 bytes of data in little endian, i.e. 0xcdab, when I'm trying to do the opposite thing:
char data[100];
int host_value = 43981; // 0xabcd
int net_value = HTON16(host_value);
data[0] = (net_value & 0xff00) >> 8;
data[1] = (net_value & 0xff);
My solution to the previous problem is just not using HTON16 on the host value, and operating with it as if my machine was big endian.
Also, in the same machine, when doing the following to write the same host value to data, it does produce data to have the two first byte set to 0xabcd:
*((unsigned short *)&data[0]) = HTON16(host_value);
I would like to understand why these two cases work differently. Any help is appreciated.
The problem is these two lines,
data[0] = (net_value & 0xff00) >> 8;
data[1] = (net_value & 0xff);
If you run on a little-endian machine,
(net_value & 0xff00) is equivalent to getting the second character, you should want the first character
(net_value & 0xff) is the first character,
if you use bit operations, it should be a direct operation, no need to call HTON16

How to get the most significant bit of an unsigned 8-bit type in C

I'm trying to get the most significant bit of an unsigned 8-bit type in C.
This is what I'm trying to do right now:
uint8_t *var = ...;
...
(*var >> 6) & 1
Is this right? If it's not, what would be?
To get the most significant bit from a memory pointed to by uint8_t pointer, you need to shift by 7 bits.
(*var >> 7) & 1
The most standard/correct way of masking bits is to use a readable bit mask of the form 1u << bit. Any C programmer spotting 1u << n in code will know that it is a bit mask - so it is self-documenting code.
So if you want bit number 7, you would write
*var & (1u << 7)
The u suffix is important for rugged code, since you want to avoid accidental implicit promotions to signed types.
Another option is to simply apply a bit mask and check the resulting value:
*var & 0x80u // 1000 0000

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)
You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.
The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}
From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

C bit array macros, could anyone explain me how these work?

I'm trying to implement sieve of erathostenes for school project and I've decided to do so using bit arrays. While I was searching for materials, I came across these 3 macros, they work flawlessly, but I can't really read(understand) them.
#define ISBITSET(x,i) ((x[i>>3] & (1<<(i&7)))!=0)
#define SETBIT(x,i) x[i>>3]|=(1<<(i&7));
#define CLEARBIT(x,i) x[i>>3]&=(1<<(i&7))^0xFF;
Could you please explain to me at least one of them in detail, I have very basic knowledge about bitwise operations in C (basically I know they "exist").
Will this work on another architecture using different endianness?
Thanks in advance.
xis array of chars. i is an index of bits. since every char is 8 bits, the last 3 bits of i define the bit in the char, and the rest bits define the char in the array.
i>>3 shift i 3 bits to the right, so you get the part that tell you which char, so x[i>>3] is the char that contain the bit indexed byi.
i&7 is the last 3 bits of i (since 710==1112), so it's the index of the bit in the char. 1<<(i&7) is a char (truly it's int, but in this context you can ignore the difference), that has the bit indexed by i on, and the rest bits off. (the mask of the bit)
char&mask is the common way to check if bit is on.
char|=mask is the common way to turn bit in.
char&=~mask is the common way to turn bit off, and if mask is char, then ~mask==mask^0xFF.
I don't think that these macros are endiannes-depend. (if you got x by converting int[] to *char, it's a different story)
First off, those macros assume evilly that CHAR_BIT == 8, and i >> 3 is actually i / 8. (So really this code should say i / CHAR_BIT.) This first expression computes the byte which contains your desired bit, and is thus the array index in your array x (which should be an array of unsigned char!).
Now that we've selected the correct byte, namely x[i >> 3] (or x[i / CHAR_BIT] in your own, better code), we have to do the bit-fiddling. Again, i & 7 really wants to be i % CHAR_BIT, and it extracts only the remainder of your bit count that gives you the offset within the byte.
Example. Requesting the 44th bit with i = 43, and assuming CHAR_BIT = 8, i / CHAR_BIT is 5, so we're in the sixth byte, and i % CHAR_BIT is 3, so we're looking at the fourth bit of the sixth byte.
The actual bit-fiddling itself does the usual stuff; e.g. testing for a given bit performs bit-wise AND with the appropriate bit pattern (namely 1 << k for the kth bit); setting the bit uses bit-wise OR, and zeroing it requires something a tiny bit more involved (think about it!).
#define ISBITSET(x,i) (((x)[(i) / CHAR_BIT] & (1u << ((i) % CHAR_BIT))) != 0)
#define SETBIT(x,i) (x)[(i) / CHAR_BIT] |= (1u << ((i) % CHAR_BIT);
#define CLEARBIT(x,i) (x)[(i) / CHAR_BIT] &= ~(1u << ((i) % CHAR_BIT))
Always put parenthesis around macro arguments
always prefer unsigned types for bit operations
(1u << CHAR_BIT) is 256 for 8 bit platforms
there was an exra ; after the last macro

Assign unsigned char to unsigned short with bit operators in ansi C

I know it is possible to assign an unsigned char to an unsigned short, but I would like to have more control how the bits are actually assigned to the unsigned short.
unsigned char UC_8;
unsigned short US_16;
UC_8 = 0xff;
US_16 = (unsigned char) UC_8;
The bits from UC_8 are now placed in the lower bits of US_16. I need more control of the conversion since the application I'm currently working on are safety related. Is it possible to control the conversion with bit operators? So I can specify where the 8 bits from the unsigned char should be placed in the bigger 16 bit unsigned short variable.
My guess is that it would be possible with masking combined with some other bit-operator, maybe left/right shifting.
UC_8 = 0xff;
US_16 = (US_16 & 0x00ff) ?? UC_8; // Maybe masking?
I have tried different combinations but have not come up with a smart solution. I'm using ansi C and as said earlier, need more control how the bits actually are set in the larger variable.
EDIT:
My problem or concern comes from a CRC generating function. It will and should always return an unsigned short, since it will sometimes calculate an 16 bit CRC. But sometimes it should calculate a 8 bit CRC instead, and place the 8 bit on the eight LSB in the 16 bit return variable. And on the eight MSB should then contain only zeros.
I would like to say something like:
US_16(7 downto 0) = UC_8;
US_16(15 downto 8) = 0x00;
If I just typecast it, can I guarantee that the bits always will be placed on the lower bits in the larger variable? (On all different architectures)
What do you mean, "control"?
The C standard unambiguously defines the unsigned binary format in terms of bit positions and significance. Certain bits of a 16-bit variable are "low", by numerical definition, and they will hold the pattern from the 8-bit variable, the other bits being set to zero. There is no ambiguity, no wiggle room, and nothing else to control.
Maybe rotation of bits will help you:
US_16 = (US_16 & 0x00ff) | ( UC_8 << 8 );
Result in bits will be:
C - UC_8 bits
S - US_16 bits
CCCC CCCC SSSS SSSS, resp.: SSSS SSSS are last 8 bits of US_16
But if UC_8 was 1 and US_16 was 0, then US_16 will be 512. Are you mean this?
US_16 = (US_16 & 0xff00) | ( UC_8 & 0x00ff );
US_16=~-1|UC_8;
Is this what you want?
If it is important to use ansi C, and not be restricted to a particular implementation, then you should not assume sizeof(short) == 2. And why bother to cast an unsigned char to an unsigned char (the same thing)? Although probably safe to assume char is 8 bits nowadays, even though that's not guaranteed.
uint8_t UC_8;
uint16_t US_16;
int nbits = ...# of bits to shift...;
US_16 = UC_8 << nbits;
Obviously, if you shift more than 15 bits, it may not be what you want. If you need to actually rearrange the bits, rather than just shift them to some position, you'll have to set them individually
int sourcebit = ...0 to 7...;
int destinationbit = ...0 to 15...;
// set
US_16 |= (US_8 & (1<<sourcebit)) << (destinationbit - sourcebit);
// clear
US_16 &= ~((US_8 & (1<<sourcebit)) << (destinationbit - sourcebit));
note: just wrote, didn't test. probably not optimal. blah blah blah. but something like that will work.

Resources