c Code that reads a 4 byte little endian number from a buffer - c

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
{
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
}
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)

You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.

The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
{
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
}

From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

Related

What does this code does ? There are so many weird things

int n_b ( char *addr , int i ) {
char char_in_chain = addr [ i / 8 ] ;
return char_in_chain >> i%8 & 0x1;
}
Like what is that : " i%8 & Ox1" ?
Edit: Note that 0x1 is the hexadecimal notation for 1. Also note that :
0x1 = 0x01 = 0x000001 = 0x0...01
i%8 means i modulo 8, ie the rest in the Euclidean division of i by 8.
& 0x1 is a bitwise AND, it converts the number before to binary form then computes the bitwise operation. (it's already in binary but it's just so you understand)
Example : 0x1101 & 0x1001 = 0x1001
Note that any number & 0x1 is either 0 or one.
Example: 0x11111111 & 0x00000001 is 0x1 and 0x11111110 & 0x00000001 is 0x0
Essentially, it is testing the last bit on the number, which the bit determining parity.
Final edit:
I got the precedence wrong, thanks to the comments for pointing it out. Here is the real precedence.
First, we compute i%8.
The result could be 0, 1, 2, 3, 4, 5, 6, 7.
Then, we shift the char by the result, which is maximum 7. That means the i % 8 th bit is now the least significant bit.
Then, we check if the original i % 8 bit is set (equals one) or not. If it is, return 1. Else, return 0.
This function returns the value of a specific bit in a char array as the integer 0 or 1.
addr is the pointer to the first char.
i is the index to the bit. 8 bits are commonly stored in a char.
First, the char at the correct offset is fetched:
char char_in_chain = addr [ i / 8 ] ;
i / 8 divides i by 8, ignoring the remainder. For example, any value in the range from 24 to 31 gives 3 as the result.
This result is used as the index to the char in the array.
Next and finally, the bit is obtained and returned:
return char_in_chain >> i%8 & 0x1;
Let's just look at the expression char_in_chain >> i%8 & 0x1.
It is confusing, because it does not show which operation is done in what sequence. Therefore, I duplicate it with appropriate parentheses: (char_in_chain >> (i % 8)) & 0x1. The rules (operation precedence) are given by the C standard.
First, the remainder of the division of i by 8 is calculated. This is used to right-shift the obtained char_in_chain. Now the interesting bit is in the least significant bit. Finally, this bit is "masked" with the binary AND operator and the second operand 0x1. BTW, there is no need to mark this constant as hex.
Example:
The array contains the bytes 0x5A, 0x23, and 0x42. The index of the bit to retrieve is 13.
i as given as argument is 13.
i / 8 gives 13 / 8 = 1, remainder ignored.
addr[1] returns 0x23, which is stored in char_in_chain.
i % 8 gives 5 (13 / 8 = 1, remainder 5).
0x23 is binary 0b00100011, and right-shifted by 5 gives 0b00000001.
0b00000001 ANDed with 0b00000001 gives 0b00000001.
The value returned is 1.
Note: If more is not clear, feel free to comment.
What the various operators do is explained by any C book, so I won't address that here. To instead analyse the code step by step...
The function and types used:
int as return type is an indication of the programmer being inexperienced at writing hardware-related code. We should always avoid signed types for such purposes. An experienced programmer would have used an unsigned type, like for example uint8_t. (Or in this specific case maybe even bool, depending on what the data is supposed to represent.)
n_b is a rubbish name, we should obviously never give an identifier such a nondescript name. get_bit or similar would have been a better name.
char* is, again, an indication of the programmer being inexperienced. char is particularly problematic when dealing with raw data, since we can't even know if it is signed or unsigned, it depends on which compiler that is used. Had the raw data contained a value of 0x80 or larger and char was negative, we would have gotten a negative type. And then right shifting a negative value is also problematic, since that behavior too is compiler-specific.
char* is proof of the programmer lacking the fundamental knowledge of const correctness. The function does not modify this parameter so it should have been const qualified. Good code would use const uint8_t* addr.
int i is not really incorrect, the signedness doesn't really matter. But good programming practice would have used an unsigned type or even size_t.
With types unsloppified and corrected, the function might look like this:
#include <stdint.h>
uint8_t get_bit (const uint8_t* addr, size_t i ) {
uint8_t char_in_chain = addr [ i / 8 ] ;
return char_in_chain >> i%8 & 0x1;
}
This is still somewhat problematic, because the average C programmer might not remember the precedence of >> vs % vs & on top of their head. It happens to be % over >> over &, but lets write the code a bit more readable still by making precedence explicit: (char_in_chain >> (i%8)) & 0x1.
Then I would question if the local variable really adds anything to readability. Not really, we might as well write:
uint8_t get_bit (const uint8_t* addr, size_t i ) {
return ((addr[i/8]) >> (i%8)) & 0x1;
}
As for what this code actually does: this happens to be a common design pattern for how to access a specific bit in a raw bit-field.
Any bit-field in C may be accessed as an array of bytes.
Bit number n in that bit-field, will be found at byte n/8.
Inside that byte, the bit will be located at n%8.
Bit masking in C is most readably done as data & (1u << bit). Which can be obfuscated as somewhat equivalent but less readable (data >> bit) & 1u, where the masked bit ends up in the LSB.
For example lets assume we have 64 bits of raw data. Bits are always enumerated from 0 to 63 and bytes (just like any C array) from index 0. We want to access bit 33. Then 33/8 integer division = 4.
So byte[4]. Bit 33 will be found at 33%8 = 1. So we can obtain the value of bit 33 from ordinary bit masking byte[33/8] & (1u << (bit%8)). Or similarly, (byte[33/8] >> (bit%8)) & 1u
An alternative, more readable version of it all:
bool is_bit_set (const uint8_t* data, size_t bit)
{
uint8_t byte = data [bit / 8u];
size_t mask = 1u << (bit % 8u);
return (byte & mask) != 0u;
}
(Strictly speaking we could as well do return byte & mask; since a boolean type is used, but it doesn't hurt to be explicit.)

Is reading one byte at a time endianness agnostic regardless of value size?

Say I am reading and writing uint32_t values to and from a stream. If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
In the examples here the stream is a buffer in memory called p.
static uint32_t s_read_uint32(uint8_t** p)
{
uint32_t value;
value = (*p)[0];
value |= (((uint32_t)((*p)[1])) << 8);
value |= (((uint32_t)((*p)[2])) << 16);
value |= (((uint32_t)((*p)[3])) << 24);
*p += 4;
return value;
}
static void s_write_uint32(uint8_t** p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}
I don't currently have access to a big-endian machine to test this out, but the idea is if each byte is written one at a time each individual byte can be independently written or read from the stream. Then the CPU can handle endianness by hiding these details behind the shifting operations. Is this true, and if not could anyone please explain why not?
If I read/write one byte at a time to/from a stream and shift each byte like the below examples, will the results be consistent regardless of machine endianness?
Yes. Your s_write_uint32() function stores the bytes of the input value in order from least significant to most significant, regardless of their order in the native representation of that value. Your s_read_uint32() correctly reverses this process, regardless of the underlying representation of uint32_t. These work because
the behavior of the shift operators (<<, >>) is defined in terms of the value of the left operand, not its representation
the & 0xff masks off all bits of the left operand but those of its least-significant byte, regardless of the value's representation (because 0xff has a matching representation), and
the |= operations just put the bytes into the result; the positions are selected, appropriately, by the preceding left shift. This might be more clear if += were used instead, but the result would be no different.
Note, however, that to some extent, you are reinventing the wheel. POSIX defines a function pair htonl() and nothl() -- supported also on many non-POSIX systems -- for dealing with byte-order issues in four-byte numbers. The idea is that when sending, everyone uses htonl() to convert from host byte order (whatever that is) to network byte order (big endian) and sends the resulting four-byte buffer. On receipt, everyone accepts four bytes into one number, then uses ntohl() to convert from network to host byte order.
It'll work but a memcpy followed by a conditional byteswap will give you much better codegen for the write function.
#include <stdint.h>
#include <string.h>
#define LE (((char*)&(uint_least32_t){1})[0]) // little endian ?
void byteswap(char*,size_t);
uint32_t s2_read_uint32(uint8_t** p)
{
uint32_t value;
memcpy(&value,*p,sizeof(value));
if(!LE) byteswap(&value,4);
return *p+=4, value;
}
void s2_write_uint32(uint8_t** p, uint32_t value)
{
memcpy(*p,&value,sizeof(value));
if(!LE) byteswap(*p,4);
*p+=4;
}
Gcc since the 8th series (but not clang) can eliminate this shifts on a little-endian platforms, but you should help it by restrict-qualifying the doubly-indirect pointer to the destination, or else it might think that a write to (*p)[0] can invalidate *p (uint8_t is a char type and therefore permitted to alias anything).
void s_write_uint32(uint8_t** restrict p, uint32_t value)
{
(*p)[0] = value & 0xFF;
(*p)[1] = (value >> 8 ) & 0xFF;
(*p)[2] = (value >> 16) & 0xFF;
(*p)[3] = value >> 24;
*p += 4;
}

How to get the most significant bit of an unsigned 8-bit type in C

I'm trying to get the most significant bit of an unsigned 8-bit type in C.
This is what I'm trying to do right now:
uint8_t *var = ...;
...
(*var >> 6) & 1
Is this right? If it's not, what would be?
To get the most significant bit from a memory pointed to by uint8_t pointer, you need to shift by 7 bits.
(*var >> 7) & 1
The most standard/correct way of masking bits is to use a readable bit mask of the form 1u << bit. Any C programmer spotting 1u << n in code will know that it is a bit mask - so it is self-documenting code.
So if you want bit number 7, you would write
*var & (1u << 7)
The u suffix is important for rugged code, since you want to avoid accidental implicit promotions to signed types.
Another option is to simply apply a bit mask and check the resulting value:
*var & 0x80u // 1000 0000

Assign unsigned char to unsigned short with bit operators in ansi C

I know it is possible to assign an unsigned char to an unsigned short, but I would like to have more control how the bits are actually assigned to the unsigned short.
unsigned char UC_8;
unsigned short US_16;
UC_8 = 0xff;
US_16 = (unsigned char) UC_8;
The bits from UC_8 are now placed in the lower bits of US_16. I need more control of the conversion since the application I'm currently working on are safety related. Is it possible to control the conversion with bit operators? So I can specify where the 8 bits from the unsigned char should be placed in the bigger 16 bit unsigned short variable.
My guess is that it would be possible with masking combined with some other bit-operator, maybe left/right shifting.
UC_8 = 0xff;
US_16 = (US_16 & 0x00ff) ?? UC_8; // Maybe masking?
I have tried different combinations but have not come up with a smart solution. I'm using ansi C and as said earlier, need more control how the bits actually are set in the larger variable.
EDIT:
My problem or concern comes from a CRC generating function. It will and should always return an unsigned short, since it will sometimes calculate an 16 bit CRC. But sometimes it should calculate a 8 bit CRC instead, and place the 8 bit on the eight LSB in the 16 bit return variable. And on the eight MSB should then contain only zeros.
I would like to say something like:
US_16(7 downto 0) = UC_8;
US_16(15 downto 8) = 0x00;
If I just typecast it, can I guarantee that the bits always will be placed on the lower bits in the larger variable? (On all different architectures)
What do you mean, "control"?
The C standard unambiguously defines the unsigned binary format in terms of bit positions and significance. Certain bits of a 16-bit variable are "low", by numerical definition, and they will hold the pattern from the 8-bit variable, the other bits being set to zero. There is no ambiguity, no wiggle room, and nothing else to control.
Maybe rotation of bits will help you:
US_16 = (US_16 & 0x00ff) | ( UC_8 << 8 );
Result in bits will be:
C - UC_8 bits
S - US_16 bits
CCCC CCCC SSSS SSSS, resp.: SSSS SSSS are last 8 bits of US_16
But if UC_8 was 1 and US_16 was 0, then US_16 will be 512. Are you mean this?
US_16 = (US_16 & 0xff00) | ( UC_8 & 0x00ff );
US_16=~-1|UC_8;
Is this what you want?
If it is important to use ansi C, and not be restricted to a particular implementation, then you should not assume sizeof(short) == 2. And why bother to cast an unsigned char to an unsigned char (the same thing)? Although probably safe to assume char is 8 bits nowadays, even though that's not guaranteed.
uint8_t UC_8;
uint16_t US_16;
int nbits = ...# of bits to shift...;
US_16 = UC_8 << nbits;
Obviously, if you shift more than 15 bits, it may not be what you want. If you need to actually rearrange the bits, rather than just shift them to some position, you'll have to set them individually
int sourcebit = ...0 to 7...;
int destinationbit = ...0 to 15...;
// set
US_16 |= (US_8 & (1<<sourcebit)) << (destinationbit - sourcebit);
// clear
US_16 &= ~((US_8 & (1<<sourcebit)) << (destinationbit - sourcebit));
note: just wrote, didn't test. probably not optimal. blah blah blah. but something like that will work.

bitwise indexing in C?

I'm trying to implement a data compression idea I've had, and since I'm imagining running it against a large corpus of test data, I had thought to code it in C (I mostly have experience in scripting languages like Ruby and Tcl.)
Looking through the O'Reilly 'cow' books on C, I realize that I can't simply index the bits of a simple 'char' or 'int' type variable as I'd like to to do bitwise comparisons and operators.
Am I correct in this perception? Is it reasonable for me to use an enumerated type for representing a bit (and make an array of these, and writing functions to convert to and from char)? If so, is such a type and functions defined in a standard library already somewhere? Are there other (better?) approaches? Is there some example code somewhere that someone could point me to?
Thanks -
Following on from what Kyle has said, you can use a macro to do the hard work for you.
It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 6th-from
right
To clear a bit, use AND:
x &= ~(1 << 5); // clears
6th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 6th-from-right
Or...
#define GetBit(var, bit) ((var & (1 << bit)) != 0) // Returns true / false if bit is set
#define SetBit(var, bit) (var |= (1 << bit))
#define FlipBit(var, bit) (var ^= (1 << bit))
Then you can use it in code like:
int myVar = 0;
SetBit(myVar, 5);
if (GetBit(myVar, 5))
{
// Do something
}
It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 5th-from right
To clear a bit, use AND:
x &= ~(1 << 5); // clears 5th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 5th-from-right
To get the value of a bit use shift and AND:
(x & (1 << 5)) >> 5 // gets the value (0 or 1) of the 5th-from-right
note: the shift right 5 is to ensure the value is either 0 or 1. If you're just interested in 0/not 0, you can get by without the shift.
Have a look at the answers to this question.
Theory
There is no C syntax for accessing or setting the n-th bit of a built-in datatype (e.g. a 'char'). However, you can access bits using a logical AND operation, and set bits using a logical OR operation.
As an example, say that you have a variable that holds 1101 and you want to check the 2nd bit from the left. Simply perform a logical AND with 0100:
1101
0100
---- AND
0100
If the result is non-zero, then the 2nd bit must have been set; otherwise is was not set.
If you want to set the 3rd bit from the left, then perform a logical OR with 0010:
1101
0010
---- OR
1111
You can use the C operators && (for AND) and || (for OR) to perform these tasks. You will need to construct the bit access patterns (the 0100 and 0010 in the above examples) yourself. The trick is to remember that the least significant bit (LSB) counts 1s, the next LSB counts 2s, then 4s etc. So, the bit access pattern for the n-th LSB (starting at 0) is simply the value of 2^n. The easiest way to compute this in C is to shift the binary value 0001 (in this four bit example) to the left by the required number of places. As this value is always equal to 1 in unsigned integer-like quantities, this is just '1 << n'
Example
unsigned char myVal = 0x65; /* in hex; this is 01100101 in binary. */
/* Q: is the 3-rd least significant bit set (again, the LSB is the 0th bit)? */
unsigned char pattern = 1;
pattern <<= 3; /* Shift pattern left by three places.*/
if(myVal && (char)(1<<3)) {printf("Yes!\n");} /* Perform the test. */
/* Set the most significant bit. */
myVal |= (char)(1<<7);
This example hasn't been tested, but should serve to illustrate the general idea.
To query state of bit with specific index:
int index_state = variable & ( 1 << bit_index );
To set bit:
varabile |= 1 << bit_index;
To restart bit:
variable &= ~( 1 << bit_index );
Try using bitfields. Be careful the implementation can vary by compiler.
http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html
IF you want to index a bit you could:
bit = (char & 0xF0) >> 7;
gets the msb of a char. You could even leave out the right shift and do a test on 0.
bit = char & 0xF0;
if the bit is set the result will be > 0;
obviousuly, you need to change the mask to get different bits (NB: the 0xF is the bit mask if it is unclear). It is possible to define numerous masks e.g.
#define BIT_0 0x1 // or 1 << 0
#define BIT_1 0x2 // or 1 << 1
#define BIT_2 0x4 // or 1 << 2
#define BIT_3 0x8 // or 1 << 3
etc...
This gives you:
bit = char & BIT_1;
You can use these definitions in the above code to sucessfully index a bit within either a macro or a function.
To set a bit:
char |= BIT_2;
To clear a bit:
char &= ~BIT_3
To toggle a bit
char ^= BIT_4
This help?
Individual bits can be indexed as follows.
Define a struct like this one:
struct
{
unsigned bit0 : 1;
unsigned bit1 : 1;
unsigned bit2 : 1;
unsigned bit3 : 1;
unsigned reserved : 28;
} bitPattern;
Now if I want to know the individual bit values of a var named "value", do the following:
CopyMemory( &input, &value, sizeof(value) );
To see if bit 2 is high or low:
int state = bitPattern.bit2;
Hope this helps.
There is a standard library container for bits: std::vector. It is specialised in the library to be space efficient. There is also a boost dynamic_bitset class.
These will let you perform operations on a set of boolean values, using one bit per value of underlying storage.
Boost dynamic bitset documentation
For the STL documentation, see your compiler documentation.
Of course, you can also address the individual bits in other integral types by hand. If you do that, you should use unsigned types so that you don't get undefined behaviour if decide to do a right shift on a value with the high bit set. However, it sounds like you want the containers.
To the commenter who claimed this takes 32x more space than necessary: boost::dynamic_bitset and vector are specialised to use one bit per entry, and so there is not a space penalty, assuming that you actually want more than the number of bits in a primitive type. These classes allow you to address individual bits in a large container with efficient underlying storage. If you just want (say) 32 bits, by all means, use an int. If you want some large number of bits, you can use a library container.

Resources