I am writing a socket client-server application where the server needs to send a large buffer to a client and all buffers should be processed separately, so I want to put the buffer length in the buffer so that the client can read the length of data from the buffer and process accordingly.
To put the length value I need to divide an integer value in one byte each and store it in a buffer to be sent over the socket. I am able to break the integer into four parts, but at the time of joining I am not able to retrieve the correct value. To demonstrate my problem I have written a sample program where I am dividing int into four char variables and then join it back in another integer. The goal is that after joining I should get the same result.
Here is my small program.
#include <stdio.h>
int main ()
{
int inVal = 0, outVal =0;
char buf[5] = {0};
inVal = 67502978;
printf ("inVal: %d\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal = outVal << 8;
outVal |= buf[2];
outVal = outVal << 8;
outVal |= buf[1];
outVal = outVal << 8;
outVal |= buf[0];
printf ("outVal: %d\n",outVal);
return 0;
}
Output
inVal: 67502978
outVal: -126
What am I doing wrong?
One problem is that you are using bit-wise operators on signed numbers. This is always a bad idea and almost always incorrect. Please note that char has implementation-defined signedness, unlike int which is always signed.
Therefore you should replace int with uint32_t and char with uint8_t. With such unsigned types you eliminate the possibility of using bit shifts on negative numbers, which would be a bug. Similarly, if you shift data into the sign bits of a signed number, you will get bugs.
And needless to say, the code will not work if integers are not 4 bytes large.
Your method has potential implementation defined behavior as well as undefined behavior:
storing values into the array of type char beyond the range of type char has implementation defined behavior: buf[0] = inVal & 0xff; and the next 3 statements (inVal & 0xff might be larger than CHAR_MAX if char type is signed by default).
left shifting negative values invokes undefined behavior: if any of the 3 first bytes in the array becomes negative as the implementation defined result of storing a value larger than CHAR_MAX into it, the resulting outVal becomes negative, left shifting it is undefined.
In your specific example, your architecture uses 2's complement representation for negative values and the type char is signed. The value stored into buf[0] is 67502978 & 0xff = 130, becomes -126. The last statement outVal |= buf[0]; sets bits 7 through 31 of outVal and the result is -126.
You can avoid these issues by using an array of unsigned char and values of type unsigned int:
#include <stdio.h>
int main(void) {
unsigned int inVal = 0, outVal = 0;
unsigned char buf[4] = { 0 };
inVal = 67502978;
printf("inVal: %u\n", inVal);
buf[0] = inVal & 0xff;
buf[1] = (inVal >> 8) & 0xff;
buf[2] = (inVal >> 16) & 0xff;
buf[3] = (inVal >> 24) & 0xff;
outVal = buf[3];
outVal <<= 8;
outVal |= buf[2];
outVal <<= 8;
outVal |= buf[1];
outVal <<= 8;
outVal |= buf[0];
printf("outVal: %u\n", outVal);
return 0;
}
Note that the above code still assumes 32-bit ints.
While bit shifts of signed values can be a problem, this is not the case here (all left hand values are positive, and all results are within the range of a 32 bit unsigned int).
The problematic expression with somewhat unintuitive semantics is the last bitwise OR:
outVal |= buf[0];
buf[0] is a (on your and my architecture) signed char with the value -126, simply because the most significant bit in the least significant byte of 67502978 is set. In C all operands in an arithmetic expression are subject to the arithmetic conversions. Specifically, they undergo integer promotion which states: "If an int can represent all values of the original type [...], the value is converted to an int". Accordingly, the signed character buf[0] is converted to a (signed) int, preserving its value of -126. A negative signed int has the sign bit set. ORing that with another signed int sets the result's sign bit as well, making that value negative. That is exactly what we are seeing.
Making the bytes unsigned chars fixes the issue because the value of the temporary integer to which the unsigned char is converted is then a simple 8 bit value of 130.
C++ standard N3936 quotes about shift operators:
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are zero-filled.
If E1 has an unsigned type,
the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value
representable in the result type.
Otherwise, if E1 has a signed type and non-negative value,
and E1 × 2^E2 is representable in the
corresponding unsigned type of the result type, then that value,
converted to the result type, is the resulting value; otherwise, the
behavior is undefined.
So, to avoid undefined behaviour, it is recommended to use unsigned data types, and ensure the 64-bits length of data type.
Use unsigned char buf[5] = {0}; and unsigned int for inVal and outVal, and it should work.
When using signed integral types, there arise two sorts of problems:
First, if buf[3] is negative, then due to outVal = buf[3] variable outVal becomes negative; consequent bit shift operators on outVal are then undefined behaviour cppreference.com concerning bit shift operators:
For signed and positive a, the value of a << b is a * 2b if it is
representable the return type, otherwise the behavior is
undefined. (until C++14), the value of a << b is a * 2b if it is
representable in the unsigned version of the return type (which is
then converted to signed: this makes it legal to create INT_MIN as
1<<31), otherwise the behavior is undefined. (since C++14)
For negative a, the behavior of a << b is undefined.
Note that with OP's inVal = 67502978 this does not occur, since buf[3]=4; But for other inVals it may occur and then may bring problems due to "undefined behaviour".
The second problem is that with operation outVal |= buf[0] with buf[0]=-126, the value (char)-126, which in binary format is 10000010, is converted to (int)-126, which in binary format is 11111111111111111111111110000010 before operator |= is applied, and this then will fill up outVal with a lot of 1-bits. The reason for conversion is defined at conversion rules for arithmetic operations (cppreference.com):
If both operands are signed or both are unsigned, the operand with
lesser conversion rank is converted to the operand with the greater
integer conversion rank
So the problem in OP's case is actually not because of any undefined behaviour, but because of having character buf[3] being a negative value, which is converted to int before |= operation.
Note, however, that if either buf[2] or buf[1] had been negative, this would have made outVal negative and would have lead to undefined behaviour on subsequent shift operations, too.
This may be a terrible idea but I'll post it here for interest - you can use a union:
union my_data
{
uint32_t one_int;
struct
{
uint8_t byte3;
uint8_t byte2;
uint8_t byte1;
uint8_t byte0;
}bytes;
};
// Your original code modified to use union my_data
#include <stdio.h>
int main(void) {
union my_data data;
uint32_t inVal = 0, outVal = 0;
uint8_t buf[4] = {0};
inVal = 67502978;
printf("inVal: %u\n", inVal);
data.one_int = inVal;
// Populate bytes into buff
buf[3] = data.bytes.byte3;
buf[2] = data.bytes.byte2;
buf[1] = data.bytes.byte1;
buf[0] = data.bytes.byte0;
return 0;
}
I don't know if this would also work, can't see why not:
union my_data
{
uint32_t one_int;
uint8_t bytes[4];
};
Because of endian differences between architectures, it is best practice to convert numeric values to network order, which is big-endian. On receipt, they can then be converted to the native host order. We can do this in a portable way by using htonl() (host to network "long" = uint32_t), and convert to host order on receipt with ntohl(). Example:
#include <stdio.h>
#include <arpa/inet.h>
int main(int argc, char **argv) {
uint32_t inval = 67502978, outval, backinval;
outval = htonl(inval);
printf("outval: %d\n", outval);
backinval = ntohl(outval);
printf("backinval: %d\n", backinval);
return 0;
}
This gives the following result on my 64 bit x86 which is little endian:
$ gcc -Wall example.c
$ ./a.out
outval: -2113731068
backinval: 67502978
$
I'm seeing strange behavior when I try to apply a right bit-shift within a variable declaration/assignment:
unsigned int i = ~0 >> 1;
The result I'm getting is 0xffffffff, as if the >> 1 simply wasn't there. It seems to be something about the ~0, because if I instead do:
unsigned int i = 0xffffffff >> 1;
I get 0x7fffffff as expected. I thought I might be tripping over an operator precedence issue, so tried:
unsigned int i = (~0) >> 1;
but it made no difference. I could just perform the shift in a separate statement, like
unsigned int i = ~0;
i >>= 1;
but I'd like to know what's going on.
update Thanks merlin2011 for pointing me towards an answer. Turns out it was performing an arithmetic shift because it was interpreting ~0 as a signed (negative) value. The simplest fix seems to be:
unsigned int i = ~0u >> 1;
Now I'm wondering why 0xffffffff wasn't also interpreted as a signed value.
It is how c compiler works for signed value. The base literal for number in C is int (in 32-bit machine, it is 32-bit signed int)
You may want to change it to:
unsigned int i = ~(unsigned int)0 >> 1;
The reason is because for the signed value, the compiler would treat the operator >> as an arithmetic shift (or signed shift).
Or, more shortly (pointed out by M.M),
unsigned int i = ~0u >> 1;
Test:
printf("%x", i);
Result:
In unsigned int i = ~0;, ~0 is seen as a signed integer (the compiler should warn about that).
Try this instead:
unsigned int i = (unsigned int)~0 >> 1;
I wanted to try to get only the four bits from the right in a byte by using only bit shift operations but it sometimes worked and sometimes not, but I don't understand why.
Here's an example:
unsigned char b = foo; //say foo is 1000 1010
unsigned char temp=0u;
temp |= ((b << 4) >> 4);//I want this to be 00001010
PS: I know I can use a mask=F and do temp =(mask&=b).
Shift operator only only works on integral types. Using << causes implicit integral promotion, type casting b to an int and "protecting" the higher bits.
To solve, use temp = ((unsigned char)(b << 4)) >> 4;
I want to convert an unsigned int and break it into 2 chars. For example: If the integer is 1, its binary representation would be 0000 0001. I want the 0000 part in one char variable and the 0001 part in another binary variable. How do I achieve this in C?
If you insist that you have a sizeof(int)==2 then:
unsigned int x = (unsigned int)2; //or any other value it happens to be
unsigned char high = (unsigned char)(x>>8);
unsigned char low = x & 0xff;
If you have eight bits total (one byte) and you are breaking it into two 4-bit values:
unsigned char x=2;// or whatever
unsigned char high = (x>>4);
unsigned char low = x & 0xf;
Shift and mask off the part of the number you want. Unsigned ints are probably four bytes, and if you wanted all four bytes, you'd just shift by 16 and 24 for the higher order bytes.
unsigned char low = myuint & 0xff;
unsigned char high = (myuint >> 8) & 0xff;
This is assuming 16 bit ints check with sizeof!! On my platform ints are 32bit so I will use a short in this code example. Mine wins the award for most disgusting in terms of pulling apart the pointer - but it also is the clearest for me to understand.
unsigned short number = 1;
unsigned char a;
a = *((unsigned char*)(&number)); // Grab char from first byte of the pointer to the int
unsigned char b;
b = *((unsigned char*)(&number) + 1); // Offset one byte from the pointer and grab second char
One method that works is as follows:
typedef union
{
unsigned char c[sizeof(int)];
int i;
} intchar__t;
intchar__t x;
x.i = 2;
Now x.c[] (an array) will reference the integer as a series of characters, although you will have byte endian issues. Those can be addressed with appropriate #define values for the platform you are programming on. This is similar to the answer that Justin Meiners provided, but a bit cleaner.
unsigned short s = 0xFFEE;
unsigned char b1 = (s >> 8)&0xFF;
unsigned char b2 = (((s << 8)>> 8) & 0xFF);
Simplest I could think of.
int i = 1 // 2 Byte integer value 0x0001
unsigned char byteLow = (i & 0x00FF);
unsinged char byteHigh = ((i & 0xFF00) >> 8);
value in byteLow is 0x01 and value in byteHigh is 0x00
I have a char array that is really used as a byte array and not for storing text. In the array, there are two specific bytes that represent a numeric value that I need to store into an unsigned int value. The code below explains the setup.
char* bytes = bytes[2];
bytes[0] = 0x0C; // For the sake of this example, I'm
bytes[1] = 0x88; // assigning random values to the char array.
unsigned int val = ???; // This needs to be the actual numeric
// value of the two bytes in the char array.
// In other words, the value should equal 0x0C88;
I can not figure out how to do this. I would assume it would involve some casting and recasting of the pointers, but I can not get this to work. How can I accomplish my end goal?
UPDATE
Thank you Martin B for the quick response, however this doesn't work. Specifically, in my case the two bytes are 0x00 and 0xbc. Obviously what I want is 0x000000bc. But what I'm getting in my unsigned int is 0xffffffbc.
The code that was posted by Martin was my actual, original code and works fine so long as all of the bytes are less than 128 (.i.e. positive signed char values.)
unsigned int val = (unsigned char)bytes[0] << CHAR_BIT | (unsigned char)bytes[1];
This if sizeof(unsigned int) >= 2 * sizeof(unsigned char) (not something guaranteed by the C standard)
Now... The interesting things here is surely the order of operators (in many years still I can remember only +, -, * and /... Shame on me :-), so I always put as many brackets I can). [] is king. Second is the (cast). Third is the << and fourth is the | (if you use the + instead of the |, remember that + is more importan than << so you'll need brakets)
We don't need to upcast to (unsigned integer) the two (unsigned char) because there is the integral promotion that will do it for us for one, and for the other it should be an automatic Arithmetic Conversion.
I'll add that if you want less headaches:
unsigned int val = (unsigned char)bytes[0] << CHAR_BIT;
val |= (unsigned char)bytes[1];
unsigned int val = (unsigned char) bytes[0]<<8 | (unsigned char) bytes[1];
The byte ordering depends on the endianness of your processor. You can do this, which will work on big or little endian machines. (without ntohs it will work on big-endian):
unsigned int val = ntohs(*(uint16_t*)bytes)
unsigned int val = bytes[0] << 8 + bytes[1];
I think this is a better way to go about it than relying on pointer aliasing:
union {unsigned asInt; char asChars[2];} conversion;
conversion.asInt = 0;
conversion.asChars[0] = 0x0C;
conversion.asChars[1] = 0x88;
unsigned val = conversion.asInt;