Convert a uint16_t to char[2] to be sent over socket (unix) - c

I know that there are things out there roughly on this.. But my brains hurting and I can't find anything to make this work...
I am trying to send an 16 bit unsigned integer over a unix socket.. To do so I need to convert a uint16_t into two chars, then I need to read them in on the other end of the connection and convert it back into either an unsigned int or an uint16_t, at that point it doesn't matter if it uses 2bytes or 4bytes (I'm running 64bit, that's why I can't use unsigned int :)
I'm doing this in C btw

Why not just break it up into bytes with mask and shift?
uint16_t value = 12345;
char lo = value & 0xFF;
char hi = value >> 8;
On the other end, you assemble with the reverse:
uint16_t value = lo | uint16_t(hi) << 8;
Off the top of my head, not sure if that cast is required.

char* pUint16 = (char*)&u16;
ie Cast the address of the uint16_t.
char c16[2];
uint16_t ui16 = 0xdead;
memcpy( c16, ui16, 2 );
c16 now contains the 2 bytes of the u16. At the far end you can simply reverse the process.
char* pC16 = /*blah*/
uint16_t ui16;
memcpy( &ui16, pC16, 2 );
Interestingly though there is a call to memcpy nearly every compiler will optimise it out because its of a fixed size.
As Steven sudt points out you may get problems with big-endian-ness. to get round this you can use the htons (host-to-network short) function.
uint16_t ui16correct = htons( 0xdead );
and at the far end use ntohs (network-to-host short)
uint16_t ui16correct = ntohs( ui16 );
On a little-endian machine this will convert the short to big-endian and then at the far end convert back from big-endian. On a big-endian machine the 2 functions do nothing.
Of course if you know that the architecture of both machines on the network use the same endian-ness then you can avoid this step.
Look up ntohl and htonl for handling 32-bit integers. Most platforms also support ntohll and htonll for 64-bits as well.

Sounds like you need to use the bit mask and shift operators.
To split up a 16-bit number into two 8-bit numbers:
you mask the lower 8 bits using the bitwise AND operator (& in C) so that the upper 8 bits all become 0, and then assign that result to one char.
you shift the upper 8 bits to the right using the right shift operator (>> in C) so that the lower 8 bits are all pushed out of the integer, leaving only the top 8 bits, and assign that to another char.
Then when you send these two chars over the connection, you do the reverse: you shift what used to be the top 8 bits to the left by 8 bits, and then use bitwise OR to combine that with the other 8 bits.

Basically you are sending 2 bytes over the socket, that's all the socket need to know, regardless of endianness, signedness and so on... just decompose your uint16 into 2 bytes and send them over the socket.
char byte0 = u16 & 0xFF;
char byte1 = u16 >> 8;
At the other end do the conversion in the opposite way


Converting 32 bit number to four 8bit numbers

I am trying to convert the input from a device (always integer between 1 and 600000) to four 8-bit integers.
For example,
If the input is 32700, I want 188 127 00 00.
I achieved this by using:
32700 % 256
32700 / 256
The above works till 32700. From 32800 onward, I start getting incorrect conversions.
I am totally new to this and would like some help to understand how this can be done properly.
Major edit following clarifications:
Given that someone has already mentioned the shift-and-mask approach (which is undeniably the right one), I'll give another approach, which, to be pedantic, is not portable, machine-dependent, and possibly exhibits undefined behavior. It is nevertheless a good learning exercise, IMO.
For various reasons, your computer represents integers as groups of 8-bit values (called bytes); note that, although extremely common, this is not always the case (see CHAR_BIT). For this reason, values that are represented using more than 8 bits use multiple bytes (hence those using a number of bits with is a multiple of 8). For a 32-bit value, you use 4 bytes and, in memory, those bytes always follow each other.
We call a pointer a value containing the address in memory of another value. In that context, a byte is defined as the smallest (in terms of bit count) value that can be referred to by a pointer. For example, your 32-bit value, covering 4 bytes, will have 4 "addressable" cells (one per byte) and its address is defined as the first of those addresses:
| ... | x-1 | <== Pointer to byte before
| BYTE 0 | x | <== Pointer to first byte (also pointer to 32-bit value)
| BYTE 1 | x+1 | <== Pointer to second byte
| BYTE 2 | x+2 | <== Pointer to third byte
| BYTE 3 | x+3 | <== Pointer to fourth byte
| ... | x+4 | <== Pointer to byte after
So what you want to do (split the 32-bit word into 8-bits word) has already been done by your computer, as it is imposed onto it by its processor and/or memory architecture. To reap the benefits of this almost-coincidence, we are going to find where your 32-bit value is stored and read its memory byte-by-byte (instead of 32 bits at a time).
As all serious SO answers seem to do so, let me cite the Standard (ISO/IEC 9899:2018, 6.2.5-20) to define the last thing I need (emphasis mine):
Any number of derived types can be constructed from the object and function types, as follows:
An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. [...] Array types are characterized by their element type and by the number of elements in the array. [...]
So, as elements in an array are defined to be contiguous, a 32-bit value in memory, on a machine with 8-bit bytes, really is nothing more, in its machine representation, than an array of 4 bytes!
Given a 32-bit signed value:
int32_t value;
its address is given by &value. Meanwhile, an array of 4 8-bit bytes may be represented by:
uint8_t arr[4];
notice that I use the unsigned variant because those bytes don't really represent a number per se so interpreting them as "signed" would not make sense. Now, a pointer-to-array-of-4-uint8_t is defined as:
uint8_t (*ptr)[4];
and if I assign the address of our 32-bit value to such an array, I will be able to index each byte individually, which means that I will be reading the byte directly, avoiding any pesky shifting-and-masking operations!
uint8_t (*bytes)[4] = (void *) &value;
I need to cast the pointer ("(void *)") because I can't bear that whining compiler &value's type is "pointer-to-int32_t" while I'm assigning it to a "pointer-to-array-of-4-uint8_t" and this type-mismatch is caught by the compiler and pedantically warned against by the Standard; this is a first warning that what we're doing is not ideal!
Finally, we can access each byte individually by reading it directly from memory through indexing: (*bytes)[n] reads the n-th byte of value!
To put it all together, given a send_can(uint8_t) function:
for (size_t i = 0; i < sizeof(*bytes); i++)
and, for testing purpose, we define:
void send_can(uint8_t b)
printf("%hhu\n", b);
which prints, on my machine, when value is 32700:
Lastly, this shows yet another reason why this method is platform-dependent: the order in which the bytes of the 32-bit word is stored isn't always what you would expect from a theoretical discussion of binary representation i.e:
byte 0 contains bits 31-24
byte 1 contains bits 23-16
byte 2 contains bits 15-8
byte 3 contains bits 7-0
actually, AFAIK, the C Language permits any of the 24 possibilities for ordering those 4 bytes (this is called endianness). Meanwhile, shifting and masking will always get you the n-th "logical" byte.
It really depends on how your architecture stores an int. For example
8 or 16 bit system short=16, int=16, long=32
32 bit system, short=16, int=32, long=32
64 bit system, short=16, int=32, long=64
This is not a hard and fast rule - you need to check your architecture first. There is also a long long but some compilers do not recognize it and the size varies according to architecture.
Some compilers have uint8_t etc defined so you can actually specify how many bits your number is instead of worrying about ints and longs.
Having said that you wish to convert a number into 4 8 bit ints. You could have something like
unsigned long x = 600000UL; // you need UL to indicate it is unsigned long
unsigned int b1 = (unsigned int)(x & 0xff);
unsigned int b2 = (unsigned int)(x >> 8) & 0xff;
unsigned int b3 = (unsigned int)(x >> 16) & 0xff;
unsigned int b4 = (unsigned int)(x >> 24);
Using shifts is a lot faster than multiplication, division or mod. This depends on the endianess you wish to achieve. You could reverse the assignments using b1 with the formula for b4 etc.
You could do some bit masking.
600000 is 0x927C0
600000 / (256 * 256) gets you the 9, no masking yet.
((600000 / 256) & (255 * 256)) >> 8 gets you the 0x27 == 39. Using a 8bit-shifted mask of 8 set bits (256 * 255) and a right shift by 8 bits, the >> 8, which would also be possible as another / 256.
600000 % 256 gets you the 0xC0 == 192 as you did it. Masking would be 600000 & 255.
I ended up doing this:
unsigned char bytes[4];
unsigned long n;
n = (unsigned long) sensore1 * 100;
bytes[0] = n & 0xFF;
bytes[1] = (n >> 8) & 0xFF;
bytes[2] = (n >> 16) & 0xFF;
bytes[3] = (n >> 24) & 0xFF;
I have been in a similar kind of situation while packing and unpacking huge custom packets of data to be transmitted/received, I suggest you try below approach:
typedef union
uint32_t u4_input;
uint8_t u1_byte_arr[4];
un_t_mode_reg.u4_input = input;/*your 32 bit input*/
// 1st byte = un_t_mode_reg.u1_byte_arr[0];
// 2nd byte = un_t_mode_reg.u1_byte_arr[1];
// 3rd byte = un_t_mode_reg.u1_byte_arr[2];
// 4th byte = un_t_mode_reg.u1_byte_arr[3];
The largest positive value you can store in a 16-bit signed int is 32767. If you force a number bigger than that, you'll get a negative number as a result, hence unexpected values returned by % and /.
Use either unsigned 16-bit int for a range up to 65535 or a 32-bit integer type.

Why would someone bitwise AND an 8-bit value with a 16-bit mask in C?

I am trying to replicate Javidx9's NES/MOS6502 CPU code in C# as an academic exercise and I am having trouble understanding the logic behind the implementation of the Zero-Page Addressing Mode. Specifically, I am looking at this code:
// Address Mode: Zero Page
// To save program bytes, zero page addressing allows you to absolutely address
// a location in first 0xFF bytes of address range. Clearly this only requires
// one byte instead of the usual two.
uint8_t olc6502::ZP0()
addr_abs = read(pc);
addr_abs &= 0x00FF;
return 0;
I struggle to understand why addr_abs &= 0x00FF; is there, uint16_t addr_abs is 16 bits but
uint8_t read(uint16_t a); returns an 8-bit value anyways, so the upper 8 bits (MOS6502 is little-endian) would be 00'd out by default? Am I missing something about how the C compiler/x86 ISA works?
You're correct addr_abs &= 0x00ff isn't needed.
uint16_t x = n where n is an unsigned 8-bit number (which is the case here). x would have it's upper 8 bits cleared. As #tadman stated, there might have been a different method used previously to store the value into addr_abs which didn't clear the upper 8 bits.

c Code that reads a 4 byte little endian number from a buffer

I encountered this piece of C code that's existing. I am struggling to understand it.
I supposidly reads a 4 byte unsigned value passed in a buffer (in little endian format) into a variable of type "long".
This code runs on a 64 bit word size, little endian x86 machine - where sizeof(long) is 8 bytes.
My guess is that this code is intended to also run on a 32 bit x86 machine - so a variable of type long is used instead of int for sake of storing value from a four byte input data.
I am having some doubts and have put comments in the code to express what I understand, or what I don't :-)
Please answer questions below in that context
void read_Value_From_Four_Byte_Buff( char*input)
/* use long so on 32 bit machine, can still accommodate 4 bytes */
long intValueOfInput;
/* Bitwise and of input buffer's byte 0 with 0xFF gives MSB or LSB ?*/
/* This code seems to assume that assignment will store in rightmost byte - is that true on a x86 machine ?*/
intValueOfInput = 0xFF & input[0];
/*left shift byte-1 eight times, bitwise "or" places in 2nd byte frm right*/
intValueOfInput |= ((0xFF & input[1]) << 8);
/* similar left shift in mult. of 8 and bitwise "or" for next two bytes */
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
My questions
1) The input buffer is expected to be in "Little endian". But from code looks like assumption here is that it read in as Byte 0 = MSB, Byte 1, Byte 2, Byte 3= LSB. I thought so because code reads bytes starting from Byte 0, and subsequent bytes ( 1 onwards) are placed in the target variable after left shifting. Is that how it is or am I getting it wrong ?
2) I feel this is a convoluted way of doing things - is there a simpler alternative to copy value from 4 byte buffer into a long variable ?
3) Will the assumption "that this code will run on a 64 bit machine" will have any bearing on how easily I can do this alternatively? I mean is all this trouble to keep it agnostic to word size ( I assume its agnostic to word size now - not sure though) ?
Thanks for your enlightenment :-)
You have it backwards. When you left shift, you're putting into more significant bits. So (0xFF & input[3]) << 24) puts Byte 3 into the MSB.
This is the way to do it in standard C. POSIX has the function ntohl() that converts from network byte order to a native 32-bit integer, so this is usually used in Unix/Linux applications.
This will not work exactly the same on a 64-bit machine, unless you use unsigned long instead of long. As currently written, the highest bit of input[3] will be put into the sign bit of the result (assuming a twos-complement machine), so you can get negative results. If long is 64 bits, all the results will be positive.
The code you are using does indeed treat the input buffer as little endian. Look how it takes the first byte of the buffer and just assigns it to the variable without any shifting. If the first byte increases by 1, the value of your result increases by 1, so it is the least-significant byte (LSB). Left-shifting makes a byte more significant, not less. Left-shifting by 8 is generally the same as multiplying by 256.
I don't think you can get much simpler than this unless you use an external function, or make assumptions about the machine this code is running on, or invoke undefined behavior. In most instances, it would work to just write uint32_t x = *(uint32_t *)input; but this assumes your machine is little endian and I think it might be undefined behavior according to the C standard.
No, running on a 64-bit machine is not a problem. I recommend using types like uint32_t and int32_t to make it easier to reason about whether your code will work on different architectures. You just need to include the stdint.h header from C99 to use those types.
The right-hand side of the last line of this function might exhibit undefined behavior depending on the data in the input:
((0xFF & input[3]) << 24)
The problem is that (0xFF & input[3]) will be a signed int (because of integer promotion). The int will probably be 32-bit, and you are shifting it so far to the left that the resulting value might not be representable in an int. The C standard says this is undefined behavior, and you should really try to avoid that because it gives the compiler a license to do whatever it wants and you won't be able to predict the result.
A solution is to convert it from an int to a uint32_t before shifting it, using a cast.
Finally, the variable intValueOfInput is written to but never used. Shouldn't you return it or store it somewhere?
Taking all this into account, I would rewrite the function like this:
uint32_t read_value_from_four_byte_buff(char * input)
uint32_t x;
x = 0xFF & input[0];
x |= (0xFF & input[1]) << 8;
x |= (0xFF & input[2]) << 16;
x |= (uint32_t)(0xFF & input[3]) << 24;
return x;
From the code, Byte 0 is LSB, Byte 3 is MSB. But there are some typos. The lines should be
intValueOfInput |= ((0xFF & input[2]) << 16);
intValueOfInput |= ((0xFF & input[3]) << 24);
You can make the code shorter by dropping 0xFF but using the type "unsigned char" in the argument type.
To make the code shorter, you can do:
long intValueOfInput = 0;
for (int i = 0, shift = 0; i < 4; i++, shift += 8)
intValueOfInput |= ((unsigned char)input[i]) << shift;

Assign unsigned char to unsigned short with bit operators in ansi C

I know it is possible to assign an unsigned char to an unsigned short, but I would like to have more control how the bits are actually assigned to the unsigned short.
unsigned char UC_8;
unsigned short US_16;
UC_8 = 0xff;
US_16 = (unsigned char) UC_8;
The bits from UC_8 are now placed in the lower bits of US_16. I need more control of the conversion since the application I'm currently working on are safety related. Is it possible to control the conversion with bit operators? So I can specify where the 8 bits from the unsigned char should be placed in the bigger 16 bit unsigned short variable.
My guess is that it would be possible with masking combined with some other bit-operator, maybe left/right shifting.
UC_8 = 0xff;
US_16 = (US_16 & 0x00ff) ?? UC_8; // Maybe masking?
I have tried different combinations but have not come up with a smart solution. I'm using ansi C and as said earlier, need more control how the bits actually are set in the larger variable.
My problem or concern comes from a CRC generating function. It will and should always return an unsigned short, since it will sometimes calculate an 16 bit CRC. But sometimes it should calculate a 8 bit CRC instead, and place the 8 bit on the eight LSB in the 16 bit return variable. And on the eight MSB should then contain only zeros.
I would like to say something like:
US_16(7 downto 0) = UC_8;
US_16(15 downto 8) = 0x00;
If I just typecast it, can I guarantee that the bits always will be placed on the lower bits in the larger variable? (On all different architectures)
What do you mean, "control"?
The C standard unambiguously defines the unsigned binary format in terms of bit positions and significance. Certain bits of a 16-bit variable are "low", by numerical definition, and they will hold the pattern from the 8-bit variable, the other bits being set to zero. There is no ambiguity, no wiggle room, and nothing else to control.
Maybe rotation of bits will help you:
US_16 = (US_16 & 0x00ff) | ( UC_8 << 8 );
Result in bits will be:
C - UC_8 bits
S - US_16 bits
CCCC CCCC SSSS SSSS, resp.: SSSS SSSS are last 8 bits of US_16
But if UC_8 was 1 and US_16 was 0, then US_16 will be 512. Are you mean this?
US_16 = (US_16 & 0xff00) | ( UC_8 & 0x00ff );
Is this what you want?
If it is important to use ansi C, and not be restricted to a particular implementation, then you should not assume sizeof(short) == 2. And why bother to cast an unsigned char to an unsigned char (the same thing)? Although probably safe to assume char is 8 bits nowadays, even though that's not guaranteed.
uint8_t UC_8;
uint16_t US_16;
int nbits = ...# of bits to shift...;
US_16 = UC_8 << nbits;
Obviously, if you shift more than 15 bits, it may not be what you want. If you need to actually rearrange the bits, rather than just shift them to some position, you'll have to set them individually
int sourcebit = ...0 to 7...;
int destinationbit = ...0 to 15...;
// set
US_16 |= (US_8 & (1<<sourcebit)) << (destinationbit - sourcebit);
// clear
US_16 &= ~((US_8 & (1<<sourcebit)) << (destinationbit - sourcebit));
note: just wrote, didn't test. probably not optimal. blah blah blah. but something like that will work.

Bit Shifting, Masking or a Bit Field Struct?

I'm new to working with bits. I'm trying to work with an existing protocol, which can send three different types of messages.
Type 1 is a 16-bit structure:
struct digital
unsigned int type:2;
unsigned int highlow:1;
unsigned int sig1:5;
unsigned int :1;
unsigned int sig2:7;
The first two bits (type, in my struct above) are always 1 0 . The third bit, highlow, determines whether the signal is on or off, and sig1 + sig2 together define the 12-bit index of the signal. This index is split across the two bytes by a 0, which is always in bit 7.
Type 2 is a 32-bit structure. It has a 2-bit type, a 10-bit index and a 16-bit value, interspersed with 0's at positions 27, 23, 15 & 7. A bit-field struct representation would like something like this:
struct analog
unsigned int type:2;
unsigned int val1:2;
unsigned int :1;
unsigned int sig1:3;
unsigned int :1;
unsigned int sig2:7;
unsigned int :1;
unsigned int val2:7;
unsigned int :1;
unsigned int val3:7;
sig1 & sig2 together form the 10-bit index. val1 + val2 + val3 together form the 16-bit value of the signal at the 10-bit index.
If I understand how to work with the first two structs, I think I can figure out the third.
My question is, is there a way to assign a single value and have the program work out the bits that need to go into val1, val2 and val3?
I've read about bit shifting, bit-field structs and padding with 0's. The struct seems like the way to go, but I'm not sure how to implement it. None of the examples of bit-packing that I've seen have values that are split the way these are. Ultimately, I'd like to be able to create an analog struct, assign an index (i = 252) and a value (v = 32768) and be done with it.
If someone could suggest the appropriate method or provide a link to a similar sample, I'd greatly appreciate it. If it matters, this code will be incorporated into a larger Objective-C app.
You can do it with a series of shifts, ands, and ors. I have done the 10-bit index part for Type 2:
unsigned int i = 252;
analog a = (analog)(((i << 16) & 0x7f0000) | (i << 17) & 0x7000000);
Essentially, what this code does is shift the 10 bits of interest in int i to the range 16 - 25, then it ands it with the bitmask 0x7f0000 to set bits 22 - 31 to zero. It also shifts another copy of the 10 bits to the range 17 - 26, then it ands it with the bitmask 0x7000000 to set bits 0 - 22 and 26 - 31 to zero. Then it ors the two values together to create your desired zero-separated value.
.. I'm not absolutely sure that I counted the bitmasks correctly, but I hope you've got the idea. Just shift, and-mask, and or-merge.
Edit: Method 2:
analog a;
a.sig1 = (i & 0x7f); // mask out bit 8 onwards
a.sig2 = ((i<<1) & 0x700); // shift left by one, then mask out bits 0-8
On second thought method 2 is more readable, so you should probably use this.
You don't have to do this, this is where the union keyword comes in - you can specify all the bits out at the same time, or by referring to the same bits with a different name, set them all at once.
You shouldn't use C structure bitfields because the physical layout of bitfields is undefined. While you could figure out what your compiler is doing and get your layout to match the underlying data, the code may not work if you switch to a different compiler or even update your compiler.
I know it's a pain, but do the bit manipulation yourself.
