I consider how to make efficient XORing of 2 bytes arrays.
I have this bytes arrays defined as unsigned char *
I think that XORing them as uint64_t will be much faster. Is it true?
How efficiently convert unsigned char * to this uint64_t * preferably inside the XORing loop? How to make padding of last bytes if length of the bytes array % 8 isn't 0?
Here is my current code that XORs bytes array, but each byte (unsigned char) separately:
unsigned char *bitwise_xor(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {
unsigned char *XOR_Bytes_Array;
// allocate XORed bytes array
XOR_Bytes_Array = malloc(sizeof(unsigned char) * length);
// perform bitwise XOR operation on bytes arrays A and B
for(int i=0; i < length; i++)
XOR_Bytes_Array[i] = (unsigned char)(A_Bytes_Array[i] ^ B_Bytes_Array[i]);
return XOR_Bytes_Array;
}
Ok, in the meantime I have tried to do it this way. My bytes_array are rather large (rgba bitmaps 4*1440*900?).
static uint64_t next64bitsFromBytesArray(const unsigned char *bytesArray, const int i) {
uint64_t next64bits = (uint64_t) bytesArray[i+7] | ((uint64_t) bytesArray[i+6] << 8) | ((uint64_t) bytesArray[i+5] << 16) | ((uint64_t) bytesArray[i+4] << 24) | ((uint64_t) bytesArray[i+3] << 32) | ((uint64_t) bytesArray[i+2] << 40) | ((uint64_t) bytesArray[i+1] << 48) | ((uint64_t)bytesArray[i] << 56);
return next64bits;
}
unsigned char *bitwise_xor64(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {
unsigned char *XOR_Bytes_Array;
// allocate XORed bytes array
XOR_Bytes_Array = malloc(sizeof(unsigned char) * length);
// perform bitwise XOR operation on bytes arrays A and B using uint64_t
for(int i=0; i<length; i+=8) {
uint64_t A_Bytes = next64bitsFromBytesArray(A_Bytes_Array, i);
uint64_t B_Bytes = next64bitsFromBytesArray(B_Bytes_Array, i);
uint64_t XOR_Bytes = A_Bytes ^ B_Bytes;
memcpy(XOR_Bytes_Array + i, &XOR_Bytes, 8);
}
return XOR_Bytes_Array;
}
UPDATE: (2nd approach to this problem)
unsigned char *bitwise_xor64(const unsigned char *A_Bytes_Array, const unsigned char *B_Bytes_Array, const size_t length) {
const uint64_t *aBytes = (const uint64_t *) A_Bytes_Array;
const uint64_t *bBytes = (const uint64_t *) B_Bytes_Array;
unsigned char *xorBytes = malloc(sizeof(unsigned char)*length);
for(int i = 0, j=0; i < length; i +=8) {
uint64_t aXORbBytes = aBytes[j] ^ bBytes[j];
//printf("a XOR b = 0x%" PRIx64 "\n", aXORbBytes);
memcpy(xorBytes + i, &aXORbBytes, 8);
j++;
}
return xorBytes;
}
So I did an experiment:
#include <stdlib.h>
#include <stdint.h>
#ifndef TYPE
#define TYPE uint64_t
#endif
TYPE *
xor(const void *va, const void *vb, size_t l)
{
const TYPE *a = va;
const TYPE *b = vb;
TYPE *r = malloc(l);
size_t i;
for (i = 0; i < l / sizeof(TYPE); i++) {
*r++ = *a++ ^ *b++;
}
return r;
}
Compiled both for uint64_t and uint8_t with clang with basic optimizations. In both cases the compiler vectorized the hell out of this. The difference was that the uint8_t version had code to handle when l wasn't a multiple of 8. So if we add code to handle the size not being a multiple of 8, you'll probably end up with equivalent generated code. Also, the 64 bit version unrolled the loop a few times and had code to handle that, so for big enough arrays you might gain a few percent here. On the other hand, on big enough arrays you'll be memory-bound and the xor operation won't matter a bit.
Are you sure your compiler won't deal with this? This is a kind of micro-optimization that makes sense only when you're measuring things and then you wouldn't need to ask which one is faster, you'd know.
In an arbitrary-sized array of bytes in C, I want to store 14-bit numbers (0-16,383) tightly packed. In other words, in the sequence:
0000000000000100000000000001
there are two numbers that I wish to be able to arbitrarily store and retrieve into a 16-bit integer. (in this case, both of them are 1, but could be anything in the given range) If I were to have the functions uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) and void 14bitarr_set(unsigned char* arr, unsigned int index, uint16_t value), how would I implement those functions?
This is not for a homework project, merely my own curiosity. I have a specific project that this would be used for, and it is the key/center of the entire project.
I do not want an array of structs that have 14-bit values in them, as that generates waste bits for every struct that is stored. I want to be able to tightly pack as many 14-bit values as I possibly can into an array of bytes. (e.g.: in a comment I made, putting as many 14-bit values into a chunk of 64 bytes is desirable, with no waste bits. the way those 64 bytes work is completely tightly packed for a specific use case, such that even a single bit of waste would take away the ability to store another 14 bit value)
Well, this is bit fiddling at its best. Doing it with an array of bytes makes it more complicated than it would be with larger elements because a single 14 bit quantity can span 3 bytes, where uint16_t or anything bigger would require no more than two. But I'll take you at your word that this is what you want (no pun intended). This code will actually work with the constant set to anything 8 or larger (but not over the size of an int; for that, additional type casts are needed). Of course the value type must be adjusted if larger than 16.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define W 14
uint16_t arr_get(unsigned char* arr, size_t index) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
uint16_t result = arr[byte_index] >> bit_in_byte_index;
for (unsigned n_bits = 8 - bit_in_byte_index; n_bits < W; n_bits += 8)
result |= arr[++byte_index] << n_bits;
return result & ~(~0u << W);
}
void arr_set(unsigned char* arr, size_t index, uint16_t value) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
arr[byte_index] &= ~(0xff << bit_in_byte_index);
arr[byte_index++] |= value << bit_in_byte_index;
unsigned n_bits = 8 - bit_in_byte_index;
value >>= n_bits;
while (n_bits < W - 8) {
arr[byte_index++] = value;
value >>= 8;
n_bits += 8;
}
arr[byte_index] &= 0xff << (W - n_bits);
arr[byte_index] |= value;
}
int main(void) {
int mod = 1 << W;
int n = 50000;
unsigned x[n];
unsigned char b[2 * n];
for (int tries = 0; tries < 10000; tries++) {
for (int i = 0; i < n; i++) {
x[i] = rand() % mod;
arr_set(b, i, x[i]);
}
for (int i = 0; i < n; i++)
if (arr_get(b, i) != x[i])
printf("Err #%d: %d should be %d\n", i, arr_get(b, i), x[i]);
}
return 0;
}
Faster versions Since you said in comments that performance is an issue: open coding the loops gives a roughly 10% speed improvement on my machine on the little test driver included in the original. This includes random number generation and testing, so perhaps the primitives are 20% faster. I'm confident that 16- or 32-bit array elements would give further improvements because byte access is expensive:
uint16_t arr_get(unsigned char* a, size_t i) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
return (a[iy] | (a[iy+1] << 8)) & 0x3fff;
case 2:
return ((a[iy] >> 2) | (a[iy+1] << 6)) & 0x3fff;
case 4:
return ((a[iy] >> 4) | (a[iy+1] << 4) | (a[iy+2] << 12)) & 0x3fff;
}
return ((a[iy] >> 6) | (a[iy+1] << 2) | (a[iy+2] << 10)) & 0x3fff;
}
#define M(IB) (~0u << (IB))
#define SETLO(IY, IB, V) a[IY] = (a[IY] & M(IB)) | ((V) >> (14 - (IB)))
#define SETHI(IY, IB, V) a[IY] = (a[IY] & ~M(IB)) | ((V) << (IB))
void arr_set(unsigned char* a, size_t i, uint16_t val) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
a[iy] = val;
SETLO(iy+1, 6, val);
return;
case 2:
SETHI(iy, 2, val);
a[iy+1] = val >> 6;
return;
case 4:
SETHI(iy, 4, val);
a[iy+1] = val >> 4;
SETLO(iy+2, 2, val);
return;
}
SETHI(iy, 6, val);
a[iy+1] = val >> 2;
SETLO(iy+2, 4, val);
}
Another variation
This is quite a bit faster yet on my machine, about 20% better than above:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> (ib % 8)) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
unsigned io = ib % 8;
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
Note that for this code to be safe you should allocate one extra byte at the end of the packed array. It always reads and writes 3 bytes even when the desired 14 bits are in the first 2.
One more variation Finally, this runs just a bit slower than the one above (again on my machine; YMMV), but you don't need the extra byte. It uses one comparison per operation:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
unsigned buf = ib % 8 <= 2
? a[iy] | (a[iy+1] << 8)
: a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> io) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
if (io <= 2) {
unsigned buf = a[iy] | (a[iy+1] << 8);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
} else {
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
}
The easiest solution is to use a struct of eight bitfields:
typedef struct __attribute__((__packed__)) EightValues {
uint16_t v0 : 14,
v1 : 14,
v2 : 14,
v3 : 14,
v4 : 14,
v5 : 14,
v6 : 14,
v7 : 14;
} EightValues;
This struct has a size of 14*8 = 112 bits, which is 14 bytes (seven uint16_t). Now, all you need is to use the last three bits of the array index to select the right bitfield:
uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) {
EightValues* accessPointer = (EightValues*)arr;
accessPointer += index >> 3; //select the right structure in the array
switch(index & 7) { //use the last three bits of the index to access the right bitfield
case 0: return accessPointer->v0;
case 1: return accessPointer->v1;
case 2: return accessPointer->v2;
case 3: return accessPointer->v3;
case 4: return accessPointer->v4;
case 5: return accessPointer->v5;
case 6: return accessPointer->v6;
case 7: return accessPointer->v7;
}
}
Your compiler will do the bit-fiddling for you.
The Basis for Storage Issue
The biggest issue you are facing is the fundamental question of "What is my basis for storage going to be?" You know the basics, what you have available to you is char, short, int, etc... The smallest being 8-bits. No matter how you slice your storage scheme, it will ultimately have to rest in memory in a unit of memory based on this 8 bit per byte layout.
The only optimal, no bits wasted, memory allocation would be to declare an array of char in the least common multiple of 14-bits. It is the full 112-bits in this case (7-shorts or 14-chars). This may be the best option. Here, declaring an array of 7-shorts or 14-chars, would allow the exact storage of 8 14-bit values. Of course if you have no need for 8 of them, then it wouldn't be of much use anyway as it would waste more than the 4-bits lost on a single unsigned value.
Let me know if this is something you would like to further explore. If it is, I'm happy to help with the implementation.
Bitfield Struct
The comments regarding bitfield packing or bit packing are exactly what you need to do. This can involve a structure alone or in combination with a union, or by manually right/left shifting values directly as needed.
A short example applicable to your situation (if I understood correctly you want 2 14-bit areas in memory) would be:
#include <stdio.h>
typedef struct bitarr14 {
unsigned n1 : 14,
n2 : 14;
} bitarr14;
char *binstr (unsigned long n, size_t sz);
int main (void) {
bitarr14 mybitfield;
mybitfield.n1 = 1;
mybitfield.n2 = 1;
printf ("\n mybitfield in memory : %s\n\n",
binstr (*(unsigned *)&mybitfield, 28));
return 0;
}
char *binstr (unsigned long n, size_t sz)
{
static char s[64 + 1] = {0};
char *p = s + 64;
register size_t i = 0;
for (i = 0; i < sz; i++) {
p--;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
Output
$ ./bin/bitfield14
mybitfield in memory : 0000000000000100000000000001
Note: the dereference of mybitfield for purposes of printing the value in memory breaks strict aliasing and it is intentional just for the purpose of the output example.
The beauty, and purpose for using a struct in the manner provided is it will allow direct access to each 14-bit part of the struct directly, without having to manually shift, etc.
Update - assuming you want big endian bit packing. This is code meant for a fixed size code word. It's based on code I've used for data compression algorithms. The switch case and fixed logic helps with performance.
typedef unsigned short uint16_t;
void bit14arr_set(unsigned char* arr, unsigned int index, uint16_t value)
{
unsigned int bitofs = (index*14)%8;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
*arr++ = (unsigned char)(value >> 6);
*arr &= 0x03;
*arr |= (unsigned char)(value << 2);
break;
case 2: /* bit offset == 2 */
*arr &= 0xc0;
*arr++ |= (unsigned char)(value >> 8);
*arr = (unsigned char)(value << 0);
break;
case 4: /* bit offset == 4 */
*arr &= 0xf0;
*arr++ |= (unsigned char)(value >> 10);
*arr++ = (unsigned char)(value >> 2);
*arr &= 0x3f;
*arr |= (unsigned char)(value << 6);
break;
case 6: /* bit offset == 6 */
*arr &= 0xfc;
*arr++ |= (unsigned char)(value >> 12);
*arr++ = (unsigned char)(value >> 4);
*arr &= 0x0f;
*arr |= (unsigned char)(value << 4);
break;
}
}
uint16_t bit14arr_get(unsigned char* arr, unsigned int index)
{
unsigned int bitofs = (index*14)%8;
unsigned short value;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
value = ((unsigned int)(*arr++) ) << 6;
value |= ((unsigned int)(*arr ) ) >> 2;
break;
case 2: /* bit offset == 2 */
value = ((unsigned int)(*arr++)&0x3f) << 8;
value |= ((unsigned int)(*arr ) ) >> 0;
break;
case 4: /* bit offset == 4 */
value = ((unsigned int)(*arr++)&0x0f) << 10;
value |= ((unsigned int)(*arr++) ) << 2;
value |= ((unsigned int)(*arr ) ) >> 6;
break;
case 6: /* bit offset == 6 */
value = ((unsigned int)(*arr++)&0x03) << 12;
value |= ((unsigned int)(*arr++) ) << 4;
value |= ((unsigned int)(*arr ) ) >> 4;
break;
}
return value;
}
Here's my version (updated to fix bugs):
#define PACKWID 14 // number of bits in packed number
#define PACKMSK ((1 << PACKWID) - 1)
#ifndef ARCHBYTEALIGN
#define ARCHBYTEALIGN 1 // align to 1=bytes, 2=words
#endif
#define ARCHBITALIGN (ARCHBYTEALIGN * 8)
typedef unsigned char byte;
typedef unsigned short u16;
typedef unsigned int u32;
typedef long long s64;
typedef u16 pcknum_t; // container for packed number
typedef u32 acc_t; // working accumulator
#ifndef ARYOFF
#define ARYOFF long
#endif
#define PRT(_val) ((unsigned long) _val)
typedef unsigned ARYOFF aryoff_t; // bit offset
// packary -- access array of packed numbers
// RETURNS: old value
extern inline pcknum_t
packary(byte *ary,aryoff_t idx,int setflg,pcknum_t newval)
// ary -- byte array pointer
// idx -- index into array (packed number relative)
// setflg -- 1=set new value, 0=just get old value
// newval -- new value to set (if setflg set)
{
aryoff_t absbitoff;
aryoff_t bytoff;
aryoff_t absbitlhs;
acc_t acc;
acc_t nval;
int shf;
acc_t curmsk;
pcknum_t oldval;
// get the absolute bit number for the given array index
absbitoff = idx * PACKWID;
// get the byte offset of the lowest byte containing the number
bytoff = absbitoff / ARCHBITALIGN;
// get absolute bit offset of first containing byte
absbitlhs = bytoff * ARCHBITALIGN;
// get amount we need to shift things by:
// (1) our accumulator
// (2) values to set/get
shf = absbitoff - absbitlhs;
#ifdef MODSHOW
do {
static int modshow;
if (modshow > 50)
break;
++modshow;
printf("packary: MODSHOW idx=%ld shf=%d bytoff=%ld absbitlhs=%ld absbitoff=%ld\n",
PRT(idx),shf,PRT(bytoff),PRT(absbitlhs),PRT(absbitoff));
} while (0);
#endif
// adjust array pointer to the portion we want (guaranteed to span)
ary += bytoff * ARCHBYTEALIGN;
// fetch the number + some other bits
acc = *(acc_t *) ary;
// get the old value
oldval = (acc >> shf) & PACKMSK;
// set the new value
if (setflg) {
// get shifted mask for packed number
curmsk = PACKMSK << shf;
// remove the old value
acc &= ~curmsk;
// ensure caller doesn't pass us a bad value
nval = newval;
#if 0
nval &= PACKMSK;
#endif
nval <<= shf;
// add in the value
acc |= nval;
*(acc_t *) ary = acc;
}
return oldval;
}
pcknum_t
int_get(byte *ary,aryoff_t idx)
{
return packary(ary,idx,0,0);
}
void
int_set(byte *ary,aryoff_t idx,pcknum_t newval)
{
packary(ary,idx,1,newval);
}
Here are benchmarks:
set: 354740751 7.095 -- gene
set: 203407176 4.068 -- rcgldr
set: 298946533 5.979 -- craig
get: 268574627 5.371 -- gene
get: 166839767 3.337 -- rcgldr
get: 207764612 4.155 -- craig
What is the FASTEST way, using bit operators to return the number, represented with 3 different unsigned char variables ?
unsigned char byte1 = 200;
unsigned char byte2 = 40;
unsigned char byte3 = 33;
unsigned long number = byte1 + byte2 * 256 + byte3 * 256 * 256;
is the slowest way possible.
Just shift each one into place, and OR them together:
#include <stdint.h>
int main(void)
{
uint8_t a = 0xAB, b = 0xCD, c = 0xEF;
/*
* 'a' must be first cast to uint32_t because of the implicit conversion
* to int, which is only guaranteed to be at least 16 bits.
* (Thanks Matt McNabb and Tim Čas.)
*/
uint32_t i = ((uint32_t)a << 16) | (b << 8) | c;
printf("0x%X\n", i);
return 0;
}
Do note however, that almost any modern compiler will replace a multiplication by a power of two with a bit-shift of the appropriate amount.
The fastest way would be the direct memory writing, assuming you know the endian of your system (here the assumption is little endian):
unsigned char byte1 = 200;
unsigned char byte2 = 40;
unsigned char byte3 = 33;
unsigned long number = 0;
((unsigned char*)&number)[0] = byte1;
((unsigned char*)&number)[1] = byte2;
((unsigned char*)&number)[2] = byte3;
Or if you don't mind doing some excercise, you can do something like:
union
{
unsigned long ulongVal;
unsigned char chars[4]; // In case your long is 32bits
} a;
and then by assigning:
a.chars[0] = byte1;
a.chars[1] = byte2;
a.chars[2] = byte3;
a.chars[3] = 0;
you will read the final value from a.ulongVal. This will spare extra memory operations.
I am trying to create a 48-bit integer value. I understand it may be possible to use a char array or struct, but I want to be able to do bit masking/manipulation and I'm not sure how that can be done.
Currently the program uses a 16-bit uint and I need to change it to 48. It is a bytecode interpreter and I want to expand the memory addressing to 4GB. I could just use 64-bit, but that would waste a lot of space.
Here is a sample of the code:
unsigned int program[] = { 0x1064, 0x11C8, 0x2201, 0x0000 };
void decode( )
{
instrNum = (program[i] & 0xF000) >> 12; //the instruction
reg1 = (program[i] & 0xF00 ) >> 8; //registers
reg2 = (program[i] & 0xF0 ) >> 4;
reg3 = (program[i] & 0xF );
imm = (program[i] & 0xFF ); //pointer to data
}
full program: http://en.wikibooks.org/wiki/Creating_a_Virtual_Machine/Register_VM_in_C
You can use the bit fields which are often used to represent integral types of known, fixed bit-width. A well-known usage of bit-fields is to represent a set of bits, and/or series of bits, known as flags. You can apply bit operations on them.
#include <stdio.h>
#include <stdint.h>
struct uint48 {
uint64_t x:48;
} __attribute__((packed));
Use a structure or uint16_t array with special functions for an array of uint48.
For individual instances, use uint64_t or unsigned long long. uint64_t will work fine for individually int48, but may want to mask off the results operations like * or << to keep upper bits cleared. Just some space saving routines are needed for arrays.
typedef uint64_t uint48;
const uint48 uint48mask = 0xFFFFFFFFFFFFFFFFull;
uint48 uint48_get(const uint48 *a48, size_t index) {
const uint16_t *a16 = (const uint16_t *) a48;
index *= 3;
return a16[index] | (uint32_t) a16[index + 1] << 16
| (uint64_t) a16[index + 2] << 32;
}
void uint48_set(uint48 *a48, size_t index, uint48 value) {
uint16_t *a16 = (uint16_t *) a48;
index *= 3;
a16[index] = (uint16_t) value;
a16[++index] = (uint16_t) (value >> 16);
a16[++index] = (uint16_t) (value >> 32);
}
uint48 *uint48_new(size_t n) {
size_t size = n * 3 * sizeof(uint16_t);
// Insure size allocated is a multiple of `sizeof(uint64_t)`
// Not fully certain this is needed - but doesn't hurt.
if (size % sizeof(uint64_t)) {
size += sizeof(uint64_t) - size % sizeof(uint64_t);
}
return malloc(size);
}
I need to be able to be able to send a numeric value to a remote socket server and so I need to encode possible numbers as bytes.
The numbers are up to 64 bit, ie requiring up to 8 bytes. The very first byte is the type, and it is always a number under 255 so fits in 1 byte.
For example, if the number was 8 and the type was a 32 bit unsigned integer then the type would be 7 which would be copied to the first (leftmost) byte and then the next 4 bytes would be encoded with the actual number (8 in this case).
So in terms of bytes:
byte1: 7
byte2: 0
byte3: 0
byte4: 0
byte5: 8
I hope this is making sense.
Does this code to perform this encoding look like a reasonable approach?
int type = 7;
uint32_t number = 8;
unsigned char* msg7 = (unsigned char*)malloc(5);
unsigned char* p = msg7;
*p++ = type;
for (int i = sizeof(uint32_t) - 1; i >= 0; --i)
*p++ = number & 0xFF << (i * 8);
You'll want to explicitly cast type to avoid a warning:
*p++ = (unsigned char) type;
You want to encode the number with most significant byte first, but you're shifting in the wrong direction. The loop should be:
for (int i = sizeof(uint32_t) - 1; i >= 0; --i)
*p++ = (unsigned char) ((number >> (i * 8)) & 0xFF);
It looks good otherwise.
Your code is reasonable (although I'd use uint8_t, since you are not using the bytes as “characters”, and Peter is of course right wrt the typo), and unlike the commonly found alternatives like
uint32_t number = 8;
uint8_t* p = (uint8_t *) &number;
or
union {
uint32_t number;
uint8_t bytes[4];
} val;
val.number = 8;
// access val.bytes[0] .. val.bytes[3]
is even guaranteed to work. The first alternative will probably work in a debug build, but more and more compilers might break it when optimizing, while the second one tends to work in practice just about everywhere, but is explicitly marked as a bad thing™ by the language standard.
I would drop the loop and use a "caller allocates" interface, like
int convert_32 (unsigned char *target, size_t size, uint32_t val)
{
if (size < 5) return -1;
target[0] = 7;
target[1] = (val >> 24) & 0xff;
target[2] = (val >> 16) & 0xff;
target[3] = (val >> 8) & 0xff;
target[4] = (val) & 0xff;
return 5;
}
This makes it easier for the caller to concatenate multiple fragments into one big binary packet and keep track of the used/needed buffer size.
Do you mean?
for (int i = sizeof(uint32_t) - 1; i >= 0; --i)
*p++ = (number >> (i * 8)) & 0xFF;
Another option to might be to do
// this would work on Big endian systems, e.g. sparc
struct unsignedMsg {
unsigned char type;
uint32_t value;
}
unsignedMsg msg;
msg.type = 7;
msg.value = number;
unsigned char *p = (unsigned char *) &msg;
or
unsigned char* p =
p[0] = 7;
*((uint32_t *) &(p[1])) = number;