Endian-independent way of using memcpy() from smaller to larger integer pointer - c

Suppose I have two arrays.
uint8_t[SIZE] src = { 0 };
uint32_t[SIZE] dst = { 0 };
uint8_t* srcPtr; // Points to current src value
uint32_t* dstPtr; // Points to current dst value
src holds values that sometimes need to be put into dst. Importantly, the values from src may be 8-bit, 16-bit, or 32-bit, and aren't necessarily properly aligned. So, suppose I wish to use memcpy() like below, to copy a 16-bit value
memcpy(dstPtr, srcPtr, 2);
Will I run into an endianness issue here? This works fine on little-endian systems, since if I want to copy 8, then srcPtr has 08 then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be 8, as expected.
But if I were on a big-endian system, srcPtr would be 00 then 08, and the bytes at dstPtr will be 00 08 00 00 (I presume), which would take on a value of 524288.
What would be an endian-independent way to write this copy?

Will I run into an endianness issue here?
Not necessarily endianness issues per se, but yes, the specific approach you describe will run into issues with integer representation.
This works fine on
little-endian systems, since if I want to copy 8, then srcPtr has 08
then 00 the bytes at dstPtr will be 08 00 00 00 and the value will be
8, as expected.
You seem to be making an assumption there, either
that more bytes of the destination will be modified than you actually copy, or perhaps
that relevant parts of the destination are pre-set to all zero bytes.
But you need to understand that memcpy() will copy exactly the number of bytes requested. No more than that will be read from the specified source, and no more than that will be modified in the destination. In particular, the data types of the objects to which the source and destination pointers point have no effect on the operation of memcpy().
What would be an endian-independent way to write this copy?
The most natural way to do it would be via simple assignment, relying on the compiler to perform the necessary conversion:
*dstPtr = *srcPtr;
However, I take your emphasis on the prospect that the arrays might not aligned as a concern that it may be unsafe to dereference the source and / or destination pointer. That will not, in fact, be the case for pointers to char, but it might be the case for pointers to other types. For cases where you take memcpy as the only safe way to read from the arrays, the most portable method for converting value representations is still to rely on the implementation. For example:
uint8_t* srcPtr = /* ... */;
uint32_t* dstPtr = /* ... */;
uint16_t srcVal;
uint32_t dstVal;
memcpy(&srcVal, srcPtr, sizeof(srcVal));
dstVal = srcVal; // conversion is automatically performed
memcpy(dstPtr, &dstVal, sizeof(dstVal));

Will I run into an endianness issue here?
Yes. You're not copying, you're converting from one format to another (packing several unsigned integers into a single larger unsigned integer).
What would be an endian-independent way to write this copy?
The simple way is to make the conversion explicit, like:
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
However, for cases where using memcpy() works it's likely to be faster, and this won't change after compiling; so you could do something like:
#ifdef BIG_ENDIAN
for(int i = 0; i < something; i++) {
dest[i] = (uint32_t)src[i*4] | ((uint32_t)src[i*4+1] << 8) |
((uint32_t)src[i*4+2] << 16) | ((uint32_t)src[i*4+3] << 24);
}
#else
memcpy(dest, src, something*4);
#endif
Note: you'd also have to define the BIG_ENDIAN macro when appropriate - e.g. maybe a -D BIG_ENDIAN command line argument when starting the compiler when you know the target architecture needs it.
I'm storing 16-bit values in src which aren't 16-bit-aligned which then need to be put into a 64-bit integer
That adds another problem - some architectures do not allow misaligned accesses. You need to use explicit conversion (read 2 separate uint8_t, not a misaligned uint16_t) to avoid this problem too.

Related

Data of a struct into a union

I have declared the next union:
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
} xxx_data_t
I am trying to access a memory by SPI, which only has an input capacity of 1Byte.
I want to enter the variable data, and I have though of descomposing that 32-bits data variable into 4 8-bits variables (1Byte each one), thus forming the w[4] array.
My question is: is this valid? This creates a descomposition of my 32-bits variable?
EXAMPLE
I declare xxx_DATA_t my_variable.
my_variable.data=300 which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
data (32-bits) = 300 = 0x 00 00 01 2C
w (4-bits) ====== = 0x [0] [0] [1] [44]
Thanks all.
I think it is right. You can write a test program to test it.
#include<stdio.h>
#include<stdint.h>
typedef union
{
struct
{
uint32_t data;
};
uint8_t w[4];
}data_t;
int main(){
data_t d = {.data = 0x12c};
for(int i = 0; i < 4; i++){
printf("%d\n", d.w[i]);
}
return 0;
}
44
1
0
0
My question is: is this valid?
Yes, you may do type punning between different types using union. Type punning to/from a character type (which uint8_t almost certainly is) is always safe. Alignment/padding shouldn't be an issue either, in this specific case.
Please note that you can only do this in C - you cannot do it in C++. So think twice before using C++ for hardware-related programming.
which in hexadecimal is 0x12C. Will the array be my_variable.w[4]=[0,0,1,44]??
It depends on CPU endianess. What is CPU endianness?
So you can either get 00 00 01 2C on a Big Endian computer such as Power PC, or you can get 2C 01 00 00 on a Little Endian computer such as x86.
As for what endianess you actually want, it is the network endianess. In case of SPI, network endianess is the same as the byte order that the part you are communicating with expects. You have to look that up in a datasheet, in case it's a "dumb" part like an ADC, display or similar. If it's a "smart" part like another MCU, then you can probably specify the network endianess yourself.

writing a byte with the "write" system call in C

Using the system call write, I am trying to write a number to a file. I want the file pointed by fileid to have 4 as '04'(expected outcome).
unsigned int g = 4;
if (write(fileid, &g, (size_t) sizeof(int) ) == -1)
{
perror("Error"); exit(1);
}
I get the output '0000 0004' in my file. If I put one instead of sizeof(int) I get 00.
Is there a specific type that I missed ?
PS. I have to read this value form the file also, so if there isn't a type I'm not quite sure how I would go about doing that.
If writing 1 byte of g will print 00 or 04 will depend on the architecture. Usually, 32-bit integers will be stored in the memory using little-endian, meaning the less significant byte comes first, therefore 32-bits int 4 is stored as 04 00 00 00 and the first byte is 04.
But this is not always true. Some architectures will store using big-endian, so the byte order in memory is the same as its read in 32-bit hexadecimal 00 00 00 04.
Wikipedia Article.
sizeof(int) will return 4; so actually, the code is writing four bytes.
Change the type of 'g' from
unsigned int
to
unsigned char
... and, change
sizeof(int)
to
sizeof(unsigned char) .. or sizeof(g)
Then you should see that only one byte '04' will be written.
In this circumstance I would recommend using uint8_t, which is defined in <stdint.h>. On basically all systems you will ever encounter, this is a typedef for unsigned char, but using this name makes it clearer that the value in the variable is being treated as a number, not a character.
uint8_t g = 4;
if (write(fileid, &g, 1) != 1) {
perror("write");
exit(1);
}
(sizeof(char) == 1 by definition, and therefore so is sizeof(uint8_t).)
To understand why your original code did not behave as you expected, read up on endianness.
If you want to save only one byte, it will be more appropriate to create a variable that is of size one byte and save it using write.
unsigned int g = 4;
unsinged char c = (unsigned char)g;
if (write(fileid, &c, 1 ) == -1)
{
perror("Error"); exit(1);
}
If you lose any data, it will be in the program and not in/out of files.

Integer Conversion for Char Array

I've been trying to brush up on my C recently and was writing a program to manually parse through a PNG file.
I viewed the PNG file in a hex editor and noticed a stream of bytes that looked like
00 00 00 0D
in hex format.
This string supposedly represents a length that I am interested in.
I used getc(file) to pull in the bytes of the PNG file.
I created a char array as
char example[8];
to store the characters retrieved from getc.
Now, I have populated example and printing it with
printf("%#x, %#x, %#x, %#x", example[0]....
shows 0, 0, 0, 0xd which is exactly what I want.
However when I use
int x = atoi(example)
or
int x = strtol(example, NULL, 16)
I get back zero in both cases (I was expecting 13). Am I missing something fundamental?
atoi converts strings like "0" to its numeric equivalent, in this case 0. What you have instead is the string "\0\0\0\0\0\0\0\r" which is nowhere near numeric characters.
If you want to interpret your bytes as a number you could do something like
char example[4] = {0, 0, 0, 0xd};
printf("%d\n", *(uint32_t*) example);
You will notice (in case you're using a x86 CPU) that you will get 218103808 instead of 13
due to little endianness: the farther you go right the more significant the number gets.
As PNG uses big endian you can simply use be32toh (big endian to host endianess):
uint32_t* n = example;
printf("%u\n", be32toh(*n)
atoi and strtol expect text strings, while you have an array of binary values. To combine the individual bytes in an array to a larger integer, try something like:
uint32_t x = (a[0] << 24) | (a[1] << 16) | (a[2] << 8) | a[3];
atoi etc. operates on (ascii) strings.
You would get 123 for "123", which is in bytes 49 50 41 0.
What you have instead is binary 00 00 00 7B ... (well, endianess matters too).
Simple, but in this case wrong solution (ignoring endianess):
Cast the array address to int* and then get a value with *.
As integers in PNG are supposed to be big endian in any case,
the pointer casting would only work with big endian machines.
As portable solution, shifting the bytes with 24,16,8,0 and binary-or´ing them will do.

Structure and pointer

I'm having a problem getting the entry memory address to a member variable of a structure. I've tried in two ways, one of which didn't work properly. It would be very good if you guys give me some advice.
First, i defined a structure named BITMAP_HEADER.
struct BITMAP_HEADER
{
WORD bfType ;
DWORD bfSize ;
WORD bfReserved1 ;
WORD bfReserved2 ;
DWORD bfOffBits ;
} ;
Second, i defined and initialized some variables. please look at the code below before you read next line. In case you ask me why i got a character pointer, i needed to access each bytes of integer bfSize.
struct BITMAP_HEADER bitmap_header ;
char* pSize = (char*)&bitmap_header.bfSize;
Third, i got a memory address to the bfSize in two different ways and printed the values.
1. printf("%X\n", *pSize) ;
2. printf("%X\n", (unsigned char)*(((char*)&bitmap_header)+2)) ;
(1) directly got a memory address to the bitmap_header.bfSize.
(2) got a memory address to the structure BITMAP_HEADER and shifted the pointer to the next by 2 bytes.
Finally, here is the result.
2D
F6
For your information, here is the hex data of the structure BITMAP_HEADER.
42 4D / F6 C6 2D 00 / 00 00 / 00 00 / 36 00 00 00
Why didn't the first method work? I thought the two methods were exactly same.
You're running into structure padding here. The compiler is inserting two bytes' worth of padding between the bfType and bfSize fields, to align bfSize to 4 bytes' size, since bfSize is a DWORD.
Generally speaking, you cannot rely on being able to calculate exact offsets within a structure, since the compiler might add padding between members. You can control this to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As for the raw data which structure is known in advance, it usually better to read it to an array and use defined offsets to access required fields. This way you won't have to worry about compiler's behaviour (which might often be not as you expected). Your code would look like:
#define FIELD_TYPE 0
#define FIELD_SIZE 2
#define FIELD_RES1 6
#define FIELD_RES2 8
#define FIELD_OFF 10
#define SIZE_HEADER 14
static uint8_t header[SIZE_HEADER];
<...>
uint8_t * pheader = header;
DWORD offset_bits = (DWORD)*(pheader + FIELD_OFF);
P.S. to make this code portable, size of WORD and endianness must be considered, few #ifdef.. #else.. #endif should help with that.
P.P.S it would be even better use manual logical operations and shift operators instead of casting, but left it this way for the sake of brevity.

Handling xor with different key sizes and endianess

I am playing around with xor decoding via a small C file, and am running into issues with endianness ...I am a bit stuck on how to work around them. This is really the first time I've played this deeply with bitwise operations in C.
If I use a one-byte xor key and pick up several xor-encoded values into a uint8_t pointer, my basic code works fine. Walk each byte, xor it against the key, and store the result in a decoded byte array/buffer and then print it back to the console.
However, if I try a two-byte xor key, then endianness starts to get in the way. I currently stick the key into a uint32_t, because I don't plan on dealing with xor keys greater than 32bits. On a little-endian system, a xor key of 0xc39f gets stored as 0x9fc3. The bytes to be decoded are big-endian if I play them back one byte at a time, but they too, get flipped to little-endian if I try to play them back two-bytes at a time (same size as the xor key).
I am tempted to #include <byteswap.h> and then call bswap_32(). But while this will work on little endian, it might have the opposite effect on big-endian. I assume then I'd need ugly #ifdef's to only use bswap_32() for little-endian archs. I figure, there has got to be a more portable way for this to work.
Random sample string:
g e n e r a t e
67 65 6e 65 72 61 74 65
Xor 0xc39f
a4 fa ad fa b1 fe b7 fa
If I play back the xor-encoded buffer with two-byte (uint16_t) pointers, I get this (via a basic printf):
0xfaa4 0xfaad 0xfeb1 0xfab7
And with four-byte pointers (uint32_t):
0xfaadfaa4 0xfab7feb1
I would expect for the above, to get instead for two-byte pointers:
0xa4fa 0xadfa 0xb1fe 0xb7fa
And four-byte pointers:
0xa4faadfa 0xb1feb7fa
Thoughts?
Edit: Any takers? Current answers aren't adequate to my needs.
You're overthinking this—just treat your xor key as an endianless binary blob, and convert it to a native uint32_t for performance:
void xor_encrypt_slow(uint8_t *data, size_t len, uint8_t key[4])
{
// key is a 4-byte xor key
size_t i;
for(i = 0; i < len; i++)
data[i] ^= key[i % 4];
}
void xor_encrypt_fast(uint8_t *data, size_t len, uint8_t key[4])
{
// Convert key to a 32-bit value
uint32_t key32 = *(uint32_t *)key;
// This assumes that data is aligned on a 4-byte boundary; if not, adjust
// accordingly
size_t i;
for(i = 0; i + 3 < len; i += 4)
((uint32_t *)data)[i] ^= key32;
// Handle the remainder, if len is not a multiple of 4
for( ; i < len; i++)
data[i] ^= key[i % 4];
}
Try using the htonl() macro, which is designed exactly for this purpose. It stands for "hex to network long" and is defined to swap (or not swap) bytes to make the resulting values big-endian, as required before transmitting them over the network.

Resources