Typecasting Arrays for Variable Width Access

Typecasting Arrays for Variable Width Access - c

Sorry, I am not sure if I wrote the title accurately.
But first, here are my constraints:
Array[], used as a register map, is declared as an unsigned 8-bit array (uint8_t),
this is so that indexing(offset) is per byte.
Data to be read/written into the array has varying width (8-bit, 16-bit, 32-bit and 64-bit).
Very Limited Memory and Speed is a must.
What are the caveats in doing the following
uint8_t some_function(uint16_t offset_addr) //16bit address
{
uint8_t Array[0x100];
uint8_t data_byte = 0xAA;
uint16_t data_word;
uint32_t data_double = 0xBEEFFACE;
\\ A. Storing wider-data into the array
*((uint32_t *) &Array[offset_addr]) = data_double;
\\ B. Reading multiple-bytes from the array
data_word = *((uint16_t *) &Array[offset_addr]);
return 0;
}
I know i could try writing the data per byte, but that would be slow due to bit shifting.
Is there going to be a significant problem with this usage?
I have run this on my hardware and have not seen any problems so far, but I want to take note of potential problems this implementation might cause.

Is there going to be a significant problem with this usage?
It produces undefined behavior. Therefore, even if in practice that manifests as you intend on your current C implementation, hardware, program, and data, you might find that it breaks unexpectedly when something (anything) changes.
Even if the compiler implements the cast and dereference in the obvious way (which it is not obligated to do, because UB) misaligned accesses resulting from your approach will at least slow many CPUs, and will produce traps on some.
The standard-conforming way to do what you want is this:
uint8_t some_function(uint16_t offset_addr) {
uint8_t Array[0x100];
uint8_t data_byte = 0xAA;
uint16_t data_word;
uint32_t data_double = 0xBEEFFACE;
\\ A. Storing wider-data into the array
memcpy(Array + offset_addr, &data_double, sizeof data_double);
\\ B. Reading multiple-bytes from the array
memcpy(&data_word, Array + offset_addr, sizeof data_word);
return 0;
}
This is not necessarily any slower than your version, and it has defined behavior as long as you do not overrun the bounds of your array.

This is probably fine. Many have done things like this. C performs well with this kind of thing.
Two things to watch out for:
Buffer overruns. You know those zero-days like Eternal Blue and hacks like WannaCry? Many of them exploited bugs in code like yours. Malicious input caused the code to write too much stuff into data structures like your uint8_t Array[0x100]. Be careful. Avoid allocating buffers on the stack (as function-local variables) as you have done because clobbering the stack is exploitable. Make them big enough. Check that you don't overrun them.
Machine byte ordering vs. network byte ordering, aka endianness. If these data structures move from machine to machine over the net you may get into trouble.

Related

Why its not recommended to use pointer for array access in C

I'm learning C programming and I cam across this tutorial online, which state that you should always prefer using [] operator over pointer arithmetic as much as possible.
https://www.cs.swarthmore.edu/~newhall/unixhelp/C_arrays.html#dynamic
you can use pointer arithmetic (but in general don't)
consider the following code in C
int *p_array;
p_array = (int *)malloc(sizeof(int)*50);
for(i=0; i < 50; i++) {
p_array[i] = 0;
}
What is the difference in doing it using pointer arithmetic like the following code (and why its not recommended)?
int *p_array;
p_array = (int *)malloc(sizeof(int)*50); // allocate 50 ints
int *dptr = p_array;
for(i=0; i < 50; i++) {
*dptr = 0;
dptr++;
}
What are the cases where using pointer arithmetic can cause issues in the software? is it bad practice or is it inexperienced engineer can be not paying attention?

Since there seems to be all out confusion on this:
In the old days, we had 16bit CPU's think 8088, 268 etc.
To formulate an address you had to load your segment register (16 bit register) and your address register. if accessing an array, you could load your array base into the segment register and the address register would be the index.
C compilers for these platforms did exist but pointer arithmetic involved checking the address for overruns and bumping the segment register if necessary (inefficient) Flat addressed pointers simply weren't possible in hardware.
Fast forward to the 80386 Now we have a full 32 bit space. Hardware pointers are possible Index + base addressing incurs a 1 clock cycle penalty. The segments are also 32 bit though, so arrays can be loaded using segments avoiding this penalty even if you are running 32 bit mode. The 368 also increases the number of segment registers by 2. (No idea why Intel thought that was a good idea) There was still a lot of 16bit code around though
These days, segment registers are disabled in 64 bit mode, Base+Index addressing is free.
Is there any platform where a flat pointer can outperform array addressing in hardware ? Well yes. the Motorola 68000 released in 1979 has a flat 32 bit address space, no segments and the Base + Index addressing mode incurs an 8 clock cycle penalty over immediate addressing. So if you're programming a early 80's era Sun station, Apple Lisa etc. this might be relevant.
In short. If you want an array, use an array. If you want a pointer use a pointer. Don't try and out smart your compiler. Convoluted code to turn arrays into pointers is exceedingly unlikely to provide any benefit, and may be slower.

This code is not recommended:
int *p_array;
p_array = (int *)malloc(sizeof(int)*50); // allocate 50 ints
int *dptr = p_array;
for(i=0; i < 50; i++) {
*dptr = 0;
dptr++;
}
because 1) for no reason you have two different pointers that point to the same place, 2) you don't check the result of malloc() -- it's known to return NULL occasionally, 3) the code is not easy to read and 4) it's easy to make a silly mistake very hard to spot later on.
All in all, I'd recommend to use this instead:
int array[50] = { 0 }; // make sure it's zero-initialized
int* p_array = array; // if you must =)

In your example, without compiler optimizations, pointer arithmetic may be more efficient, because it is easier to just increment a pointer than to calculate a new offset in every single loop iteration. However, most modern CPUs are optimized in such a way that accessing memory with an offset does not incur a (significant) performance penalty.
Even if you happen to be programming on a platform in which pointer arithmetic is faster, then it is likely that, if you activate compiler optimizations ("-O3" on most compilers), the compiler will use whatever method is fastest.
Therefore, it is mostly a matter of personal preference whether you use pointer arithmetic or not.
Code using array indexing instead of pointer arithmetic is generally easier to understand and less prone to errors.
Another advantage of not using pointer arithmetic is that pointer aliasing may be less of an issue (because you are using less pointers). That way, the compiler may have more freedom in optimizing your code (making your code faster).

Replacing Bitfields with Bitshifting in an Embedded Register Struct

I'm trying to get a little fancier with how I write my drivers for peripherals in embedded applications.
Naturally, reading and writing to predefined memory mapped areas is a common task, so I try to wrap as much stuff up in a struct as I can.
Sometimes, I want to write to the whole register, and sometimes I want to manipulate a subset of bits in this register. Lately, I've read some stuff that suggests making a union that contains a single uintX type that's big enough to hold the whole register (usually 8 or 16 bits), as well as a struct that has a collection of bitfields in it to represent the specific bits of that register.
After reading a few comments on some of these posts that have this strategy outlined for managing multiple control/status registers for a peripheral, I concluded that most people with experience in this level of embedded development dislike bitfields largely due to lack of portability and eideness issues between different compilers...Not to mention that debugging can be confounded by bitfields as well.
The alternative that most people seem to recommend is to use bit shifting to ensure that the driver will be portable between platforms, compilers and environments, but I have had a hard time seeing this in action.
My question is:
How do I take something like this:
typedef union data_port
{
uint16_t CCR1;
struct
{
data1 : 5;
data2 : 3;
data3 : 4;
data4 : 4;
}
}
And get rid of the bitfields and convert to a bit-shifting scheme in a sane way?
Part 3 of this guys post here describes what I'm talking about in general...Notice at the end, he puts all the registers (wrapped up as unions) in a struct and then suggests to do the following:
define a pointer to refer to the can base address and cast it as a pointer to the (CAN) register file like the following.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
What the hell is this cute little move all about? CAN0 is a pointer to a pointer to a function of a...number that's #defined as CAN_BASE_ADDRESS? I don't know...He lost me on that one.

The C standard does not specify how much memory a sequence of bit-fields occupies or what order the bit-fields are in. In your example, some compilers might decide to use 32 bits for the bit-fields, even though you clearly expect it to cover 16 bits. So using bit-fields locks you down to a specific compiler and specific compilation flags.
Using types larger than unsigned char also has implementation-defined effects, but in practice it is a lot more portable. In the real world, there are just two choices for an uintNN_t: big-endian or little-endian, and usually for a given CPU everybody uses the same order because that's the order that the CPU uses natively. (Some architectures such as mips and arm support both endiannesses, but usually people stick to one endianness across a large range of CPU models.) If you're accessing a CPU's own registers, its endianness may be part of the CPU anyway. On the other hand, if you're accessing a peripheral, you need to take care.
The documentation of the device that you're accessing will tell you how big a memory unit to address at once (apparently 2 bytes in your example) and how the bits are arranged. For example, it might state that the register is a 16-bit register accessed with a 16-bit load/store instructions whatever the CPU's endianness is, that data1 encompasses the 5 low-order bits, data2 encompasses the next 3, data3 the next 4 and data4 the next 4. In this case, you would declare the register as a uint16_t.
typedef volatile uint16_t data_port_t;
data_port_t *port = GET_DATA_PORT_ADDRESS();
Memory addresses in devices almost always need to be declared volatile, because it matters that the compiler reads and writes to them at the right time.
To access the parts of the register, use bit-shift and bit-mask operators. For example:
#define DATA2_WIDTH 3
#define DATA2_OFFSET 5
#define DATA2_MAX (((uint16_t)1 << DATA2_WIDTH) - 1) // in binary: 0000000000000111
#define DATA2_MASK (DATA2_MAX << DATA2_OFFSET) // in binary: 0000000011100000
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
uint16_t old_register_value = *port;
// First, mask out the data2 bits from the current register value.
uint16_t new_register_value = (old_register_value & ~DATA2_MASK);
// Then mask in the new value for data2.
new_register_value |= (new_field_value << DATA2_OFFSET);
*port = new_register_value;
}
Obviously you can make the code a lot shorter. I separated it out into individual tiny steps so that the logic should be easy to follow. I include a shorter version below. Any compiler worth its salt should compile to the same code except in non-optimizing mode. Note that above, I used an intermediate variable instead of doing two assignments to *port because doing two assignments to *port would change the behavior: it would cause the device to see the intermediate value (and another read, since |= is both a read and a write). Here's the shorter version, and a read function:
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
*port = (*port & ~(((uint16_t)1 << DATA2_WIDTH) - 1) << DATA2_OFFSET))
| (new_field_value << DATA2_OFFSET);
}
unsigned get_data2(data_port *port)
{
return (*port >> DATA2_OFFSET) & DATA2_MASK;
}
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
There is no function here. A function declaration would have a return type followed by an argument list in parentheses. This takes the value CAN_BASE_ADDRESS, which is presumably a pointer of some type, then casts the pointer to a pointer to CAN_REG_FILE, and finally dereferences the pointer. In other words, it accesses the CAN register file at the address given by CAN_BASE_ADDRESS. For example, there may be declarations like
void *CAN_BASE_ADDRESS = (void*)0x12345678;
typedef struct {
const volatile uint32_t status;
volatile uint16_t foo;
volatile uint16_t bar;
} CAN_REG_FILE;
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
and then you can do things like
CAN0.foo = 42;
printf("CAN0 status: %d\n", (int)CAN0.status);

1.
The problem when getting rid of bitfields is that you can no more use simple assignment statements, but you must shift the value to write, create a mask, make an AND to wipe out the previous bits, and use an OR to write the new bits. Reading is similar reversed. For example, let's take an 8-bit register defined like this:
val2.val1
0000.0000
val1 is the lower 4 bits, and val2 is the upper 4. The whole register is named REG.
To read val1 into tmp, one should issue:
tmp = REG & 0x0F;
and to read val2:
tmp = (REG >> 4) & 0xF; // AND redundant in this particular case
or
tmp = (REG & 0xF0) >> 4;
But to write tmp to val2, for example, you need to do:
REG = (REG & 0x0F) | (tmp << 4);
Of course some macro can be used to facilitate this, but the problem, for me, is that reading and writing require two different macros.
I think that bitfield is the best way, and a serious compiler should have options to define endiannes and bit ordering of such bitfields. Anyway, this is the future, even if, for now, maybe not every compiler has full support.
2.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
This macro defines CAN0 as a dereferenced pointer to the base address of the CAN register(s), no function declaration is involved. Suppose you have an 8-bit register at address 0x800. You could do:
#define REG_BASE 0x800 // address of the register
#define REG (*(uint8_t *) REG_BASE)
REG = 0; // becomes *REG_BASE = 0
tmp = REG; // tmp=*REG_BASE
Instead of uint_t you can use a struct type, and all the bits, and probably all the bytes or words, go magically to their correct place, with the right semantics. Using a good compiler of course - but who doesn't want to deploy a good compiler?
Some compilers have/had extensions to assign a given address to a variable; for example old turbo pascal had the ABSOLUTE keyword:
var CAN: byte absolute 0x800:0000; // seg:ofs...!
The semantic is the same as before, only more straightforward because no pointer is involved, but this is managed by the macro and the compiler automatically.

Can I cast pointers like this?

Code:
unsigned char array_add[8]={0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};
...
if ((*((uint32_t*)array_add)!=0)||(*((uint32_t*)array_add+1)!=0))
{
...
}
I want to check if the array is all zero. So naturally I thought of casting the address of an array, which also happens to be the address of the first member, to an unsigned int 32 type, so I'll only need to do this twice, since it's a 64 bit, 8 byte array. Problem is, it was successfully compiled but the program crashes every time around here.
I'm running my program on an 8bit microcontroller, cortex-M0.
How wrong am I?

In theory this could work but in practice there is a thing you aren't considering: aligned memory accesses.
If a uint32_t requires aligned memory access (eg to 4 bytes), then casting an array of unsigned char which has 1 byte alignment requirement to an uint32_t* produces a pointer to an unaligned array of uint32_t.
According to documentation:
There is no support for unaligned accesses on the Cortex-M0 processor. Any attempt to perform an unaligned memory access operation results in a HardFault exception.
In practice this is just dangerous and fragile code which invokes undefined behavior in certain circumstances, as pointed out by Olaf and better explained here.

To test multiple bytes as once code could use memcmp().
How speedy this is depends more on the compiler as a optimizing compiler may simple emit code that does a quick 8 byte at once (or 2 4-byte) compare. Even the memcmp() might not be too slow on an 8-bit processor. Profiling code helps.
Take care in micro-optimizations, as they too often are not efficient use of coders` time for significant optimizations.
unsigned char array_add[8] = ...
const unsigned char array_zero[8]={0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};
if (memcmp(array_zero, array_add, 8) == 0) ...
Another method uses a union. Be careful not to assume if add.arr8[0] is the most or least significant byte.
union {
uint8_t array8[8];
uint64_t array64;
} add;
// below code will check all 8 of the add.array8[] is they are zero.
if (add.array64 == 0)
In general, focus on writing clear code and reserve such small optimizations to very select cases.

I am not sure but if your array has 8 bytes then just assign base address to a long long variable and compare it to 0. That should solve your problem of checking if the array is all 0.
Edit 1: After Olaf's comment I would say that replace long long with int64_t. However, why do you not a simple loop for iterating the array and checking. 8 chars is all you need to compare.
Edit 2: The other approach could be to OR all elements of array and then compare with 0. If all are 0 then OR will be zero. I do not know whether CMP will be fast or OR. Please refer to Cortex-M0 docs for exact CPU cycles requirement, however, I would expect CMP to be slower.

How do I force the program to use unaligned addresses?

I've heard reads and writes of aligned int's are atomic and safe, I wonder when does the system make non malloc'd globals unaligned other than packed structures and casting/pointer arithmetic byte buffers?
[X86-64 linux] In all of my normal cases, the system always chooses integer locations that don't get word torn, for example, two byte on one word and the other two bytes on the other word. Can any one post a program/snip (C or assembly) that forces the global variable to unaligned address such that the integer gets torn and the system has to use two reads to load one integer value ?
When I print the below program, the addresses are close to each other such that multiple variables are within 64bits but never once word tearing is seen (smartness in the system or compiler ?)
#include <stdio.h>
int a;
char b;
char c;
int d;
int e = 0;
int isaligned(void *p, int N)
{
if (((int)p % N) == 0)
return 1;
else
return 0;
}
int main()
{
printf("processor is %d byte mode \n", sizeof(int *));
printf ( "a=%p/b=%p/c=%p/d=%p/f=%p\n", &a, &b, &c, &d, &e );
printf ( " check for 64bit alignment of test result of 0x80 = %d \n", isaligned( 0x80, 64 ));
printf ( " check for 64bit alignment of a result = %d \n", isaligned( &a, 64 ));
printf ( " check for 64bit alignment of d result = %d \n", isaligned( &e, 64 ));
return 0;}
Output:
processor is 8 byte mode
a=0x601038/b=0x60103c/c=0x60103d/d=0x601034/f=0x601030
check for 64bit alignment of test result of 0x80 = 1
check for 64bit alignment of a result = 0
check for 64bit alignment of d result = 0
How does a read of a char happen in the above case ? Does it read from 8 byte aligned boundary (in my case 0x601030 ) and then go to 0x60103c ?
Memory access granularity is always word size isn't it ?
Thx.

1) Yes, there is no guarantee that unaligned accesses are atomic, because [at least sometimes, on certain types of processors] the data may be written as two separate writes - for example if you cross over a memory page boundary [I'm not talking about 4KB pages for virtual memory, I'm talking about DDR2/3/4 pages, which is some fraction of the total memory size, typically 16Kbits times whatever the width is of the actual memory chip - which will vary depending on the memory stick itself]. Equally, on other processors than x86, you get a trap for reading unaligned memory, which would either cause the program to abort, or the read be emulated in software as multiple reads to "fix" the unaligned read.
2) You could always make an unaligned memory region by something like this:
char *ptr = malloc(sizeof(long long) * number+1);
long long *unaligned = (long long *)&ptr[2];
for(i = 0; i < number; i++)
temp = unaligned[i];
By the way, your alignment check checks if the address is aligned to 64 bytes, not 64 bits. You'll have to divide by 8 to check that it's aligned to 64 bits.
3) A char is a single byte read, and the address will be on the actual address of the byte itself. The actual memory read performed is probably for a full cache-line, starting at the target address, and then cycling around, so for example:
0x60103d is the target address, so the processor will read a cache line of 32 bytes, starting at the 64-bit word we want: 0x601038 (and as soon as that's completed the processor goes on to the next instruction - meanwhile the next read will be performed to fill the cacheline), then cacheline is filled with 0x601020, 0x601028, 0x601030. But should we turn the cache off [if you want your 3GHz latest x86 processor to be slightly slower than a 66MHz 486, disabling the cache is a good way to achieve that], the processor would just read one byte at 0x60103d.
4) Not on x86 processors, they have byte addressing - but for normal memory, reads are done on a cacheline basis, as explained above.
Note also that "may not be atomic" is not at all the same as "will not be atomic" - so you'll probably have a hard time making it go wrong by will - you really need to get all the timings of two different threads just right, and straddle cachelines, straddle memory page boundaries, and so on to make it go wrong - this will happen if you don't want it to happen, but trying to make it go wrong can be darn hard [trust me, I've been there, done that].

It probably doesn't, outside of those cases.
In assembly it's trivial. Something like:
.org 0x2
myglobal:
.word SOME_NUMBER
But on Intel, the processor can safely read unaligned memory. It might not be atomic, but that might not be apparent from the generated code.
Intel, right? The Intel ISA has single-byte read/write opcodes. Disassemble your program and see what it's using.
Not necessarily - you might have a mismatch between memory word size and processor word size.

1) This answer is platform-specific. In general, though, the compiler will align variables unless you force it to do otherwise.
2) The following will require two reads to load one variable when run on a 32-bit CPU:
uint64_t huge_variable;
The variable is larger than a register, so it will require multiple operations to access. You can also do something similar by using packed structures:
struct unaligned __attribute__ ((packed))
{
char buffer[2];
int unaligned;
char buffer2[2];
} sample_struct;
3) This answer is platform-specific. Some platforms may behave like you describe. Some platforms have instructions capable of fetching a half-register or quarter-register of data. I recommend examining the assembly emitted by your compiler for more details (make sure you turn off all compiler optimizations first).
4) The C language allows you to access memory with byte-sized granularity. How this is implemented under the hood and how much data your CPU fetches to read a single byte is platform-specific. For many CPUs, this is the same as the size of a general-purpose register.

The C standards guarantee that malloc(3) returns a memory area that complies to the strictest alignment requirements, so this just can't happen in that case. If there are unaligned data, it is probably read/written by pieces (that depends on the exact guarantees the architecture provides).
On some architectures unaligned access is allowed, on others it is a fatal error. When allowed, it is normally much slower than aligned access; when not allowed the compiler must take the pieces and splice them together, and that is even much slower.
Characters (really bytes) are normally allowed to have any byte address. The instructions working with bytes just get/store the individual byte in that case.
No, memory access is according to the width of the data. But real memory access is in terms of cache lines (read up on CPU cache for this).

Non-aligned objects can never come into existence without you invoking undefined behavior. In other words, there is no sequence of actions, all having well-defined behavior, which a program can take that will result in a non-aligned pointer coming into existence. In particular, there is no portable way to get the compiler to give you misaligned objects. The closest thing is the "packed structure" many compilers have, but that only applies to structure members, not independent objects.
Further, there is no way to test alignedness in portable C. You can use the implementation-defined conversions of pointers to integers and inspect the low bits, but there is no fundamental requirement that "aligned" pointers have zeros in the low bits, or that the low bits after conversion to integer even correspond to the "least significant" bits of the pointer, whatever that would mean. In other words, conversions between pointers and integers are not required to commute with arithmetic operations.
If you really want to make some misaligned pointers, the easiest way to do it, assuming alignof(int)>1, is something like:
char buf[2*sizeof(int)+1];
int *p1 = (int *)buf, *p2 = (int *)(buf+sizeof(int)+1);
It's impossible for both buf and buf+sizeof(int)+1 to be simultaneously aligned for int if alignof(int) is greater than 1. Thus at least one of the two (int *) casts gets applied to a misaligned pointer, invoking undefined behavior, and the typical result is a misaligned pointer.

Safely punning char* to double in C

In an Open Source program I
wrote, I'm reading binary data (written by another program) from a file and outputting ints, doubles,
and other assorted data types. One of the challenges is that it needs to
run on 32-bit and 64-bit machines of both endiannesses, which means that I
end up having to do quite a bit of low-level bit-twiddling. I know a (very)
little bit about type punning and strict aliasing and want to make sure I'm
doing things the right way.
Basically, it's easy to convert from a char* to an int of various sizes:
int64_t snativeint64_t(const char *buf)
{
/* Interpret the first 8 bytes of buf as a 64-bit int */
return *(int64_t *) buf;
}
and I have a cast of support functions to swap byte orders as needed, such
as:
int64_t swappedint64_t(const int64_t wrongend)
{
/* Change the endianness of a 64-bit integer */
return (((wrongend & 0xff00000000000000LL) >> 56) |
((wrongend & 0x00ff000000000000LL) >> 40) |
((wrongend & 0x0000ff0000000000LL) >> 24) |
((wrongend & 0x000000ff00000000LL) >> 8) |
((wrongend & 0x00000000ff000000LL) << 8) |
((wrongend & 0x0000000000ff0000LL) << 24) |
((wrongend & 0x000000000000ff00LL) << 40) |
((wrongend & 0x00000000000000ffLL) << 56));
}
At runtime, the program detects the endianness of the machine and assigns
one of the above to a function pointer:
int64_t (*slittleint64_t)(const char *);
if(littleendian) {
slittleint64_t = snativeint64_t;
} else {
slittleint64_t = sswappedint64_t;
}
Now, the tricky part comes when I'm trying to cast a char* to a double. I'd
like to re-use the endian-swapping code like so:
union
{
double d;
int64_t i;
} int64todouble;
int64todouble.i = slittleint64_t(bufoffset);
printf("%lf", int64todouble.d);
However, some compilers could optimize away the "int64todouble.i" assignment
and break the program. Is there a safer way to do this, while considering
that this program must stay optimized for performance, and also that I'd
prefer not to write a parallel set of transformations to cast char* to
double directly? If the union method of punning is safe, should I be
re-writing my functions like snativeint64_t to use it?
I ended up using Steve Jessop's answer because the conversion functions re-written to use memcpy, like so:
int64_t snativeint64_t(const char *buf)
{
/* Interpret the first 8 bytes of buf as a 64-bit int */
int64_t output;
memcpy(&output, buf, 8);
return output;
}
compiled into the exact same assembler as my original code:
snativeint64_t:
movq (%rdi), %rax
ret
Of the two, the memcpy version more explicitly expresses what I'm trying to do and should work on even the most naive compilers.
Adam, your answer was also wonderful and I learned a lot from it. Thanks for posting!

I highly suggest you read Understanding Strict Aliasing. Specifically, see the sections labeled "Casting through a union". It has a number of very good examples. While the article is on a website about the Cell processor and uses PPC assembly examples, almost all of it is equally applicable to other architectures, including x86.

Since you seem to know enough about your implementation to be sure that int64_t and double are the same size, and have suitable storage representations, you might hazard a memcpy. Then you don't even have to think about aliasing.
Since you're using a function pointer for a function that might easily be inlined if you were willing to release multiple binaries, performance must not be a huge issue anyway, but you might like to know that some compilers can be quite fiendish optimising memcpy - for small integer sizes a set of loads and stores can be inlined, and you might even find the variables are optimised away entirely and the compiler does the "copy" simply be reassigning the stack slots it's using for the variables, just like a union.
int64_t i = slittleint64_t(buffoffset);
double d;
memcpy(&d,&i,8); /* might emit no code if you're lucky */
printf("%lf", d);
Examine the resulting code, or just profile it. Chances are even in the worst case it will not be slow.
In general, though, doing anything too clever with byteswapping results in portability issues. There exist ABIs with middle-endian doubles, where each word is little-endian, but the big word comes first.
Normally you could consider storing your doubles using sprintf and sscanf, but for your project the file formats aren't under your control. But if your application is just shovelling IEEE doubles from an input file in one format to an output file in another format (not sure if it is, since I don't know the database formats in question, but if so), then perhaps you can forget about the fact that it's a double, since you aren't using it for arithmetic anyway. Just treat it as an opaque char[8], requiring byteswapping only if the file formats differ.

The standard says that writing to one field of a union and reading from it immediately is undefined behaviour. So if you go by the rule book, the union based method won't work.
Macros are usually a bad idea, but this might be an exception to the rule. It should be possible to get template-like behaviour in C using a set of macros using the input and output types as parameters.

As a very small sub-suggestion, I suggest you investigate if you can swap the masking and the shifting, in the 64-bit case. Since the operation is swapping bytes, you should be able to always get away with a mask of just 0xff. This should lead to faster, more compact code, unless the compiler is smart enough to figure that one out itself.
In brief, changing this:
(((wrongend & 0xff00000000000000LL) >> 56)
into this:
((wrongend >> 56) & 0xff)
should generate the same result.

Edit:
Removed comments regarding how to effectively store data always big endian and swapping to machine endianess, as questioner hasn't mentioned another program writes his data (which is important information).Still if the data needs conversion from any endian to big and from big to host endian, ntohs/ntohl/htons/htonl are the best methods, most elegant and unbeatable in speed (as they will perform task in hardware if CPU supports that, you can't beat that).
Regarding double/float, just store them to ints by memory casting:
double d = 3.1234;
printf("Double %f\n", d);
int64_t i = *(int64_t *)&d;
// Now i contains the double value as int
double d2 = *(double *)&i;
printf("Double2 %f\n", d2);
Wrap it into a function
int64_t doubleToInt64(double d)
{
return *(int64_t *)&d;
}
double int64ToDouble(int64_t i)
{
return *(double *)&i;
}
Questioner provided this link:
http://cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html
as a prove that casting is bad... unfortunately I can only strongly disagree with most of this page. Quotes and comments:
As common as casting through a pointer
is, it is actually bad practice and
potentially risky code. Casting
through a pointer has the potential to
create bugs because of type punning.
It is not risky at all and it is also not bad practice. It has only a potential to cause bugs if you do it incorrectly, just like programming in C has the potential to cause bugs if you do it incorrectly, so does any programming in any language. By that argument you must stop programming altogether.
Type punning A form of pointer
aliasing where two pointers and refer
to the same location in memory but
represent that location as different
types. The compiler will treat both
"puns" as unrelated pointers. Type
punning has the potential to cause
dependency problems for any data
accessed through both pointers.
This is true, but unfortunately totally unrelated to my code.
What he refers to is code like this:
int64_t * intPointer;
:
// Init intPointer somehow
:
double * doublePointer = (double *)intPointer;
Now doublePointer and intPointer both point to the same memory location, but treating this as the same type. This is the situation you should solve with a union indeed, anything else is pretty bad. Bad that is not what my code does!
My code copies by value, not by reference. I cast a double to int64 pointer (or the other way round) and immediately deference it. Once the functions return, there is no pointer held to anything. There is a int64 and a double and these are totally unrelated to the input parameter of the functions. I never copy any pointer to a pointer of a different type (if you saw this in my code sample, you strongly misread the C code I wrote), I just transfer the value to a variable of different type (in an own memory location). So the definition of type punning does not apply at all, as it says "refer to the same location in memory" and nothing here refers to the same memory location.
int64_t intValue = 12345;
double doubleValue = int64ToDouble(intValue);
// The statement below will not change the value of doubleValue!
// Both are not pointing to the same memory location, both have their
// own storage space on stack and are totally unreleated.
intValue = 5678;
My code is nothing more than a memory copy, just written in C without an external function.
int64_t doubleToInt64(double d)
{
return *(int64_t *)&d;
}
Could be written as
int64_t doubleToInt64(double d)
{
int64_t result;
memcpy(&result, &d, sizeof(d));
return result;
}
It's nothing more than that, so there is no type punning even in sight anywhere. And this operation is also totally safe, as safe as an operation can be in C. A double is defined to always be 64 Bit (unlike int it does not vary in size, it is fixed at 64 bit), hence it will always fit into a int64_t sized variable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight