I was wondering if it was possible to return the same result with only 1 explicit cast ?
void *begin(void *pt, size_t size)
{
return (void*)((size_t)pt & -size);
}
Every time in tried I got a BAD_ACCESS code 1
Exemple:
void *begin(void *pt, size_t size)
{
size_t *tmp = pt;
size_t res = *tmp & -size;
return (void*)(res);
}
It can be done with zero casts, with implementation-dependent code. However, this is akin to writing code without using the letter “e”: It may be a challenge, but it serves no purpose in production code. If it is posed as an academic exercise, it can be useful because artificial constraints can induce a student to think about things they might not otherwise think about so much, like alternative ways to do things or the technical rules of the language. However, in practice, this is generally pointless.
Your sample code uses size_t, but the preferred type for working with address as integers is uintptr_t, if it is defined. If it is defined, it is defined in <stdint.h>, and any normal C implementation of even modest quality should define it.
Your sample code assumes that converting an address to size_t yields a plain integer address in memory. (The address & -size operation is a common way of finding an address aligned to a multiple of size, which must be a power of two, by clearing the low bits, and so we recognize that your (size_t) pt must be a plain address, at least in its low bits.) Instead, let us assume that a pointer is represented in memory using a plain integer for the address and is the same size as uintptr_t. In any C implementation in which either of these true, the other is likely true too. Before using the following code, you should confirm this for your target C implementations.
Given that assumption, we can implement your begin routine with no casts:
#include <stdint.h>
#include <string.h>
void *begin(void *pt, size_t size)
{
uintptr_t us = size; // Convert size to uintptr_t to ensure it is at least as wide.
uintptr_t x; // Make space to copy pointer.
memcpy(&x, &pt, sizeof x); // Copy bytes.
x &= -us; // Zero low bits.
memcpy(&pt, &x, sizeof pt); // Copy bytes back.
return pt;
}
If the assumption is not true, it is nonetheless possible to implement begin for any chosen C implementation by setting an unsigned char pointer with unsigned char *p = (unsigned char *) &pt; and then using p to examine and manipulate the bytes of pt. The C standard requires each implementation to document its representation of types, so the meanings of the bytes in the void * pt must be documented, which enables writing code to compute with them as desired.
That uses one cast. It could be reduced to zero with void *v = &pt; unsigned char *p = v;.
Related
I'm trying to further my knowledge and experience in C, so I'm writing some small utilities.
I'm copying memory, and according to the man page for memcpy(3):
NOTES
Failure to observe the requirement that the memory areas do not overlap has been
the source of real bugs. (POSIX and the C standards are explicit that employing
memcpy() with overlapping areas produces undefined behavior.) Most notably, in
glibc 2.13 a performance optimization of memcpy() on some platforms (including
x86-64) included changing the order in which bytes were copied from src to dest.
Clearly, overlapping memory regions passed to memcpy(3) can cause a lot of problems.
I'm trying to write a safe wrapper as part of learning C to make sure that these memory regions don't overlap:
int safe_memcpy(void *dest, void *src, size_t length);
The logic I'm trying to implement is:
Check both the source and destination pointers for NULL.
Establish the pointer "range" for both source and dest with the length parameter.
Determine if the source range intersects with the destination range, and vice versa.
My implementation so far:
#define SAFE_MEMCPY_ERR_NULL 1
#define SAFE_MEMCPY_ERR_SRC_OVERLAP 2
#define SAFE_MEMCPY_ERR_DEST_OVERLAP 3
int safe_memcpy(void *dest, void *src, size_t length) {
if (src == NULL || dest == NULL) {
return SAFE_MEMCPY_ERR_NULL;
}
void *dest_end = &dest[length - 1];
void *src_end = &src[length - 1];
if ((&src >= &dest && &src <= &dest_end) ||
(&src_end >= &dest && &src_end <= &dest_end)) {
// the start of src falls within dest..dest_end OR
// the end of src falls within dest..dest_end
return SAFE_MEMCPY_ERR_SRC_OVERLAP;
}
if ((&dest >= &src && &dest <= &src_end) ||
(&dest_end >= &src && &dest_end <= &src_end)) {
// the start of dest falls within src..src_end
// the end of dest falls within src..src_end
return SAFE_MEMCPY_ERR_DEST_OVERLAP;
}
// do the thing
memcpy(dest, src, length);
return 0;
}
There's probably a better way to do errors, but this is what I've got for now.
I'm pretty sure I'm triggering some undefined behavior in this code, as I'm hitting SAFE_MEMCPY_ERR_DEST_OVERLAP on memory regions that do not overlap. When I examine the state using a debugger, I see (for instance) the following values:
src: 0x7ffc0b75c5fb
src_end: 0x7ffc0b75c617
dest: 0x1d05420
dest_end: 0x1d0543c
Clearly, these addresses do not even remotely overlap, hence why I'm thinking I'm triggering UB, and compiler warnings indicate as such:
piper.c:68:27: warning: dereferencing ‘void *’ pointer
void *dest_end = &dest[length - 1];
It seems that I need to cast the pointers as a different type, but I'm not sure which type to use: the memory is untyped so should I use a char * to "look at" the memory as bytes? If so, should I cast everything as a char *? Should I instead use intptr_t or uintptr_t?
Given two pointers and a length for each of them, how can I safely check if these regions overlap one another?
In the first place, a conforming program cannot perform pointer arithmetic on a pointer of type void *, nor (relatedly) apply the indexing operator to it, not even with index 0. void is an incomplete type, and unique among those in that it cannot be completed. The most relevant implication of that is that that type does not convey any information about the size of the thing to which it points, and pointer arithmetic is defined in terms of the pointed-to object.
So yes, expressions such as your &dest[length - 1] have undefined behavior with respect to the C standard. Some implementations provide extensions affecting that, and others reject such code at compile time. In principle, an implementation could accept the code and do something bizarre with it, but that's relatively unlikely.
In the second place, you propose to
write a safe wrapper as part of learning C to make sure that these memory regions don't overlap
, but there is no conforming way to do that for general pointers. Pointer comparisons and pointer differences are defined only for pointers into the same array (or to one element past the end of the array), where a pointer to a scalar is considered in that regard as a pointer to the first element of dimension-1 array.
Converting to a different pointer type, perhaps char *, would resolve the pointer arithmetic issue, but not, in the general case, the pointer comparability issue. It might get exactly the behavior you want out of some implementations, reliably even, but it is not a conforming approach to the problem, and the ensuing undefined behavior might produce genuine bugs in other implementations.
Relatively often, you can know statically that pointers do not point to overlapping regions. In particular, if one pointer in question is a pointer to an in-scope local variable or to a block of memory allocated by the current function, then you can usually be sure whether there is an overlap. For cases where you do not know, or where you know that there definitely is overlap, the correct approach is to use memmove() instead of memcpy().
This "safe" memcpy is not safe as well as it does not copy anything when programmes expects it. Use memmove to be safe
You should not use &src and &dest as it is not beginning of the data or buffer but the address of the parameter src and dest itself.
Same is with srcend and destend
Given two pointers and a length for each of them, how can I safely check if these regions overlap one another?
<, <=, >=, > are not defined when 2 pointers are not related to the same object.
A tedious approach checks the endpoints of one against all the other's elements and takes advantage that the length of the source and destination are the same.
int safe_memcpy(void *dest, const void *src, size_t length) {
if (length > 0) {
unsigned char *d = dest;
const unsigned char *s = src;
const unsigned char *s_last = s + length - 1;
for (size_t i = 0; i < length; i++) {
if (s == &d[i]) return 1; // not safe
if (s_last == &d[i]) return 1; // not safe
}
memcpy(dest, src, length);
}
return 0;
}
If the buffer lengths differ, check the shorter one's endpoints against the addresses of the longer one's elements.
should I cast everything as a char *
Use unsigned char *.
mem...(), str...() behave as if each array element was unsigned char.
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3
With rare non-2's complement, unsigned char is important to avoid signed char traps and maintain -0, +0 distinctiveness. Strings only stop on +0.
With functions like int strcmp/memcmp(), unsigned char that use integer math, it is important when comparing elements outside the range of [0...CHAR_MAX] to return the correctly signed result.
Even if void * indexing was allowed, void *dest_end = &dest[length - 1]; is very bad when length == 0 as that is like &dest[SIZE_MAX];
&src >= &dest s/b src >= dest for even a chance at working.
The addresses of src, dest are irrelevant to the copy, only their values are important.
I suspect this errant code leads to UB in OP's other code.
Should I instead use intptr_t or uintptr_t?
Note that (u)intptr_t are optional types - they might not exist in a conforming compiler.
Even when the types exist, math on the pointers is not defined to be related to math on the integer values.
Clearly, these addresses do not even remotely overlap, hence why I'm thinking I'm triggering UB,
"Clearly" if ones assumes a liner mapping addresses to integers, something not specified in C.
The memory is untyped so should I use a char * to "look at" the memory as bytes? If so, should I cast everything as a char *?
Use unsigned char* if you need to dereference the data, or just char* when you want to increment/decrement the pointer value by count of bytes.
It's common to do:
void a_function_that_takes_void(void *x, void *y) {
char *a = x;
char *b = y;
/* uses a and b throughout here */
}
If so, should I cast everything as a char *?
Yes. It's also common to do:
void_pointer = (char*)void_pointer + 1;
Should I instead use intptr_t or uintptr_t?
You could, but that would be the same as using char*, except for a char* to intptr_t conversion.
how can I safely check if these regions overlap one another?
It's good to do some research. how to implement overlap-checking memcpy in C
I've got a function which essentially produces a hash value over an arbitrary memory region.
The input argument uses const void* type, as a way to say "this can be anything". So essentially :
unsigned hash(const void* ptr, size_t size);
so far, so good.
The blob of bytes can be anything, and its start address can be anywhere. Meaning that sometimes, it's aligned on 32-bit boundaries, and sometimes it's not.
On some platforms (armv6 for example, or mips), reading from unaligned memory result in huge performance penalty. It's actually not possible to read 32-bit from unaligned memory directly, so the compiler tends to settle for a safer byte-by-byte recombination algorithm (the exact implementation details are hidden behind a memcpy()).
The safe access method is of course a lot slower than a direct 32-bit access, which is itself only possible when input data is properly aligned on 32-bit boundary.
This leads to a design trying to separate the 2 cases : when input is unaligned, use the safe access code path, when input is aligned (effectively quite often) use the direct 32-bit access code path.
The difference in performance is huge, we are not talking of a few % here, this translates into a 5x performance increase, sometimes even more. So it's not just "nice", it actually makes the function competitive or not, useful or not.
This design has been working fine so far, in a decent number of scenarios.
Enter inlining.
Now, with the function implementation accessible at compilation time, a clever compiler can peel off all the indirection layers, and reduce the implementation to its essential elements. In the case where it can prove that the input is necessarily aligned, such as a struct with defined members, it can simplify the code, remove all the const void* indirections, and get down to the barebone implementation where the memory area is effectively read using a const u32* pointer.
And now strict-aliasing can kick in, as the input area is written using a struct S* ptr, and read using a different const u32* ptr, therefore allowing the compiler to consider these 2 operations as completely independent, eventually re-ordering them, leading to an incorrect outcome.
This is essentially the interpretation I received from a user.
It may worth noting that I was unable to reproduce the issue, but strict-aliasing issues uncovered through inlining, this is a known topic. It is also known that strict aliasing can be difficult to reproduce due to tiny implementation details leading to different optimization choices depending on compiler version. I therefore consider the report as credible, but can't study it directly in absence of a reproduction case.
Anyway, now comes the question. How to handle this case correctly ?
A "safe" solution is to always use the memcpy() path, but it craters the performance so much that it makes the function just no longer useful. Plus, it's obviously a terrible waste of energy.
The easy escape is to not inline, though it leads to its own function call overhead (to be fair, not that large), and more importantly just "hides" the problem rather than solve it.
But I've yet to find a solution to it. I've been told that, no matter what kind of intermediate pointer is used, even if a const char* is part of the cast chain, this will not prevent the final const u32* read operation from violating strict aliasing (just repeating, I can't test it, because I can't reproduce the case).
Described this way, this feels almost hopeless.
But I can't help but note that memcpy() can properly avoid such kind of re-ordering risk, even though its interface also uses const void*, and exact implementation vary a lot, but we can be certain that it's not just reading byte-by-byte const char*, as performance is excellent, and doesn't hesitate using vector code when faster. Also, memcpy() is a function which is definitely inlined a lot.
So I guess there must be a solution to this problem.
(unsigned) char is exempt from strict aliasing rules. No matter what, the following is safe and sane as long as sizeof(uint32_t) == 4:
unsigned hash(const void* ptr, size_t size) {
const unsigned char* bytes = ptr;
while (size >= 4) {
uint32_t x;
memcpy(&x, bytes, 4);
bytes += 4;
size -= 4;
// Use x.
}
// Size leftover bytes.
}
Do note that the values of x will be dependent on the endianness of your machine. If you require cross-platform consistent hashes you will need to convert to your preferred endianness.
Note that if you force alignment you can make the compiler generate fast path code even with memcpy:
void* align(void* p, size_t n) {
// n must be power of two.
uintptr_t pi = (uintptr_t) p;
return (unsigned char*) ((pi + (n - 1)) & -n);
}
inline uint32_t update_hash(uint32_t h, uint32_t x) {
h += x;
return h;
}
unsigned hash(const void* ptr, size_t size) {
const unsigned char* bytes = (unsigned char*) ptr;
const unsigned char* aligned_bytes = align((void*) bytes, 4);
uint32_t h = 0;
uint32_t x;
if (bytes == aligned_bytes) {
// Aligned fast path.
while (size >= 4) {
memcpy(&x, bytes, 4);
h = update_hash(h, x);
size -= 4;
bytes += 4;
}
} else {
// Slower unaligned path, copy to aligned buffer.
while (size >= 4) {
uint32_t buffer[32];
size_t bufsize = size < 4*32 ? size / 4 : 32;
memcpy(buffer, bytes, 4*bufsize);
size -= 4*bufsize;
for (int i = 0; i < bufsize; ++i) {
h = update_hash(h, buffer[i]);
}
}
}
if (size) {
// Assuming little endian.
x = 0;
memcpy(&x, bytes, size);
h = update_hash(h, x);
}
return h;
}
This question already has an answer here:
Why does unaligned access to mmap'ed memory sometimes segfault on AMD64?
(1 answer)
Closed 1 year ago.
I can't explain the execution behavior of this program:
#include <string>
#include <cstdlib>
#include <stdio.h>
typedef char u8;
typedef unsigned short u16;
size_t f(u8 *keyc, size_t len)
{
u16 *key2 = (u16 *) (keyc + 1);
size_t hash = len;
len = len / 2;
for (size_t i = 0; i < len; ++i)
hash += key2[i];
return hash;
}
int main()
{
srand(time(NULL));
size_t len;
scanf("%lu", &len);
u8 x[len];
for (size_t i = 0; i < len; i++)
x[i] = rand();
printf("out %lu\n", f(x, len));
}
So, when it is compiled with -O3 with gcc, and run with argument 25, it raises a segfault. Without optimizations it works fine. I've disassembled it: it is being vectorized, and the compiler assumes that the key2 array is aligned at 16 bytes, so it uses movdqa. Obviously it is UB, although I can't explain it. I know about the strict aliasing rule and it is not this case (I hope), because, as far as I know, the strict aliasing rule doesn't work with chars. Why does gcc assume that this pointer is aligned? Clang works fine too, even with optimizations.
EDIT
I changed unsigned char to char, and removed const, it still segfaults.
EDIT2
I know that this code is not good, but it should work ok, as far as I know about the strict aliasing rule. Where exactly is the violation?
The code indeed breaks the strict aliasing rule. However, there is not only an aliasing violation, and the crash doesn't happen because of the aliasing violation. It happens because the unsigned short pointer is incorrectly aligned; even the pointer conversion itself is undefined if the result is not suitably aligned.
C11 (draft n1570) Appendix J.2:
1 The behavior is undefined in the following circumstances:
....
Conversion between two pointer types produces a result that is incorrectly aligned (6.3.2.3).
With 6.3.2.3p7 saying
[...] If the resulting pointer is not correctly aligned [68] for the referenced type, the behavior is undefined. [...]
unsigned short has alignment requirement of 2 on your implementation (x86-32 and x86-64), which you can test with
_Static_assert(_Alignof(unsigned short) == 2, "alignof(unsigned short) == 2");
However, you're forcing the u16 *key2 to point to an unaligned address:
u16 *key2 = (u16 *) (keyc + 1); // we've already got undefined behaviour *here*!
There are countless programmers that insist that unaligned access is guaranteed to work in practice on x86-32 and x86-64 everywhere, and there wouldn't be any problems in practice - well, they're all wrong.
Basically what happens is that the compiler notices that
for (size_t i = 0; i < len; ++i)
hash += key2[i];
can be executed more efficiently using the SIMD instructions if suitably aligned. The values are loaded into the SSE registers using MOVDQA, which requires that the argument is aligned to 16 bytes:
When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general-protection exception (#GP) will be generated.
For cases where the pointer is not suitably aligned at start, the compiler will generate code that will sum the first 1-7 unsigned shorts one by one, until the pointer is aligned to 16 bytes.
Of course if you start with a pointer that points to an odd address, not even adding 7 times 2 will land one to an address that is aligned to 16 bytes. Of course the compiler will not even generate code that will detect this case, as "the behaviour is undefined, if conversion between two pointer types produces a result that is incorrectly aligned" - and ignores the situation completely with unpredictable results, which here means that the operand to MOVDQA will not be properly aligned, which will then crash the program.
It can be easily proven that this can happen even without violating any strict aliasing rules. Consider the following program that consists of 2 translation units (if both f and its caller are placed into one translation unit, my GCC is smart enough to notice that we're using a packed structure here, and doesn't generate code with MOVDQA):
translation unit 1:
#include <stdlib.h>
#include <stdint.h>
size_t f(uint16_t *keyc, size_t len)
{
size_t hash = len;
len = len / 2;
for (size_t i = 0; i < len; ++i)
hash += keyc[i];
return hash;
}
translation unit 2
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <inttypes.h>
size_t f(uint16_t *keyc, size_t len);
struct mystruct {
uint8_t padding;
uint16_t contents[100];
} __attribute__ ((packed));
int main(void)
{
struct mystruct s;
size_t len;
srand(time(NULL));
scanf("%zu", &len);
char *initializer = (char *)s.contents;
for (size_t i = 0; i < len; i++)
initializer[i] = rand();
printf("out %zu\n", f(s.contents, len));
}
Now compile and link them together:
% gcc -O3 unit1.c unit2.c
% ./a.out
25
zsh: segmentation fault (core dumped) ./a.out
Notice that there is no aliasing violation there. The only problem is the unaligned uint16_t *keyc.
With -fsanitize=undefined the following error is produced:
unit1.c:10:21: runtime error: load of misaligned address 0x7ffefc2d54f1 for type 'uint16_t', which requires 2 byte alignment
0x7ffefc2d54f1: note: pointer points here
00 00 00 01 4e 02 c4 e9 dd b9 00 83 d9 1f 35 0e 46 0f 59 85 9b a4 d7 26 95 94 06 15 bb ca b3 c7
^
It is legal to alias a pointer to an object to a pointer to a char, and then iterate all bytes from the original object.
When a pointer to char actually points to an object (has been obtained through previous operation), it is legal to convert is back to a pointer to the original type, and the standard requires that you get back the original value.
But converting an arbitrary pointer to a char to a pointer to object and dereferencing the obtained pointer violates the strict aliasing rule and invokes undefined behaviour.
So in your code, the following line is UB:
const u16 *key2 = (const u16 *) (keyc + 1);
// keyc + 1 did not originally pointed to a u16: UB
To provide some more info and common pitfalls to the excellent answer from #Antti Haapala:
TLDR: Access to unaligned data is undefined behavior (UB) in C/C++. Unaligned data is data at an address (aka pointer value) that is not evenly divisible by its alignment (which is usually its size). In (pseudo-)code: bool isAligned(T* ptr){ return (ptr % alignof(T)) == 0; }
This issue arises often when parsing file formats or data sent over network: You have a densely packed struct of different data types. Example would be a protocol like this: struct Packet{ uint16_t len; int32_t data[]; }; (Read as: A 16 bit length followed by len times a 32 bit int as a value). You could now do:
char* raw = receiveData();
int32_t sum = 0;
uint16_t len = *((uint16_t*)raw);
int32_t* data = (int32_t*)(raw2 + 2);
for(size_t i=0; i<len; ++i) sum += data[i];
This does not work! If you assume that raw is aligned (in your mind you could set raw = 0 which is aligned to any size as 0 % n == 0 for all n) then data cannot possibly be aligned (assuming alignment == type size): len is at address 0, so data is at address 2 and 2 % 4 != 0. But the cast tells the compiler "This data is properly aligned" ("... because otherwise it is UB and we never run into UB"). So during optimization the compiler will use SIMD/SSE instructions for faster calculation of the sum and those do crash when given unaligned data.
Sidenote: There are unaligned SSE instructions but they are slower and as the compiler assumes the alignment you promised they are not used here.
You can see this in the example from #Antti Haapala which I shortened and put at godbolt for you to play around with: https://godbolt.org/z/KOfi6V. Watch the "program returned: 255" aka "crashed".
This problem is also pretty common in deserialization routines which look like this:
char* raw = receiveData();
int32_t foo = readInt(raw); raw+=4;
bool foo = readBool(raw); raw+=1;
int16_t foo = readShort(raw); raw+=2;
...
The read* takes care of endianess and is often implemented like this:
int32_t readInt(char* ptr){
int32_t result = *((int32_t*) ptr);
#if BIG_ENDIAN
result = byteswap(result);
#endif
}
Note how this code dereferences a pointer which pointed to a smaller type which might have a different alignment and you run into the exact some problem.
This problem is so common that even Boost suffered from this through many versions. There is Boost.Endian which provides easy endian types. The C code from godbolt can be easily written likes this:
#include <cstdint>
#include <boost/endian/arithmetic.hpp>
__attribute__ ((noinline)) size_t f(boost::endian::little_uint16_t *keyc, size_t len)
{
size_t hash = 0;
for (size_t i = 0; i < len; ++i)
hash += keyc[i];
return hash;
}
struct mystruct {
uint8_t padding;
boost::endian::little_uint16_t contents[100];
};
int main(int argc, char** argv)
{
mystruct s;
size_t len = argc*25;
for (size_t i = 0; i < len; i++)
s.contents[i] = i * argc;
return f(s.contents, len) != 300;
}
The type little_uint16_t is basically just some chars with an implicit conversion from/to uint16_t with a byteswap if the current machines endianess is BIG_ENDIAN. Under the hood the code used by Boost:endian was similar to this:
class little_uint16_t{
char buffer[2];
uint16_t value(){
#if IS_x86
uint16_t value = *reinterpret_cast<uint16_t*>(buffer);
#else
...
#endif
#if BIG_ENDIAN
swapbytes(value);
#endif
return value;
};
It used the knowledge that on x86 architectures unaligned access is possible. A load from an unaligned address was just a bit slower, but even on assembler level the same as the load from an aligned address.
However "possible" doesn't mean valid. If the compiler replaced the "standard" load by a SSE instruction then this fails as can be seen on godbolt. This went unnoticed for a long time because those SSE instructions are just used when processing large chunks of data with the same operation, e.g. adding an array of values which is what I did for this example. This was fixed in Boost 1.69 by using memcopy which can be translated to a "standard" load instruction in ASM which supports aligned and unaligned data on x86, so there is no slowdown compared to the cast version. But it cannot be translated into aligned SSE instructions without further checks.
Takeaway: Don't use shortcuts with casts. Be suspicious of every cast especially when casting from a smaller type and check that the alignment cannot be wrong or use the safe memcpy.
Unless code does something to ensure that an array of character type is aligned, it should not particularly expect that it will be.
If alignment is taken care of, code takes its address once, converts it to a pointer of another type, and never accesses the storage via any means not derived from the latter pointer, then an implementation designed for low-level programming should have no particular difficulty treating the storage as an abstract buffer. Since such treatment would not be difficult and would be necessary for some kinds of low-level programming (e.g. implementing memory pools in contexts where malloc() may be unavailable), an implementation which doesn't support such constructs should not claim to be suitable for low-level programming.
Consequently, on implementations which are designed for low-level programming, constructs such as you describe would allow suitably-aligned arrays to be treated as untyped storage. Unfortunately, there is no easy way to recognize such implementations, since implementations which are designed primarily for low-level programming often fail to list all of the cases where the authors would think it obvious that such implementations behave in a fashion characteristic of the environment (and where they consequently do precisely that), while those whose design is are focused on other purposes may claim to be suitable for low-level programming even if they behave inappropriately for that purpose.
The authors of the Standard recognize that C is a useful language for non-portable programs, and specifically stated they did not wish to preclude its use as a "high-level assembler". They expected, however, that implementations intended for various purposes would support popular extensions to facilitate those purposes without regard for whether the Standard requires them to do so, and thus there was no need to have the Standard address such things. Because such intention was relegated to the Rationale rather than the Standard, however, some compiler writers regard the Standard as a full description of everything that programmers should ever expect from an implementation, and thus may not support low-level concepts like the use of static- or automatic-duration objects as effectively-untyped buffers.
I believe I've found a way to achieve something like the well-known "struct hack" in portable C89. I'm curious if this really strictly conforms to C89.
The main idea is: I allocate memory large enough to hold an initial struct and the elements of the array. The exact size is (K + N) * sizeof(array_base_type), where K is chosen so that K * sizeof(array_base_type) >= sizeof(the_struct) and N is the number of array elements.
First, I dereference the pointer that malloc() returned to store the_struct, then I use pointer arithmetic to obtain a pointer to the beginning of the array following the struct.
One line of code is worth more than a thousand words, so here is a minimal implementation:
typedef struct Header {
size_t length;
/* other members follow */
} Header;
typedef struct Value {
int type;
union {
int intval;
double fltval;
} v;
} Value;
/* round up to nearest multiple of sizeof(Value) so that a Header struct fits in */
size_t n_hdr = (sizeof(Header) + sizeof(Value) - 1) / sizeof(Value);
size_t n_arr = 42; /* arbitrary array size here */
void *frame = malloc((n_hdr + n_arr) * sizeof(Value));
if (!frame)
return NULL;
Header *hdr = frame;
Value *stack_bottom = (Value *)frame + n_hdr;
My main concern is that the last two assignments (using frame as both a pointer to Header and a pointer to Value) may violate the strict aliasing rule. I do not, however, dereference hdr as a pointer to Value - it's only pointer arithmetic that is performed on frame in order to access the first element of the value array, so I don't effectively access the same object using pointers of different types.
So, is this approach any better than the classic struct hack (which has been officially deemed UB), or is it UB too?
The "obvious" (well... not exactly obvious, but it's what comes to my mind anyway :-) ) way to cause this to break is to use a vectorizing compiler that somehow decides it's OK to load, say, 64 Headers into a vector register from the 42-rounded-up-to-64+ area at hdr which comes from malloc which always allocates enough to vectorize. Storing the vector register back to memory might overwrite one of the Values.
I think this vectorizing compiler could point to the standard (well, if a compiler has fingers...) and claim conformance.
In practice, though, I'd expect this code to work. If you come across a vectorizing compiler, add even more space (do the rounding up with a machine-dependent macro that can insert a minimum) and charge on. :-)
Purpose
I am writing a small library for a larger project which supplies malloc/realloc/free wrapper-functions as well as a function which can tell you whether or not its parameter (of type void *) corresponds to live (not yet freed) memory allocated and managed by the library's wrapper-functions. Let's refer to this function as isgood_memory.
Internally, the library maintains a hash-table to ensure that the search performed by isgood_memory is reasonably fast. The hash-table maintains pointer values (elements of type void *) to make the search possible. Clearly, values are added and removed from the hash-table to keep it up-to-date with what has been allocated and what has been freed, respectively.
The portability of the library is my biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more.
Question
Since portability is my biggest concern, I couldn't assume that sizeof(void *) == sizeof(X) for the hash-function. Therefore, I have resorted to treating the value byte-by-byte as if it were a string. To accomplish this, the hash function looks a little like:
static size_t hashit(void *ptrval)
{
size_t i = 0, h = 0;
union {
void *ptrval;
unsigned char string[sizeof(void *)];
} ptrstr;
ptrstr.ptrval = ptrval;
for (; i < sizeof(void *); ++i) {
size_t byte = ptrstr.string[i];
/* Crazy operations here... */
}
return (h);
}
What portability concerns do any of you have with this particular fragment? Will I encounter any funky alignment issues by accessing ptrval byte-by-byte?
You are allowed to access a data type as an array of unsigned char, as you do here. The major portability issue that I see could occur on platforms where the bit-pattern identifying a particular location is not unique - in that case, you might get pointers that compare equal hashing to different locations because the bit patterns were different.
Why could they be different? Well, for one thing, most C data types are allowed to contain padding bits that don't participate in the value. A platform where pointers contained such padding bits could have two pointers that differed only in the padding bits point to the same location. (For example, the OS might use some pointer bits to indicate capabilities of the pointer, not just physical address.) Another example is the far memory model from the early days of DOS, where far pointers consisted of segment:offset, and the adjacent segments overlapped, so that segment:offset could point to the same location as segment+1:offset-x.
All that said, on most platforms in common use today, the bit pattern pointing to a given location is indeed unique. So your code will be widely portable, even though it is unlikely to be strictly conforming.
Looks pretty clean. If you can rely on the <inttypes.h> header from C99 (it is often available elsewhere), then consider using uintptr_t - but if you want to hash the value byte-wise, you end up breaking things down to bytes and there is no real advantage to it.
Mostly correct. There's one potential problem, though. you assign
size_t byte = ptrstr.string[i];
*string is defined as char, not unsigned char. On the platform that has signed chars and unsigned size_t, it will give you result that you may or may not expect. Just change your char to unsigned char, that will be cleaner.
If you don't need the pointer values for some other reason beside keeping track of allocated memory, why not get rid of the hash table altogether and just store a magic number along with the memory allocated as in the example below. The magic number being present alongside the memory allocated indicates that it is still "alive". When freeing the memory you clear the stored magic number before freeing the memory.
#pragma pack(1)
struct sMemHdl
{
int magic;
byte firstByte;
};
#pragma pack()
#define MAGIC 0xDEADDEAD
#define MAGIC_SIZE sizeof(((struct sMemHdl *)0)->magic)
void *get_memory( size_t request )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)malloc(MAGIC_SIZE + request);
pMemHdl->magic = MAGIC;
return (void *)&pMemHdl->firstByte;
}
void free_memory ( void *mem )
{
if ( isgood_memory(mem) != 0 )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)mem - MAGIC_SIZE);
pMemHdl->magic = 0;
free(pMemHdl);
}
}
int isgood_memory ( void *Mem )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)Mem - MAGIC_SIZE);
if ( pMemHdl->magic == MAGIC )
{
return 1; /* mem is good */
}
else
{
return 0; /* mem already freed */
}
}
This may be a bit hackish, but I guess I'm in a hackish mood...
Accessing variables such integers or pointers as chars or unsigned chars in not a problem from a portability view. But the reverse is not true, because it is hardware dependent.
I have one question, why are you hashing a pointer as a string instead of using the pointer itself as a hash value ( using uintptr_t) ?