I have a piece of memory I am "guarding", defined by
typedef unsigned char byte;
byte * guardArea;
size_t guardSize;
byte * guardArea = getGuardArea();
size_t guardSize = getGuardSize();
An acceptable implementation for the sake of this would be:
size_t glGuardSize = 1024; /* protect an area of 1kb */
byte * getGuardArea()
{
return malloc( glGuardSize );
}
size_t getGuardSize()
{
return glGuardSize;
}
Can the following snippet return true for any pointer (from a different malloc, from the stack etc)?
if ( ptr >= guardArea && ptr < (guardArea + guardSize)) {
return true;
}
The standard states that:
values within the area will return true. (When ptr was a member, all acts correctly.)
pointers will be distinct (a == b only if they are the same).
all addresses within the byte array can be accessed by incrementing the base.
any pointer can be converted to and from a char *, without damage.
So I can't understand how the result could be true for any pointer from a different object (as it would break the distinct rule for one of the pointers within the area).
Edit:
What is the use case?
The ability to detect whether a pointer is within a region is really important, at some point code is written
if ( isInMyAreaOfInterest( unknownPointer ) ) {
doMySpecialThing( unknownPointer );
} else {
doSomethingElse( unknownPointer );
}
I think the language needs to support the developer by making such constructs simple and obvious, and our interpretation of the standard, is that the developer needs to cast to int. Due to the "undefined behavior" of pointer comparisons of distinct objects.
I was hoping for some clarity of why I can't do what I would like (my snippet), as all the posts on SO I found say that the standard claims undefined behavior, without any explanation, or examples of why the standard is better than how I would like it to work.
At the moment, we have a rule, we are neither understanding why the rule exists, or questioning if the rule is helping us
Example posts:
SO: checking if a pointer is in a malloced area
SO: C compare pointers
It is still possible for an allocation to generate a pointer that satisfies the condition despite the pointer not pointing into the region. This will happen, for example, on an 80286 in protected mode, which is used by Windows 3.x in Standard mode and OS/2 1.x.
In this system, pointers are 32-bit values, split into two 16-bit parts, traditionally written as XXXX:YYYY. The first 16-bit part (XXXX) is the "selector", which chooses a bank of 64KB. The second 16-bit part (YYYY) is the "offset", which chooses a byte within that 64KB bank. (It's more complicated than this, but let's just leave it at that for the purpose of this discussion.)
Memory blocks larger than 64KB are broken up into 64KB chunks. To move from one chunk to the next, you add 8 to the selector. For example, the byte after 0101:FFFF is 0109:0000.
But why do you add 8 to move to the next selector? Why not just increment the selector? Because the bottom three bits of the selector are used for other things.
In particular, the bottom bit of the selector is used to choose the selector table. (Let's ignore bits 1 and 2 since they are not relevant to the discussion. Assume for convenience that they are always zero.)
There are two selector tables, the Global Selector Table (for memory shared across all processes) and the Local Selector Table (for memory private to a single process). Therefore, the selectors available for process private memory are 0001, 0009, 0011, 0019, etc. Meanwhile, the selectors available for global memory are 0008, 0010, 0018, 0020, etc. (Selector 0000 is reserved.)
Okay, now we can set up our counter-example. Suppose guardArea = 0101:0000 and guardSize = 0x00020000. This means that the guarded addresses are 0101:0000 through 0101:FFFF and 0109:0000 through 0109:FFFF. Furthermore, guardArea + guardSize = 0111:0000.
Meanwhile, suppose there is some global memory that happens to be allocated at 0108:0000. This is a global memory allocation because the selector is an even number.
Observe that the global memory allocation is not part of the guarded region, but its pointer value does satisfy the numeric inequality 0101:0000 <= 0108:0000 < 0111:0000.
Bonus chatter: Even on CPU architectures with a flat memory model, the test can fail. Modern compilers take advantage of undefined behavior and optimize accordingly. If they see a relational comparison between pointers, they are permitted to assume that the pointers point into the same array (or one past the last element of that array). Specifically, the only pointers that can legally be compared with guardArea are the ones of the form guardArea, guardArea+1, guardArea+2, ..., guardArea + guardSize. For all of these pointers, the condition ptr >= guardArea is true and can therefore be optimized out, reducing your test to
if (ptr < (guardArea + guardSize))
which will now be satisfied for pointers that are numerically less than guardArea.
Moral of the story: This code is not safe, not even on flat architectures.
But all is not lost: The pointer-to-integer conversion is implementation-defined, which means that your implementation must document how it works. If your implementation defines the pointer-to-integer conversion as producing the numeric value of the pointer, and you know that you are on a flat architecture, then what you can do is compare integers rather than pointers. Integer comparisons are not constrained in the same way that pointer comparisons are.
if ((uintptr_t)ptr >= (uintptr_t)guardArea &&
(uintptr_t)ptr < (uintptr_t)guardArea + (uintptr_t)guardSize)
Yes.
void foo(void) {}
void(*a) = foo;
void *b = malloc(69);
uintptr_t ua = a, ub = b;
ua and ub are in fact permitted to have the same value. This occurred frequently on segmented systems (like MS-DOS) which might put code and data in separate segments.
Related
I was recently answering a question on the undefined behaviour of doing p < q in C when p and q are pointers into different objects/arrays. That got me thinking: C++ has the same (undefined) behaviour of < in this case, but also offers the standard library template std::less which is guaranteed to return the same thing as < when the pointers can be compared, and return some consistent ordering when they cannot.
Does C offer something with similar functionality which would allow safely comparing arbitrary pointers (to the same type)? I tried looking through the C11 standard and didn't find anything, but my experience in C is orders of magnitude smaller than in C++, so I could have easily missed something.
On implementations with a flat memory model (basically everything), casting to uintptr_t will Just Work.
(But see Should pointer comparisons be signed or unsigned in 64-bit x86? for discussion of whether you should treat pointers as signed or not, including issues of forming pointers outside of objects which is UB in C.)
But systems with non-flat memory models do exist, and thinking about them can help explain the current situation, like C++ having different specs for < vs. std::less.
Part of the point of < on pointers to separate objects being UB in C (or at least unspecified in some C++ revisions) is to allow for weird machines, including non-flat memory models.
A well-known example is x86-16 real mode where pointers are segment:offset, forming a 20-bit linear address via (segment << 4) + offset. The same linear address can be represented by multiple different seg:off combinations.
C++ std::less on pointers on weird ISAs might need to be expensive, e.g. "normalize" a segment:offset on x86-16 to have offset <= 15. However, there's no portable way to implement this. The manipulation required to normalize a uintptr_t (or the object-representation of a pointer object) is implementation-specific.
But even on systems where C++ std::less has to be expensive, < doesn't have to be. For example, assuming a "large" memory model where an object fits within one segment, < can just compare the offset part and not even bother with the segment part. (Pointers inside the same object will have the same segment, and otherwise it's UB in C. C++17 changed to merely "unspecified", which might still allow skipping normalization and just comparing offsets.) This is assuming all pointers to any part of an object always use the same seg value, never normalizing. This is what you'd expect an ABI to require for a "large" as opposed to "huge" memory model. (See discussion in comments).
(Such a memory model might have a max object size of 64kiB for example, but a much larger max total address space that has room for many such max-sized objects. ISO C allows implementations to have a limit on object size that's lower than the max value (unsigned) size_t can represent, SIZE_MAX. For example even on flat memory model systems, GNU C limits max object size to PTRDIFF_MAX so size calculation can ignore signed overflow.) See this answer and discussion in comments.
If you want to allow objects larger than a segment, you need a "huge" memory model that has to worry about overflowing the offset part of a pointer when doing p++ to loop through an array, or when doing indexing / pointer arithmetic. This leads to slower code everywhere, but would probably mean that p < q would happen to work for pointers to different objects, because an implementation targeting a "huge" memory model would normally choose to keep all pointers normalized all the time. See What are near, far and huge pointers? - some real C compilers for x86 real mode did have an option to compile for the "huge" model where all pointers defaulted to "huge" unless declared otherwise.
x86 real-mode segmentation isn't the only non-flat memory model possible, it's merely a useful concrete example to illustrate how it's been handled by C/C++ implementations. In real life, implementations extended ISO C with the concept of far vs. near pointers, allowing programmers to choose when they can get away with just storing / passing around the 16-bit offset part, relative to some common data segment.
But a pure ISO C implementation would have to choose between a small memory model (everything except code in the same 64kiB with 16-bit pointers) or large or huge with all pointers being 32-bit. Some loops could optimize by incrementing just the offset part, but pointer objects couldn't be optimized to be smaller.
If you knew what the magic manipulation was for any given implementation, you could implement it in pure C. The problem is that different systems use different addressing and the details aren't parameterized by any portable macros.
Or maybe not: it might involve looking something up from a special segment table or something, e.g. like x86 protected mode instead of real mode where the segment part of the address is an index, not a value to be left shifted. You could set up partially-overlapping segments in protected mode, and the segment selector parts of addresses wouldn't necessarily even be ordered in the same order as the corresponding segment base addresses. Getting a linear address from a seg:off pointer in x86 protected mode might involve a system call, if the GDT and/or LDT aren't mapped into readable pages in your process.
(Of course mainstream OSes for x86 use a flat memory model so the segment base is always 0 (except for thread-local storage using fs or gs segments), and only the 32-bit or 64-bit "offset" part is used as a pointer.)
You could manually add code for various specific platforms, e.g. by default assume flat, or #ifdef something to detect x86 real mode and split uintptr_t into 16-bit halves for seg -= off>>4; off &= 0xf; then combine those parts back into a 32-bit number.
I once tried to find a way around this and I did find a solution that works for overlapping objects and in most other cases assuming the compiler does the "usual" thing.
You can first implement the suggestion in How to implement memmove in standard C without an intermediate copy? and then if that doesn't work cast to uintptr (a wrapper type for either uintptr_t or unsigned long long depending on whether uintptr_t is available) and get a most-likely accurate result (although it probably wouldn't matter anyway):
#include <stdint.h>
#ifndef UINTPTR_MAX
typedef unsigned long long uintptr;
#else
typedef uintptr_t uintptr;
#endif
int pcmp(const void *p1, const void *p2, size_t len)
{
const unsigned char *s1 = p1;
const unsigned char *s2 = p2;
size_t l;
/* Check for overlap */
for( l = 0; l < len; l++ )
{
if( s1 + l == s2 || s1 + l == s2 + len - 1 )
{
/* The two objects overlap, so we're allowed to
use comparison operators. */
if(s1 > s2)
return 1;
else if (s1 < s2)
return -1;
else
return 0;
}
}
/* No overlap so the result probably won't really matter.
Cast the result to `uintptr` and hope the compiler
does the "usual" thing */
if((uintptr)s1 > (uintptr)s2)
return 1;
else if ((uintptr)s1 < (uintptr)s2)
return -1;
else
return 0;
}
Does C offer something with similar functionality which would allow safely comparing arbitrary pointers.
No
First let us only consider object pointers. Function pointers bring in a whole other set of concerns.
2 pointers p1, p2 can have different encodings and point to the same address so p1 == p2 even though memcmp(&p1, &p2, sizeof p1) is not 0. Such architectures are rare.
Yet conversion of these pointer to uintptr_t does not require the same integer result leading to (uintptr_t)p1 != (uinptr_t)p2.
(uintptr_t)p1 < (uinptr_t)p2 itself is well legal code, by may not provide the hoped for functionality.
If code truly needs to compare unrelated pointers, form a helper function less(const void *p1, const void *p2) and perform platform specific code there.
Perhaps:
// return -1,0,1 for <,==,>
int ptrcmp(const void *c1, const void *c1) {
// Equivalence test works on all platforms
if (c1 == c2) {
return 0;
}
// At this point, we know pointers are not equivalent.
#ifdef UINTPTR_MAX
uintptr_t u1 = (uintptr_t)c1;
uintptr_t u2 = (uintptr_t)c2;
// Below code "works" in that the computation is legal,
// but does it function as desired?
// Likely, but strange systems lurk out in the wild.
// Check implementation before using
#if tbd
return (u1 > u2) - (u1 < u2);
#else
#error TBD code
#endif
#else
#error TBD code
#endif
}
The C Standard explicitly allows implementations to behave "in a documented manner characteristic of the environment" when an action invokes "Undefined Behavior". When the Standard was written, it would have been obvious to everyone that implementations intended for low-level programming on platforms with a flat memory model should do precisely that when processing relational operators between arbitrary pointers. It also would have been obvious that implementations targeting platforms whose natural means of pointer comparisons would never have side effects should perform comparisons between arbitrary pointers in ways that don't have side effects.
There are three general circumstances where programmers might perform relational operators between pointers:
Pointers to unrelated objects will never be compared.
Code may compare pointers within an object in cases where the results would matter, or between unrelated objects in cases where the results wouldn't matter. A simple example of this would be an operation that can act upon possibly-overlapping array segments in either ascending or descending order. The choice of ascending or descending order would matter in cases where the objects overlap, but either order would be equally valid when acting upon array segments in unrelated objects.
Code relies upon comparisons yielding a transitive ordering consistent with pointer equality.
The third type of usage would seldom occur outside of platform-specific code, which would either know that relational operators would simply work, or would know a platform-specific alternative. The second type of usage could occur in code which should be mostly portable, but almost all implementations could support the second type of usage just as cheaply as the first and there would be no reasons for them to do otherwise. The only people who should have any reason to care about whether the second usage was defined would be people writing compilers for platforms where such comparisons would be expensive or those seeking to ensure that their programs would be compatible with such platforms. Such people would be better placed than the Committee to judge the pros and cons of upholding a "no side effects" guarantee, and thus the Committee leaves the question open.
To be sure, the fact that there would be no reason for a compiler not to process a construct usefully is no guarantee that a "Gratuitously Clever Compiler" won't use the Standard as an excuse to do otherwise, but the reason the C Standard doesn't define a "less" operator is that the Committee expected that "<" would be adequate for almost all programs on almost all platforms.
I was looking at the GNU implementation of obstacks, and I noticed the obstack_free subroutine is using pointer comparison to the beginnings and ends of the previous links of the linked list to find what block the pointer-to-be freed belongs to.
https://code.woboq.org/userspace/glibc/malloc/obstack.c.html
while (lp != 0 && ((void *) lp >= obj || (void *) (lp)->limit < obj))
{
plp = lp->prev;
CALL_FREEFUN (h, lp);
lp = plp;
h->maybe_empty_object = 1;
} //...
Such comparison appears to be undefined as per http://port70.net/~nsz/c/c11/n1570.html#6.5.8p5:
When two pointers are compared, the result depends on the relative
locations in the address space of the objects pointed to. If two
pointers to object types both point to the same object, or both point
one past the last element of the same array object, they compare
equal. If the objects pointed to are members of the same aggregate
object, pointers to structure members declared later compare greater
than pointers to members declared earlier in the structure, and
pointers to array elements with larger subscript values compare
greater than pointers to elements of the same array with lower
subscript values. All pointers to members of the same union object
compare equal. If the expression P points to an element of an array
object and the expression Q points to the last element of the same
array object, the pointer expression Q+1 compares greater than P. In
all other cases, the behavior is undefined.
Is there a fully standard compliant way to implement obstacks. If not, what platforms could such comparison practically break on?
I am not a language-lawyer, so I don't know how to answer OP's question, except that a plain reading of the standard does not describe the entire picture.
While the standard says that comparing unrelated pointers yields undefined results, the behaviour of a standards-compliant C compiler is much more restricted.
The first sentence in the section concerning pointer comparison is
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to.
and for a very good reason.
If we examine the possibilities how the pointer comparison code may be used, we find that unless the compiler can determine which objects the compared pointers belong to at compile time, all pointers in the same address space must compare arithmetically, according to the addresses they refer to.
(If we prove that a standards-compliant C compiler is required by the standard to provide specific results when a plain reading of the C standard itself says the results are undefined, is such code standards-compliant or not? I don't know. I only know such code works in practice.)
A literal interpretation of the standard may lead to one believing that there is absolutely no way of determining whether a pointer refers to an array element or not. In particular, observing
int is_within(const char *arr, const size_t len, const char *ptr)
{
return (ptr >= arr) && (ptr < (arr + len));
}
a standards compliant C compiler could decide that because comparison between unrelated pointers is undefined, it is justified in optimizing the above function into
int is_within(const char *arr, const size_t len, const char *ptr)
{
if (size)
return ptr != (arr + len);
else
return 0;
}
which returns 1 for pointers within array const char arr[len], and zero at the element just past the end of the array, just like the standard requires; and 1 for all undefined cases.
The problem in that line of thinking arises when a caller, in a separate compilation unit, does e.g.
char buffer[1024];
char *p = buffer + 768;
if (is_within(buffer, (sizeof buffer) / 2, p)) {
/* bug */
} else {
/* correct */
}
Obviously, if the is_within() function was declared static (or static inline), the compiler could examine all call chains that end up in is_within(), and produce correct code.
However, when is_within() is in a separate compilation unit compared to its callers, the compiler can no longer make such assumptions: it simply does not, and cannot know, the object boundaries beforehand. Instead, the only way it can be implemented by a standards-compliant C compiler, is to rely on the addresses the pointers refer to, blindly; something like
int is_within(const char *arr, const size_t len, const char *ptr)
{
const uintptr_t start = POINTER_TO_UINTPTR(arr);
const uintptr_t limit = POINTER_TO_UINTPTR(arr + len);
const uintptr_t thing = POINTER_TO_UINTPTR(ptr);
return (thing >= start) && (thing < limit);
}
where the POINTER_TO_UINTPTR() would be a compiler-internal macro or function, that converts the pointer losslessly to an unsigned integer value (with the intent that there would be a corresponding UINTPTR_TO_POINTER() that could recover the exact same pointer from the unsigned integer value), without consideration for any optimizations or rules allowed by the C standard.
So, if we assume that the code is compiled in a separate compilation unit to its users, the compiler is forced to generate code that provides more quarantees than a simple reading of the C standard would indicate.
In particular, if arr and ptr are in the same address space, the C compiler must generate code that compares the addresses the pointers point to, even if the C standard says that comparison of unrelated pointers yields undefined results; simply because it is at least theoretically possible for an array of objects to occupy any subregion of the address space. The compiler just cannot make assumptions that break conforming C code later on.
In the GNU obstack implementation, the obstacks all exist in the same address space (because of how they are obtained from the OS/kernel). The code assumes that the pointers supplied to it refer to these objects. Although the code does return an error if it detects that a pointer is invalid, it does not guarantee it always detects invalid pointers; thus, we can ignore the invalid pointer cases, and simply assume that because all obstacks are from the same address space, so are all the user-supplied pointers.
There are many architectures with multiple address spaces. x86 with a segmented memory model is one of these. Many microcontrollers have Harvard architecture, with separate address spaces for code and data. Some microcontrollers have a separate address space (different machine instructions) for accessing RAM and flash memory (but capable of executing from both), and so on.
It is even possible for there to be an architecture where each pointer has not only its memory address, but some kind of unique object ID associated with it. This is nothing special; it just means that on such an architecture, each object has their own address space.
I'm building a garbage collector for a compiler. I work with "physical" and "virtual" addresses. The virtual addresses have type value_t and are 32 bits. The physical addresses have type value_t* and are 32 or 64 bits pointers depending on the host machine. The conversion between the two is as follows:
static void* addr_v_to_p(value_t v_addr) {
assert(0 <= v_addr);
return (char*)memory_start + v_addr;
}
static value_t addr_p_to_v(void* p_addr) {
assert(memory_start <= p_addr && p_addr <= memory_end);
return (value_t)((char*)p_addr - (char*)memory_start);
}
Then I set a memory lay out similar to this:
Where the bitmap has to reference values of the heap.
Problem
I want to give the index in the bitmap of an address in a machine independent way. (For that purposes it becomes handy to set VALUE_BITS = sizeof(value_t) * CHAR_BITS as the number of bits of the value_t type). I would write:
(ptr, heap_start, bitmap_start of type value_t*)
size_t index = ptr - heap_start;
size_t word_index = index / VALUE_BITS;
bitmap_start[word_index] = ...;
But I'm not sure this is going to work.
As I understand the question, your "bitmap" is intended to be sequence of bits, with a unique bit corresponding to each heap address. Presumably you also want to minimize the number of unused bits. You are asking about your proposed approach to mapping between heap addresses and bits.
Furthermore, the conversion functions you present between virtual and physical addresses suggest that your memory model is byte-addressible, as opposed, say, to being addressible only with the granularity of a value_t.
Since your bitmap is apparently accessed in units of type value_t, which I'm taking to be unsigned and without padding bits, the number of usable bits in each unit is sizeof(value_t) * CHAR_BIT. That matches your VALUE_BITS, modulo spelling. Still, if bitmap_start is going to be (or reasonably can be made to be) visible wherever VALUE_BITS is defined (if it is a variable) or used (if it is a macro), then I would be inclined to write its initializer / replacement text as (sizeof(*bitmap_start) * CHAR_BIT). That's clearer to me and adapts automatically if ever you change the type to which bitmap_start points.
Now let's consider your code starting with this:
size_t index = ptr - heap_start;
There's nothing inherently wrong with that, but remember that pointer arithmetic is defined in terms of units of the pointed-to type. Thus, that gives the number of units of type value_t in the half-open interval defined by the two pointers, supposing ptr points into or just past the end of the heap and is properly aligned. That alignment caveat matters because your model is byte-addressible, therefore there are valid values that ptr can take that are misaligned. In fact, the majority of valid ptr values are misaligned. If you want the index in terms of a byte offset into the heap -- and it appears you do -- then you want something more like this:
ptrdiff_t index = (char *) ptr - (char *) heap_start;
Let's move on to the next part:
size_t word_index = index / VALUE_BITS;
It seems you're trying to determine the storage unit in the bitset that contains the indexth bit. If we stipulate that each unit contains VALUE_BITS usable bits, and that you want every bit in each unit to correspond to a heap address, then this is fine.
But you seem to have run out of steam there, as this ...
bitmap_start[word_index] = ...;
... is slightly lacking in detail. You're going to need to use bit masking to select the appropriate bit of bitmap_start[word_index] to examine or set, and in doing so you'll need to take care to avoid modifying the other bits in the same unit. That's not hard, but I'm not going to do it for you.
Is the following correct code?
Assume it is known that all object pointer types have equal size and alignment, with size not greater than 8.
// allocate some space to A, and set *A and **A to different regions of that space
char*** A = malloc(92);
*A = (char**)( (char*)A + 2*sizeof(char**) );
**A = (char*)*A + 4*sizeof(char*);
// initialize the second char** object
A[1] = *A + 2;
// write four strings further out in the space
strcpy(A[0][0],"string-0-0");
A[0][1] = A[0][0] + strlen(A[0][0]) + 1;
strcpy(A[0][1],"string-0-1");
A[1][0] = A[0][1] + strlen(A[0][1]) + 1;
strcpy(A[1][0],"string-1-0");
A[1][1] = A[1][0] + strlen(A[1][0]) + 1;
strcpy(A[1][1],"string-1-1");
I find stuff like this useful in situations where it may not be straightforward how to deallocate the object. For example, say A[1][1] may or may not be reassigned to the address of a string literal. Either way you just free A. Also, the number of calls to malloc are minimized.
My concern that this may not be correct code is based on the following. Using a draft version of the standard, I have:
7.22.3 Memory management functions
...The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated)...
So I am garanteed to be able to use the space as an array of a single type. I can not find any garantee that I can use it as an array of two distinct types (char* and char**). Note that the use of some of the space as char arrays is unique given that any object can be accessed as a character type array.
The rules for effective type are consistent with this approach, as no individual byte is ever used as part of two different types.
While the above indicates there seems to be no explicit violation of the standard, the standard does not explicitly permit the behavior either, and we have paragraph 2 of chapter 4 (emphasis added):
If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ‘‘behavior that is undefined’’.
That is a little vague when the memory model itself is so vague. The use of space returned by malloc() to store an array of any (one) type is apparently in need of explicit allowance, which allowance I quoted above. So one could argue that the use of that space for disjoint arrays of distinct types also requires explicit allowance and, without it, is left as undefined behavior per chapter 4.
So, to be specific, if the code example is correct, what is wrong with the argument that it is not explicitly defined and therefore undefined, per the part of chapter 4 from the standard that is above quoted?
Provided that the offset of any object from the start of the allocated region is a multiple of the alignment, and provided that no piece of the memory is used as more than one type within the lifetime of the allocation, there will be no problem.
One nasty gotcha is that while there are some algorithms (e.g. for hash tables) that will work very nicely with a table that is initially filled with arbitrary values (given a value that may or may not be correct, code may be able to determine whether the value is correct much more quickly--O(1) vs O(N)--than it could find the correct value without an initial guess), such behavior may not be reliable when using gcc or clang. The way they interpret the Standard, writing memory as one type and reading as another non-character type yields Undefined Behavior even if the destination type would have no trap representations, even if pointers are converted which are converted to a new type are never used (in their original form) after that, and even if code would work correctly for any value the read might have yielded in cases where the data had not yet been written as the new type.
Given:
float *fp;
uint32_t *ip;
*fp = 1.0f;
*ip = 23;
Behavior would be defined if fp and ip identify the same storage, and the
storage would be required to hold 23 afterward. On the other hand, given:
float *fp;
uint32_t *ip;
*fp = 1.0f;
uint32_t temp = *ip;
*ip = 23;
a compiler could take advantage of the Undefined Behavior to reorder the
operations so the write to *fp occurred after the write to *ip. If the data is getting reused for something like a hash table it's unlikely that a compiler would be able to usefully apply such optimization, but a "smart" compiler might reorder writes of useless data past the writes of useful data. Such "optimizations" aren't terribly likely to break things, but there is no Standard-defined way to prevent them except by either freeing and reallocating the storage (if one is using a hosted system and can tolerate the performance hit and possible fragmentation) or else clearing the storage before re-use (which may turn O(1) operations into O(N)).
I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it.
My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this:
void *allocateScratch(size_t rows, size_t columns);
void doCalculation(size_t rows, size_t columns, double *data, void *scratch);
The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required.
In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types.
So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this:
void *allocateScratch(...) {
return malloc(3 * sizeof(double) + 5 * sizeof(int));
}
void doCalculation(..., void *scratch) {
double *dblArray = scratch;
int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double);
}
Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider?
My biggest goals are as follows:
I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required.
I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way.
The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility.
Maybe this issue is addressed in the C standard, but I haven't been able to find it.
The memory of a single malloc can be partitioned for use in multiple arrays as shown below.
Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this:
size_t Offset = 0;
ptrdiff_t OffsetA = Offset; // Put array at current offset.
Offset += NA * sizeof(A); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(B)); // Align sufficiently for type.
ptrdiff_t OffsetB = Offset; // Put array at current offset.
Offset += NB * sizeof(B); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(C)); // Align sufficiently for type.
ptrdiff_t OffsetC = Offset; // Put array at current offset.
Offset += NC * sizeof(C); // Move offset to end of array.
unsigned char *Memory = malloc(Offset); // Allocate memory.
// Set pointers for arrays.
A *pA = Memory + OffsetA;
B *pB = Memory + OffsetB;
C *pC = Memory + OffsetC;
where RoundUp is:
// Return Offset rounded up to a multiple of Size.
size_t RoundUp(size_t Offset, size_t Size)
{
size_t x = Offset + Size - 1;
return x - x % Size;
}
This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.
If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design.
So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to.
If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those...
Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?
The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header).
GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment.
Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions...
And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes).
On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes...
Don't forget that according to Fulton
there is no such thing as portable software, only software that has been ported.
but intptr_t and max_align_t is here to help you....
Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).