R_alloc and alignment considerations

R_alloc and alignment considerations - c

I'm writing an R extension that requires me to allocate memory for an array of structs. The structs contain integers, SEXPs, character pointers, e.g. something like:
struct my_struct {
int a;
SEXP b;
const char * d;
const char * e[3];
};
I'm hoping to allocate memory for the array using something like:
struct my_struct * arr = (struct my_struct *) R_alloc(10, sizeof(my_struct));
This works in the strictest sense of the word, but I'm given pause by the following comment from WRE:
The memory returned is only guaranteed to be aligned as required for double pointers: take precautions if casting to a pointer which needs more.
I am not concerned about speed or space as I don't expect my arrays to be large or accessed frequently. I do however want to avoid crashes. My understanding is that misaligned memory access is really only a performance issue for x86 architectures. Furthermore, since it seems nowadays R is primarily for x86 architectures (based on CRAN tests), and I'm not concerned about performance, I shouldn't have any problems doing this.
Am I setting myself up for trouble?
EDIT: I'm assuming here that "double pointers" means pointers to doubles, and not pointers to pointers as seems to be the informal convention in some places. FWIW the code in R_alloc (src/main/memory.c#~2700) wants to allocate in multiples of sizeof(VECREC), where VECREC is union(SEXP, double) (src/include/Defn.h#~410), plus header offset. Presumably this is where the alignment guarantee for doubles comes from, but I'm not sure why this would be a problem for larger structs. Granted, I'm not particularly experienced at this.

I'm still not an expert on the matter, but according to at least this site:
In general, a struct instance will have the alignment of its widest scalar member. Compilers do this as the easiest way to ensure that all the members are self-aligned for fast access.
So in this case, the struct should have an alignment compatible with what R_alloc allocates, since it's largest member will be the SEXP (probably, I guess it might be possible for char * to be bigger on some systems, but seems unlikely), and R_alloc allocates in multiples of union(SEXP, double), as documented in the question.
One other thing perhaps worth pointing out is that the recommended ordering of the elements in a struct is from largest alignment requirement to lowest, and right now we start with an int, which is the smallest.

Related

Single malloc call for multiple data structures

Given the following structure:
struct myStructure {
float myField;
};
struct mySecondStructure {
int myField;
struct myStructure *mySecondField;
};
How can I efficiently allocate the struct mySecondStructure on the heap? (With the variable mySecondField initialized to a valid memory address)
I came up with two ways doing this:
First approach: use two calls to malloc:
struct mySecondStructure *structure = malloc(sizeof(* structure));
structure->mySecondField = malloc(sizeof(* structure->mySecondField));
Second approach: use one call to malloc with enough space for both structures:
struct mySecondStructure *structure = malloc(sizeof(struct myStructure) + sizeof(struct mySecondStructure));
structure->mySecondField = (struct myStructure *)(((unsigned char *)structure) + sizeof(struct mySecondStructure)); // the pointer points now to the end of the first structure
But the first one is in my opinion inefficient since it uses two calls to malloc. However I'm not sure if the second one will work properly because of data alignment.
Can someone enlighten me about this?
I'm thankful for any given help.

The second approach should work; structs are padded so that they align properly (for example, consider if you had an array of them). One downside to the second approach would be if you wanted to reclaim the second structure (there is a reason its not just embedded in the first structure, I presume).
But the first is a lot cleaner, and what benefits there are to the second are minimal at best.

Both your approaches would work.
The difference between one malloc() call and two depends on how malloc() itself is implemented - which varies between implementations, is affected by operating system and hardware characteristics.
Without measuring the difference for your implementation, in the context of your program, and a typical execution environment (operating system, hardware, etc) all you are doing is indulging in premature optimisation.
Generally speaking, it would be better to do things in a way that are easier to understand (less complex code), easier to get right, and therefore easier to maintain. Then, if performance is a concern, do measurements to work out the bottlenecks.
In practice, you might also want to consider having a struct myStructure within your struct mySecondStructure, rather than a pointer [which is what Joachim's comment is about]. That way, you can allocate with a single malloc() call, without any need for additional bookkeeping to initialise pointers. Simpler to implement, easier to understand, and - if number of malloc() calls really matters - minimises that number.

Depending on the members of the structures, it's possible that they have different alignment requirements. With C11, these requirements can be inspected with _Alignof (also related: _Alignas, max_align_t, <stdalign.h>).
A portable solution (also for C99) would be to add padding manually, something along the following:
struct mySecondStructure *structure;
size_t secondAlign = sizeof *structure->mySecondField;
size_t offset = (sizeof *structure + secondAlign - 1) / secondAlign
* secondAlign;
size_t total = offset + sizeof *structure->mySecondField;
structure = malloc(total);
structure->mySecondField = (void *)((char *)structure + offset);
With C11,
size_t secondAlign = _Alignof(struct myStructure);
also yields a valid (possibly smaller) result.
Otherwise, I agree with the others' comments and answers that this looks like premature optimization. What perhaps could be done now, however, is thinking about how changes in the allocation strategy may influence other peaces of the code and trying to keep such inter-dependencies to a minimum in order to allow later changes if this turns out to be worth it (e.g. if it should be possible to partially release memory, as covered in Scott's answer).

Structure Elements On The Heap vs The Stack

So, I am creating a structure that currently needs a lot of memory. I hope to reduce it in the future, but for now, it is what it is. Hence, I need to allocate some of its elements on the heap because I get a stack overflow if they are put on the stack. And yes, I increased the stack size but on the target platform I only have so much.
In this case, would it be 'better' to allocate every structure element on the heap, or put some on the stack and the big stuff on the heap? For instance:
typedef struct my_structure_s{
int bounds[2];
int num_values;
int* values; //needs to be very large
} my_structure_t;
Vs:
typedef struct my_structure_s{
int* bounds;
int* num_values;
int* values;
} my_structure_t;
I know 'better' is largely subjective, and could quite possibly incite a riot here. So, what are the pros and cons of both examples? What do you usually do? Why?
Also, forgive the _s, _t stuff...I know some of you may find it in bad taste but that is the convention for the legacy codebase this will be integrated into.
Thanks everyone!

It is better to keep the simple members as direct values, and allocate just the array. Using the extra two pointers just slows down access for no benefit.
One other option to consider if you have C99 or C11 is to use a flexible array member (FAM).
You'd define your structure using the notation:
typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[];
} my_structure_t;
You'd allocate enough memory for the structure and an N-element array in values all in a single operation, using:
my_structure_t *np = malloc(sizeof(*np) + N * sizeof(np->values[0]));
This then means you only have to free one block of memory to free.
You can find references to the 'struct hack' if you search. This notation is effectively the standardized form of the struct hack.
In comments, the discussion continued:
This is an interesting approach; however, I can't guarantee I will have C99.
If need be, you can use the 'struct hack' version of the code, which would look like:
typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[1];
} my_structure_t;
The rest of the code remains unchanged. This uses slightly more memory (4-8 bytes more) than the FAM solution, and isn't strictly supported by the standard, but it was used extensively before the C99 standard so it is unlikely that a compiler would invalidate such code.
Okay, but how about:
typedef struct my_structure_s
{
int bounds[2];
int num_values;
int values[MAX_SIZE];
} my_structure_t;
And then: my_structure_t *the_structure = malloc(sizeof(my_structure_t));
This will also give me a fixed block size on the heap right? (Except here, my block size will be bigger than it needs to be, in some instances, because I won't always get to MAX_SIZE).
If there is not too much wasted space on average, then the fixed-size array in the structure is simpler still. Further, it means that if the MAX_SIZE is not too huge, you can allocate on the stack or on the heap, whereas the FAM approach mandates dynamic (heap) allocation. The issue is whether the wasted space is enough of a problem, and what you do if MAX_SIZE isn't big enough after all. Otherwise, this is much the simplest approach; I simply assumed you'd already ruled it out.
Note that every one of the suggested solutions avoids the pointers to bounds and num_values suggested in option 2 in the question.

do the first one. It is simpler and less error prone (you have to remember to allocate and release more things in the second one)
BTW - not that the first example will not put num_values on the stack. IT will go wherever you allocate the struct, stack, heap of static

What alignment issues limit the use of a block of memory created by malloc?

I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it.
My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this:
void *allocateScratch(size_t rows, size_t columns);
void doCalculation(size_t rows, size_t columns, double *data, void *scratch);
The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required.
In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types.
So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this:
void *allocateScratch(...) {
return malloc(3 * sizeof(double) + 5 * sizeof(int));
}
void doCalculation(..., void *scratch) {
double *dblArray = scratch;
int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double);
}
Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider?
My biggest goals are as follows:
I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required.
I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way.
The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility.
Maybe this issue is addressed in the C standard, but I haven't been able to find it.

The memory of a single malloc can be partitioned for use in multiple arrays as shown below.
Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this:
size_t Offset = 0;
ptrdiff_t OffsetA = Offset; // Put array at current offset.
Offset += NA * sizeof(A); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(B)); // Align sufficiently for type.
ptrdiff_t OffsetB = Offset; // Put array at current offset.
Offset += NB * sizeof(B); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(C)); // Align sufficiently for type.
ptrdiff_t OffsetC = Offset; // Put array at current offset.
Offset += NC * sizeof(C); // Move offset to end of array.
unsigned char *Memory = malloc(Offset); // Allocate memory.
// Set pointers for arrays.
A *pA = Memory + OffsetA;
B *pB = Memory + OffsetB;
C *pC = Memory + OffsetC;
where RoundUp is:
// Return Offset rounded up to a multiple of Size.
size_t RoundUp(size_t Offset, size_t Size)
{
size_t x = Offset + Size - 1;
return x - x % Size;
}
This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.

If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design.
So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to.
If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those...
Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?

The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header).
GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment.
Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions...
And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes).
On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes...
Don't forget that according to Fulton
there is no such thing as portable software, only software that has been ported.
but intptr_t and max_align_t is here to help you....

Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?

The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.

If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).

Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

sizeof sideeffect and allocation location [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
I can not understand why is it like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
char b;
int a;
} A;
typedef struct
{
char b;
} B;
int main() {
A object;
printf("sizeof char is: %d\n",sizeof(char));
printf("sizeof int is: %d\n",sizeof(int));
printf("==> the sizeof both are: %d\n",sizeof(int)+sizeof(char));
printf("and yet the sizeof struct A is: %d\n",sizeof(object));
printf("why?\n");
B secondObject;
printf("pay attention that the sizeof struct B is: %d which is equal to the "
"sizeof char\n",sizeof(secondObject));
return 0;
}
I think I explained my question in the code and there is no more need to explain. besides I have another question:
I know there is allocation on the: heap/static heap/stack, but what is that means that the allocation location is unknown, How could it be ?
I am talking about this example:
typedef struct
{
char *_name;
int _id;
} Entry;
int main()
{
Entry ** vec = (Entry**) malloc(sizeof(Entry*)*2);
vec[0] = (Entry *) malloc(sizeof (Entry));
vec[0]->_name = (char*)malloc(6);
strcpy (vec[0]->_name, "name");
vec[0]->_id = 0;
return 0;
}
I know that:
vec is on the stack.
*vec is on the heap.
*vec[0] is on the heap.
vec[0]->id is on the heap.
but :
vec[0]->_name is unknown
why ?

There is an unspecified amount of padding between the members of a structure and at the end of a structure. In C the size of a structure object is greater than or equal to the sum of the size of its members.

Take a look at this question as well as this one and many others if you search for CPU and memory alignment. In short, CPUs are happier if they access the memory aligned to the size of the data they are reading. For example, if you are reading a uint16_t, then it would be more efficient (on most CPUs) if you read at an address that is a multiple of 2. The details of why CPUs are designed in such a way is whole other story.
This is why compilers come to the rescue and pad the fields of the structures in such a way that would be most comfortable for the CPU to access them, at the cost of extra storage space. In your case, you are probably given 3 byte of padding between your char and int, assuming int is 4 bytes.
If you look at the C standard (which I don't have nearby right now), or the man page of malloc, you will see such a phrase:
The malloc() and calloc() functions return a pointer to the allocated memory
that is suitably aligned for any kind of variable.
This behavior is exactly due to the same reason I mentioned above. So in short, memory alignment is something to care about, and that's what compilers do for you in struct layout and other places, such as layout of local variables etc.

You're running into structure padding here. The compiler is inserting likely inserting three bytes' worth of padding after the b field in struct A, so that the a field is 4-byte aligned. You can control this padding to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, or the aligned attribute on GCC, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As to your second question, I'm unsure what you mean by the name is "unknown". Care to elaborate?

The compiler is free to add padding in structures to ensure that datatypes are aligned properly. For example, an int will be aligned to sizeof(int) bytes. So I expect the output for the size of your A struct is 8. The compiler does this, because fetching an int from an unaligned address is at best inefficient, and at worst doesn't work at all - that depends on the processor that the computer uses. x86 will fetch happily from unaligned addresses for most data types, but will take about twice as long for the fetch operation.
In your second code-snippet, you haven't declared i.
So vec[0]->_name is not unknown - it is on the heap, just like anything else you get from "malloc" (and malloc's siblings).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

R_alloc and alignment considerations - c

Related

Single malloc call for multiple data structures

Structure Elements On The Heap vs The Stack

What alignment issues limit the use of a block of memory created by malloc?

What is there to be gained by deterministic field ordering in the memory layout?

sizeof sideeffect and allocation location [duplicate]

Categories

Resources