Better memory management for GSL gsl_vector - c

When I write linear algebra programs in C++, I use the Armadillo library. It is based on templates, which provide me a way to define vectors of any length that don't necessarily require additional memory allocation, since they are statically assigned an appropriate memory buffer at compile-time. When I use arma::Col<double>::fixed<3> the compiler creates a "new type" on the fly so that the vector contains a buffer of exactly 3 doubles.
Now I'm working on a linear algebra program in C, and I'm using the GNU Scientific Library (GSL). In order to instantiate a 3D vector, I do: gsl_vector_alloc(3) that returns a gsl_vector*. The problem is that this operation causes the dynamic allocation of a small portions of memory, and this happens millions of times during program runtime. My program is wasting a lot of resources to perform tens of millions of malloc/free operations.
I've inspected the internal structure of gsl_vector:
typedef struct
{
size_t size;
size_t stride;
double * data;
gsl_block * block;
int owner;
} gsl_vector;
For the library to work correctly, data should point to the first element of the vector, usually inside a gsl_block structure like this:
typedef struct
{
size_t size;
double * data;
} gsl_block;
which contains another data pointer. So, for instantiating a simple 3D vector, this sequence of mallocs happen:
A gsl_vector structure is malloc'd (something around 40 bytes on x86_64).
A gsl_block structure is malloc'd (16 bytes) and the block pointer of the gsl_vector is set to the memory address of the gsl_block just allocated
An array of 3 doubles is malloc'd and its memory address is assigned to both data pointers (the one in gsl_block and the one in gsl_vector).
I obtained a 40% performance gain by removing two mallocs. I created my custom gsl_vector creation routine, that allocates an array of 3 doubles and sets the data pointer of the gsl_vector to the address of this array. Then I return a gsl_vector (not a pointer).
But doing so, I still get millions of malloc(3 * sizeof(double)) operations.
I didn't manage to "embed" the array of 3 doubles inside the gsl_vector struct, since if the data pointer points to something which is inside the struct itself (hacky!), then the pointer is no longer valid when the vector is copied elsewhere!
Do you have any ideas (apart switching to C++ or rolling my own linear algebra library)? I am open to any suggestion.

It looks to me as if you are misunderstanding the purpose of the gls_block data structure. It seems to me that you should just use it to allocate a large chunk of data in an gsl_block data structure and then split that chunk up for the use in multiple gsl_vector. If you'd allocate all your gsl_vector in one go by allocating an array of them, you are almost there. You just have two calls to malloc and some bookkeeping during initialization.
What this would impose on you, is that you'd have to think precisely in advance which gsl_vector you need. But that is the "price" to pay when you use a language that has no builtin garbage collection. If you invest in that, most of the times it has the advantage of structuring your code, you'll probably learn a lot on how you could organize your computation.

C is a bit too primitive to do this properly.
If you need to use a function from GSL, you might be able to still use C++ Armadillo vectors and matrices with it.
For example, you can obtain a pointer to the memory used by a vector or matrix via the .memptr() member function. This also works for fixed size matrices/vectors.
Alternatively, you can tell Armadillo to use an already allocated block of memory, by giving it a pointer during vector or matrix construction.

Related

Memory Management with pointers

I am developing a C library which has three functions. Init function takes a (void*) pointer to a memory chunk. I need to develop functions to allocate and deallocate memory blocks from that said chunk. What this means is that I have to keep track of which parts of the memory chunk I have allocated and which parts are free. Problem is, the structure I will implement to track the memory also has to be part of said memory chunk. I am not allowed to allocate new memory for my management structure.
And I have no idea how to do that.
Currently, I am planning to designate first few hundred bytes as header space and divide the rest into frames of equal size. I will use header space to create an array which will keep track of which frames are allocated. To do that, I need a way to convert memory address into long int so I can save them into the array and my search so far yielded nothing.
Is there any way to accomplish that?
Failing that is there any other way to implement a management structure in this situation.
To do that, I need a way to convert memory address into long int so I can save them into the array and my search so far yielded nothing.
Is there any way to accomplish that?
Generally, you do not need to convert memory addresses to an integer type merely to keep track of them. Options include:
Work with pointers within the memory chunk you are given, using char * to perform arithmetic.
Subtract the base address of the memory (again with char *) from pointers within it to get offsets of type ptrdiff_t (defined in <stddef.h>) and use those.
Convert the addresses to the integer type uintptr_t (defined in <stdint.h>). Unlike the other options, this has implementation-dependent behavior. In common C implementations, the result of conversion will be a simple memory address that you can perform arithmetic on as expected. But, in some C implementations, the result will be more complicated, so the code will not be fully portable.

C efficient way to manage function temporary dynamic memory

I'm making an Nondeterministic Finite Automata (NFA). An NFA has a set of states, I need four array with the same size(number of states in the NFA) to record temporary information about states during the simulation of the NFA.
However, different NFAs has different number of states, thus the size of arrays varies with different NFAs.
Using C I have come up with three ways to deal with the memory allocation:
Use a very big fix-sized array.
malloc memroy dynamically every time the function is called, and before the function finishes, free the allocated memory
Use malloc but not allocate memory on every function call, use four static pointer variables, and one static int arraySize to record allocated array size, for the first time when the function is called, allocate arrays of size NFA->numStates and assign to each of the static pointers, assign NFA->numStates to static int arraySize; for the second time, if NFA->numStates is less than or equal to arraySize, no memory allocation; if NFA->numStates is greater than arraySize, free previous memory and realloc arrays of size NFA->numStates.
Method 1 uses fix-sized array, which means when the input NFA->numStates is greater than the hard-coded array size, the function will not work.
Method 2 is adaptable, but need to allocate memory every time the function is called, not so efficient ?
Method 3 is adaptable too, but is complicated, and is not thread safe ?
Suggestions or other alternatives ?
How about option 2 but with alloca(), i.e. allocate on the stack? It's much (much) faster than malloc(), and automatically de-allocates when you leave the scope which sounds like what you want.
Of course, it's not standard or portable, so it might not be applicable for you.
Failing that, a big fixed-size array seems easy enough and doesn't use more memory.
Variable-Length Arrays (VLAs) have been available since C99. I'd be impressed if you are still working with an implementation that does not support such a thing. VLAs work like regular arrays, except that the size you use to declare them is determined at runtime. That seems to be what you're looking for.
As you are dealing with arrays of arbitrary size I suggest you to implement simple linked list structure that at initialization will hold an array of predefined size and can be extended to hold additional chunks of memory. This will be an abstraction over contiguous memory space. You just need to store the current size of this structure in parent container.
#include <stddef.h>
struct contmem
{
size_t size;
struct memchunk
{
void *data;
struct memchunk *next, *prev;
} head;
}
So, you can extend this structure when you have to and store information by iterating over elements of linked list. This is similar to what happens inside lists from standard C++ library and differs from straight memory reallocation.

Array of pointer in CUDA

Is it possible to pass an array of pointers to a cuda kernel?
i am looking for something like this:
__global__ void Kernel(int **arr)
{
int *temp = arr[blockDim.x];
temp[blockIdx.x] = blockIdx.x;
}
How can i allocate cuda memory for such structure?
Memory allocation for such array is not a problem, you'll do this by cudaMalloc(sizeof(void*)*SIZE). However, writing correct values into it is main problem. Only way to change values in device memory from host function is actually copying information from host memory to device memory (cudaMemcpy() or cudaMemcpyToSymbol()). Thus, to write device pointers into device memory, we must have pointer to device memory in host memory, which I don't think is possible. (pointer which is stored in host variables allocated by cudaMalloc() isn't actual pointer in device memory). So, the only way to write correct values in the array is from kernel, which makes array of pointers unconvenient.
I suggest using indexes instead of pointers, it is much better. Basically if in your array of indexes you have written {4,3,0,1,2} it means that first element points to some array in index 4, second one - to the 3rd element and so on. If you want to point multiple arrays, you should make indexing by some rule, in which you will fill the array of indexes and in which you will access memory from kernel.
I'm doing some image processing work in CUDA currently, and I recommend that you just allocate a linear memory buffer and use an indexing scheme rather than dealing with arrays of pointers. It's way, way simpler in my experience. My 2c.

structures containing structures vs. structures containing pointers

The following question is in regards to C programming. I am using Microchip C30 compiler (because I know someone will ask)
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures? Does one make for faster code execution than the other? Does one technique use more or less memory? Does the memory get allocated at the same time in both cases?
If I use the following code does memory automatically get allocated for the subStruct?
// Header file...
typedef struct{
int a;
subStruct * b;
} mainStruct;
typedef struct{
int c;
int d;
}subStruct;
extern mainStruct myMainStruct;
// source file...
mainStruct myMainStruct;
int main(void)
{
//...
{
If you use a pointer, you have to allocate the memory yourself. If you use a substructure, you can allocate the entire thing in one go, either using malloc or on the stack.
What you need depends on your use case:
Pointers will give you smaller struct's
Substructures provide better locality of reference
A pointer may point to either a single struct or the first member in an array of them, while substructures are self-documenting: there's always one of them unless you use an array explicitly
Pointers take up some space, for the pointer itself + overhead from extra memory allocations
And no, it doesn't matter which compiler you use :)
Memory for pointers doesn't get automatically allocated, but when you contain whole structure in your struct, it does.
Also - with pointers you are likely to have fragmented memory - each pointed part of tructure could be in other part of memory.
But with poniters you can share the same substructures across many structs (but this makes changing and deleting them later harder).
Memory for a pointer wouldn't be automatically allocated. You would need to run:
myMainStruct.b=malloc(sizeof(*myMainStruct.b));
In terms of performance, there is likely a small hit to going from one structure to another via the pointer.
As far as speed goes, it varies. Generally, including structs, rather than pointers, will be faster, because the CPU doesn't have to dereference the pointer for every member access. However, if some of the members aren't used very often, and the sub-struct's size is massive, the structure might not fit in the cache and this can slow down your code quite a bit.
Using pointers will use /slightly/ more memory (but only the size of the pointers themselves) than the direct approach.
Usually with pointers to sub-structs you'll allocate the sub-structs separately, but you can write some kind of initialization function which abstracts all the allocation out to "the same time." In your code, memory is allocated for myMainStruct on the stack, but the b member will be garbage. You need to call malloc to allocate heap memory for b, or create a subStruct object on the stack and point myMainStruct.b to it.
What is the difference between having a structure which contains several other structures vs a structure which contains several pointers to other structures?
In the first case what you have is essentially one big structure in contiguous memory. In the "pointers to structures" case your master structure just contains the addresses to the sub-structures which are allocated separately.
Does one make for faster code execution than the other?
The difference should be negligible, but pointers method is will be slightly slower. This is because you must dereference the pointer with each access to the substructure.
Does one technique use more or less memory?
The pointer method uses number_of_pointers * sizeof(void*) more memory. sizeof(void*) will be 4 for 32-bit and 8 for 64-bit.
Does the memory get allocated at the same time in both cases?
No, you need to go through each pointer in your master struct and allocate memory for the sub-structs via malloc().
Conclusion
The pointers add a layer of indirection to the code, which is useful for switching out the sub-structs or having more than one pointer point to the same sub-struct. Having different master-structs pointing to common sub-structs in particular could save quite a bit of memory and allocation time.

Why does C need arrays if it has pointers?

If we can use pointers and malloc to create and use arrays, why does the array type exist in C? Isn't it unnecessary if we can use pointers instead?
Arrays are faster than dynamic memory allocation.
Arrays are "allocated" at "compile time" whereas malloc allocates at run time. Allocating takes time.
Also, C does not mandate that malloc() and friends are available in free-standing implementations.
Edit
Example of array
#define DECK_SIZE 52
int main(void) {
int deck[DECK_SIZE];
play(deck, DECK_SIZE);
return 0;
}
Example of malloc()
int main(void) {
size_t len = 52;
int *deck = malloc(len * sizeof *deck);
if (deck) {
play(deck, len);
}
free(deck);
return 0;
}
In the array version, the space for the deck array was reserved by the compiler when the program was created (but, of course, the memory is only reserved/occupied when the program is being run), in the malloc() version, space for the deck array has to be requested at every run of the program.
Arrays can never change size, malloc'd memory can grow when needed.
If you only need a fixed number of elements, use an array (within the limits of your implementation).
If you need memory that can grow or shrink during the running of the program, use malloc() and friends.
It's not a bad question. In fact, early C had no array types.
Global and static arrays are allocated at compile time (very fast). Other arrays are allocated on the stack at runtime (fast). Allocating memory with malloc (to be used for an array or otherwise) is much slower. A similar thing is seen in deallocation: dynamically allocated memory is slower to deallocate.
Speed is not the only issue. Array types are automatically deallocated when they go out of scope, so they cannot be "leaked" by mistake. You don't need to worry about accidentally freeing something twice, and so on. They also make it easier for static analysis tools to detect bugs.
You may argue that there is the function _alloca() which lets you allocate memory from the stack. Yes, there is no technical reason why arrays are needed over _alloca(). However, I think arrays are more convenient to use. Also, it is easier for the compiler to optimise the use of an array than a pointer with an _alloca() return value in it, since it's obvious what a stack-allocated array's offset from the stack pointer is, whereas if _alloca() is treated like a black-box function call, the compiler can't tell this value in advance.
EDIT, since tsubasa has asked for more details on how this allocation occurs:
On x86 architectures, the ebp register normally refers to the current function's stack frame, and is used to reference stack-allocated variables. For instance, you may have an int located at [ebp - 8] and a char array stretching from [ebp - 24] to [ebp - 9]. And perhaps more variables and arrays on the stack. (The compiler decides how to use the stack frame at compile time. C99 compilers allow variable-size arrays to be stack allocated, this is just a matter of doing a tiny bit of work at runtime.)
In x86 code, pointer offsets (such as [ebp - 16]) can be represented in a single instruction. Pretty efficient.
Now, an important point is that all stack-allocated variables and arrays in the current context are retrieved via offsets from a single register. If you call malloc there is (as I have said) some processing overhead in actually finding some memory for you. But also, malloc gives you a new memory address. Let's say it is stored in the ebx register. You can't use an offset from ebp anymore, because you can't tell what that offset will be at compile time. So you are basically "wasting" an extra register that you would not need if you used a normal array instead. If you malloc more arrays, you have more "unpredictable" pointer values that magnify this problem.
Arrays have their uses, and should be used when you can, as static allocation will help make programs more stable, and are a necessity at times due to the need to ensure memory leaks don't happen.
They exist because some requirements require them.
In a language such as BASIC, you have certain commands that are allowed, and this is known, due to the language construct. So, what is the benefit of using malloc to create the arrays, and then fill them in from strings?
If I have to define the names of the operations anyway, why not put them into an array?
C was written as a general purpose language, which means that it should be useful in any situation, so they had to ensure that it had the constructs to be useful for writing operating systems as well as embedded systems.
An array is a shorthand way to specify pointing to the beginning of a malloc for example.
But, imagine trying to do matrix math by using pointer manipulations rather than vec[x] * vec[y]. It would be very prone to difficult to find errors.
See this question discussing space hardening and C. Sometimes dynamic memory allocation is just a bad idea, I have worked with C libraries that are completely devoid of malloc() and friends.
You don't want a satellite dereferencing a NULL pointer any more than you want air traffic control software forgetting to zero out heap blocks.
Its also important (as others have pointed out) to understand what is part of C and what extends it into various uniform standards (i.e. POSIX).
Arrays are a nice syntax improvement compared to dealing with pointers. You can make all sorts of mistakes unknowingly when dealing with pointers. What if you move too many spaces across the memory because you're using the wrong byte size?
Explanation by Dennis Ritchie about C history:
Embryonic C
NB existed so briefly that no full description of it was written. It supplied the types int and char, arrays of them, and pointers to them, declared in a style typified by
int i, j;
char c, d;
int iarray[10];
int ipointer[];
char carray[10];
char cpointer[];
The semantics of arrays remained exactly as in B and BCPL: the declarations of iarray and carray create cells dynamically initialized with a value pointing to the first of a sequence of 10 integers and characters respectively. The declarations for ipointer and cpointer omit the size, to assert that no storage should be allocated automatically. Within procedures, the language's interpretation of the pointers was identical to that of the array variables: a pointer declaration created a cell differing from an array declaration only in that the programmer was expected to assign a referent, instead of letting the compiler allocate the space and initialize the cell.
Values stored in the cells bound to array and pointer names were the machine addresses, measured in bytes, of the corresponding storage area. Therefore, indirection through a pointer implied no run-time overhead to scale the pointer from word to byte offset. On the other hand, the machine code for array subscripting and pointer arithmetic now depended on the type of the array or the pointer: to compute iarray[i] or ipointer+i implied scaling the addend i by the size of the object referred to.
These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
int inumber;
char name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?
The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.
To summarize in my own words - if name above were just a pointer, any of that struct would contain an additional pointer, destroying the perfect mapping of it to an external object (like an directory entry).

Resources