Heap management with Eigen::Map : Is that user or Eigen::Matrix responsability? - heap-memory

Who does handle the heap unallocation when Eigen::Map is used with a heap memory segment to create a MAtrix ?
I couldn't find any info concerning the internal Matrix data memory segment management when Eigen::Map is invoked to build a Matrix.
Here is the doc I went through : https://eigen.tuxfamily.org/dox/group__TutorialMapClass.html
Should I handle the memory segment deletion when I'am done with my Matrix "mf" in the code below ?
int rows(3), cols (3);
scomplex *m1Data = new scomplex[rows*cols];
for (int i = 0; i<cols*rows; i++)
{
m1Data[i] = scomplex( i, 2*i);
}
Map<MatrixXcf> mf(m1Data, rows, cols);
By now, if I settle a breakpoint in the function (./Eigen/src/core/util/Memory.h) :
EIGEN_DEVICE_FUNC inline void aligned_free(void *ptr)
it's not triggered when the main exits.
May I ask you whether I should considere that I must delete the memory segment when I don't use my matrix anymore ?
Cheers
Sylvain

The Map object does not take ownership/responsibility of the memory that is passed to it. It could just be a view into another matrix. In that case, you definitely would not want it to release the memory.
To quote the tutorial page you linked:
Occasionally you may have a pre-defined array of numbers that you want to use within Eigen as a vector or matrix. While one option is to make a copy of the data, most commonly you probably want to re-use this memory as an Eigen type.
So, bottom line, you have to delete the memory you allocated and used with the Map.

Related

Examples or documentation for custom storage allocator in c?

I allocate a big region of memory lets say x of 1000 bytes.
// I am using c language and all of this is just pseudo code(function prototypes mostly) so far.
pointer = malloc( size(1000 units) ); // this pointer points to region of memory we created.
now we select this region by a pointer and allocate memory inside it to smaller blocks like
void *allocate_from_region( size_of_block1(300) ); //1000-300=700 (left free)
void *allocate_from_region( size_of_block2(100) ); //700-100 =600
void *allocate_from_region( size_of_block3(300) ); //600-300 =300
void *allocate_from_region( size_of_block4(100) ); //300-100 =200
void *allocate_from_region( size_of_block5(150) ); //200-150 =50
// here we almost finished space we have in region (only 50 is left free in region)
boolean free_from_region(pointer_to_block2); //free 100 more
//total free = 100+50 but are not contiguous in memory
void *allocate_from_region( size_of_block6(150) ); // this one will fail and gives null as it cant find 150 units memory(contiguous) in region.
boolean free_from_region(pointer_to_block3); // this free 300 more so total free = 100+300+50 but contiguous free is 100+300 (from block 2 and 3)
void *allocate_from_region( size_of_block6(150); // this time it is successful
Are there any examples about how to manage memory like this?
So far I have only did examples where I can allocate blocks next to each other in a region of memory and and end it once I ran out of memory inside the region.
But how to search for blocks which are free inside the region and then check if enough contiguous memory is available.
I am sure there should be some documentation or examples in c which shows how to do it.
Sure. What you are proposing is more-or-less exactly what some malloc implementations do. They maintain a "free list". Initially the single large block is on this list. When you make a request, the algorithm to allocate n bytes is:
search the free list to find a block at B of size m >= n
Remove B from the free list.
Return the block from B+n through B+m-1 (size m-n) to the free list (unless m-n==0)
Return a pointer to B
To free a block at B of size n, we must put it back on the free list. However this isn't the end. We must also "coalesce" it with adjacent free blocks, if any, either above or below or both. This is the algorithm.
Let p = B; m = n; // pointer to base and size of block to be freed
If there is a block of size x on the free list and at the address B + n,
remove it, set m=m+x. // coalescing block above
If there is a block of size y on the free list and at address B - y,
remove it and set p=B-y; m=m+y; // coalescing block below
Return block at p of size m to the free list.
The remaining question is how to set up the free list to make it quick to find blocks of the right size during allocation and to find adjacent blocks for coalescing during free operations. The simplest way is a singly linked list. But there are many possible alternatives that can yield better speed, usually at some cost of additional space for data structures.
Additionally there is the choice of which block to allocate when more than one is big enough. The usual choices are "first fit" and "best fit". For first fit, just take the first one discovered. Often the best technique is (rather than starting at the lowest addresses every time) to remember the free block after the one just allocated and use this as a starting point for the next search. This is called "rotating first fit."
For best, fit, traverse as many block as necessary to find the one that most closely matches the size requested.
If allocations are random, first fit actually performs a bit better than best fit in terms of memory fragmentation. Fragmentation is the bane of all non-compacting allocators.

Allocate from buffer in C

I am building a simple particle system and want to use a single array buffer of structs to manage my particles. That said, I can't find a C function that allows me to malloc() and free() from an arbitrary buffer. Here is some pseudocode to show my intent:
Particle* particles = (Particle*) malloc( sizeof(Particle) * numParticles );
Particle* firstParticle = <buffer_alloc>( particles );
initialize_particle( firstParticle );
// ... Some more stuff
if (firstParticle->life < 0)
<buffer_free>( firstParticle );
// # program's end
free(particles);
Where <buffer_alloc> and <buffer_free> are functions that allocate and free memory chunks from arbitrary pointers (possibly with additional metadata such as buffer length, etc.). Do such functions exist and/or is there a better way to do this? Thank you!
Yeah, you’d have to write your own. It’s so simple it’s really silly, but its performance will scream in comparison to simply using malloc() and free() all the time....
static const int maxParticles = 1000;
static Particle particleBuf[maxParticles]; // global static array
static Particle* headParticle;
void initParticleAllocator()
{
Particle* p = particleBuf;
Particle* pEnd = &particleBuf[maxParticles-1];
// create a linked list of unallocated Particles
while (p!=pEnd)
{
*((Particle**)p) = p+1;
++p;
}
*((Particle**)p) = NULL; // terminate the end of the list
headParticle = particleBuf; // point 'head' at the 1st unalloc'ed one
}
Particle* ParticleAlloc()
{
// grab the next unalloc'ed Particle from the list
Particle* ret = headParticle;
if (ret)
headParticle = *(Particle**)ret;
return ret; // will return NULL if no more available
}
void ParticleFree(Particle* p)
{
// return p to the list of unalloc'ed Particles
*((Particle**)p) = headParticle;
headParticle = p;
}
You could modify the approach above to not start with any global static array at all, and use malloc() at first when the user calls ParticleAlloc(), but when Particles are returned, don't call free() but instead add the returned ones to the linked list of unalloc'ed particles. Then the next caller to ParticleAlloc() will get one off the list of free Particles rather than use malloc(). Any time there are no more on the free list, your ParticleAlloc() function could then fall back on malloc(). Or use a mix of the two strategies, which would really be the best of both worlds: If you know that your user will almost certainly be using at least 1000 Particles but occasionally might need more, you could start with a static array of 1000 and fall back on calling malloc() if you run out. If you do it that way, the malloc()'ed ones do not need special handling; just add them to your list of unalloc'ed Particles when they come back to ParticleFree(). You do NOT need to bother calling free() on them when your program exits; the OS will free the process'es entire memory space, so any leaked memory will clear up at that point.
I should mention that since you question was tagged "C" and not "C++", I answered it in the form of a C solution. In C++, the best way to implement this same thing would be to add "operator new" and "operator delete" methods to your Particle class. They would contain basically the same code as I showed above, but they override (not overload) the global 'new' operator and, for the Particle class only, define a specialized allocator that replaces global 'new'. The cool thing is that users of Particle objects don't even have to know that there's a special allocator; they simply use 'new' and 'delete' as normal and remain blissfully unaware that their Particle objects are coming from a special pre-allocated pool.
Oh, sorry. This question is C only I see. Not C++. Well, if it was C++ the following would help you out.
Look at Boost's pool allocation library.
It sounds to me that each of your allocations is the same size? The size of a particle, correct? If so the pool allocation functions from Boost will work really well and you don't have to write your own.
You would have to write your own, or find someone who has already written them and reuse what they wrote. There isn't a standard C library to manage that scenario, AFAIK.
You'd probably need 4 functions for your 'buffer allocation' code:
typedef struct ba_handle ba_handle;
ba_handle *ba_create(size_t element_size, size_t initial_space);
void ba_destroy(ba_handle *ba);
void *ba_alloc(ba_handle *ba);
void ba_free(ba_handle *ba, void *space);
The create function would do the initial allocation of space, and arrange to parcel out the information in units of the element_size. The returned handle allows you to have separate buffer allocations for different types (or even for the same type several times). The destroy function forcibly releases all the space associated with the handle.
The allocate function provides you with a new unit of space for use. The free function releases that for reuse.
Behind the scenes, the code keeps track of which units are in use (a bit map, perhaps) and might allocate extra space as needed, or might deny space when the initial allocation is used up. You could arrange for it to fail more or less dramatically when it runs out of space (so the allocator never returns a null pointer). Clearly, the free function can validate that the pointer it is given was one it supplied by the buffer allocator handle that is currently in use. This allows it to detect some errors that regular free() does not normally detect (though the GNU C library version of malloc() et al does seem to do some sanity checking that others do not necessarily do).
Maybe try something like this instead...
Particle * particles[numParticles];
particles[0] = malloc(sizeof(Particle));
initialize_particle( particle[0] );
// ... Some more stuff
if (particle[0]->life < 0)
free( particle[0] );
// # program's end
// don't free(particles);
I am building a simple particle system and want to use a single array buffer of structs to manage my particles.
I think you answered it:
static Particle myParticleArray[numParticles];
Gets allocated at the start of the program and deallocated at the end, simple. Or do like your pseudocode and malloc the array all at once. You might ask yourself why allocate a single particle, why not allocate the whole system? Write your API functions to take a pointer to a particle array and an index.

When is CUDA's __shared__ memory useful?

Can someone please help me with a very simple example on how to use shared memory? The example included in the Cuda C programming guide seems cluttered by irrelevant details.
For example, if I copy a large array to the device global memory and want to square each element, how can shared memory be used to speed this up? Or is it not useful in this case?
In the specific case you mention, shared memory is not useful, for the following reason: each data element is used only once. For shared memory to be useful, you must use data transferred to shared memory several times, using good access patterns, to have it help. The reason for this is simple: just reading from global memory requires 1 global memory read and zero shared memory reads; reading it into shared memory first would require 1 global memory read and 1 shared memory read, which takes longer.
Here's a simple example, where each thread in the block computes the corresponding value, squared, plus the average of both its left and right neighbors, squared:
__global__ void compute_it(float *data)
{
int tid = threadIdx.x;
__shared__ float myblock[1024];
float tmp;
// load the thread's data element into shared memory
myblock[tid] = data[tid];
// ensure that all threads have loaded their values into
// shared memory; otherwise, one thread might be computing
// on unitialized data.
__syncthreads();
// compute the average of this thread's left and right neighbors
tmp = (myblock[tid > 0 ? tid - 1 : 1023] + myblock[tid < 1023 ? tid + 1 : 0]) * 0.5f;
// square the previousr result and add my value, squared
tmp = tmp*tmp + myblock[tid] * myblock[tid];
// write the result back to global memory
data[tid] = tmp;
}
Note that this is envisioned to work using only one block. The extension to more blocks should be straightforward. Assumes block dimension (1024, 1, 1) and grid dimension (1, 1, 1).
Think of shared memory as an explicitly managed cache - it's only useful if you need to access data more than once, either within the same thread or from different threads within the same block. If you're only accessing data once then shared memory isn't going to help you.

How to properly free memory used by matrices?

I'm having a conceptual problem in OpenCV
I have the following function:
void project_on_subspace(CvMat * projectionResult_img)
{
[...]
projectionResult_img = cvReshape( projectionResult_line_normalised_centered, projectionResult_img, 0, 100 );
}
Basically I'm returning a square matrix as a result of my function.
The problem is that the actual data of my matrix is stored in "projectionResult_line_normalised_centered" (if I understood how open CV works), which means that trying to use CvReleaseMat(projectionResult_img) later in my code to free the memory will not work, as the real matrix data is elsewhere.
Is there any proper way to release the actual matrix data WITHOUT also dealing with a pointer to "projectionResult_line_normalised_centered" ?
Thanks
No, there is no other way than to keep a pointer to the result matrix (projectionResult_line_normalised_centered) around in a variable or struct member.

how is dynamic memory allocation better than array?

int numbers*;
numbers = malloc ( sizeof(int) * 10 );
I want to know how is this dynamic memory allocation, if I can store just 10 int items to the memory block ? I could just use the array and store elemets dynamically using index. Why is the above approach better ?
I am new to C, and this is my 2nd day and I may sound stupid, so please bear with me.
In this case you could replace 10 with a variable that is assigned at run time. That way you can decide how much memory space you need. But with arrays, you have to specify an integer constant during declaration. So you cannot decide whether the user would actually need as many locations as was declared, or even worse , it might not be enough.
With a dynamic allocation like this, you could assign a larger memory location and copy the contents of the first location to the new one to give the impression that the array has grown as needed.
This helps to ensure optimum memory utilization.
The main reason why malloc() is useful is not because the size of the array can be determined at runtime - modern versions of C allow that with normal arrays too. There are two reasons:
Objects allocated with malloc() have flexible lifetimes;
That is, you get runtime control over when to create the object, and when to destroy it. The array allocated with malloc() exists from the time of the malloc() call until the corresponding free() call; in contrast, declared arrays either exist until the function they're declared in exits, or until the program finishes.
malloc() reports failure, allowing the program to handle it in a graceful way.
On a failure to allocate the requested memory, malloc() can return NULL, which allows your program to detect and handle the condition. There is no such mechanism for declared arrays - on a failure to allocate sufficient space, either the program crashes at runtime, or fails to load altogether.
There is a difference with where the memory is allocated. Using the array syntax, the memory is allocated on the stack (assuming you are in a function), while malloc'ed arrays/bytes are allocated on the heap.
/* Allocates 4*1000 bytes on the stack (which might be a bit much depending on your system) */
int a[1000];
/* Allocates 4*1000 bytes on the heap */
int *b = malloc(1000 * sizeof(int))
Stack allocations are fast - and often preferred when:
"Small" amount of memory is required
Pointer to the array is not to be returned from the function
Heap allocations are slower, but has the advantages:
Available heap memory is (normally) >> than available stack memory
You can freely pass the pointer to the allocated bytes around, e.g. returning it from a function -- just remember to free it at some point.
A third option is to use statically initialized arrays if you have some common task, that always requires an array of some max size. Given you can spare the memory statically consumed by the array, you avoid the hit for heap memory allocation, gain the flexibility to pass the pointer around, and avoid having to keep track of ownership of the pointer to ensure the memory is freed.
Edit: If you are using C99 (default with the gnu c compiler i think?), you can do variable-length stack arrays like
int a = 4;
int b[a*a];
In the example you gave
int *numbers;
numbers = malloc ( sizeof(int) * 10 );
there are no explicit benefits. Though, imagine 10 is a value that changes at runtime (e.g. user input), and that you need to return this array from a function. E.g.
int *aFunction(size_t howMany, ...)
{
int *r = malloc(sizeof(int)*howMany);
// do something, fill the array...
return r;
}
The malloc takes room from the heap, while something like
int *aFunction(size_t howMany, ...)
{
int r[howMany];
// do something, fill the array...
// you can't return r unless you make it static, but this is in general
// not good
return somethingElse;
}
would consume the stack that is not so big as the whole heap available.
More complex example exists. E.g. if you have to build a binary tree that grows according to some computation done at runtime, you basically have no other choices but to use dynamic memory allocation.
Array size is defined at compilation time whereas dynamic allocation is done at run time.
Thus, in your case, you can use your pointer as an array : numbers[5] is valid.
If you don't know the size of your array when writing the program, using runtime allocation is not a choice. Otherwise, you're free to use an array, it might be simpler (less risk to forget to free memory for example)
Example:
to store a 3-D position, you might want to use an array as it's alwaays 3 coordinates
to create a sieve to calculate prime numbers, you might want to use a parameter to give the max value and thus use dynamic allocation to create the memory area
Array is used to allocate memory statically and in one go.
To allocate memory dynamically malloc is required.
e.g. int numbers[10];
This will allocate memory statically and it will be contiguous memory.
If you are not aware of the count of the numbers then use variable like count.
int count;
int *numbers;
scanf("%d", count);
numbers = malloc ( sizeof(int) * count );
This is not possible in case of arrays.
Dynamic does not refer to the access. Dynamic is the size of malloc. If you just use a constant number, e.g. like 10 in your example, it is nothing better than an array. The advantage is when you dont know in advance how big it must be, e.g. because the user can enter at runtime the size. Then you can allocate with a variable, e.g. like malloc(sizeof(int) * userEnteredNumber). This is not possible with array, as you have to know there at compile time the (maximum) size.

Resources