How to manipulate dynamic multidimensional-array in C - c

I need to alloc a large multidimensional-array as char a[x][32][y], and x*32*y is about 6~12G. (x, y are detenmined at runtime.)
I think out a way that is to do char *a=malloc(x*32*y), and use *(a+32*y*i+y*j+k) for a[i][j][k].
However, this looks not so convient comparing to a[i][j][k].
Is there any better way ?
Added:
It is a[x][32][datlen], where datlen is detenmined at runtime and x is set considering the memory.
The whole data in the array will be new. And I have got mathines with 16 or 32GB memory to run it.

INCORRECT: You should still be able to use a[i][j][k] syntax when referencing dynamically allocated memory.
CORRECT: Use a macro to at least make the job easier
#define A(i,j,k) *(a+32*y*i+y*j+k)
A(1,2,3) would then do the right thing.

I doubt you'll find a system which will allocate you contiguous memory that large*. You're going to have to utilize a chunking strategy of some kind.
You need to ask, "What is your data access pattern?"
If it is some stride (be it 1D or 2D), use that to choose an appropriate allocation of memory for each chunk. Use a data structure to represent each stride (might just be a struct containing your character arrays).
Edit: I didn't notice your second "question" about accessing your newly found 12G contiguous chunk of memory using a[i][j][k] syntax. That isn't going to happen in any consumer grade C distribution I'm aware of.
(*) and 640k ought to be enough memory for anyone.

Since this is C you cannot wrap everything up into a handy C++ object.
But I would do something similar. Design a series of functions that allocate, manipulate and destroy this new data type of yours.
To read or write a piece of the data, call a function. Never touch the data directly. In fact, if you can use a void* handle to your data and not even put the real data types in an included header file, that's the best thing to do.
With this, you can define the functions as operating on one very large memory block, a set of large memory blocks or even an on-disk database of blocks.
Now that I wrote that, let me partly take it back. If you need more performance you might want to define all of the functions in the included header file as inline definitions. That will let your compiler remove almost all the function call overhead and optimize aggressively.
I admit that matrix_set(x, y, z, value) is not as pretty as matrix[x][y][z] = value, but it will work just as well.

Related

c malloc functionality for custom memory region

Is there any malloc/realloc/free like implementation where i can specify a memory region where to manage the memory allocation?
I mean regular malloc (etc.) functions manages only the heap memory region.
What if I need to allocate some space in a shared memory segment or in a memory mapped file?
Not 100 %, As per your question you want to maintain your own memory region. so you need to go for your own my_malloc, my_realloc and my_free
Implementing your own my_malloc may help you
void* my_malloc(int size)
{
char* ptr = malloc(size+sizeof(int));
memcpy(ptr, &size, sizeof(int));
return ptr+sizeof(int);
}
This is just a small idea, full implementation will take you to the
answer.
Refer this question
use the same method to achieve my_realloc and my_free
I asked myself this question recently too, because I wanted a malloc implementation for my security programs which could safely wipe out a static memory region just before exit (which contains sensitive data like encryption keys, passwords and other such data).
First, I found this. I thought it could be very good for my purpose, but I really could not understand it's code completely. The license status was also unclear, as it is very important for one of my projects too.
I ended up writing my own.
My own implementation supports multiple heaps at same time, operating over them with pool descriptor structure, automatic memory zeroing of freed blocks, undefined behavior and OOM handlers, getting exact usable size of allocated objects and testing that pointer is still allocated, which is very sufficient for me. It's not very fast and it is more educational grade rather than professional one, but I wanted one in a hurry.
Note that it does not (yet) knows about alignment requirements, but at least it returns an address suitable for storing an 32 bit integer.
Iam using Tasking and I can store data in a specific space of memory. For example I can use:
testVar _at(0x200000);
I'm not sure if this is what you are looking for, but for example I'am using it to store data to external RAM. But as far as I know, it's only workin for global variables.
It is not very hard to implement your own my_alloc and my_free and use preferred memory range. It is simple chain of: block size, flag free/in use, and block data plus final-block marker (e.g. block size = 0). In the beginning you have one large free block and know its address. Note that my_alloc returns the address of block data and block size/flag are few bytes before.

Pointer Meta Information

An interesting feature of realloc() is that it somehow knows how long your data is when it is copying or extending your allocated memory.
I read that what happens is that behind the scenes there is some meta information stored about a pointer (which contains it's allocated memory size), usually immediately before the address the pointer is pointing at (but of course, subject to implementation).
So my question is, if there is such data stored why isn't it exposed via an API, so things like the C string for example, won't have to look for a \0 to know where the end of a string is.
There could be hundreds of other uses as well.
As you said yourself, it's subject to the implementation of the standard libraries' memory manager.
C doesn't have a standard for this type of functionality, probably because generally speaking C was designed to be simple (but also provide a lot of capabilities).
With that said, there isn't really much use for this type of functionality outside of features for debugging memory allocation.
Some compilers do provide this type of functionality (specifically memory block size), but it's made pretty clear that it's only for the purpose of debugging.
It's not uncommon to write your own memory manager, allowing you to have full control over allocations, and even make assumptions for other library components. As you mentioned, your implementation can store the size of an allocation somewhere in a header, and a string implementation can reference that value rather than walking the bytes for \0 termination.
This information is not always accurate.
On Windows, the size is rounded up to a multiple of at least 8.
realloc does fine with this because it cares only about the allocated size, but you won't get the requested size.
Here it's worth understanding how allocators work a bit. For example, if you take a buddy allocator, it allocates chunks in predetermined sizes. For simplicity, let's say powers of 2 (real world allocators usually use tighter sizes).
In this kind of simplified case, if you call malloc and request 19 bytes of memory for your personal use, then the allocator is actually going to give you a 32-byte chunk worth of memory. It doesn't care that it gave you more than you needed, as that still satisfies your basic request by giving you something as large or larger than what you needed.
These chunk sizes are usually stored somewhere, somehow, for a general-purpose allocator that can handle variable-sized requests to be able to free chunks and do things like merge free chunks together and split them apart. Yet they're identifying the chunk size, not the size of your request. So you can't use that chunk size to see where a string should end, e.g., since according to you, you wanted one with 19 bytes, not 32 bytes.
realloc also doesn't care about your requested 19-byte size. It's working at the level of bits and bytes, and so it copies the whole chunk's worth of memory to a new, larger chunk if it has to.
So these kinds of chunk sizes generally aren't useful for implementing things like data structures. For these, you want to be working at the data size proportional to the amount of memory you requested which is something many allocators don't even bother to store (they don't need to in order to work given their efficiency needs). So it's often up to you to keep track of that size which is in tune with the logic of your software in some form (a sentinel like a null terminator is one form).
As for why querying these sizes is not available in the standard, it's probably because the need to know it would be a rather obscure, low-level kind of need given that it has nothing to do with the amount of memory you actually requested to work with. I have often wished for it for some very low-level debugging needs, but there are platform/compiler-specific alternatives that I have used instead, and I found they weren't quite as handy as I thought they would be (even for just low-level debugging).
You asked:
if there is such data stored why isn't it exposed via an API, so things like the C string for example, won't have to look for a \0 to know where the end of a string is.
You can allocate enough memory with the ability to hold 100 characters but you may use that memory to hold a string that is only 5 characters long. If you relied on the pointer metadata, you will get the wrong result.
char* cp = malloc(100);
strcpy(cp, "hello");
The second reason you would need the terminating null character is when you use stack memory to create a string.
char str[100] = "hello";
Without the terminating null character, you won't be able determine the length of the string held in str.
You said:
There could be hundreds of other uses as well.
I don't have a good response to that other than to say that custom memory allocators often provide access to the metadata. The reason for not including such APIs in the standard library is not obvious to me.

Best way to expand dynamic memory in C

I'm looking for a way to allocate additional memory (in C) at runtime, for an existing structure (that already had its memory assigned initially). I have a feeling I might be able to use memmove or something similar but that's still just a copy operation, and doesn't increase the amount of memory available to a structure at runtime. Also I don't want to have to copy the entire structure every time I need to do this, which will be many hundreds of times during the program (the structure is already huge). Can anyone help?
UPDATE: Thanks everyone for the replies. To give more detail, what I am trying to do is run an MPI-parallelised code that creates many instances of the structure (call it 'S') initially. Each instance of the structure contains an array 'T' which records the time of a particular event happening as the code is run. These events occur at runtime, and the number of events differs for each instance of S. For example, S[0] might see 100 events (and therefore need an array of 100 elements in length) but S[1] might see only 1 event (and S[2] 30 events, etc.) Therefore it would be very wasteful to allocate huge amounts of memory at the start for every instance of S (for which there are millions) since some might fill the array but others would not even come close. Indeed I have tried this and it is too much for the machine I am running it on.
I will try some of the ideas here and post my progress. Many thanks!
You could probably use realloc().
There is no way to do what you describe, because there is no way to guarantee that there will be available memory next to the one that your structure is currently occupying.
The standard thing to do is to allocate more memory and copy your data.
Of course if you can know (an estimate of) the size of the memory allocation that you need you can preallocate it and avoid copying.
Note, however, that the structures in C have a fixed size once they are declared, so it seems you don't really need to allocate more memory for an existing structure...
realloc is the only way to expand the existing dynamic memory. realloc will tries to expand the existing buffer, if it fails in expansion it will allocate new buffer for the total size required and it will copy the data from old buffer. If you dont want to do realloc every time(which internally will memmove most of the time) then you can try to reallocate more memory than actually you required.
realloc(buf_ptr, (actual_size + additional_size) * 2);
This way will reduce the frequency of calling realloc (and memmove).
Note : Implementation of realloc is different in some architecture, it will never tries to expand the memory it always tries to allocate buffer for total size. So in those platforms memmove will be called for every call to realloc.
It sounds like you are looking for the C feature called flexible array member (example). It is only well-defined for C standard C99 or later.
The last member of the struct will have to be declared as a flexible array member, which you initially malloc, and later realloc (and of course memcpy to do the actual copying).

malloc and other associated functions

I have an array named 'ArrayA' and it is full of ints but I want to add another 5 cell to the end of the array every time a condition is met. How would I do this? ( The internet is not being very helpful )
If this is a static array, you will have to create a new one with more space and copy the data yourself. If it was allocated with malloc(), as the title to your question suggests, then you can use realloc() to do this more-or-less automatically. Note that the address of your array will, in general, have changed.
It is precisely because of the need for "dynamic" arrays that grow (and shrink) as needed, that languages like C++ introduced vectors. They do the management under the covers.
You need the realloc function.
Also note that adding 5 cells is not the best performance solution.
It is best to double the size of your arrays every time an array increase is needed.
Use two variables, one for the size (the number of integers used) and one for capacity (the actual memory size of arrays)
In a modern OS it is generally safe to assume that if you allocate a lot of memory that you don't use then it will not actually consume physical RAM, but only exist as virtual mappings. The OS will provide physical RAM as soon as a page (today generally in chunks of 4Kb) is used for the first time.
You can specifically enforce this behavior by using mmap to create a large anonymous mapping (MAP_PRIVATE | MAP_ANONYMOUS) e.g. as much as you intend to hold at maximum. On modern x64 systems virtual mappings can be up to 64Tb large. It is logically memory available to your program, but in practice pages will be added to it as you start using them.
realloc as described by the other posters is the naiive way to resize a malloc mapping, but make sure that realloc was successful. It can fail!
Problems with memory arise when you use memory once, don't deallocate it and stop using it. In contrast allocated, but untouched memory generally does not actually use resources other then VM table entries.

Efficient memory reallocation question

Let's say I have a program(C++, for example) that allocates multiple objects, never bigger than a given size(let's call it MAX_OBJECT_SIZE).
I also have a region(I'll call it a "page") on the heap(allocated with, say, malloc(REGION_SIZE), where REGION_SIZE >= MAX_OBJECT_SIZE).
I keep reserving space in that page until the filled space equals PAGE_SIZE(or at least gets > PAGE_SIZE - MAX_OBJECT_SIZE).
Now, I want to allocate more memory. Obviously my previous "page" won't be enough. So I have at least two options:
Use realloc(page, NEW_SIZE), where NEW_SIZE > PAGE_SIZE;
Allocate a new "page"(page2) and put the new object there.
If I wanted to have a custom allocate function, then:
Using the first method, I'd see how much I had filled, and then put my new object there(and add the size of the object to my filled memory variable).
Using the second method, I'd have a list(vector? array?) of pages, then look for the current page, and then use a method similar to 1 on the selected page.
Eventually, I'd need a method to free memory too, but I can figure out that part.
So my question is: What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
I understand that different operations may perform differently, but I'm looking for an overall metric.
In my experience option 2 is much easier to work with has minimal overhead. Realloc does not guarantee it will increase the size of existing memory. And in practice it almost never does. If you use it you will need to go back and remap all of the old objects. That would require that you remember where every object allocated was... That can be a ton over overhead.
But it's hard to qualify "most efficient" without knowing exactly what metrics you use.
This is the memory manager I always use. It works for the entire application not just one object.
allocs:
for every allocation determine the size of the object allocated.
1 look at a link list of frees for objects of that size to see if anything has been freed if so take the first free
2 look for in a look up table and if not found
2.1 allocate an array of N objects of the size being allocated.
3 return the next free object of the desired size.
3.1 if the array is full add a new page.
N objects can be programmer tunned. If you know you have a million 16 byte objects you might want that N to be slightly higher.
for objects over some size X, do not keep an array simply allocate a new object.
frees:
determine the size of the object, add it to the link list of frees.
if the size of the object allocated is less than the size of a pointer the link list does not need to incur any memory overhead. simply use the already allocated memory to store the nodes.
The problem with this method is memory is never returned to the operating system until the application has exited or the programmer decides to defragment the memory. defragmenting is another post. it can be done.
It is not clear from your question why you need to allocate a big block of memory in advance rather than allocating memory for each object as needed. I'm assuming you are using it as a contiguous array. Otherwise, it would make more sense to malloc the memory of each object as it is needed.
If it is indeed acting as an array,malloc-ing another block gives you another chunk of memory that you have to access via another pointer (in your case page2). Thus it is no longer on contiguous block and you cannot use the two blocks as part of one array.
realloc, on the other hand, allocates one contiguous block of memory. You can use it as a single array and do all sorts of pointer arithmetic not possible if there are separate blocks. realloc is also useful when you actually want to shrink the block you are working with, but that is probably not what you are seeking to do here.
So, if you are using this as an array, realloc is basically the better option. Otherwise, there is nothing wrong with malloc. Actually, you might want to use malloc for each object you create rather than having to keep track of and micro-manage blocks of memory.
You have not given any details on what platform you are experimenting. There are some performance differences for realloc between Linux and Windows, for example.
Depending on the situation, realloc might have to allocate a new memory block if it can't grow the current one and copy the old memory to the new one, which is expensive.
If you don't really need a contiguous block of memory you should avoid using realloc.
My sugestion would be to use the second approach, or use a custom allocator (you could implement a simple buddy allocator [2]).
You could also use more advanced memory allocators, like
APR memory pools
Google's TCMalloc
In the worst case, option 1 could cause a "move" of the original memory, that is an extrawork to be done. If the memory is not moved, anyway the "extra" size is initialized, which is other work too. So realloc would be "defeated" by the malloc method, but to say how much, you should do tests (and I think there's a bias on how the system is when the memory requests are done).
Depending on how many times you expect the realloc/malloc have to be performed, it could be an useful idea or an unuseful one. I would use malloc anyway.
The free strategy depends on the implementation. To free all the pages as whole, it is enough to "traverse" them; instead of an array, I would use linked "pages": add sizeof(void *) to the "page" size, and you can use the extra bytes to store the pointer to the next page.
If you have to free a single object, located anywhere in one of the pages, it becomes a little bit more complex. My idea is to keep a list of non-sequential free "block"/"slot" (suitable to hold any object). When a new "block" is requested, first you pop a value from this list; if it is empty, then you get the next "slot" in the last in use page, and eventually a new page is triggered. Freeing an object, means just to put the empty slot address in a stack/list (whatever you prefer to use).
In linux (and probably other POSIX systems) there is a third possibility, that is to use a memory mapped region with shm_open. Such a region is initialized by zeroes once you access it, but AFAIK pages that you never access come with no cost, if it isn't just the address-range in virtual memory that you reserve. So you could just reserve a large chunk of memory at the beginning (more than you ever would need) of your execution and then fill it incrementally from the start.
What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
Option 1. For it to be efficient, NEW_SIZE has to depend on old size non-linearly. Otherwise you risk running into O(n^2) performance of realloc() due to the redundant copying. I generally do new_size = old_size + old_size/4 (increase by 25% percent) as theoretically best new_size = old_size*2 might in worst case reserve too much unused memory.
Option 2. It should be more optimal as most modern OSs (thanks to C++'s STL) are already well optimized for flood of small memory allocations. And small allocations have lesser chance to cause memory fragmentation.
In the end it all depends how often you allocate the new objects and how do you handle freeing. If you allocate a lot with #1 you would have some redundant copying when expanding but freeing is dead simple since all objects are in the same page. If you would need to free/reuse the objects, with #2 you would be spending some time walking through the list of pages.
From my experience #2 is better, as moving around large memory blocks might increase rate of heap fragmentation. The #2 is also allows to use pointers as objects do not change their location in memory (though for some applications I prefer to use pool_id/index pairs instead of raw pointers). If walking through pages becomes a problem later, it can be too optimized.
In the end you should also consider option #3: libc. I think that libc's malloc() is efficient enough for many many tasks. Please test it before investing more of your time. Unless you are stuck on some backward *NIX, there should be no problem using malloc() for every smallish object. I used custom memory management only when I needed to put objects in exotic places (e.g. shm or mmap). Keep in mind the multi-threading too: malloc()/realloc()/free() generally are already optimized and MT-ready; you would have to reimplement the optimizations anew to avoid threads being constantly colliding on memory management. And if you want to have memory pools or zones, there are already bunch of libraries for that too.

Resources