How to allocate more memory to your program( GCC) [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to allocate more memory to program.l What is the gcc flag that allows you to do so?
FYI what I am trying to do is create a very large matrix( really large) which is gonna go through compression algorithms later. So there is no way I can avoid creating such a large matrix to store data.

Your question is very unclear, but I suspect that you are trying to create a large multidimensional array (matrix) as a local variable (auto variable) to some function (possibly main) and this is failing.
int foo(int boo, int doo) {
int big_array[REALLY_BIG];
...
This would fail because C compilers try to make room for variables like this on the programs system stack. A compiler may just fail upon attempting to think about something that big being on the stack (especially with alignment issues that might make it bigger) or it may generate code to try to do this and either the CPU can't run it because stack pointer relative indexing is limited or because the OS has placed limits on the size of the program's system stack.
There may be ways to change OS limits, but if it is a CPU limit you are just going to have to do things differently.
For some things the simplest thing to do use just use global or static variables for large sized data such as this. Doing this you end up allocating the space for the data either at compile time or at program load time (just prior to run time), but limits your ability to have more than one copy since you have to plan ahead to declare enough global variables to hold everything you want to be live at the same time.
You could also try using malloc or calloc to allocate the memory for you.
A third option is (if you are using a *nix system) to memory map a file containing the matrix. Look into the mmap system call for this.
An added benefit of using mmap or static or global variables is that under most operating systems the virtual memory manager can use the original file (the file containing the matrix for mmap, or the executable file for static or global) as swap space for the memory that the data uses. This makes it so that your program may be able to run without putting too much of a strain on the physical memory or virtual memory manager.

If the matrix is really large you might have to allocate memory in smaller segments so it can find room in the virtual memory space. On 32-bit Windows I have found you simply cannot get anything bigger than about 980 MB in a single allocation. On Linux it is pushing it to try to get more than about 1.5 GB.
In a 64-bit system you can get a lot more.
But in any case, I would recommend using a matrix library that can handle the memory and algorithms for you. There are many subtle tricks to making fast matrix computations. Tricks with threads, computing in cache-sized blocks, prefetching data, SSE vector ops, etc.
You might want to look into using the math libraries from either Intel or AMD.

You don't need any special gcc flags.
Use malloc to allocate your array dynamically at runtime.
If you are somehow forced to use a static array, or if your environment is set up by default to limit your program's access to virtual memory, you may need to use the ulimit command.
ulimit -v unlimited
ulimit -d unlimited
Otherwise, you need to specify more clearly the error you are getting that prevents you from getting sufficient memory, and probably also tell us how big your matrix is.

Use the heap! malloc() and friends are your friends.

Related

how to allocate the dynamic memory with our own function (without using malloc) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
how to allocate the dynamic memory with our own function
without using malloc(), how to allocate dynamic memory using C language.
You cannot get fresh heap memory without some support from the underlying operating system. I am assuming you have a POSIX operating system, e.g. Linux.
You could define your own malloc, but (in a hosted C implementation) most library functions assume that it has the traditional semantics (two successive and successful calls to malloc without any free-s producing two unaliased pointers to distinct non-overlapping memory zones).
In practice, your system malloc is generally implemented by querying fresh segments -in multiples of 4Kbytes pages- from virtual memory in your address space with a system call like mmap(2). But your standard C library malloc tries hard to reuse previously free-d memory zones before calling mmap, and it allocates some "large" (e.g. 128Kbytes or 1Mbytes) memory chunks using mmap and organize it as a set of memory zones (details are complex, since most malloc implementations are optimized for actual common use-cases). Quite often, malloc handles small allocations differently than large ones.
Occasionally (but most often not) a malloc implementation might release memory to the kernel using e.g. munmap but this does not happen often in practice. So in practice a process which has malloc-ed a lot of memory in many small zones and have free-d almost all of them still keep a lot of memory (to be able to reuse it without any mmap)
If you do not want to use malloc() provided by the library, you will have to implement your own memory manager, but I don't see any reason for doing so. That code is thoroughly tested and is in use for a long time.
You can implement your own sample memory manager based on the sbrk() system call.
I would recommend you to go through the following link.
Assuming you are on linux, check the link below:
http://www.ibm.com/developerworks/linux/library/l-memory/
There's no reason not to use malloc to dynamically allocate memory.
That said, you could define global char array with a very large size and write a series of functions to hand out space from that array. However, you need to keep track of what is/isn't available and be aware of alignments issues. And that's just scraping the surface.
Bottom line: use malloc.

Force memory allocation always to the same virtual address [duplicate]

This question already has answers here:
disable the randomness in malloc
(6 answers)
Closed 9 years ago.
I'm experimenting with Pin, an instrumentation tool, which I use to compute some statistics based on memory address of my variables. I want to re-run my program with the information gathered by my instrumentation tool, but for that it's crucial that virtual memory addresses remain the same through different runs.
In general, I should let the OS handle memory allocation, but in this case I need some kind of way to force it to always allocate to the same virtual address. In particular, I'm interested in a very long array, which I'm currently allocating with numa_alloc_onnode(), though I could use something else.
What would be the correct way to proceed?
Thanks
You could try mmap(2).
The instrumented version of your program will use a different memory layout than the original program because pin needs memory for the dynamic translation etc. and will change the memory layout. (if I recall correctly)
With the exception of address space layout randomization, most memory allocators, loaders, and system routines for assigning virtual memory addresses will return the same results given the same calls and data (not by deliberate design for that but by natural consequence of how software works). So, you need to:
Disable address space layout randomization.
Ensure your program executes in the same way each time.
Address space layout randomization is deliberate changes to address space to foil attackers: If the addresses are changed in each program execution, it is more difficult for attacks to use various exploits to control the code that is executed. It should be disabled only temporarily and only for debugging purposes. This answer shows one method of doing that and links to more information, but the exact method may depend on the version of Linux you are using.
Your program may execute differently for a variety of reasons, such as using threads or using asynchronous signals or interprocess communication. It will be up to you to control that in your program.
Generally, memory allocation is not guaranteed to be reproducible. The results you get may be on an as-is basis.

Determine total memory usage of embedded C program

I would like to be able to debug how much total memory is being used by C program in a limited resource environment of 256 KB memory (currently I am testing in an emulator program).
I have the ability to print debug statements to a screen, but what method should I use to calculate how much my C program is using (including globals, local variables [from perspective of my main function loop], the program code itself etc..)?
A secondary aspect would be to display the location/ranges of specific variables as opposed to just their size.
-Edit- The CPU is Hitachi SH2, I don't have an IDE that lets me put breakpoints into the program.
Using the IDE options make the proper actions (mark a checkobx, probably) so that the build process (namely, the linker) will generate a map file.
A map file of an embedded system will normally give you the information you need in a detailed fashion: The memory segments, their sizes, how much memory is utilzed in each one, program memory, data memory, etc.. There is usually a lot of data supplied by the map file, and you might need to write a script to calculate exactly what you need, or copy it to Excel. The map file might also contain summary information for you.
The stack is a bit trickier. If the map file gives that, then there you have it. If not, you need to find it yourself. Embedded compilers usually let you define the stack location and size. Put a breakpoint in the start of you program. When the application stops there zero the entire stack. Resume the application and let it work for a while. Finally stop it and inspect the stack memory. You will see non-zero values instead of zeros. The used stack goes until the zeros part starts again.
Generally you will have different sections in mmap generated file, where data goes, like :
.intvect
.intvect_end
.rozdata
.robase
.rosdata
.rodata
.text .... and so on!!!
with other attributes like Base,Size(hex),Size(dec) etc for each section.
While at any time local variables may take up more or less space (as they go in and out of scope), they are instantiated on the stack. In a single threaded environment, the stack will be a fixed allocation known at link time. The same is true of all statically allocated data. The only run-time variable part id dynamically allocated data, but even then sich data is allocated from the heap, which in most bare-metal, single-threaded environments is a fixed link-time allocation.
Consequently all the information you need about memory allocation is probably already provided by your linker. Often (depending on your tool-chain and linker parameters used) basic information is output when the linker runs. You can usually request that a full linker map file is generated and this will give you detailed information. Some linkers can perform stack usage analysis that will give you worst case stack usage for any particular function. In a single threaded environment, the stack usage from main() will give worst case overall usage (although interrupt handlers need consideration, the linker is not thread or interrupt aware, and some architectures have separate interrupt stacks, some are shared).
Although the heap itself is typically a fixed allocation (often all the available memory after the linker has performed static allocation of stack and static data), if you are using dynamic memory allocation, it may be useful at run-time to know how much memory has been allocated from the heap, as well as information about the number of allocations, average size of allocation, and the number of free blocks and their sizes also. Because dynamic memory allocation is implemented by your system's standard library any such analysis facility will be specific to your library, and may not be provided at all. If you have the library source you could implement such facilities yourself.
In a multi-threaded environment, thread stacks may be allocated statically or from the heap, but either way the same analysis methods described above apply. For stack usage analysis, the worst-case for each thread is measured from the entry point of each thread rather than from main().

What is the size limit for automatic variables in C? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Checking available stack size in C
EDIT2: My question is Duplicate of Checking available stack size in C Please delete.
EDIT: I'm looking for information on size limit, not general info on variables use.
Can the size limit be determined? Is it System dependent?
e.g. textbooks often write char string[1024];
but if one happily writes char string[99999999] he may get a crash.
This will depend on lot of factors ( iam writing from a unix machine point of view)
ulimit of the stack segment. ulimit of stack segment will determine how much stack space can be allocated to a process.
Bit ness of the process. Even if ulimit for stack is unlimited, there is a max limit. 32 bit have different maximum stack size and 64 bit have different max size. Depends on the OS architecture and runtime environment.
Free memory in the machine. There are paging algos where space in paging device is reserved while allocating actual memory. If there is no space, the process wont even start.
Huge automatic variable size can lead to stack and heap collision.
There could be more.. but completly depends on OS architecture and run time environment
Since auto variables are located on stack, it depends on how the stacksize is configured and how many nested call you have. To allocate MBs you should consider to use the heap (malloc)
For basic types of variables, Go to : C Variables
Else use dynamic variables like Linked List or else as per the requirement.
EDIT : in that case just go with #stracker
Size depends upon the free memory at the time of you run this application.
I don't know if this will help you.
You can try to look at limits.h (or through this link).
You might get something from this.

What's a good C memory allocator for embedded systems? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I have an single threaded, embedded application that allocates and deallocates lots and lots of small blocks (32-64b). The perfect scenario for a cache based allocator. And although I could TRY to write one it'll likely be a waste of time, and not as well tested and tuned as some solution that's already been on the front lines.
So what would be the best allocator I could use for this scenario?
Note: I'm using a Lua Virtual Machine in the system (which is the culprit of 80+% of the allocations), so I can't trivially refactor my code to use stack allocations to increase allocation performance.
I'm a bit late to the party, but I just want to share very efficient memory allocator for embedded systems I've recently found and tested: https://github.com/dimonomid/umm_malloc
This is a memory management library specifically designed to work with the ARM7, personally I use it on PIC32 device, but it should work on any 16- and 8-bit device (I have plans to test in on 16-bit PIC24, but I haven't tested it yet)
I was seriously beaten by fragmentation with default allocator: my project often allocates blocks of various size, from several bytes to several hundreds of bytes, and sometimes I faced 'out of memory' error. My PIC32 device has total 32K of RAM, and 8192 bytes is used for heap. At the particular moment there is more than 5K of free memory, but default allocator has maximum non-fragmented memory block just of about 700 bytes, because of fragmentation. This is too bad, so I decided to look for more efficient solution.
I already was aware of some allocators, but all of them has some limitations (such as block size should be a power or 2, and starting not from 2 but from, say, 128 bytes), or was just buggy. Every time before, I had to switch back to the default allocator.
But this time, I'm lucky: I've found this one: http://hempeldesigngroup.com/embedded/stories/memorymanager/
When I tried this memory allocator, in exactly the same situation with 5K of free memory, it has more than 3800 bytes block! It was so unbelievable to me (comparing to 700 bytes), and I performed hard test: device worked heavily more than 30 hours. No memory leaks, everything works as it should work.
I also found this allocator in the FreeRTOS repository: http://svnmios.midibox.org/listing.php?repname=svn.mios32&path=%2Ftrunk%2FFreeRTOS%2FSource%2Fportable%2FMemMang%2F&rev=1041&peg=1041# , and this fact is an additional evidence of stability of umm_malloc.
So I completely switched to umm_malloc, and I'm quite happy with it.
I just had to change it a bit: configuration was a bit buggy when macro UMM_TEST_MAIN is not defined, so, I've created the github repository (the link is at the top of this post). Now, user dependent configuration is stored in separate file umm_malloc_cfg.h
I haven't got deeply yet in the algorithms applied in this allocator, but it has very detailed explanation of algorithms, so anyone who is interested can look at the top of the file umm_malloc.c . At least, "binning" approach should give huge benefit in less-fragmentation: http://g.oswego.edu/dl/html/malloc.html
I believe that anyone who needs for efficient memory allocator for microcontrollers, should at least try this one.
In a past project in C I worked on, we went down the road of implementing our own memory management routines for a library ran on a wide range of platforms including embedded systems. The library also allocated and freed a large number of small buffers. It ran relatively well and didn't take a large amount of code to implement. I can give you a bit of background on that implementation in case you want to develop something yourself.
The basic implementation included a set of routines that managed buffers of a set size. The routines were used as wrappers around malloc() and free(). We used these routines to manage allocation of structures that we frequently used and also to manage generic buffers of set sizes. A structure was used to describe each type of buffer being managed. When a buffer of a specific type was allocated, we'd malloc() the memory in blocks (if a list of free buffers was empty). IE, if we were managing 10 byte buffers, we might make a single malloc() that contained space for 100 of these buffers to reduce fragmentation and the number of underlying mallocs needed.
At the front of each buffer would be a pointer that would be used to chain the buffers in a free list. When the 100 buffers were allocated, each buffer would be chained together in the free list. When the buffer was in use, the pointer would be set to null. We also maintained a list of the "blocks" of buffers, so that we could do a simple cleanup by calling free() on each of the actual malloc'd buffers.
For management of dynamic buffer sizes, we also added a size_t variable at the beginning of each buffer telling the size of the buffer. This was then used to identify which buffer block to put the buffer back into when it was freed. We had replacement routines for malloc() and free() that did pointer arithmetic to get the buffer size and then to put the buffer into the free list. We also had a limit on how large of buffers we managed. Buffers larger than this limit were simply malloc'd and passed to the user. For structures that we managed, we created wrapper routines for allocation and freeing of the specific structures.
Eventually we also evolved the system to include garbage collection when requested by the user to clean up unused memory. Since we had control over the whole system, there were various optimizations we were able to make over time to increase performance of the system. As I mentioned, it did work quite well.
I did some research on this very topic recently, as we had an issue with memory fragmentation. In the end we decided to stay with GNU libc's implementation, and add some application-level memory pools where necessary. There were other allocators which had better fragmentation behavior, but we weren't comfortable enough with them replace malloc globally. GNU's has the benefit of a long history behind it.
In your case it seems justified; assuming you can't fix the VM, those tiny allocations are very wasteful. I don't know what your whole environment is, but you might consider wrapping the calls to malloc/realloc/free on just the VM so that you can pass it off to a handler designed for small pools.
Although its been some time since I asked this, my final solution was to use LoKi's SmallObjectAllocator it work great. Got rid off all the OS calls and improved the performance of my Lua engine for embedded devices. Very nice and simple, and just about 5 minutes worth of work!
Since version 5.1, Lua has allowed a custom allocator to be set when creating new states.
I'd just also like to add to this even though it's an old thread. In an embedded application if you can analyze your memory usage for your application and come up with a max number of memory allocation of the varying sizes usually the fastest type of allocator is one using memory pools. In our embedded apps we can determine all allocation sizes that will ever be needed during run time. If you can do this you can completely eliminate heap fragmentation and have very fast allocations. Most these implementations have an overflow pool which will do a regular malloc for the special cases which will hopefully be far and few between if you did your analysis right.
I have used the 'binary buddy' system to good effect under vxworks. Basically, you portion out your heap by cutting blocks in half to get the smallest power of two sized block to hold your request, and when blocks are freed, you can make a pass up the tree to merge blocks back together to mitigate fragmentation. A google search should turn up all the info you need.
I am writing a C memory allocator called tinymem that is intended to be able to defragment the heap, and re-use memory. Check it out:
https://github.com/vitiral/tinymem
Note: this project has been discontinued to work on the rust implementation:
https://github.com/vitiral/defrag-rs
Also, I had not heard of umm_malloc before. Unfortunately, it doesn't seem to be able to deal with fragmentation, but it definitely looks useful. I will have to check it out.

Resources