I am looking for a malloc alternative for c that will only ever be used as a stack. Something more like alloca but not limited in space by the stack size. It is for coding a math algorithm.
I will work with large amounts of memory (possibly hundreds of megabytes in use in the middle of the algorithm)
memory is accessed in a stack-like order. What I mean is that the next memory to be freed is always the memory that was most recently allocated.
would like to be able to run an a variety of systems (Windows and Unix-like)
as an extra, something that can be used with threading, where the stack-like allocate and free order applies just to individual threads. (ie ideally each thread has its own "pool" for memory allocation)
My question is, is there anything like this, or is this something that would be easy to implement?
This sounds like a perfect use for Obstack.
I've never used it myself since the API is really confusing, and I can't dig up an example right now. But it supports all the operations you want, and additionally supports streaming creation of the "current" object.
Edit: whipped up a quick example. The Obstack API shows signs of age, but the principle is sound at least.
You will probably want to look into tuning the align/block settings and likely use obstack_next_free and obstack_object_size if you do any fancy growing.
#include <obstack.h>
#include <stdio.h>
#include <stdlib.h>
void *xmalloc(size_t size)
{
void *rv = malloc(size);
if (rv == NULL)
abort();
return rv;
}
#define obstack_chunk_alloc xmalloc
#define obstack_chunk_free free
const char *cat(struct obstack *obstack_ptr, const char *dir, const char *file)
{
obstack_grow(obstack_ptr, dir, strlen(dir));
obstack_1grow(obstack_ptr, '/');
obstack_grow0(obstack_ptr, file, strlen(file));
return obstack_finish(obstack_ptr);
}
int main()
{
struct obstack main_stack;
obstack_init(&main_stack);
const char *cat1 = cat(&main_stack, "dir1", "file1");
const char *cat2 = cat(&main_stack, "dir1", "file2");
const char *cat3 = cat(&main_stack, "dir2", "file3");
puts(cat1);
puts(cat2);
puts(cat3);
obstack_free(&main_stack, cat2);
// cat2 and cat3 both freed, cat1 still valid
}
As you already found out, as long as it works with malloc you should use it and only come back when you need to squeeze out the last bit of performance.
An idea fit that case: You could use a list of blocks, that you allocate when needed. Using a list makes it possible to eventually swap out data in case you hit the virtual memory limit.
struct block {
size_t size;
void * memory;
struct block * next;
};
struct stacklike {
struct block * top;
void * last_alloc;
};
void * allocate (struct stacklike * a, size_t s) {
// add null check for top
if (a->top->size - (a->next_alloc - a->top->memory) < s + sizeof(size_t)) {
// not enough memory left in top block, allocate new one
struct block * nb = malloc(sizeof(*nb));
nb->next = a->top;
a->top = nb;
nb->memory = malloc(/* some size large enough to hold multiple data entities */);
// also set nb size to that size
a->next_alloc = nb->memory;
}
void * place = a->next_alloc;
a->next_alloc += s;
*((size_t *) a->next_alloc) = s; // store size to be able to free
a->next_alloc += sizeof (size_t);
return place;
}
I hope this shows the general idea, for an actual implementation there's much more to consider.
To swap out stuff you change that to a doubly linked list an keep track of the total allocated bytes. If you hit a limit, write the end to some file.
I have seen a strategy used in an old FORTRAN program that might be what you are looking for. The strategy involves use of a global array that is passed down to each function from main.
char global_buffer[SOME_LARGE_SIZE];
void foo1(char* buffer, ...);
void foo2(char* buffer, ...);
void foo3(char* buffer, ...);
int main()
{
foo1(global_buffer, ....);
}
void foo1(char* buffer, ...)
{
// This function needs to use SIZE1 characters of buffer.
// It can let the functions that it calls use buffer+SIZE1
foo2(buffer+SIZE1, ...);
// When foo2 returns, everything from buffer+SIZE1 is assumed
// to be free for re-use.
}
void foo2(char* buffer, ...)
{
// This function needs to use SIZE2 characters of buffer.
// It can let the functions that it calls use buffer+SIZE2
foo3(buffer+SIZE2, ...);
}
void foo3(char* buffer, ...)
{
// This function needs to use SIZE3 characters of buffer.
// It can let the functions that it calls use buffer+SIZE3
bar1(buffer+SIZE3, ...);
}
Related
I'm creating a C-library with .h and .c files for a ring buffer. Ideally, you would initialize this ring buffer library in the main project with something like ringbuff_init(int buff_size); and the size that is sent, will be the size of the buffer. How can I do this when arrays in C needs to be initialized statically?
I have tried some dynamically allocating of arrays already, I did not get it to work. Surely this task is possible somehow?
What I would like to do is something like this:
int buffSize[];
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(int buff_size)
{
buffSize[buff_size];
}
This obviously doesn't compile because the array should have been initialized at the declaration. So my question is really, when you make a library for something like a buffer, how can you initialize it in the main program (so that in the .h/.c files of the buffer library) the buffer size is set to the wanted size?
You want to use dynamic memory allocation. A direct translation of your initial attempt would look like this:
size_t buffSize;
int * buffer;
int main(void)
{
ringbuffer_init(100); // initialize buffer size to 100
}
void ringbuffer_init(size_t buff_size)
{
buffSize = buff_size;
buffer = malloc(buff_size * sizeof(int));
}
This solution here is however extremely bad. Let me list the problems here:
There is no check of the result of malloc. It could return NULL if the allocation fails.
Buffer size needs to be stored along with the buffer, otherwise there's no way to know its size from your library code. It isn't exactly clean to keep these global variables around.
Speaking of which, these global variables are absolutely not thread-safe. If several threads call functions of your library, results are inpredictible. You might want to store your buffer and its size in a struct that would be returned from your init function.
Nothing keeps you from calling the init function several times in a row, meaning that the buffer pointer will be overwritten each time, causing memory leaks.
Allocated memory must be eventually freed using the free function.
In conclusion, you need to think very carefully about the API you expose in your library, and the implementation while not extremely complicated, will not be trivial.
Something more correct would look like:
typedef struct {
size_t buffSize;
int * buffer;
} RingBuffer;
int ringbuffer_init(size_t buff_size, RingBuffer * buf)
{
if (buf == NULL)
return 0;
buf.buffSize = buff_size;
buf.buffer = malloc(buff_size * sizeof(int));
return buf.buffer != NULL;
}
void ringbuffer_free(RingBuffer * buf)
{
free(buf.buffer);
}
int main(void)
{
RingBuffer buf;
int ok = ringbuffer_init(100, &buf); // initialize buffer size to 100
// ...
ringbuffer_free(&buf);
}
Even this is not without problems, as there is still a potential memory leak if the init function is called several times for the same buffer, and the client of your library must not forget to call the free function.
Static/global arrays can't have dynamic sizes.
If you must have a global dynamic array, declare a global pointer instead and initialize it with a malloc/calloc/realloc call.
You might want to also store its size in an accompanying integer variable as sizeof applied to a pointer won't give you the size of the block the pointer might be pointing to.
int *buffer;
int buffer_nelems;
char *ringbuffer_init(int buff_size)
{
assert(buff_size > 0);
if ( (buffer = malloc(buff_size*sizeof(*buffer)) ) )
buffer_nelems = buff_size;
return buffer;
}
You should use malloc function for a dynamic memory allocation.
It is used to dynamically allocate a single large block of memory with the specified size. It returns a pointer of type void which can be cast into a pointer of any form.
Example:
// Dynamically allocate memory using malloc()
buffSize= (int*)malloc(n * sizeof(int));
// Initialize the elements of the array
for (i = 0; i < n; ++i) {
buffSize[i] = i + 1;
}
// Print the elements of the array
for (i = 0; i < n; ++i) {
printf("%d, ", buffSize[i]);
}
I know I'm three years late to the party, but I feel I have an acceptable solution without using dynamic allocation.
If you need to do this without dynamic allocation for whatever reason (I have a similar issue in an embedded environment, and would like to avoid it).
You can do the following:
Library:
int * buffSize;
int buffSizeLength;
void ringbuffer_init(int buff_size, int * bufferAddress)
{
buffSize = bufferAddress;
buffSizeLength = buff_size;
}
Main :
#define BUFFER_SIZE 100
int LibraryBuffer[BUFFER_SIZE];
int main(void)
{
ringbuffer_init(BUFFER_SIZE, LibraryBuffer ) // initialize buffer size to 100
}
I have been using this trick for a while now, and it's greatly simplified some parts of working with a library.
One drawback: you can technically mess with the variable in your own code, breaking the library. I don't have a solution to that yet. If anyone has a solution to that I would love to here it. Basically good discipline is required for now.
You can also combine this with #SirDarius 's typedef for ring buffer above. I would in fact recommend it.
My objective is to optimize memory usage... I've never seen it in any tutorial which leads me to think that this isn't the right way to do it
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct Player {
char* username;
int hp;
int mp;
};
int main(void) {
struct Player test, *p = &test;
p->username = (char*)malloc(50 * sizeof(char));
scanf("%s", p->username);
p->username = realloc(p->username, (strlen(p->username) + 1) * sizeof(char));
printf("%s", p->username);
return 0;
}
right way to optimize memory usage?
Temporary re-used buffers can often be generousness and fixed in size.
Allocating the right-size amount for memory makes sense for member .username for code could be for millions of struct Player.
IOWs, use allocation for the variable size aspects of code. If struct Player was for 2-player chess, a char username[50] size makes sense. For a multi-player universe, char * makes sense.
Rather than call *alloc() twice consider a single right-sized call.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
// Reasonable upper bound
#define USERNAME_SIZEMAX 50
struct Player {
char* username;
int hp;
int mp;
};
int main(void) {
puts("Enter user name");
// Recommend 2x - useful for leading/trailing spaces & detecting excessive long inputs.
char buf[USERNAME_SIZEMAX * 50];
if (fgets(buf, sizeof buf, stdin) == NULL) {
puts("No input");
} else {
trim(buf); // TBD code to lop off leading/trailing spaces
if (!valid_name(buf)) { // TBD code to validate the `name`
printf("Bad input \"%s\"\n", buf);
} else {
struct Player test = { 0 }; // fully populate
test.username = malloc(strlen(buf) + 1);
// Maybe add NULL check here
strcpy(test.username, buf);
// Oh happy day!
printf("%s", p->username);
return EXIT_SUCCESS;
}
}
return EXIT_FAILURE;
}
Some tips:
a) the example code is too small to matter
b) never use malloc() for something that you will always want one of. Instead, pre-allocate (e.g. as a global variable) or (if it's small enough) use a local variable to avoid the overhead of malloc(). E.g.:
int main(void) {
struct Player test, *p = &test;
char userName[50];
p->username = userName;
c) Don't spread data all over the place. You want all the data in the same place (in the least number of cache lines, with pieces of data that are used at the same time as close to each other as possible). One way to do that is to combine multiple items. E.g.:
struct Player {
char username[50];
int hp;
int mp;
};
int main(void) {
struct Player test, *p = &test;
d) If something takes (at most) 50 chars of memory; don't bother using realloc() to waste CPU time and potentially waste more memory. Don't forget that the internal code for malloc() and realloc() will add meta-data to each allocated piece of memory that is likely to cost an extra 16 bytes or more.
In general; for performance, malloc() and realloc() (and new() and ...) should be completely avoided (especially for larger programs). They spread data "randomly" everywhere and destroy any hope of getting good locality (which is important for minimising multiple very expensive things - cache misses, TLB misses, page faults, swap space usage, ...).
Note: scanf() and gets() should also be banned. They provide no way to prevent buffer overflows (e.g. the user providing more than 50 characters when there's only enough memory allocated for 50 characters, for the purpose of deliberately trashing/corrupting other data), which results in huge gaping security holes.
Primer: This question is quite long, because I want to give an overview of my current understanding of the inner mechanisms of MRI and how I came to my conclusions. I want to understand the code better, so please correct me if any assumption I'm making is wrong.
I'm trying to find out where MRI Ruby stores the data part (aka the contents) of a String, because I'd like to create String objects which reuse memory allocated by another binary (same allocator of course).
Here's what I know so far:
RString: internal representation of a String.
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[RSTRING_EMBED_LEN_MAX + 1];
} as;
};
reference
From the above snippet I conclude that there are 2 ways the data can be stored:
on the heap via the heap struct (ptr points to data)
in the ary char array directly (probably some optimization)
I'm only interested in the heap case.
str_new0() seems to be the most common way to create a String from a pointer to some string data and a length.
static VALUE
str_new0(VALUE klass, const char *ptr, long len, int termlen)
{
VALUE str;
if (len < 0) {
rb_raise(rb_eArgError, "negative string size (or size too big)");
}
RUBY_DTRACE_CREATE_HOOK(STRING, len);
str = str_alloc(klass);
if (len > RSTRING_EMBED_LEN_MAX) {
RSTRING(str)->as.heap.aux.capa = len;
RSTRING(str)->as.heap.ptr = ALLOC_N(char, len + termlen);
STR_SET_NOEMBED(str);
}
else if (len == 0) {
ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
}
if (ptr) {
memcpy(RSTRING_PTR(str), ptr, len);
}
STR_SET_LEN(str, len);
TERM_FILL(RSTRING_PTR(str) + len, termlen);
return str;
}
reference
Memory is allocated with the macro ALLOC_N which is an alias for RB_ALLOC_N which expands to ruby_xmalloc2() which calls objspace_xmalloc2() which calls objspace_xmalloc0().
Phew
static void *
objspace_xmalloc0(rb_objspace_t *objspace, size_t size)
{
void *mem;
size = objspace_malloc_prepare(objspace, size);
TRY_WITH_GC(mem = malloc(size));
size = objspace_malloc_size(objspace, mem, size);
objspace_malloc_increase(objspace, mem, size, 0, MEMOP_TYPE_MALLOC);
return objspace_malloc_fixup(objspace, mem, size);
}
reference
So here we are. TRY_WITH_GC seems to check if the allocation mem = malloc(size) succeeds and if not it tries again after a GC run I think.
#define TRY_WITH_GC(alloc) do { \
objspace_malloc_gc_stress(objspace); \
if (!(alloc) && \
(!garbage_collect_with_gvl(objspace, TRUE, TRUE, TRUE, GPR_FLAG_MALLOC) || /* full/immediate mark && immediate sweep */ \
!(alloc))) { \
ruby_memerror(); \
} \
} while (0)
reference
Here's the first thing I'm unsure about: It seems to malloc just some memory (important: not in objspace). Is this the case? I don't know if they overwrote malloc somewhere to allocate GC friendly or whatever.
OK after that they mutate objspace with objspace_malloc_increase() and friends. I don't understand what these functions do. They do not seem to store the pointer mem in objspace, but maybe I overlooked it. I need clarification here.
As noted in the beginning I want to write code that creates a Ruby String, which uses memory allocated by some other binary, eg. C via FFI, of course with the system allocator. Do I have to register my "foreign" memory via the objspace_* functions? If yes, how does that exactly work? And are there subtleties when it comes to freeing the memory again? (I guess the GC does that, but what conditions must be true for this to work?)
I hope my question is not too vague, I can ask more precisely if necessary!
Thanks in advance!
I have a small program that creates a semver struct with some variables in it:
typedef struct {
unsigned major;
unsigned minor;
unsigned patch;
char * note;
char * tag;
} semver;
Then, I would like to create a function which creates a semver struct and returns it to the caller. Basically, a Factory.
That factory would call an initialize function to set the default values of the semver struct:
void init_semver(semver * s) {
s->major = 0;
s->minor = 0;
s->patch = 0;
s->note = "alpha";
generate_semver(s->tag, s);
}
And on top of that, I would like a function to generate a string of the complete semver tag.
void generate_semver(char * tag, semver * s) {
sprintf( tag, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
}
My problem appears to lie in this function. I have tried returning a string, but have heard that mallocing some space is bad unless you explicitly free it later ;) In order to avoid this problem, I decided to try to pass a string to the function to have it be changed within the function with no return value. I'm trying to loosely follow something like DI practices, even though I'd really like to separate the concerns of these functions and have the generate_semver function return a string that I can use like so:
char * generate_semver(semver * s) {
char * full_semver;
sprintf( full_semver, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
return full_semver; // I know this won't work because it is defined in the local stack and not outside.
}
semver->tag = generate_semver(semver);
How can I do this?
My problem appears to lie in this function. I have tried returning a string, but have heard that mallocing some space is bad unless you explicitly free it later.
Explicitly freeing dynamically allocated memory is required to avoid memory leaks. However, it is not necessarily a task that the end users need to perform directly: an API often provides a function to deal with this.
In your case, you should provide a deinit_semver function that does the clean up of memory that init_semver has allocated dynamically. These two functions behave in a way that is similar to constructor and destructor; init_semver is not a factory function, because it expects the semver struct to be allocated, rather than allocating it internally.
Here is one way of doing it:
void init_semver(semver * s, int major, int minor, int pathc, const char * note) {
s->major = major;
s->minor = minor;
s->patch = pathc;
size_t len = strlen(note);
s->note = malloc(len+1);
strcpy(s->note, note);
s->tag = malloc(40 + len);
sprintf(s->tag, "v%d.%d.%d-%s", major, minor, patch, note);
}
void deinit_semver(semver *s) {
free(s->note);
free(s->tag);
}
Note the changes above: rather than using fixed values for the components of struct semver, this code takes the values as parameters. In addition, the code copies the note into a dynamically allocated buffer, rather than pointing to it directly.
The deinit function does the clean-up by free-ing both fields that were allocated dynamically.
A char * on its own is just a pointer to memory. To accomplish what you want you will either need to instead use a fixed size field, i.e. char[33], or you can dynamically allocate the memory as needed.
As it is, your generate_semver function is attempting to print to an unknown address. Let's look at one solution.
typedef struct {
unsigned major;
unsigned minor;
unsigned patch;
char note[32];
char tag[32];
} semver;
Now, in your init_semver function, the line previously s->note = "alpha"; will become a string copy, as arrays are not a valid lvalue.
strncpy(s->note, "alpha", 31);
s->note[31] = '\0';
strncpy will copy a string from the second parameter to the first up to the number of bytes in the third parameter. The second line ensures that a trailing null terminator is in place.
Similarly, in the generate_semver function, it would directly work in the buffer:
void generate_semver(semver * s) {
snprintf( s->tag, 32, "v%d.%d.%d-%s",
s->major, s->minor, s->patch, s->note);
}
This will directly print to the array in the structure, with a maximum character limit. snprintf does append a trailing null terminator (unlike strncpy), so we don't need to worry about adding it ourselves.
You mention having to free allocated memory, and then say: "In order to avoid this problem". Well, it's not so much a problem, but rather a necessity of the C language. It's common to have functions that allocate memory, and require the caller to free it again.
The idiomatic way is to have a pair of "create" and "destroy" functions. So I'd suggest doing it like this:
// Your factory function
semver* create_semver() {
semver* instance = malloc(sizeof(*instance));
init_semver(instance); // will also allocate instance->tag and ->note
return instance;
}
// Your destruction function
void free_semver(semver* s) {
free(semver->tag);
free(semver->note);
free(semver);
}
I have a function in C that dynamically allocates a buffer, which is passed to another function to store its return value. Something like the following dummy example:
void other_function(float in, float *out, int out_len) {
/* Fills 'out' with 'out_len' values calculated from 'in' */
}
void function(float *data, int data_len, float *out) {
float *buf;
int buf_len = 2 * data_len, i;
buf = malloc(sizeof(float) * buf_len);
for (i = 0; i < data_len; i++, data++, out++) {
other_function(*data, buf, buf_len);
/* Do some other stuff with the contents of buf and write to *out */
}
free buf;
}
function is called by an iterator over a multi-dimensional array (it's a NumPy gufunc kernel, to be precise), so it gets called millions of times with the same value for data_len. It seems wasteful to be creating and destroying the buffer over and over again. I would normally move allocation of the buffer to the function that calls function, and pass a poiinter to it, but I don't directly control that, so not possible. Instead, I am considering doing the following:
void function(float *data, int data_len, float *out) {
static float *buf = NULL;
static int buf_len = 0;
int i;
if (buf_len != 2 * data_len) {
buf_len = 2 * data_len;
buf = realloc(buf, sizeof(float) * buf_len); /* same as malloc if buf == NULL */
}
for (i = 0; i < data_len; i++, data++, out++) {
other_function(*data, buf, buf_len);
/* Do some other stuff with the contents of buf and write to *out */
}
}
That means that I never directly free the memory I allocate: it gets reused in subsequent calls, and then lingers there until my program exits. It doesn't seem like the right thing to do, but not too bad either, as the amount of memory allocated is always going to be small. Am I over worrying? Is there a better approach to this?
This approach is legitimate (but see below), although tools like valgrind will incorrectly flag it as a "leak". (It's not a leak, as a leak is an unbounded increase in memory usage.) You might want to benchmark exactly how much time is lost on malloc and free compared to other things the function is doing.
If you can use C99 or gcc, and if your buffer is not overly large, you should also consider variable-length arrays, which are as fast (or faster than) a static buffer, and create no fragmentation. If you're on another compiler, you can look into the non-standard (but widely supported) alloca extension.
You do need to be aware that using a static buffer makes your function:
Thread-unsafe - if it is called from multiple threads simultaneously, it will destroy the data of the other instance. If the Python is called from numpy, this is probably not a problem, as threads will be effectively serialized by the GIL.
Non-reentrant - if other_function calls some Python code which ends up calling function - for whatever reason - before function finishes, your function will again destroy its own data.
If you don't need true parallel execution and reentrancy, this use of static variables is fine, and a lot of C code uses them that way.
This is a fine approach, and something like this is likely used internally by many libraries. The memory will be freed automatically when the program exits.
You might want to round buf_len up to a multiple of some block size, so you don't realloc() every time data_len changes a small bit. But if data_len is almost always the same size, this isn't necessary.