How to schedule collection cycles for custom mark-sweep collector? - c

I've written a simple garbage collector for a Postscript virtual machine, and I'm having difficulty designing a decent set of rules for when to do a collection (when the free list is too short?) and when to allocate new space (when there's a lot of space to use?).
I've written bottom-up so far, but this question involves top-level design. So I feel I'm on shaky ground.
All objects are managed and access is only through operator functions, so this is a collector in C, not for C.
The primary allocator function is called gballoc:
unsigned gballoc(mfile *mem, unsigned sz) {
unsigned z = adrent(mem, FREE);
unsigned e;
memcpy(&e, mem->base+z, sizeof(e));
while (e) {
if (szent(mem,e) >= sz) {
memcpy(mem->base+z, mem->base+adrent(mem,e), sizeof(unsigned));
return e;
}
z = adrent(mem,e);
memcpy(&e, mem->base+z, sizeof(e));
}
return mtalloc(mem, 0, sz);
}
I'm sure it's gibberish without knowing what all the types and functions mean, so here's pseudocode of the same function:
gballoc
load free list head into ptr
while ptr is not NULL
if free element size is large enough
return element, removed from list
next ptr
fallback to allocating new space
So it's a simple "first-fit" algorithm with no carving (but allocations retain their size; so a large space reused for a small object can be reused for a large object again, later).
But when should I call collect()?
Edit:
The rest of the code and related modules have been posted in comp.lang.postscript, in the thread:
http://groups.google.com/group/comp.lang.postscript/browse_thread/thread/56c1734709ee33f1#

There are several applicable philosophies:
Do garbage collection as last-ditch avoidance of expanding the heap during an allocation. This is probably the most common strategy.
Do garbage collection periodically, like every hundredth allocation or deallocation. In some situations, this might decrease the overall effort of garbage collection by not letting fragmentation get out of hand.
Don't do any garbage collection. Always a possible strategy, especially for short-lived or simple programs.
As a developer of garbage collection, it might be desirable to give the choice of strategy to the application since it might know which will be most effective. Of course, if it doesn't have a preference, you should choose a default.

Here is the periodic collection strategy incorporated into the original code:
enum { PERIOD = 10 };
unsigned gballoc(mfile *mem, unsigned sz) {
unsigned z = adrent(mem, FREE);
unsigned e;
static period = PERIOD;
memcpy(&e, mem->base+z, sizeof(e));
try_again:
while (e) {
if (szent(mem,e) >= sz) {
memcpy(mem->base+z, mem->base+adrent(mem,e), sizeof(unsigned));
return e;
}
z = adrent(mem,e);
memcpy(&e, mem->base+z, sizeof(e));
}
if (--period == 0) {
period = PERIOD;
collect(mem, 0);
goto try_again;
}
return mtalloc(mem, 0, sz);
}

Related

Trick to avoid needing to initialize an array

Normally if I want to allocate a zero initialized array I would do something like this:
int size = 1000;
int* i = (int*)calloc(sizeof int, size));
And later my code can do this to check if an element in the array has been initialized:
if(!i[10]) {
// i[10] has not been initialized
}
However in this case I don't want to pay the upfront cost of zero initializing the array because the array may be quite large (i.e. gigs). But in this case I can afford to use as much memory as I want memory.
I think I remember that there is a technique to keep track of the elements in the array that have been initialed, without paying any up front cost, that also allows O(1) cost (not amortized with a hash table). My recollection is that the technique requires an extra array of the same size.
I think it was something like this:
int size = 1000;
int* i = (int*)malloc(size*sizeof int));
int* i_markers = (int*)malloc(size*sizeof int));
If an entry in the array is used it is recorded like this:
i_markers[10] = &i[10];
And then it's use can be checked later like this:
if(i_markers[10] != &i[10]) {
// i[10] has not been initialized
}
Of course this isn't quite right because i_markers[10] could have been randomly set to &i[10].
Can anyone out there remind me of the technique?
Thank you!
I think I remembered it.
Is this right? Is there a better way or are there variations on this?
Thanks again.
(This was updated to be the right answer)
struct lazy_array {
int size;
int* values;
int* used;
int* back_references;
int num_used;
};
struct lazy_array* create_lazy_array(int size) {
struct lazy_array* lazy = (struct lazy_array*)malloc(sizeof(lazy_array));
lazy->size = 1000;
lazy->values = (int*)malloc(size*sizeof int));
lazy->used = (int*)malloc(size*sizeof int));
lazy->back_references = (int*)malloc(size*sizeof int));
lazy->num_used = 0;
return lazy;
}
void use_index(struct lazy_array* lazy, int index, int value) {
lazy->values[index] = value;
if(is_index_used(lazy, index))
return;
lazy->used[index] = lazy->used;
lazy->back_references[lazy->used[index]] = index;
++lazy->used;
}
int is_index_used(struct lazy_array* lazy, int index) {
return lazy->used[index] < lazy->num_used &&
lazy->back_references[lazy->used[index]] == index);
}
On most compilers/standard libraries I know of, large calloc requests (and malloc for that matter) are implemented in terms of the OS's bulk memory request logic. On Linux, that means a copy-on-write mmap-ing of the zero page, and on Windows it means VirtualAlloc. In both cases, the OS gives you memory that is already zero, and calloc recognizes this; it only explicitly zeroes the memory if it was doing a small calloc from the small allocation heap. So until you write to any given page in the allocation, it's zero "for free". No need to be explicitly lazy; the allocator is being lazy for you.
For small allocations it does need to memset to clear the memory, but then, it's fairly cheap to memset a few thousand bytes (or tens of thousands) of bytes. For the really large allocations where zeroing would be costly, you're getting OS provided memory that's zero-ed for free (separate from the rest of the heap); e.g. for dlmalloc in typical configuration, allocations beyond 256 KB will always be freshly mmap-ed and munmap-ed, which means you're getting freshly mapped copy-on-write mappings of the zero page (the cost to zero them being deferred until you perform a write somewhere in the page, and paid whether you got the 256 KB via malloc or calloc).
If you want better guarantees about zeroing, or to get free zeroing on smaller allocations (though it's more wasteful the closer to one page you get), you can just explicitly do what malloc/calloc do implicitly and use the OS provided zero-ed memory, e.g. replace:
sometype *x = calloc(num, sizeof(*x)); // Or the similar malloc(num * sizeof(*x));
if (!x) { ... do error handling stuff ... }
...
free(x);
with either:
sometype *x = mmap(NULL, num * sizeof(*x), PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (x == MAP_FAILED) { ... do error handling stuff ... }
...
munmap(x, num * sizeof(*x));
or on Windows:
sometype *x = VirtualAlloc(NULL, num * sizeof(*x), MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
if (!x) { ... do error handling stuff ... }
...
VirtualFree(x, 0, MEM_RELEASE); // VirtualFree with MEM_RELEASE only takes size of 0
It gets you the same lazy initialization (though on Windows, this may mean that the pages have simply been lazily zero-ed in the background between requests, so they'd be "real" zeroes when you got them, vs. *NIX where they'd be CoW-ed from the zero page, so the get zero-ed live when you write to them).
This can be done, although it relies on undefined behavior. It is called a lazy array.
The trick is to use a reverse lookup table. Every time you store a value, you store its index in the lazy array:
void store(int value)
{
if (is_stored(value)) return;
lazy_array[value] = next_index;
table[next_index] = value;
++next_index;
}
int is_stored(int value)
{
if (lazy_array[value]<0) return 0;
if (lazy_array[value]>=next_index) return 0;
if (table[lazy_array[value]]!=value) return 0;
return 1;
}
The idea is that if the value has not been stored in the lazy array, then the lazy_array[value] will be garbage. Its value will either be an invalid index or a valid index into your reverse lookup table. If it is an invalid index, then you immediately know nothing has been stored there. If it is a valid index, then you check your table. If you have a match then the value was stored, otherwise it wasn't.
The downside is that reading from uninitialized memory is undefined behavior. Based on my experience, it will probably work, but there are no guarantees.
There are many possible techniques. Everything depends on your task. For instance, you can remember maximal number of initialized element max of your array. I.e. if your algorithm can garantee, that all elements from 0 to max ara initialized, you can use simple check if (0 <= i && i <= max) or something like this.
But if your algorithms need to initialize arbitrary elements (i.e. random access), you need general solution. For instance, more effective data structure (not simple array, but sparse array or something like this).
So, add more details about your task. I expect we'll find the best solution for it.

Use program stack in a DFS implementation

I have a standard DFS implementation in my code that uses a dynamically allocated stack on each call.
I call that function a lot. Often on just small runs (200-1000) nodes, but on occasion there is a large connected component with a million nodes or more.
A profiler shows that a significant amount of computing time is wasted on allocating the stack. I want to try to reuse existing memory (e.g. the call stack). However the function has to remain thread-safe.
Is there an efficient way to use the call stack dynamically without making the function recursive?
My best idea so far was to make the function recursive with an extra argument that doubles the automatic stack size on each subsequent invocation.
Pseudo C:
void dfs(size_t stack_length, void * graph, graphnode_t start_node) {
graphnode_t stack[stack_length];
size_t stack_size = 0;
for (all nodes) {
// do something useful
if (stack_size < stack_length) {
stack[stack_size++] = new_node;
} else {
dfs(stack_length * 2, graph, new_node);
}
}
}
It sounds like you're describing that your algorithm would work fine with just a single graphnode_t array for the system (though you're calling it a stack, I don't think that really applies here), and the only real problem is you're not certain how large it should be when you begin.
If that is the case, I would suggest first that you do not make this (potentially huge) array a local variable, because that will cause problems with your actual program stack. Instead let it be a static pointer that points to dynamically sized memory which you periodically expand if needed.
ensure_size(graphnode_t **not_a_stack_ptr, unsigned long *length_ptr)
{
if (!*not_a_stack_ptr)
{
*not_a_stack_ptr = malloc(sizeof(graphnode_t) * MINIMUM_ENTRY_COUNT);
*length_ptr = MINIMUM_ENTRY_COUNT;
}
else if (size needs to double)
{
*length_ptr *= 2;
*not_a_stack_ptr = realloc(*not_a_stack_ptr, sizeof(graphnode_t) * (*length_ptr));
}
}
struct thread_arguments {
void * graph;
graphnode_t start_node;
}
dfs_thread(void *void_thread_args)
{
struct thread_arguments *thread_args = void_thread_args;
graphnode_t *not_a_stack = NULL;
unsigned long not_a_stack_length = 0;
for (all nodes)
{
ensure_size(&not_a_stack, &not_a_stack_length);
stack[stack_size++] = new_node;
}
if (not_a_stack) free(not_a_stack);
}
Note: your pseudo-code suggests that the maximum size could be determined based on the number of nodes you have. You would get the most performance gain by using this to perform just a single full-sized malloc up front.

Structure initialization performance

I am trying to improve performance of my program (running on ARC platform, compiled with arc-gcc. Having said that, I am NOT expecting a platform specific answer).
I want to know which of the following methods is more optimal and why.
typedef struct _MY_STRUCT
{
int my_height;
int my_weight;
char my_data_buffer[1024];
}MY_STRUCT;
int some_function(MY_STRUCT *px_my_struct)
{
/*Many operations with the structure members done here*/
return 0;
}
void poorly_performing_function_method_1()
{
while(1)
{
MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under WHILE LOOP SCOPE*/
x_struct_instance.my_height = rand();
x_struct_instance.my_weight = rand();
if(x_struct_instance.my_weight > 100)
{
memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
}
some_function(&x_struct_instance);
/******************************************************/
/* No need for memset as it is initialized before use.*/
/* memset(&x_struct_instance,0,sizeof(x_struct_instance));*/
/******************************************************/
}
}
void poorly_performing_function_method_2()
{
MY_STRUCT x_struct_instance = {0}; /*x_struct_instance is automatic variable under FUNCTION SCOPE*/
while(1)
{
x_struct_instance.my_height = rand();
x_struct_instance.my_weight = rand();
if(x_struct_instance.my_weight > 100)
{
memcpy(&(x_struct_instance.my_data_buffer),"this is just an example string, there could be some binary data here.",sizeof(x_struct_instance.my_data_buffer));
}
some_function(&x_struct_instance);
memset(&x_struct_instance,0,sizeof(x_struct_instance));
}
}
In the above code, will poorly_performing_function_method_1() perform better or will poorly_performing_function_method_2() perform better? Why?
Few things to think about..
In method #1, can deallocation, reallocation of structure memory add more overhead?
In method #1, during initialization, is there any optimization happening? Like calloc (Optimistic memory allocation and allocating memory in zero filled pages)?
I want to clarify that my question is more about WHICH method is more optimal and less about HOW to make this code more optimal. This code is just an example.
About making the above code more optimal, #Skizz has given the right answer.
Generally, not doing something is going to be faster than doing something.
In your code, you're clearing a structure, and then initialising it with data. You're doing two memory writes, the second is just overwriting the first.
Try this:-
void function_to_try()
{
MY_STRUCT x_struct_instance;
while(1)
{
x_struct_instance.my_height = rand();
x_struct_instance.my_weight = rand();
x_struct_instance.my_name[0]='\0';
if(x_struct_instance.my_weight > 100)
{
strlcpy(&(x_struct_instance.my_name),"Fatty",sizeof(x_struct_instance.my_name));
}
some_function(&x_struct_instance);
}
}
Update
To answer the question, which is more optimal, I would suggest method #1, but it is probably marginal and dependent on the compiler and other factors. My reasoning is that there isn't any allocation / deallocation going on, the data is on the stack and the function preamble created by the compiler will allocate a big enough stack frame for the function such that it doesn't need to resize it. In any case, allocating on the stack is just moving the stack pointer so it's not a big overhead.
Also, memset is a general purpose method for setting memory and might have extra logic in it that copes with edge conditions such as unaligned memory. The compiler can implement an initialiser more intelligently than a general purpose algorithm (at least, one would hope so).

What's the best c implementation of the C++ vector?

I've been looking into using C over C++ as I find it cleaner and the main thing I find it to lack is a vector like array.
What is the best implementation of this?
I want to just be able to call something like vector_create, vector_at, vector_add, etc.
EDIT
This answer is from a million years ago, but at some point, I actually implemented a macro-based, efficient, type-safe vector work-alike in C that covers all the typical features and needs. You can find it here:
https://github.com/eteran/c-vector
Original answer below.
What about a vector are you looking to replicate? I mean in the end, it all boils down to something like this:
int *create_vector(size_t n) {
return malloc(n * sizeof(int));
}
void delete_vector(int *v) {
free(v);
}
int *resize_vector(int *v, size_t n) {
return realloc(v, n * sizeof(int));
/* returns NULL on failure here */
}
You could wrap this all up in a struct, so it "knows its size" too, but you'd have to do it for every type (macros here?), but that seems a little uneccessary... Perhaps something like this:
typedef struct {
size_t size;
int *data;
} int_vector;
int_vector *create_vector(size_t n) {
int_vector *p = malloc(sizeof(int_vector));
if(p) {
p->data = malloc(n * sizeof(int));
p->size = n;
}
return p;
}
void delete_vector(int_vector *v) {
if(v) {
free(v->data);
free(v);
}
}
size_t resize_vector(int_vector *v, size_t n) {
if(v) {
int *p = realloc(v->data, n * sizeof(int));
if(p) {
v->data = p;
v->size = n;
}
return v->size;
}
return 0;
}
int get_vector(int_vector *v, size_t n) {
if(v && n < v->size) {
return v->data[n];
}
/* return some error value, i'm doing -1 here,
* std::vector would throw an exception if using at()
* or have UB if using [] */
return -1;
}
void set_vector(int_vector *v, size_t n, int x) {
if(v) {
if(n >= v->size) {
resize_vector(v, n);
}
v->data[n] = x;
}
}
After which, you could do:
int_vector *v = create_vector(10);
set_vector(v, 0, 123);
I dunno, it just doesn't seem worth the effort.
The most complete effort I know of to create a comprehensive set of utility types in C is GLib. For your specific needs it provides g_array_new, g_array_append_val and so on. See GLib Array Documentation.
Rather than going off on a tangent in the comments to #EvanTeran's answer I figured I'd submit a longer reply here.
As various comments allude to there's really not much point in trying to replicate the exact behavior of std::vector since C lacks templates and RAII.
What can however be useful is a dynamic array implementation that just works with bytes. This can obviously be used directly for char* strings, but can also easily be adapted for usage with any other types as long as you're careful to multiply the size parameter by sizeof(the_type).
Apache Portable Runtime has a decent set of array functions and is all C.
See the tutorial for a quick intro.
If you can multiply, there's really no need for a vector_create() function when you have malloc() or even calloc(). You just have to keep track of two values, the pointer and the allocated size, and send two values instead of one to whatever function you pass the "vector" to (if the function actually needs both the pointer and the size, that is). malloc() guarantees that the memory chunk is addressable as any type, so assign it's void * return value to e.g. a struct car * and index it with []. Most processors access array[index] almost as fast as variable, while a vector_at() function can be many times slower. If you store the pointer and size together in a struct, only do it in non time-critical code, or you'll have to index with vector.ptr[index]. Delete the space with free().
Focus on writing a good wrapper around realloc() instead, that only reallocates on every power of e.g. 2 or 1.5. See user786653's Wikipedia link.
Of course, calloc(), malloc() and realloc() can fail if you run out memory, and that's another possible reason for wanting a vector type. C++ has exceptions that automatically terminate the program if you don't catch it, C doesn't. But that's another discussion.
Lack of template functionality in C makes it impossible to support a vector like structure. The best you can do is to define a 'generic' structure with some help of the preprocessor, and then 'instantiate' for each of the types you want to support.

malloc code in C

I have a code block that seems to be the code behind malloc. But as I go through the code, I get the feeling that parts of the code are missing. Does anyone know if there is a part of the function that's missing? Does malloc always combine adjacent chunks together?
int heap[10000];
void* malloc(int size) {
int sz = (size + 3) / 4;
int chunk = 0;
if(heap[chunk] > sz) {
int my_size = heap[chunk];
if (my_size < 0) {
my_size = -my_size
}
chunk = chunk + my_size + 2;
if (chunk == heap_size) {
return 0;
}
}
The code behind malloc is certainly much more complex than that. There are several strategies. One popular code is the dlmalloc library. A simpler one is described in K&R.
The code is obviously incomplete (not all paths return a value). But in any case this is not a "real" malloc. This is probably an attempt to implement a highly simplified "model" of 'malloc'. The approach chosen by the author of the code can't really lead to a useful practical implementation.
(And BTW, standard 'malloc's parameter has type 'size_t', not 'int').
Well, one error in that code is that it doesn't return a pointer to the data.
I suspect the best approach to that code is [delete].
When possible, I expect that malloc will try to put different requests close to each other, as it will have a block of code that is available for malloc, until it has to get a new block.
But, that also depends on the requirements imposed by the OS and hardware architecture. If you are only allowed to request a certain minimum size of code then it may be that each allocation won't be near each other.
As others mentioned, there are problems with the code snippet.
You can find various open-source projects that have their own malloc function, and it may be best to look at one of those, in order to get an idea what is missing.
malloc is for dynamically allocated memory. And this involves sbrk, mmap, or maybe some other system functions for Windows and/or other architectures. I am not sure what your int heap[10000] is for, as the code is too incomplete.
Effo's version make a little bit more sense, but then it introduce another black box function get_block, so it doesn't help much.
The code seems to be run on a metal machine, normally no virtual address mapping on such a system which only use physical address space directly.
See my understanding, on a 32 bits system, sizeof(ptr) = 4 bytes:
extern block_t *block_head; // the real heap, and its address
// is >= 0x80000000, see below "my_size < 0"
extern void *get_block(int index); // get a block from the heap
// (lead by block_head)
int heap[10000]; // just the indicators, not the real heap
void* malloc(int size)
{
int sz = (size + 3) / 4; // make the size aligns with 4 bytes,
// you know, allocated size would be aligned.
int chunk = 0; // the first check point
if(heap[chunk] > sz) { // the value is either a valid free-block size
// which meets my requirement, or an
// address of an allocated block
int my_size = heap[chunk]; // verify size or address
if (my_size < 0) { // it is an address, say a 32-bit value which
// is >0x8000...., not a size.
my_size = -my_size // the algo, convert it
}
chunk = chunk + my_size + 2; // the algo too, get available
// block index
if (chunk == heap_size) { // no free chunks left
return NULL; // Out of Memory
}
void *block = get_block(chunk);
heap[chunk] = (int)block;
return block;
}
// my blocks is too small initially, none of the blocks
// will meet the requirement
return NULL;
}
EDIT: Could somebody help to explain the algo, that is, converting address -> my_size -> chunk? you know, when call reclaim, say free(void *addr), it'll use this address -> my_size -> chunk algo too, to update the heap[chunk] accordingly after return the block to the heap.
To small to be a whole malloc implementation
Take a llok in the sources of the C library of Visual Studio 6.0, there you will find the implementation of malloc if I remeber it correctly

Resources