Implementing a Mark Sweep Garbage collector in C - c

I have this problem in C where I have to implement a garbage collector. I'm stuck on the fact that I was given 4 functions to complete and not sure how they connect to one another. I'm not sure what to do. This is what I have so far:
void mygc() {
//int **max = (int **) 0xbfffffffUL; // the address of the top of the stack
unsigned long stack_bottom;
int **max = (int **) GC_init(); // get the address of the bottom of the stack
int* q;
int **p = &q; // the address of the bottom of the stack
while (p < max) {
//printf("0. p: %u, *p: %u max: %u\n",p,*p,max);
mark(*p);
p++;
}
//utilize sweep and coalesce (coalesce function already written)
}
void mark(int *p) {
int i;
int *ptr;
//code here
}
void sweep(int *ptr) {
// code here
}
int *isPtr(int *p) {
//return the pointer or NULL
int *ptr = start;
//code here
}

If you don't even understand the question perhaps it's best to speak to your teaching staff. To get you started here's the general idea.
mygc is obviously the top level function that does the GC.
mark is called to mark memory location/object as in use. It also needs to mark all memory referenced by that location/object as in use (recursive).
sweep is called to unmark all the previously marked memory and to claim back (garbage collect) those locations that are not marked.
isPtr is called to determine whether a memory location is a pointer (as opposed to being any other data). This is used by mark to know whether a memory location needs to be marked or not.
So putting that all together the general pseudo code is:
mygc()
{
loc_list = get stack extents and global variables
foreach (p in loc_list) {
if (isPtr(p)) {
mark(p)
}
}
foreach (p in heap) {
sweep(p)
}
}
There are obviously lots of details not dealt with in that psuedo code. But it should hopefully be enough to answer your original question which is how the four functions fit together.

Related

How to remove duplicates from stack? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
TASK: Let's say there is an integer stack S with M elements. Give the algorithm that will remove all those numbers from stack S that appear two or more times. (write the task using C/C++)
NOTE: We are not allowed to use std::stack to solve this task.
First of all I decided to use C language, and this is stack implementation I use.
int* stack = (int*)malloc(10 * sizeof(int));
int size = 10;
int sp = -1;
bool isempty() {
return (sp == -1);
}
bool isfull() {
return (sp == size - 1);
}
void push(int x) {
if (isfull()) {
printf("Full!");
}
else {
sp++;
stack[sp] = x;
}
}
int pop() {
int x;
if (isempty()) {
printf("Empty!");
}
else {
x = stack[sp];
sp--;
}
return x;
}
void peek() {
if (!isempty()) {
printf("%d", stack[sp]);
}
}
void clear() {
while (!isempty()) {
pop();
}
}
void print() {
if (!isempty()) {
for (int i = 0; i < sp+1; i++) {
printf("%d ", stack[i]);
}
}
printf("\n");
}
My idea of solving this task was to make another temp stack and copy main stack into it, than use two for loops to compare all elements and inside that I used if statment to check if they are same or not, if they are not same I just pushed them into back into stack that was previously cleared, by this way I'm supposed to skip all duplicate elements but for some reason this code is not working properly it keeps spamming me "Full!" message.
void removeDuplicates() {
int* temp = (int*)malloc(10 * sizeof(int));
int temp_size = 10;
int temp_sp = -1;
for (int i = 0; i < sp + 1; i++) {
temp[i] = stack[i];
}
temp_sp = sp;
clear();
for (int i = 0; i < temp_sp+1; i++) {
for (int j = i + 1; j < temp_sp+1; i++) {
if (!(temp[i] == temp[j])) {
push(temp[i]);
}
}
}
}
This is main function that I used to test out functions:
int main() {
push(1);
push(2);
push(3);
push(4);
push(3);
push(5);
removeDuplicates();
print();
return 0;
}
If there is simpler way to solve this by using C++ (not std::stack), let me know.
this code that is supposed to work for normal array, but not sure if it's right for stack as we might using dynamic memory
Whether your code is correct for stacks is nothing to do with dynamic allocation, and everything to do with the interface of a stack. Do you know what that is? It's absolutely essential to solving your problem, and I don't see any hint that you either know how a stack behaves, or tried to research it.
Here you are, the stack abstract datatype:
preserves last-in first-out order
allows you to push a new element onto the top of the stack
allows you to pop the most recently pushed element (that wasn't already popped) from the top of the stack.
That's everything, and there is no random access (ie, stack[j] will never be a valid expression), so it is obviously impossible for the algorithm you showed to work.
If you don't have a stack implementation already - write one! You're going to need a stack to compile and test your algorithm anyway. The definitions you show describe the storage, but not the interface.
There are only two functions to code (plus the two to create and destroy a stack, and optionally one to query the size).
Now for the algorithm - you can only ever access the top element of a stack, so you need to think about what to with the elements you pop that aren't duplicates. They have to go somewhere, because you can't see below them while they're on your main stack, and you mustn't lose them.
Your edit shows you do have a stack datatype, sort of: it uses three global variables which you have to take care not to break, and you can't reuse any of the functions for your temporary stack, because they operate on those globals.
Even in C, I'd expect to see something like this (untested, un-compiled sample code based on yours above)
struct Stack {
int size;
int sp;
int data[];
};
struct Stack* stack_create(int elements) {
struct Stack *s = malloc(sizeof(*s) + elements * sizeof(int));
s->size = elements;
s->sp = -1;
return s;
}
bool stack_isEmpty(struct Stack *s) { return s->sp == -1; }
bool stack_isFull(struct Stack *s) { return s->sp == s->size - 1; }
void stack_push(struct Stack *s, int x)
{
assert(!stack_isFull(s));
s->data[++s->sp] = x;
}
int stack_pop(struct Stack *s)
{
assert(!stack_isEmpty(s));
return s->data[(s->sp)--];
}
because then you can use the same operations on your main and temporary stacks.
If the removeDuplicates message is supposed to be implemented in terms of the stack abstraction, you need an algorithm you can implement in terms of stack_push, stack_pop etc.
If the removeDuplicates message is supposed to be an internal function operating directly on the stack implementation, rather than being implemented in terms of the stack abstraction - then your basic approach is probably OK (if very far from optimal), and you just need to learn to debug your code.
I still don't know which one of those is true (so I won't vote to re-open yet), but they are completely different questions.
I see a few problems with your current code:
In the loop
for (k = j; k < size; k++)
{
stack[k] = stack[k + 1];
}
you go out of bounds because you use stack[k+1]. How would you fix that?
But then after you have moved all the elements down by 1, the new stack[j] may be another duplicate of stack[i]. How would you fix that? You might consider using a while loop.
You use a global variable size which is the stack size. But there is also a variable sp that is the stack pointer and indicates the part of the stack in use. So instead of looping over size you should loop over sp.
Note what the stack pointer points at: the value -1 means stack empty, so any other value points at the current value at the top of the stack. (This is important beause the other interpretation of the stack pointer is that it points at the next free element of the stack.)
This sp of course decreases with every duplicate you remove from the stack.

How to save stack and heap

How can I save (and restore) the stack and the heap of a program at a specific point in the program?
Consider a program like this:
int main()
{
int a;
int *b;
b = (int*)malloc(sizeof(int))
a = 1;
*b = 2;
save_stack_and_heap(); // looking for this...
a = 3;
*b = 4;
restore_stack_and_heap() // ... and this
printf("%d %d\n",a,*b);
return 0;
}
The output should be:
1 2
It boils down to (am I right?): How do I get the pointers to the stack and to the heap and their sizes?
edit
I want to use this for a number of things. One of them is to write code that can handle hardware failure by checkpointing and being able to restart at a checkpointed state.
Let's focus on the stack, as heap allocations can be tracked otherwise (good old malloc preload for instance).
The code should be reusable. There can be any possible number and type of variables on the stack.
The best would be standard C99. Next best Posix conform. Next best Linux conform.
I am usually using GCC but I would prefer not to use built ins...
int main()
{
int a = 1;
int *b = malloc(sizeof(int));
*b = 2;
if (fork() == 0) {
a = 3;
*b = 4;
return 0;
}
int status;
wait(&status);
printf("%d %d\n",a,*b);
return 0;
}
So you haven't given a lot of scope of what you are trying to achieve but I will try and tackle a few perspectives and at least something that can get you started.
It boils down to (am I right?): How do I get the pointers to the stack
and to the heap and their sizes?
The stack is a large thing, and often expandable in size. I'm going to skip the heap bit as you are going to struggle to save all the heaps (that kinda doesn't make any sense). Getting a pointer to the stack is as easy as declaring a variable and taking a reference to it like so.
int a = 5;
void *stack_ptr = &a;
void *another_stack_ptr = &stack_ptr;
// We could could go on forever with this....
That is not the base address of the stack however. If you want to find that there may be many methods, and even API's (I think there is on Windows). You can even just walk in both directions from an address on the stack until you get a page fault. That is likely to mark the beginning and end of the stack. The following might work, no guarantees. You will need to set up an exception handler to handle the page fault so your app doesn't crash.
int variable = 5;
int *stack_start = &variable;
int *stack_end = stack_start;
int *last_good_address = NULL;
// Setup an exception handler
...
// Try accessing addresses lower than the variable address
for(;;)
{
int try_read = stack_start[0];
// The read didn't trigger an exception, so store the address
last_good_address = stack_start
stack_start--;
}
// Catch exception
... stack_start = last_good_address
// Setup an exception handler
...
// Try accessing addresses higher than the variable address
for(;;)
{
int try_read = stack_end[0];
// The read didn't trigger an exception, so store the address
last_good_address = stack_end
stack_end--;
}
// Catch exception
... stack_end = last_good_address
So if you have the base and end address of the stack you can now memcpy it into some memory (I'd advise against the stack though!).
If you just want to copy a few variables, because copying the entire stack would be crazy, the conventional method would be to save them prior to a call
int a = 5;
int b = 6;
int c = 7;
// save old values
int a_old = a;
int b_old = b;
int c_old = c;
some_call(&a, &b, &c);
// do whatever with old values
I'll assume that you have written a function that has 10,000 variables on the stack, and you don't want to have to save them all manually. The following should work in this case. It uses _AddressOfReturnAddress to get the highest possible address for the current functions stack and allocates some stack memory to get the lowest current value. It then copies everything in between.
Disclaimer: This has not been compiled, and is unlikely to work out of the box, but I believe the theory is sound.
// Get the address of the return address, this is the highest address in the current stack frame.
// If you over-write this you are in trouble
char *end_of_function_stack = _AddressOfReturnAddress();
// Allocate some fresh memory on the stack
char *start_of_function_stack = alloca(16);
// Calculate the difference between our freshly allocated memory and the address of the return address
// Remember to subtract the size of our allocation from this to not include it in the stack size.
ptrdiff_t stack_size = (end_of_function_stack - start_of_function_stack) - 16);
// Calculation should not be negative
assert(stack_size > 0)
// Allocate some memory to save stack variables
void *save_the_stack = malloc(stack_size);
// Copy the variables
memcpy(save_the_stack, &start_of_function_stack[16], stack_size);
That's about all I can offer you with the limited information in your question.
I think you're looking to reuse the variable names a and b in this case? You should declare new variable of the same name on different scope!
int main()
{
int a=1;
int *b = (int*)malloc(sizeof(int));
*b=2;
{
int a=3;
int *b = (int*)malloc(sizeof(int));
*b=4
}//beware, other lang such as C# may persist stack variables after this point
//old a,b should be reachable here
}

C using malloc and duplicating array

I am supposed to follow the following criteria:
Implement function answer4 (pointer parameter and n):
Prepare an array of student_record using malloc() of n items.
Duplicate the student record from the parameter to the array n
times.
Return the array.
And I came with the code below, but it's obviously not correct. What's the correct way to implement this?
student_record *answer4(student_record* p, unsigned int n)
{
int i;
student_record* q = malloc(sizeof(student_record)*n);
for(i = 0; i < n ; i++){
q[i] = p[i];
}
free(q);
return q;
};
p = malloc(sizeof(student_record)*n);
This is problematic: you're overwriting the p input argument, so you can't reference the data you were handed after that line.
Which means that your inner loop reads initialized data.
This:
return a;
is problematic too - it would return a pointer to a local variable, and that's not good - that pointer becomes invalid as soon as the function returns.
What you need is something like:
student_record* ret = malloc(...);
for (int i=...) {
// copy p[i] to ret[i]
}
return ret;
1) You reassigned p, the array you were suppose to copy, by calling malloc().
2) You can't return the address of a local stack variable (a). Change a to a pointer, malloc it to the size of p, and copy p into. Malloc'd memory is heap memory, and so you can return such an address.
a[] is a local automatic array. Once you return from the function, it is erased from memory, so the calling function can't use the array you returned.
What you probably wanted to do is to malloc a new array (ie, not p), into which you should assign the duplicates and return its values w/o freeing the malloced memory.
Try to use better names, it might help in avoiding the obvious mix-up errors you have in your code.
For instance, start the function with:
student_record * answer4(const student_record *template, size_t n)
{
...
}
It also makes the code clearer. Note that I added const to make it clearer that the first argument is input-only, and made the type of the second one size_t which is good when dealing with "counts" and sizes of things.
The code in this question is evolving quite quickly but at the time of this answer it contains these two lines:
free(q);
return q;
This is guaranteed to be wrong - after the call to free its argument points to invalid memory and anything could happen subsequently upon using the value of q. i.e. you're returning an invalid pointer. Since you're returning q, don't free it yet! It becomes a "caller-owned" variable and it becomes the caller's responsibility to free it.
student_record* answer4(student_record* p, unsigned int n)
{
uint8_t *data, *pos;
size_t size = sizeof(student_record);
data = malloc(size*n);
pos = data;
for(unsigned int i = 0; i < n ; i++, pos=&pos[size])
memcpy(pos,p,size);
return (student_record *)data;
};
You may do like this.
This compiles and, I think, does what you want:
student_record *answer4(const student_record *const p, const unsigned int n)
{
unsigned int i;
student_record *const a = malloc(sizeof(student_record)*n);
for(i = 0; i < n; ++i)
{
a[i] = p[i];
}
return a;
};
Several points:
The existing array is identified as p. You want to copy from it. You probably do not want to free it (to free it is probably the caller's job).
The new array is a. You want to copy to it. The function cannot free it, because the caller will need it. Therefore, the caller must take the responsibility to free it, once the caller has done with it.
The array has n elements, indexed 0 through n-1. The usual way to express the upper bound on the index thus is i < n.
The consts I have added are not required, but well-written code will probably include them.
Altought, there are previous GOOD answers to this question, I couldn't avoid added my own. Since I got pascal programming in Collegue, I am used to do this, in C related programming languages:
void* AnyFunction(int AnyParameter)
{
void* Result = NULL;
DoSomethingWith(Result);
return Result;
}
This, helps me to easy debug, and avoid bugs like the one mention by #ysap, related to pointers.
Something important to remember, is that the question mention to return a SINGLE pointer, this a common caveat, because a pointer, can be used to address a single item, or a consecutive array !!!
This question suggests to use an array as A CONCEPT, with pointers, NOT USING ARRAY SYNTAX.
// returns a single pointer to an array:
student_record* answer4(student_record* student, unsigned int n)
{
// empty result variable for this function:
student_record* Result = NULL;
// the result will allocate a conceptual array, even if it is a single pointer:
student_record* Result = malloc(sizeof(student_record)*n);
// a copy of the destination result, will move for each item
student_record* dest = Result;
int i;
for(i = 0; i < n ; i++){
// copy contents, not address:
*dest = *student;
// move to next item of "Result"
dest++;
}
// the data referenced by "Result", was changed using "dest"
return Result;
} // student_record* answer4(...)
Check that, there is not subscript operator here, because of addressing with pointers.
Please, don't start a pascal v.s. c flame war, this is just a suggestion.

How to realloc an array inside a function with no lost data? (in C )

I have a dynamic array of structures, so I thought I could store the information about the array in the first structure.
So one attribute will represent the amount of memory allocated for the array and another one representing number of the structures actually stored in the array.
The trouble is, that when I put it inside a function that fills it with these structures and tries to allocate more memory if needed, the original array gets somehow distorted.
Can someone explain why is this and how to get past it?
Here is my code
#define INIT 3
typedef struct point{
int x;
int y;
int c;
int d;
}Point;
Point empty(){
Point p;
p.x=1;
p.y=10;
p.c=100;
p.d=1000; //if you put different values it will act differently - weird
return p;
}
void printArray(Point * r){
int i;
int total = r[0].y+1;
for(i=0;i<total;i++){
printf("%2d | P [%2d,%2d][%4d,%4d]\n",i,r[i].x,r[i].y,r[i].c,r[i].d);
}
}
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
}
void enter(Point* r,int c){
int i;
for(i=1;i<c;i++){
r[r[0].y+1]=empty();
r[0].y++;
if( (r[0].y+2) >= r[0].x ){ /*when the amount of Points is near
*the end of allocated memory.
reallocate the array*/
reallocFunction(r);
}
}
}
int main(int argc, char** argv) {
Point * r=(Point *) malloc ( sizeof ( Point ) * INIT );
r[0]=empty();
r[0].x=INIT; /*so here I store for how many "Points" is there memory
//in r[0].y theres how many Points there are.*/
enter(r,5);
printArray(r);
return (0);
}
Your code does not look clean to me for other reasons, but...
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
}
The problem here is that r in this function is the parameter, hence any modifications to it are lost when the function returns. You need some way to change the caller's version of r. I suggest:
Point * // Note new return type...
reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
return r; // Note: now we return r back to the caller..
}
Then later:
r = reallocFunction(r);
Now... Another thing to consider is that realloc can fail. A common pattern for realloc that accounts for this is:
Point *reallocFunction(Point * r){
void *new_buffer = realloc(r, r[0].x*2*sizeof(Point));
if (!new_buffer)
{
// realloc failed, pass the error up to the caller..
return NULL;
}
r = new_buffer;
r[0].x*=2;
r[0].y++;
return r;
}
This ensures that you don't leak r when the memory allocation fails, and the caller then has to decide what happens when your function returns NULL...
But, some other things I'd point out about this code (I don't mean to sound like I'm nitpicking about things and trying to tear them apart; this is meant as constructive design feedback):
The names of variables and members don't make it very clear what you're doing.
You've got a lot of magic constants. There's no explanation for what they mean or why they exist.
reallocFunction doesn't seem to really make sense. Perhaps the name and interface can be clearer. When do you need to realloc? Why do you double the X member? Why do you increment Y? Can the caller make these decisions instead? I would make that clearer.
Similarly it's not clear what enter() is supposed to be doing. Maybe the names could be clearer.
It's a good thing to do your allocations and manipulation of member variables in a consistent place, so it's easy to spot (and later, potentially change) how you're supposed to create, destroy and manipulate one of these objects. Here it seems in particular like main() has a lot of knowledge of your structure's internals. That seems bad.
Use of the multiplication operator in parameters to realloc in the way that you do is sometimes a red flag... It's a corner case, but the multiplication can overflow and you can end up shrinking the buffer instead of growing it. This would make you crash and in writing production code it would be important to avoid this for security reasons.
You also do not seem to initialize r[0].y. As far as I understood, you should have a r[0].y=0 somewhere.
Anyway, you using the first element of the array to do something different is definitely a bad idea. It makes your code horribly complex to understand. Just create a new structure, holding the array size, the capacity, and the pointer.

What's the difference between intializating a struct as pointer or not?

I have the following for my HashTable structure:
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashElement {
HashKey key;
HashValue value;
} HashElement;
typedef struct sHashTable {
HashElement *items;
float loadFactor;
} HashTable;
I never really thought about it until now but I just realized there's two ways how I can use this:
Alternative 1:
void hashInitialize(HashTable *table, int tabSize) {
table->items = malloc(sizeof(HashElement) * tabSize);
if(!table->items) {
perror("malloc");
exit(1);
}
table->items[0].key = "AAA";
table->items[0].value = 45;
table->items[1].key = "BBB";
table->items[1].value = 82;
table->loadFactor = (float)2 / tabSize;
}
int main(void) {
HashTable t1;
int i;
hashInitialize(&t1, HASHSIZE);
for(i = 0; i < HASHSIZE - 1; i++) {
printf("PAIR(%d): %s, %d\n", i+1, t1.items[i].key, t1.items[i].value);
}
printf("LOAD FACTOR: %.2f\n", t1.loadFactor);
return 0;
}
Alternative 2:
void hashInitialize(HashTable **table, int tabSize) {
*table = malloc(sizeof(HashTable));
if(!*table) {
perror("malloc");
exit(1);
}
(*table)->items = malloc(sizeof(HashElement) * tabSize);
if(!(*table)->items) {
perror("malloc");
exit(1);
}
(*table)->items[0].key = "AAA";
(*table)->items[0].value = 45;
(*table)->items[1].key = "BBB";
(*table)->items[1].value = 82;
(*table)->loadFactor = (float)2 / tabSize;
}
int main(void) {
HashTable *t1 = NULL;
int i;
hashInitialize(&t1, HASHSIZE);
for(i = 0; i < HASHSIZE - 1; i++) {
printf("PAIR(%d): %s, %d\n", i+1, t1->items[i].key, t1->items[i].value);
}
printf("LOAD FACTOR: %.2f\n", t1->loadFactor);
return 0;
}
Question 1: They both seem to produce the same result. On main, both examples print the right key/value pair. So, what exactly is the different between them besides the syntax change (using (*table) instead of just table), the extra code to allocate memory for the HashTable structure and the declaration of HashTable pointer?
I've been writing a few data structures lately like stacks, linked lists, binary search trees and now hash tables. And for all of them, I've always used the alternative 2. But now I'm thinking if I could have used alternative 1 and simplify the code, removing most of the * and & that are all over the place.
But I'm asking this question to understand the differences between the two methods and if, and also why, I should use on over the other.
Question 2: As you can see in the structures code, HashKey is a pointer. However, I'm not using strdup nor malloc to allocate space for that string. How and why is this working? Is this OK to do? I've always used malloc or strdup where appropriate when handling dynamic strings or I would get lots of segmentation faults. But this code is not giving me any segmentation faults and I don't understand why and if I should do it like this.
First both solutions are perfectly right !
Alternative 1 :
Your HashTable is declared in the main, which means the struct is somewhere in the call stack. The struct will be destroy if you leave the scope. Note : In your case that can't happen because the declaration is in the main so the scope ends on process exit.
Alternative 2:
You've got a HashTable* (pointer) in the call stack so you need to allocate the memory for the struct. To do so you use malloc.
In both case your struct is correctly allocated. The main difference will be on performances. It's far more performant to allocate on the stack but you can't do dynamic allocation. To do so you need to use malloc.
So, some times, you have to use malloc but try to avoid mallocing a lot if you want to do a high performance application.
Is that clear enough? :)
In alternative 1, the caller would allocate table but your function would allocate the contents thereof, which is not always a good idea in terms of memory management. Alternative 2 keeps all allocations in the same place.
As answered previously, the differences between the two alternatives is memory management. In alternative 1 you expect the caller to allocate the memory for table prior to the call; whereas, in alternative 2 just a pointer declaration is required to give you a place to put the memory after you've created it.
To question 2, the simple answer is that you are assigning a constant to the string. According to the following site the assignment is set up at compile time, not runtime.
http://publications.gbdirect.co.uk/c_book/chapter6/initialization.html
for question 2:
(*table)->items[0].key = "AAA";
actually puts "AAA" in read only parts of memory and char *key points to it, contents pointed by key cannot be changed.
(*table)->items[0].key[0]='a' gives and error
Here you can find further discussion about it.
What is the difference between char s[] and char *s?
The only difference is where the memory comes from -- local variables are typically on the stack whereas mallocs typically come from the heap.

Resources