Say I have the following struct to define list nodes:
struct node {
int data;
struct node* next;
};
And I have this function to get the length of a list:
int Length(struct node* head) {
struct node* current = head;
int count = 0;
while (current != NULL) {
count++;
current = current->next;
}
return count;
}
Why would I want to do this: struct node* current = head; instead of just iterating over the head?
So, why would this not be ok:
int Length(struct node* head) {
int count = 0;
while (head != NULL) {
count++;
head = head->next;
}
return count;
}
Doesn't the head lose the scope once it gets inside the Length function, and therefore even if we do head = head->next it won't be affected outside the function?
Thanks
Your two codes snippets are equivalent.
However, there's a school of thought that says that you should never modify function arguments, in order to avoid potential programming errors, and to enhance readability (you're not really modifying the head). To that end, you will often see people defining as many arguments as possible as const.
A smart compiler will do that anyway. Some people do it for clarity as head to them means the head of the list and current is just the iterator, it's just for readability.
The programmers I know all intuitively assume that the value of an argument which is passed by-value (such as the address referenced by a pointer) remain unchanged throughout the function. Due to this assumption, it's easy to introduce little bugs when extending the function. Imagine I wanted to print a little bit of debug information to your Length function:
int Length(struct node* head) {
int count = 0;
while (head != NULL) {
count++;
head = head->next;
}
printf( "Length of list at %p is %d\n", head, count );
return count;
}
The larger the function gets (or the more contrived the logic is, or the less attention the guy doing the modification is paying...), the easier this kind of issue can happen.
For short functions, such as Length, I personally consider it to be fine (I do it as well).
Related
I'm curious what base case can be used to recursively free a circular linked list, passing the head of the linked list as the only parameter. I originally thought a base case could be if (head->next == head) { return NULL; } could be sufficient to prevent head->next from pointing to itself, but this doesn't seem to be the case (literally and figuratively). The last node free(Head) is not being freed after the recursive calls here.
typedef struct node
{
int data;
struct node *next;
} node;
// temp stores the original head of the list
node *recursive_destroyer(node *head, node *temp)
{
if (head->next == temp)
return NULL;
recursive_destroyer(head->next, temp);
free(head);
head = NULL;
return NULL;
}
You asked about passing in one parameter
I think most people skipped your first sentence, and jumped to your code. You ask in your post:
I'm curious what base case can be used to recursively free a circular linked list, passing the head of the linked list as the only parameter. ...
You go on to explain the approach you tried:
I originally thought a base case could be if (head->next == head) { return NULL; } could be sufficient to prevent head->next from pointing to itself, but this doesn't seem to be the case ...
You provide a code example, but it passes in two parameters.
Remove head->next, not head
This answer is addressing the question in your first sentence. A short comparison with your approach will follow.
Checking to see if head->next points to head is a fine stopping case, but it means your recursive function needs to remove and destroy head->next at each iteration, and then recursively process the same list.
If head->next and head are the same, then destroy head, and you are done.
I don't see any point of returning a value from this function, so I removed it.
void recursive_destroyer(node *head) {
if (head == NULL) return;
if (head->next == head) {
destroy(head);
return;
}
node *tmp = head->next;
head->next = head->next->next;
destroy(tmp);
recursive_destroyer(head);
}
Notice there is no longer any need for a second parameter to the recursive function.
Comparison with your approach
There are some issues in your sample code that caused erroneous behavior. There are other answers that have addressed them with some depth. But, I did want to point out that you should prefer tail recursion whenever possible.
Tail recursion is a special case of a sibling call. A sibling call is when a function calls another function, and then immediately returns. In the example below, function_A() is making a sibling call to function_B()
void function_B () { puts(__func__); }
void function_A (bool flag) {
if (flag) {
function_B();
return;
}
puts(__func__);
}
A sibling call can be optimized by the compiler to reuse the stack frame of the current function to make the sibling call. This is because none of the current function state of the caller is needed after the sibling returns.
A tail recursive call can be optimized in the same way. Thus, the tail recursive call when optimized has the same memory footprint as an ordinary loop. And in fact, if the optimizer detects the sibling call is a recursive call, instead of performing a function call to itself, the tail recursion is converted into a jump to the start of the function. Most C compilers can perform this optimization. You can manually perform this optimization yourself, and easily convert a tail recursive function into a loop.
If you are using the optimization features of your C compiler, and it supports tail recursion optimization, then there is no technical reason to prefer a loop over tail recursion. If your software team finds reading recursive code confusing, then loops are preferred just to make the code easier to comprehend.
I wasn't setting the head equal to the recursive_destroyer function inside another function that frees the entire list. Here is that function:
LinkedList *destroy_list(LinkedList *list)
{
node *temp;
if (list == NULL)
return NULL;
if (list->head == NULL)
return NULL;
temp = list->head;
// was not setting list->head equal to this function.
// causing the original head to never become NULL
list->head = recursive_destroyer(list->head, temp);
if (list->head != NULL)
printf("Error, head not freed.. \n");
free(list);
return NULL;
}
Could have also passed a pointer to list->head to avoid setting list->head equal to the function.
Your code does not work. It will leave a single allocation intact.
Consider the circular linked list [1]. If you call recursive_destroyer(head, head), it will not free anything. The correct recursive code would be
void destroy_helper(node* const current, node* const original) {
if (current->next != original) destroy_helper(current->next, original);
free(current);
}
void destroy(node* const list) {
// null-check necessary since otherwise current->next is UB in destroy_helper
if (list) destroy_helper(list, list);
}
If we want to turn this into iterative code, we must first modify the destroy_helper function to be tail-recursive:
void destroy_helper(node* const current, node* const original) {
node* const next = current->next;
free(current);
if (next != original) destroy_helper(next, original);
}
which we can then rewrite to a loop:
void destroy(node* const list) {
if (list) {
node* current = list;
do {
node* next = current->next;
free(current);
current = next;
} while (current != list);
}
}
Edit:
To prove that my code is actually freeing everything, we can replace free with the following function:
void free_with_print(node* ptr) {
printf("Freeing node with value %d\n", ptr->data);
free(ptr);
}
A simple example:
int main() {
node* node1 = malloc(sizeof *node1);
node1->data = 1;
node1->next = node1;
node* node2 = malloc(sizeof *node2);
node2->data = 2;
node2->next = node1;
node1->next = node2;
destroy(node1);
}
using the iterative version prints
Freeing node with value 1
Freeing node with value 2
as expected. Trying the same thing with your original code prints
Freeing node with value 1
As expected, your code does not free one of the two nodes, while mine frees both.
For code like this you can (and should) do "single step debugging" in your head to convince yourself it should work as intended. This is a very important skill to learn.
Let's try 3 cases:
a) Imagine that the list is empty (head and temp are NULL). In this case it would crash at if (head->next == temp) due to trying to use a NULL pointer in head->next.
b) Imagine that the list has one item. In this case if (head->next == temp) is true because it's a circular linked list, so it returns from the first invocation without freeing anything.
c) Imagine the list has 2 items. In this case if (head->next == temp) is false for the first invocation and true for the second invocation; so the second invocation will free nothing and the first invocation will free the original head of the list.
We can extrapolate from that and say that the last item on the list is never freed (but the first item at the original head of the list will be freed if it's not also the last item).
To fix that you could always free the item, like:
if (head->next == temp) {
free(head);
return NULL;
}
This is messy because you're duplicating code (and could invert the condition to avoid that). It would also be easier to read if head always points to the original head and temp was temporary. Also (as mentioned in comments) there's no point returning NULL when it finishes. Refactoring the code gives you something like:
void recursive_destroyer(node *head, node *temp)
{
if (head->next != temp) {
recursive_destroyer(head, temp->next);
}
free(temp);
}
However; this still crashes if the list is originally empty. To fix that I'd do a wrapper function, like:
void recursive_destroyer(node *head) {
if(head != NULL) {
recursive_destroyer_internal(head, head);
}
}
static void recursive_destroyer_internal(node *head, node *temp)
{
The final problem is that recursion is bad (tends to be slower due to all the extra function calls, and has a risk of crashing when you run out of stack space, and often ends up harder for people to read); especially if/when the compiler can't do "tail call optimization" itself to convert it into a non-recursive loop. To fix that you shouldn't be using recursion. E.g.:
void destroy(node *head) {
node *original_head = head;
node *temp;
if(head != NULL) {
do {
temp = head;
head = head->next;
free(temp);
} while(head != original_head);
}
}
I'm just learning C, and I have a question about pointer parameters. My code is the following:
int Length(node *head)
{
int length = 0;
while (head) {
length++;
head = head->next;
}
return length;
}
The code in the book I'm reading says to do this though:
int Length(struct node* head)
{
struct node* current = head;
int count = 0;
while (current != NULL) {
count++;
current = current->next;
}
return count;
}
Is there really a difference? The way I'm reading my code is that I get a pointer to a node struct as a parameter. The pointer itself however, is a local variable that I am free to mutate as long as I don't dereference it first. Therefore, I can change the value of the pointer to point to a different node (the next node as it may be).
Will this cause a memory leak or is there some other difference I'm not seeing?
This code is for a linked list implementation. The node struct is defined as:
// Define our linked list node type
typedef struct node {
int data;
struct node *next;
} node;
Yes, they are both doing the same. But in the second example, it is more clear what the author is trying to do because of the code. In your first example, you're using the pointer head to reference nodes other than the head. That can be confusing.
You could write your function like this and your intend would be clear:
int GetLength(node* current)
{
int length = 0;
while (current != NULL)
{
length += 1;
current = current->next;
}
return length;
}
Your solution and reasoning is correct. The node argument is a local variable: a copy of the pointer passed to your function, allocated on the stack. That's why you can modify it from within the function.
There is no difference between the two solutions, at least not in functionality, modern compilers are most likely to optimize away the extra variable in the book's solution. The only slight difference is in style, many tend to take arguments as unmodifiable values just in case to avoid mistakes.
Your understanding of the argument-passing mechanics is correct. Some people simply prefer not to modify argument values, the reasoning being that modifying an argument tends to be bug-prone. There's a strong expectation that at any point in the function, if you want to get the value the caller passed as head, you can just write head. If you modify the argument and then don't pay attention, or if you're maintaining the code 6 months later, you might write head expecting the original value and get some other thing. This is true regardless of the type of the argument.
In a lot of examples I've read a simple getListLength() function would look something like this:
int getListLength(struct node *head)
{
struct node *temp = head;
int iCount = 0;
while (temp)
{
++iCount;
temp = temp->next;
}
return iCount;
}
What strikes me as unneeded is the declaration of a local pointer (in this case *temp) that copies the passed parameter. If I recall correctly, passed parameters obtain their own copies. Thus, there won't be a need for a local pointer that copies the *head just because the *head is a copy itself, right?
In other words, would it be correct to discard the *temp pointer and use head everywhere instead?
Yes, it's a copy, so yes, it would be correct.
int getListLength(struct node* head)
{
int iCount = 0;
while (head)
{
++iCount;
head = head->next;
}
return iCount;
}
Why don't you execute it and see for yourself?
While it's true that you don't need the local copy since the pointer is passed by value, it's probably there for stylistic reasons. Some consider it bad form to modify arguments passed in (though I do find it useful in some scenarios), but perhaps more importantly, you lose some of the self-documentation in the code; specifically, head no longer always points to the true head of the linked list. This isn't that confusing in your short piece of code, but having inaccurately-named variables can be much more confusing when the code is longer and more complex.
Often, the reason to make a local copy of a passed-in pointer is to reduce the side-effects of a function (by not modifying the function parameter).
If a function is only using the pointer to read (not write), and has no other interaction with the outside world, the function could be annotated as 'pure' in GCC and would be open for some nice optimizations.
Example:
__attribute__((pure)) int getListLength(struct node *head)
{
struct node *temp = head;
int iCount = 0;
while (temp)
{
++iCount;
temp = temp->next;
}
return iCount;
}
If you aren't familiar with what side effects are, try reading the Side Effects and Functional Programming Wikipedia articles to get more information on the subject.
I was trying to reverse a linked list, however whenever I execute the following function, I get only the last element. For example, if the list contained 11,12,13 earlier. After executing the function, it contains only 13. Kindly point out the bug in my code
void reverselist() {
struct node *a, *b, *c;
a = NULL;
b = c = start;
while (c != NULL) {
c = b->next;
b->next = a;
a = b;
b = c;
}
start = c;
}
Doesn't your loop guard insure that start is null?
If you aren't using start to identify the first element of the list, then the variable you ARE using is still pointing to what WAS the first element, which is now the last.
c is a helper pointer.
void reverselist()
{
struct node *a, *b, *c;
a=NULL;
b=start;
while(b!=NULL)
{
c=b->next
b->next=a;
a=b
b=c
}
start=a;
}
// You should assume that Node has a Node* called next that
// points to the next item in a list
// Returns the head of the reversed list if successful, else NULL / 0
Node *reverse( Node *head )
{
Node *prev = NULL;
while( head != NULL )
{
// Save next since we will destroy it
Node *next = head->next;
// next and previous are now reversed
head->next = prev;
// Advance through the list
prev = head;
head = next;
}
return previous;
}
I would have made a prepend function, and done the following:
struct node* prepend(struct node* root, int value)
{
struct node* new_root = malloc(sizeof(struct node));
new_root->next = root;
return new_root;
}
struct node* reverselist(struct node* inlist)
{
struct node* outlist = NULL;
while(inlist != NULL) {
struct node* new_root = prepend(outlist, inlist->value);
outlist = new_root;
inlist = inlist->next;
}
return outlist;
}
Have not tested this, but guess you grasp the idea of it. Might be just your variable names, which don't describe anything, but I think this approach is cleaner, and easier to understand what actually happens.
EDIT:
Got a question why I don't do it inplace, so I'll answer it here:
Can you do it inplace? Are you sure you don't wish to keep the
original list?
Do you need to do it inplace? Is the malloc to time consuming/is this a performance critical part of your code? Remember: premature optimization is the root of all evil.
Thing is, this is a first implementation. It should work, and not be optimized. It should also have a test written before this implementation is even thought of, and you should keep this slow, un-optimized implementation until the test passes, and you have proved that it's to slow for your use!
When you have a passing unit test, and proven the implementation to be to slow, you should optimize the code, and make sure it still passes the test, without changing the test.
Also, is it necessary inplace operations which is the answer? What about allocating the memory before reverting it, this way you only have one allocation call, and should hopefully get a nice performance boost.
This way everyone is happy, you have a cleaner code and avoid the risk of having Uncle Bob showing up at your door with a shotgun.
Is there any difference between these two functions? I mean in-terms of the result returned?
int Length(struct node* head) {
struct node* current = head;
int count = 0;
while (current != NULL) {
count++;
current = current->next;
}
return count;
}
and this function
int Length(struct node* head) {
int count = 0;
while (head != NULL) {
count++;
head = head->next;
}
return count;
}
They are the same. One uses a local 'current' variable to iterate over the list, while the other one uses the same variable that was received through the function arguments.
The returned value will be the same.
The former is the kind of code that would be written by a programmer subscribing to the style rule that says "it is bad practice to modify parameters because they are passed by value and would give the reader a false sense that the function would modify the corresponding argument."
Not necessarily bad advice. It makes the code a little longer, though, but it reads better. Many readers look at the second and have an initial reaction of "wait, what? changing the head? Oh... okay, no it's safe...."
No difference. The second version simply uses the function argument itself as a variable to work with in the body, and that's perfectly legitimate. In fact, it's even slightly more efficient than the first version which make a gratuitous copy.
You couldn't use the second version if the argument were declared const, i.e. int Length(struct node* const head) -- but since it isn't, you're free to use the argument variable for your own purposes.