sorting a linked list containing strings - c

So what I want to do is to sort an linked list containing only strings. To do so, I have 2 options.
Option 1 - dynamically allocate an array with the same size as the linked list and the strings containing it also with the same size, copy the contents of the linked list into the array and sort it using qsort.
Option 2 - implement a merge sort algorithm in order to sort it.
One of the problems is will it cost more memory and time if I do option 2 over option 1 or the option is the better?
My second problem is that I'm trying to do option 1 and to do so I have a header file which contains the code of the linked lists.
The problem is after allocating memory for the array of strings when I try to copy the contents I get segmentation fault.
Program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "Listas_ligadas_char.h"
int main() {
link_char head = NULL;
char **strings;
head = insertEnd_char(head, "fcb");
head = insertEnd_char(head, "bvb");
head = insertEnd_char(head, "slb");
head = insertEnd_char(head, "fcp");
int len = length_char(head);
int i = 0, j;
strings = (char **)malloc(sizeof(char *) * len);
link_char t;
t = head;
while (t != NULL && i <= len) {
strings[i] = (char *)malloc(sizeof(char) * (strlen(t->str) + 1));
strcpy(strings[i++], t->v.str)
t = t->next;
}
for (t = head; t != NULL; t = t->next) {
printf("* %s\n", strings[i]);
}
}
Header file:
#ifndef _Listas_ligadas_char_
#define _Listas_ligadas_char_
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
typedef struct node_char {
char *str;
struct node_char *next;
} *link_char;
link_char lookup_str(link_char head, char *str) {
link_char t;
for (t = head; t != NULL; t = t->next)
if (strcmp(t->str, str) == 0)
return t;
return NULL;
}
link_char NEW_str(char *str) {
int i;
link_char x = (link_char)malloc(sizeof(struct node_char));
x->str = (char *)malloc(sizeof(char) * (strlen(str) + 1));
strcpy(x->str, str);
x->next = NULL;
return x;
}
link_char insertEnd_char(link_char head, char *str) {
link_char x;
if (head == NULL)
return NEW_str(str);
for (x = head; x->next != NULL; x = x->next)
;
x->next = NEW_str(str);
return head;
}
int length_char(link_char head) {
int count = 0;
link_char x;
for (x = head; x != NULL; x = x->next)
count++;
return count;
}
void print_lista_char(link_char head, int NL) {
link_char t;
for (t = head; t != NULL; t = t->next) {
printf("%d * %s\n", NL, t->str);
}
}
void FREEnode_str(link_char t) {
free(t->str);
free(t);
}
link_char delete_el_char(link_char head, char *str) {
link_char t, prev;
for (t = head, prev = NULL; t != NULL;
prev = t, t = t->next) {
if (strcmp(t->str, str) == 0) {
if (t == head)
head = t->next;
else
prev->next = t->next;
FREEnode_str(t);
break;
}
}
return head;
}
#endif
btw if you are wondering what NL is, NL is a variable to count the respective line of the stdin and what I only want is to print the array, I don't want to keep its elements.
So if you can tell what option you think is the best I would appreciate it a lot.

Option 1 - dynamically allocate an array with the same size as the linked list and the strings containing it also with the same size, copy the contents of the linked list into the array and sort it using qsort.
It is not necessary to convert the linked list to an array. The quicksort algorithm can also be applied to linked lists.
However, since your linked list is only singly-linked, you cannot use the (generally more efficient) Hoare partition scheme, but must use the Lomuto partition scheme instead. This is because the Hoare partition scheme requires the ability to traverse the linked list backwards (which requires a doubly-linked list).
Even if it is not necessary to convert the linked list to an array for the quicksort algorithm, this may still be meaningful, as a linked list has worse spacial locality than an array. Either way, the average time complexity of the algorithm will be O(n*log n) and the worst-case time complexity will be O(n^2).
But since your nodes only contain pointers to strings, you will have bad spacial locality anyway when dereferencing these pointers. So in this case, it may not be very helpful to convert the linked list to an array, because that would only improve the spacial locality of the pointers to the strings, but not of the strings themselves.
One of the problems is will it cost more memory and time if i do option2 over option1 or the option is the better?
Merge-sort is ideal for linked lists.
Another advantage of merge-sort is its worst-case time complexity, which is O(n*log n), whereas it is O(n^2) with quicksort.
Merge-sort has a space complexity of O(1) for linked lists, whereas quicksort has a space complexity of O(log n). However, if you decide to convert the list to an array for quicksort, the space complexity of your algorithm will increase to O(n) ).
My second problem is that im trying to do option 1 and to do so i have an header file which contains the code of the linked lists. The problem is after allocating memory for the array of strings when i try to copy the contents i get segmentation fault.
I can only help you if you provide a minimal reproducible example of your problem. The code you posted does not reproduce the problem. It does not even compile. The following line contains several errors:
strcpy(strings[i++],t->v.str)

You indeed have 2 sensible options:
option 1 will usually provide the best performance but requires additional space of sizeof(link_char) * N.
option 2 will only require O(log(N)) stack space for pending sublists using bottom-up mergesort or similar space complexity for recursive top-down mergesort. The drawback is you have to write the sorting function yourself and it is easy to make mistakes.
Note that for option 1, you should not make a copy of the strings, but just allocate an array of pointers and initialize it to point to the nodes themselves. This way you can preserve the node structures that could contain other information and avoid extra allocations.
Note also that once you have the array of node pointers and a comparison function, you can use qsort or other sorting functions such as timsort or mergesort which may be more appropriate in terms of worst case time complexity.
There are multiple problems in your implementation:
the loop test while (t != NULL && i <= len) is incorrect. the tests should be redundant, but if you insist on testing i, it should be i < len or you might access beyond the end of the string array if length_char returned an incorrect count.
strcpy(strings[i++], t->v.str) has a syntax error, you probably mean strcpy(strings[i++], t->str);
the printing loop has undefined behavior because you do not reset i to 0 nor do you increment i in the loop body, so you pass strings[i] for all calls to printf and i should be len, so strings[i] accesses beyond the end of the allocated array. You might get a crash or an invalid pointer or by chance a null pointer that printf might ignore... It should be:
for (i = 0; i < len; i++) {
printf("* %s\n", strings[i]);
}
Here is a modified version:
#include <stdio.h>
#include <stdlib.h>
#include "Listas_ligadas_char.h"
int cmp_char(const void *aa, const void *bb) {
link_char a = *(const link_char *)aa;
link_char b = *(const link_char *)bb;
return strcmp(a->str, b->str);
}
link_char sort_char(link_char head) {
if (head != NULL && head->next != NULL) {
size_t i, len = length_char(head);
link_char *array = malloc(sizeof(*array) * len);
link_char t = head;
for (i = 0; i < len; i++, t = t->next)
array[i] = t;
qsort(array, len, sizeof(*array), cmp_char);
head = t = array[0];
for (i = 1; i < len; i++)
t = t->next = array[i];
t->next = NULL;
free(array);
}
return head;
}
int main() {
link_char head = NULL;
head = insertEnd_char(head, "fcb");
head = insertEnd_char(head, "bvb");
head = insertEnd_char(head, "slb");
head = insertEnd_char(head, "fcp");
head = sort_char(head);
for (link_char t = head; t != NULL; t = t->next) {
printf("* %s\n", strings[i]);
}
return 0;
}
Notes:
it is error prone to hide pointers behind typedefs. You should define node_char as typedef struct node_char node_char and use node_char * everywhere.
it is unconventional to define the list functions in the header file. You might do this for static inline functions, but the global functions should not be defined in the header file as this will cause name clashes if multiple modules include this header file and get linked together.

Related

CS50 - LOAD - Get random character from no where when trying to execute load

I am new to C programming. I am trying to do the pset5 in CS50 while trying to understand the concepts of memory, linked list and hashtable. I wrote the code and it compiled but there seems to be something wrong because every time I tried to execute the code it returns some garbage value. Could anyone please help me with that? Many thanks.
#include<stdio.h>
#include<stdlib.h>
#include<ctype.h>
#include<string.h>
#include "dictionary.h"
#define DICTIONARY "dictionaries/small"
typedef struct node
{
char WORD[LENGTH + 1];
struct node *next;
}
node;
int hash(char *word);
int main(void)
{
node **HASHTABLE = malloc(sizeof(node) * 26);
//open the dictionary
FILE *dic = fopen(DICTIONARY, "r");
if (dic == NULL)
{
fprintf(stderr, "Could not open the library\n");
return 1;
}
int index = 0;
char word[LENGTH + 1];
for (int c = fgetc(dic); c != EOF; c = fgetc(dic))
{
word[index] = c;
index++;
if (c == '\n')
{
int table = hash(word);
printf("%d\n", table);
//create a newnode
node *newnode = malloc(sizeof(node));
strcpy(newnode->WORD, word);
newnode->next = NULL;
printf("Node: %s\n", newnode->WORD);
index = 0;
//add new node to hash table
if (HASHTABLE[table] == NULL)
{
HASHTABLE[table] = newnode;
}
else
{
HASHTABLE[table]->next = newnode;
}
}
}
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
printf("%s", p->WORD);
p = p->next;
}
}
//free memory
for(int i = 0; i < 26; i++)
{
node *p = HASHTABLE[i];
while (p != NULL)
{
node *temp = p->next;
free(p);
p = temp;
}
}
free(HASHTABLE);
}
int hash(char *word)
{
int i = 0;
if (islower(word[0]))
return i = word[0] - 'a';
if (isupper(word[0]))
return i = word[0] - 'A';
return 0;
}
Your code has serious problems that result in undefined behavior.
Two of them are the result of this line:
node **HASHTABLE = malloc(sizeof(node) * 26);
That allocates 26 node structures, but the HASHTABLE variable expects the address of a pointer to an array of node * pointers (that's the ** in the node **HASHTABLE declaration).
So, you should replace it with something like:
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
Note that I used the dereferenced value of the variable being assigned to - HASHTABLE. This means in this case a node (one less * than in the declaration). So if the type of HASHTABLE changes, you don't need to make any other changes to the malloc() statement.
That problem, while technically undefined behavior, likely wouldn't cause any problems.
However, there's still a problem with
node **HASHTABLE = malloc( 26 * sizeof( *HASHTABLE ) );
that will cause problems - and serious ones.
That array of 26 pointers isn't initialized - you don't know what's in them. They can point anywhere. So this won't work well, if at all:
if (HASHTABLE[table] == NULL)
Meaning this points off to somewhere unknown:
HASHTABLE[table]->next = newnode;
And that will cause all kinds of problems.
The simplest fix? Initialize the values all to zero by using calloc() instead of malloc():
node **HASHTABLE = calloc( 26, sizeof( *HASHTABLE ) );
Until that's fixed, any results from your entire program are questionable, at best.
The reason for the garbage is that you didn't null-terminate the string:
strcpy(newnode->WORD, word);
strcpy expects the src to point to a null-terminated string. Simply adding 0 at the end. Simply terminate it with
word[index] = 0;
before the strcpy.
Other than that, the ones in Andrew Henle's answer should be addressed too, but I am not going to repeat them here.
BTW, next you will notice that
HASHTABLE[table]->next = newnode;
wouldn't work properly - that code always inserts the node as the 2nd one. But you want to always insert the new node unconditionally as the head, with
newnode->next = HASHTABLE[table];
HASHTABLE[table] = newnode;
There need not be any special condition for inserting the first node to a bucket.

Sorting and merging multiple linked lists with sorted sub-sections

I have an array of numlists linked lists. Nodes in the lists are of the form:
struct Edge
{
int64_t blocknum;
int64_t location;
struct Edge *next;
};
typedef struct Edge edge;
I need to merge all the lists into a single linked list which is sorted by location in ascending order. Each list consists of blocks for which nodes have equal blocknum, and each of these blocks is already sorted. List blocks with larger values of blocknum have all of their location values larger than blocks with smaller blocknum. blocks in the sublists are already sorted in order of blocknum locally. Which means, practically, that this boils down to sorting blocks by blocknum in ascending order, and I don't have to worry too much about location since that will take care of itself. You may assume that the next member of an array is either valid and allocated, or explicitly declared NULL.
Here is the function I came up with
edge *sort_edges(edge **unsorted, int numlists)
{
edge *sorted_head = NULL;
edge *sorted_current = NULL;
edge *current_edge = NULL;
edge *temp = NULL;
int64_t blocknum;
int i;
int64_t minblock;
int remaining = numlists;
int first = 1;
int minblock_index;
while(remaining) //while there are still more lists to process
{
minblock = LLONG_MAX;
temp = NULL;
minblock_index = INT_MAX;
remaining = numlists;
for (i=0; i<numlists; i++) //loop over the list of head nodes to find the one with the smallest blocknum
{
if (!unsorted[i]) //when a lists is exhausted the lead node becomes NULL, and we decrement the counter
{
remaining--;
}
else //a simple minimum finding algorithm
{
current_edge = unsorted[i];
if (current_edge->blocknum < minblock)
{
temp = current_edge;
minblock = current_edge->blocknum;
minblock_index = i;
}
}
}
if (remaining == 0)
{
break;
}
if (first) //if we have not yet set up the head of the list, we have to save a pointer to the head
{
sorted_head = temp;
sorted_current = sorted_head;
first = 0;
}
else
{
sorted_current->next = temp;
}
blocknum = sorted_current->blocknum;
while (sorted_current->blocknum == blocknum && sorted_current->next) //skip through to the end of the block so that the next section we append will go on the end
{
sorted_current = sorted_current->next;
}
unsorted[minblock_index] = sorted_current->next; //reset the head of the unsorted list to the node after the block
}
return sorted_head;
}
This works. My question is:
Can I do better in terms of an efficient sorting algorithm? (Almost certainly yes, I'm just curious what people come up with for a sorting problem with the given assumptions).
If by "block" you mean the list hanging off from each pointer in the pointer array, then
int compare_edge_blocknum(const void *e1, const void *e2)
{
if (!e1 && !e2)
return 0;
else
if (!e1)
return +1;
else
if (!e2)
return -1;
else {
const int64_t b1 = ((edge *)e1)->blocknum;
const int64_t b2 = ((edge *)e2)->blocknum;
return (b1 < b2) ? -1 :
(b1 > b2) ? +1 : 0;
}
}
edge *last_in_list(edge *list)
{
if (list)
while (list->next)
list = list->next;
return list;
}
edge *sort_edges(edge **array, size_t count)
{
edge root = { 0, 0, NULL };
edge *tail = &root;
size_t i;
if (!array || count < 1)
return NULL;
if (count == 1)
return array[0];
qsort(array, count, sizeof *array, compare_edge_blocknum);
for (i = 0; i < count; i++)
if (array[i]) {
tail->next = array[i];
tail = last_in_list(array[i]);
}
return root->next;
}
The above uses qsort() to sort the array of pointers, according to blocknum. We use root as a handle to the resulting list. We loop over the array of pointers, appending each non-NULL pointer to the tail of the result list, with tail always updated to point to the final element of the list.
Traversing each list to find the tail element is probably the slow part here, but unfortunately I don't see any way to avoid it. (If the list elements are not consecutive in memory, the list traversal tends to require many cache loads from RAM. The access patterns when the array is sorted are much easier for the CPU to predict (on current architectures), so the array sort part is probably not the slowest part -- but of course you can profile the code with a practical data set, and consider whether you need a faster sort implementation than the C library qsort().)
OP clarified that each individual list hanging off a pointer in the pointer array may contain one or more "blocks", i.e. consecutive sorted runs. These can be detected by the changing blocknum.
If additional memory use is not an issue, I'd create an array of
typedef struct {
int64_t blocknum;
edge *head;
edge *tail;
} edge_block;
which then gets sorted by blocknum, and finally chained. Saving pointers to both the first (head) and last (tail) element means we only scan the lists once. After the edge_block array is sorted, a simple linear pass over it is enough to chain all the sublists into a final list.
For example (only compile-tested):
#include <stdlib.h>
#include <stdint.h>
#include <errno.h>
typedef struct Edge edge;
struct Edge {
int64_t blocknum;
int64_t location;
struct Edge *next;
};
typedef struct {
int64_t blocknum;
struct Edge *head;
struct Edge *tail;
} edge_block;
static int cmp_edge_block(const void *ptr1, const void *ptr2)
{
const int64_t b1 = ((const edge_block *)ptr1)->blocknum;
const int64_t b2 = ((const edge_block *)ptr2)->blocknum;
return (b1 < b2) ? -1 :
(b1 > b2) ? +1 : 0;
}
edge *sort_edges(edge **array, size_t count)
{
edge_block *block = NULL;
size_t blocks = 0;
size_t blocks_max = 0;
edge *root, *curr;
size_t i;
if (count < 1) {
errno = 0;
return NULL;
}
if (!array) {
errno = EINVAL;
return NULL;
}
for (i = 0; i < count; i++) {
curr = array[i];
while (curr) {
if (blocks >= blocks_max) {
edge_block *old = block;
if (blocks < 512)
blocks_max = 1024;
else
if (blocks < 1048576)
blocks_max = ((blocks * 3 / 2) | 1023) + 1;
else
blocks_max = (blocks | 1048575) + 1048577;
block = realloc(block, blocks_max * sizeof block[0]);
if (!block) {
free(old);
errno = ENOMEM;
return NULL;
}
}
block[blocks].blocknum = curr->blocknum;
block[blocks].head = curr;
while (curr->next && curr->next->blocknum == block[blocks].blocknum)
curr = curr->next;
block[blocks].tail = curr;
blocks++;
curr = curr->next;
}
}
if (blocks < 1) {
/* Note: block==NULL here, so no free(block) needed. */
errno = 0;
return NULL;
}
qsort(block, blocks, sizeof block[0], cmp_edge_block);
root = block[0].head;
curr = block[0].tail;
for (i = 1; i < blocks; i++) {
curr->next = block[i].head;
curr = block[i].tail;
}
free(block);
errno = 0;
return root;
}
If there are potentially very many blocknums, or you need to limit the amount of memory used, then I'd use a small min-heap of
typedef struct {
size_t count;
edge *head;
edge *tail;
} edge_block;
elements, keyed by count, the number of elements in that sublist.
The idea is that whenever you extract a block from the input, you add it to the min-heap if there is room; otherwise, you merge it with the root list in the min-heap. Note that according to OP's rules, this "merging" is actually a single insert, as each block is consecutive; only the insertion point needs to be found first. The count is updated to reflect the number of elements in the root list, and thus you re-heapify the min-heap.
The purpose of the heap is to ensure that you merge the two shortest blocks, keeping the traversal of the lists to find the insertion point to a minimum.
When all blocks have been inserted, you take the root, merge that list with the new root list, and re-heapify, decrementing the size of the heap by one each time, until you have a single list left. That is the final result list.
So as I understand it you have multiple sorted lists and you want to merge them together to create a single sorted list.
A common way to do this is to create a queue of lists and continually merge pairs, adding the result back to the queue, and repeating until there is only one list left. For example:
listQueue = queue of lists to be merged
while listQueue.count > 1
{
list1 = listQueue.dequeue
list2 = listQueue.dequeue
newList = new list
// do standard merge here
while (list1 != null && list2 != null)
{
if (list1.item <= list2.item)
{
newList.append(list1.item)
list1 = list1.next
}
else
{
newList.append(list2.item)
list2 = list2.next
}
}
// clean up the stragglers, if any
while (list1 != null)
{
newList.append(list1.item)
list1 = list1.next
}
while (list2 != null)
{
newList.append(list2.item)
list2 = list2.next
}
listQueue.enqueue(newList)
}
mergedList = listQueue.dequeue
This is an attractive option because it's simple and requires very little additional memory, and it's reasonably efficient.
There is a potentially faster way that requires a little more memory (O(log k), where k is the number of lists), and requires a bit more coding. It involves creating a min-heap that contains the first item from each list. You remove the lowest item from the heap, add it to the new list, and then take the next item from the list that the lowest item was from, and insert it into the heap.
Both of those algorithms are O(n log k) complexity, but the second is probably faster because it doesn't move data around as much. Which algorithm you want to use will depend on how large your lists are and how often you do the merge.

How to make my hashing algorithm faster

My question is connected with task from CS50, pset5. For ones who don't know any about that, I'll try to explain. Nothing very special. I just need to make function which will intake dictionary file (it was written before, all of the words in that file are uppercase), which contains more over 20K words, and sort them somehow. I've made simple and naive algorithm, building hash-table, which sort words, depending on the theirs first letters. And I've passed all checks by the CS50, so my program is working well. But comparing to the course's one - it is too slow. Time of executing for personnel's is 0.1s, but for mine - 5.0s - 7.0s. What can I improve in this code to make it faster? Or should I totally change everything? I have no experience in optimization, `cause just started learning. It would be great to study from any of you =) Thanks in advance!
// Some constant values, which are declared before the function
#define LENGTH 46
#define ALPHALENGTH 26
/* Definition of node struct. Nothing special, in fact =) */
typedef struct node {
char word[LENGTH +1];
struct node *next;
} node;
node *hashTable[ALPHALENGTH];
bool load(const char *dictionary) {
FILE *f = fopen(dictionary, "r");
if (f == NULL) {
return false;
}
char word[LENGTH + 1];
int hash = 0;
for (int i = 0; i < ALPHALENGTH; i++) {
hashTable[i] = NULL;
}
// 46 - is LENGTH, but for some reason it is impossible
// to put variable`s name between quotation marks
while (fscanf(f, "%46s", word) == 1) {
// make every letter lowercase to get result from 0 - 25
hash = tolower(word[0]) - 'a';
node *new_node = malloc(sizeof(node));
strcpy(new_node->word, word);
// check if element is first in the list
if (hashTable[hash] == NULL) {
new_node->next = NULL;
hashTable[hash] = new_node;
} else {
node *ptr = hashTable[hash];
do {
if (ptr->next == NULL) {
break;
}
ptr = ptr->next;
} while (true);
ptr->next = new_node;
new_node->next = NULL;
}
}
fclose(f);
return true;
}
Your problem isn't your hash function; it's that your hash table is way too small.
From the sound of things, you have about 26 hash buckets for over 20,000 words. This places between 750 and 1000 words in each bucket. (Probably much more in some, as the hash function you're using is not uniform. There are very few words that start with x or q, for instance.)
Try expanding the hash table to 1000 entries (for instance), so that there are around 20 entries in each bucket. You will need a new hash function to do this; anything will work, but to work well it will need to generate values up to the size of the table. (Adding together the values of all the letters won't work, for instance, as it'll almost never reach 1000.)
The problem is not in your hash function, nor in the size of your hash table, it is in your list management: your method for appending words to the corresponding lists has a complexity of O(N2).
By the way, your hash function is not used for hashing, but for dispatching. You are sorting the table only on the first letter of each word, keeping the words with the same initial in the same order. If you meant to sort the dictionary completely, you would still need to sort each list.
You can drastically improve the performance while keeping the same semantics by prepending to the lists and reversing the lists at the end of the parsing phase.
For a dictionary with 20 thousand words, the code below runs 50 times faster (as expected by the CS50 site):
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LENGTH 46
#define ALPHALENGTH 26
typedef struct node {
struct node *next;
char word[LENGTH +1];
} node;
node *hashTable[ALPHALENGTH];
bool load(const char *dictionary) {
FILE *f = fopen(dictionary, "r");
if (f == NULL) {
return false;
}
char word[LENGTH + 1];
int hash = 0;
for (int i = 0; i < ALPHALENGTH; i++) {
hashTable[i] = NULL;
}
while (fscanf(f, "%46s", word) == 1) {
node *new_node = malloc(sizeof(node));
if (new_node == NULL)
return false;
// make every letter lowercase to get result from 0 - 25
hash = tolower(word[0]) - 'a';
strcpy(new_node->word, word);
/* prepending to the list */
new_node->next = hashTable[hash];
hashTable[hash] = new_node;
}
for (int i = 0; i < ALPHALENGTH; i++) {
node *n, *prev, *next;
/* reverse list */
for (prev = NULL, n = hashTable[i]; n != NULL; ) {
next = n->next;
n->next = prev;
prev = n;
n = next;
}
hashTable[i] = prev;
}
fclose(f);
return true;
}
void save(void) {
for (int i = 0; i < ALPHALENGTH; i++) {
for (node *n = hashTable[i]; n != NULL; n = n->next) {
puts(n->word);
}
}
}
int main(int argc, char *argv[]) {
if (argc > 1) {
if (load(argv[1]))
save();
}
}
Changing the fscanf() to a simpler fgets() might provide a marginal performance improvement, at the cost of more restrictive semantics for the dictionary format.

Selection sort with linked list's

I have the following data structure:
struct scoreentry_node {
struct scoreentry_node *next;
int score;
char name[1];
};
typedef struct scoreentry_node *score_entry;
I am trying to create a function that consumes my structure in order and arranges them in ascending order based on the name. I want to modify the input without allocating any memory or freeing anything:
I've tried your suggestions:
void selectionsort(score_entry *a) {
for (; *a != NULL; *a = (*a)->next) {
score_entry *minafteri = a;
// find position of minimal element
for (score_entry j = (*a)->next; j != NULL; j = j->next) {
if (strcmp(j->name, (*minafteri)->name) == -1) {
*minafteri = j;
}
}
// swap minimal element to front
score_entry tmp = *a;
a = minafteri;
*minafteri = tmp;
}
}
I'm testing the above code with the following:
score_entry x = add(8, "bob", (add( 8 , "jill", (add (2, "alfred", NULL)))));
iprint("",x);
selectionsort(&x);
iprint("", x);
clear(x); //Frees the whole list
iprint() prints the score and name fields in the struct. My add function is as follows:
score_entry add(int in, char *n, score_entry en) {
score_entry r = malloc(sizeof(struct scoreentry_node) + strlen(n));
r->score = in;
strcpy(r->name, n);
r->next = en;
return r;
}
I'm getting heap errors and my second print doesn't print the sorted list, it prints nothing. What am I doing wrong, and what can I do to fix it?
besides passing the pointer by address (see comments below) you also need to fix the way you swap elements too
void selectionsort(score_entry *a) {
for (; *a != NULL; *a = (*a)->next)
{
score_entry *minafteri = a;
// find position of minimal element
for (score_entry j = (*a)->next; j != NULL; j = j->next) {
if (strcmp(j->name, (*minafteri)->name) == -1) {
*minafteri = j;
}
}
// swap minimal element to front
score_entry tmp = *a;
a = minafteri; // put the minimal node to current position
tmp->next = (*a)->next ; //fix the links
(*minafteri)->next=tmp; //fix the links
}
}
You have to pass the argument to selectionsort by reference:
void selectionsort(score_entry *a) {
for (; *a != NULL; *a = (*a)->next)
{
score_entry *minafteri = a;
// find position of minimal element
for (score_entry j = (*a)->next; j != NULL; j = j->next) {
if (strcmp(j->name, (*minafteri)->name) == -1) {
*minafteri = j;
}
}
// swap minimal element to front
score_entry tmp = *a;
a = minafteri;
*minafteri = tmp;
}
}
This code is hideous! Not only have you not provided us with all of the necessities to reproduce your problem (we can't compile this!), but you've hidden pointer abstractions behind typedef (also a nightmare for us). Generally speaking, one shouldn't even use linked lists in C anymore let alone sort them...
Nonetheless, there are two answers here.
*minafteri = j; found within your find loop actually modifies your list! Why should your find loop be modifying your list?
Answer: it shouldn't! By instead assigning minafteri = &j->next, you won't be modifying the list with your find loop...
Alternatively, you could perform the swap inside of that loop.
*minafteri = j; would need to swap the following, in this order:
(*minafteri)->next and j->next
*minafteri and j
Do you think that single line of code is capable of performing those two swaps? Well, it gets half way through one of them... and removes a heap of elements from your list in the process!
The following appears to be a faulty attempt swapping elements:
score_entry *minafteri = a; // half of assigning `a` to `a`
/* SNIP!
* Nothing assigns to `minafteri` in this snippet.
* To assign to `minafteri` write something like `minafteri = fubar;` */
score_entry tmp = *a; // half of assigning `*a` to `*a`
a = minafteri; // rest of assigning `a` to `a`
*minafteri = tmp; // rest of assigning `*a` to `*a`
It's really just assigning *a to *a and a to a... Do you think you need to do that?
I'd have thought you'd notice that when you were creating your MCVE... Ohh, wait a minute! Shame on you!
Focus on swapping two nodes within a list as a smaller task. Once you've done that, consider taking on this task.
There are multiple problems in your code:
if (strcmp(j->name, (*minafteri)->name) == -1) { is incorrect: strcmp() does not necessarily return -1 when the first string is less than the second, it can return any negative value.
The way you adjust the links in order to move the lower entry is incorrect: you cannot update the link from the previous node to the one you move to the start. The list is corrupted.
Here is an improved version:
void selectionsort(score_entry *a) {
for (; *a != NULL; a = &(*a)->next) {
// find position of minimal element
score_entry *least = a;
for (score_entry *b = &(*a)->next; *b != NULL; b = &(*b)->next) {
if (strcmp((*b)->name, (*least)->name) < 0) {
least = b;
}
}
if (least != a) {
// swap minimal element to front
score_entry n = *least;
*least = n->next; /* unlink node */
n->next = *a; /* insert node at start */
*a = n;
}
}
}
Here is the Java Implementation of Selection Sort on Linked List:
Time Complexity: O(n^2)
Space Complexity: O(1) - Selection sort is In-Place sorting algorithm
class Solution
{
public ListNode selectionSortList(ListNode head)
{
if(head != null)
{
swap(head, findMinimumNode(head));
selectionSortList(head.next);
}
return head;
}
private void swap(ListNode x, ListNode y)
{
if(x != y)
{
int temp = x.val;
x.val = y.val;
y.val = temp;
}
}
private ListNode findMinimumNode(ListNode head)
{
if(head.next == null)
return head;
ListNode minimumNode = head;
for(ListNode current = head.next; current != null; current = current.next)
{
if(minimumNode.val > current.val)
minimumNode = current;
}
return minimumNode;
}
}

How do I sort a linked list of structures by one of the fields?

Wow now i know I dont. Lol.
I've got my structure like this:
struct Medico{
int Id_Doctor;
int Estado;
char Nombre[60]; ////focus on this part of the structure, this is name.
char Clave_Acceso[20];
char Especialidad[40];
struct Medico *next;
};
And I want to organize the structure depending on the name(alphabetical order..) any ideas on how to tackle this problem?
for example
Albert Haynesworth
Bob Marley
Carl Johnson
Thank you very much in advanced. :)(C, Unix)
Implementing a mergesort over a linked list in C is quite easy:
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
struct node {
struct node *next;
char *data;
};
struct node *
divlist (struct node *n) {
int i = 0;
if (n) {
struct node *tail, *n2 = n;
while (1) {
n2 = n2->next;
if (!n2) break;
if (i++ & 1) n = n->next;
}
tail = n->next;
n->next = NULL;
return tail;
}
return NULL;
}
struct node *
mergelists(struct node *a, struct node *b) {
struct node *n;
struct node **last = &n;
if (!a) return b;
if (!b) return a;
while (1) {
if (strcmp(a->data, b->data) > 1) {
*last = b;
last = &b->next;
b = b->next;
if (!b) {
*last = a;
break;
}
}
else {
*last = a;
last = &a->next;
a = a->next;
if (!a) {
*last = b;
break;
}
}
}
return n;
}
struct node *
sortlist (struct node *n) {
struct node *tail = divlist(n);
if (!tail) return n;
return mergelists(sortlist(n), sortlist(tail));
}
int main(int argc, char *argv[]) {
int i;
struct node *n1, *n = NULL;
for (i = argc; --i >= 1;) {
n1 = (struct node *)malloc(sizeof(*n1));
n1->data = argv[i];
n1->next = n;
n = n1;
}
n1 = n = sortlist(n);
while (n1) {
printf("%s\n", n1->data);
n1 = n1->next;
}
return 0;
}
Note that you will have to modify this code to use your data structure and the right comparison!
C can't sort for you, nor maintain a sorted data structure. As others have suggested, you need to sort it yourself. I would do this when you create a new Medico, since inserting into a linked list is easy, and you can just find where it belongs as you iterate.
If Medico's order needs to be different, than you will need to sort the list whenever you display it. You'll probably want to iterate to pull out every name, then sort the resultant array using any of a number of techniques (depending on the size).
Assuming the list order is otherwise of no concern, keep it in order.
Sounds like you want to look at implementations of either quicksort or mergesort. I believe that the c std lib qsort implementation takes an array and not a linked list, so you may need to implement your own (although I'm pretty sure that you could find a readily available implementation on the interwebz if you did a quick search)
If you want to sort an array of structures, you can use the qsort function, see man qsort. It takes a base address of the array, number of elements, element size and comparing function:
int compare(const void *a, const void *b) {
Medico *medA = (Medico*) a;
Medico *medB = (Medico*) b;
return /* compare medA and medB */;
}
Medico *medicos = /* initialize */;
qsort(medicos, numberOfMedicos, sizeof(Medico), compare);
D’oh, just now I noticed the next-record pointer that probably makes this answer useless. (I’ve changed the question title to make the linked list apparent.) To make at least something from this answer, you can always copy the list into an array:
Medico *medicos = calloc(sizeof(Medico), numberOfMedicos);
Medico *current = /* first record in your linked list */;
int i = 0;
assert(current);
do {
medicos[i++] = *current;
current = current->next;
} while (current);
// Here you can sort the array.
free(medicos);
Of course, it depends on the number of records and other variables.
(My C is a bit rusty, feel free to fix.)

Resources