Sorting a list with qsort? - c

I'm writing a program in which you enter words via the keyboard or file and then they come out sorted by length. I was told I should use linked lists, because the length of the words and their number aren't fixed.
should I use linked lists to represent words?
struct node{
char c;
struct node *next;
};
And then how can I use qsort to sort the words by length? Doesn't qsort work with arrays?
I'm pretty new to programming.
Thank you.

I think there is a bigger issue than the sorting algorithm which you should pick. The first of these is that the struct that you're defining is actually not going to hold a list of words, but rather a list of single letters (or a single word.) Strings in C are represented as null-terminated arrays of characters, laid out like so:
| A | n | t | h | o | n | y | \0 |
This array would ideally be declared as char[8] - one slot for each letter, plus one slot for the null byte (literally one byte of zeros in memory.)
Now I'm aware you probably know this, but it's worth pointing this out for clarity. When you operate on arrays, you can look at multiple bytes at a time and speed things up. With a linked list, you can only look at things in truly linear time: step from one character to the next. This is important when you're trying to do something quickly on strings.
The more appropriate way to hold this information is in a style that is very C like, and used in C++ as vectors: automatically-resized blocks of contiguous memory using malloc and realloc.
First, we setup a struct like this:
struct sstring {
char *data;
int logLen;
int allocLen;
};
typedef struct string sstring;
And we provide some functions for these:
// mallocs a block of memory and holds its length in allocLen
string_create(string* input);
// inserts a string and moves up the null character
// if running out of space, (logLen == allocLen), realloc 2x as much
string_addchar(string* input, char c);
string_delete(string* input);
Now, this isn't great because you can't just read into an easy buffer using scanf, but you can use a getchar()-like function to get in single characters and place them into the string using string_addchar() to avoid using a linked list. The string avoids reallocation as much as possible, only once every 2^n inserts, and you can still use string functions on it from the C string library!! This helps a LOT with implementing your sorts.
So now how do I actually implement a sort with this? You can create a similar type intended to hold entire strings in a similar manner, growing as necessary, to hold the input strings from the console. Either way, all your data now lives in contiguous blocks of memory that can be accessed as an array - because it is an array! For example, say we've got this:
struct stringarray {
string *data;
int logLen;
int allocLen;
};
typedef struct stringarray cVector;
cVector myData;
And similar functions as before: create, delete, insert.
The key here is that you can implement your sort functions using strcmp() on the string.data element since it's JUST a C string. Since we've got a built-in implementation of qsort that uses a function pointer, all we have to do is wrap strcmp() for use with these types and pass the address in.

If you know how you want the items sorted, you should use an insertion sort when reading the data so that once all the input has been entered, all you have to do is write the output. Using a linked list would be ok, though you'll find that it has O(N2) performance. If you store the input in a binary tree ordered by length (a balanced tree would be best), then your algorithm will have O(NlogN) performance. If you're only going to do it once, then go for simplicity of implementation over efficiency.
Pseudocode:
list = new list
read line
while not end of file
len = length(line)
elem = head(list)
while (len > length(elem->value))
elem = elem->next
end
insert line in list before elem
read line
end
// at this point the list's elements are sorted from shortest to longest
// so just write it out in order
elem = head(list)
while (elem != null)
output elem->value
elem = elem->next
end

Yes, the classic "C" library function qsort() only works on an array. That is a contiguous collection of values in memory.
Tvanfosson advice is pretty good - as you build the linked list, you can insert elements at the correct position. That way, the list is always sorted.
I think the comment you made that you were told to use a linked list is interesting. Indeed a list can be a good data structure to use in many instances, but it does have draw backs; for example, it must be traversed to find elements.
Depending on your application, you may want to use a hash table. In C++ you could use a hash_set or a hash_map.
I would recommend you you spend some time studying basic data structures. Time spent here will server you will and better put you in a position to evaluate advice such as "use a linked list".

There are lots of ways to handle it... You can use arrays, via dynamic memory allocation, with realloc, if you feel brave enough to try.
The standard implementation of qsort, though, needs each element to be a fixed length, which would mean having an array-of-pointers-to-strings.
Implementing a linked list, though, should be easy, compared to using pointers to pointers.
I think what you were told to do was not to save the strings as list; but in a linked list:
struct node {
char *string;
node *next;
}
Then, all you have to do is, every time you read a string, add a new node into the list, in its ordered place. (Walk the list until the current string's length is greater than the string you just read.)
The problem of words not being a fixed length is common, and it's usually handled by storing the world temporarily in a buffer, and then copying it into a proper length array (dynamically allocated, of course).
Edit:
In pseudo code:
array = malloc(sizeof(*char))
array_size = 1
array_count = 0
while (buffer = read != EOF):
if(array_count == array_size)
realloc(array, array_size * 2)
array_count++
sring_temp = malloc(strlen(buffer))
array[array_count] = string_temp
qsort(array, array_count, sizeof(*char), comparison)
print array
Of course, that needs a TON of polishing. Remember that array is of type char **array, ie "A pointer to a pointer to char" (which you handle as an array of pointers); since you're passing pointers around, you can't just pass the buffer into the array.

You qsort a linked list by allocating an array of pointers, one per list element.
You then sort that array, where in the compare function you are of course receiving pointers to your list elements.
This then gives you a sorted list of pointers.
You then traverse your list, by traversing the array of pointers and adjusting each element in turn. rearranging its order in the list to match the order of your array of pointers.

Related

Insertion Sort of an array of strings in C

For one of my assignments I have to have a user input a list of names and while they are inputting them, they have to be sorted alphabetically as they go. I was wondering (1) when declaring an array of strings which is best to use:
char test[10][10];
or
char *test[10];
and (2) the best way to write an insertion method, I know how to write an insertion sort method and there are many examples online on it but they deal mainly with just 1D arrays, so I'm a little lost on how to do this. Thanks!
The declarations you show are very different. The first is an array of arrays of char, and the second is an array of pointers to char (also known as a jagged array).
Both could be treated similarly, like arrays of strings, but there are quite a few semantic differences. For example, in the first your strings are limited to nine character (plus terminator) while in the second the strings could be of any length (fitting in memory).
There's also a difference in how the two arrays decays (what happens when you use plain test when a pointer is expected). The first will decay to a pointer to an array of char, i.e. char (*)[10]. The second will decay to pointer to pointer to char, i.e. char **.
Now for the big question: Which should you use? Well that really depends on use-case. Will you have only fixed-sized strings where the size is known from the start (and the total size is small enough to fit on the stack where local variables normally are stored)? Then you can use the first. If you don't know the length of the strings, or if they could differ by more than a few characters, then the second is probably a better choice.
The second question depends a lot on the choice of arrays. If you have arrays of arrays (the first declaration) then you need to copy strings around using strcpy. If you chose array of pointers (the second declaration) you could just assign the pointers around.
I don't want to solve assignments here, so I'll just give a brief push into the right direction:
What you want is a linked list; then, whenever the user enters a new name, you can insert the new entry directly into it at the correct position.
A first start could look like this:
struct entry { char name[10]; struct entry* next; };
struct entry* root = NULL;
void addname(char* na) {
if (root == NULL) {
root = (struct entry*)malloc(sizeof(struct entry));
sprintf(entry->name, "%s", na);
}else{
// HERE, walk through all entries! Once you reach one, where next is lex. greater then you create a new entry, and link it into that position of the chain
}
}

How to store a sequence of numbers whose size is not know in advance in C?

I am working on a standard deviation program in C and am having difficulty with the intended input.
I must accept an unknown number of floats and I am not sure how to go about storing them and allocating memory for them.
Sample input:
82.5
1000.6699
10
11.11
-45
#
Any advice is appreciated.
New user, sorry for little mistakes
You can allocate an array to hold your values, and you can use realloc() to grow that array.
Because realloc() has some overhead, I would probably allocate enough memory for, maybe, 16 values. And when you fill it, then resize it to hold up to an additional 16 values and so on. This way, your code doesn't resize the memory for every value.
There're really only two ways. The first is to define a struct:
typedef struct {
int value;
element* next;
} element;
Then, you have what's called a linked list. You can access the nth element by iterating through the linked element structs, and you know you've reached the end when the element.next is a null pointer (and element->next is a SegFault or returns nonsense).
The second way is to "play it safe" and define a fixed-length array that is the maximum size you will need. Something like:
int my_array[65535];
This is advantageous because arrays are much faster than linked lists (you don't have to iterate to access the nth element) but if the array length is highly variable this can allocate much more memory than is necessary. It's up to you which you prefer.

Simulating array_pop for various structures in C

I am new to C but I am currently working on a project which I cannot work out how I can do what is needed.
I have 2 different struct arrays, they are completely differently defined and I am trying to do the same action as PHP's array_pop would do, i.e. remove the last element of the array structure.
I know I could create 2 separate functions, one for each structure type, but obviously is not the best idea, so am wondering whether it is possible that I can pass either structure type to the one function, and possibly a flag, and the flag determine what type of structure it should be cast to.
My structures are defined as follows
typedef struct CallLogSearchResultStruct
{
long date;
int dRowIndex;
} callLogSearchResultStruct;
typedef struct CallLogSearchDataStruct
{
char * date;
char * time;
char * bParty;
char * aParty;
float duration;
char * cleardownCause;
struct CallLogSearchOutboundStruct * outboundLegs;
} callLogSearchDataStruct;
Below is how the structures are initialised
callLogSearchData = calloc(numRows, sizeof(callLogSearchDataStruct));
callLogSearch = calloc(numRows, sizeof(callLogSearchResultStruct));
numRows being the number of structs to contain within the array.
Below is how I am using the structures
callLogSearchData[dataRow].aParty = NULL;
callLogSearchData[dataRow].bParty = NULL;
callLogSearchData[dataRow].cleardownCause = NULL;
callLogSearchData[dataRow].date = NULL;
callLogSearchData[dataRow].time = NULL;
callLogSearchData[dataRow].outboundLegs = NULL;
Apologise if this is a simple straight forward answer, I can't find anything on Google, although not entirely sure what this would be called so maybe I'm using the wrong keywords.
Thanks for any help you can provide.
What do you mean by "remove"? How are the arrays allocated?
If you have an array created by a declaration such as:
struct foo my_foos[123];
there is nothing you can do to change the fact that my_foos is 123 elements long. You can of course select to ignore some of them by having a separate size_t foo_count variable that you maintain.
Arrays in C are not generally dynamic (unlike lists/arrays in many more high-level languages). You can implement a dynamic array using malloc(), which is not too hard but it's unclear if that's what you've done.
If you're open for using external files, have a look at utarray:
It's a collection of macros stored in one header that allow what you're searching for. No need to link an additional library, just #include the file and you have what you need.
You'd have to implement a custom UT_icd providing functions to init, copy and free the elements stored in the array.
What you want is actually a linked list. It is a collection of structures each one pointing to the Nth element and to the next element in the list. That way you can easily remove any element by unlinking it in the chain. You can google for a linked list lib in C or, implement one (it's a good exercise).
Arrays in C are static memory ranges with only enough space for your elements. Nothing more. In general you can not remove one element. You can, however, use realloc function to resize an existing array.
For what you're trying to do I'd go for a linked list.

Check if an index in a struct array is empty or not in C

I'm not really a fan of C, but I did homework for this exercise though. So far, what I got is that in C, initializing an array, as far as I know, is not like JavaScript. C has fixed arrays, and not initialized by a particular value. So NULL checking won't work in this case.
I have an array of structures. How would I know if that index in an array is empty or not (filled with a struct or not)?
#define LIST_LENGTH 30
//This is the struct that is inserted in the array
typedef struct node{
char fName[30];
char mName[30];
char lName[30];
char id[8];
} NODE;
typedef struct {
int size; //size is the struct's total capacity (at 30)
int length; //tracks how many elements are added, but not where
NODE nodes[LIST_LENGTH]; //This is the array in question
} List;
//somewhere in my code, I have to insert a value to the array at a specific position.
//if that position is occupied, I have to find the nearest empty position
//to the right, and shift the values rightward for that spot to be empty
Also, we are constrained to using arrays for this exercise. If we were granted to use linked-lists, this would be a walk in the park since we already know how to use dynamic lists.
How do I go about it? Or am I looking at the problem at the wrong angle (besides having to use arrays instead of linked-lists)?
One option would be to use some kind of sentinel value in your struct. For example, you could check if the id field is zero length, which would indicate an unoccupied spot in the array.
The downside is that you have to initialize all the elements properly when you create the array. You would also have to reset the sentinel value if you "remove" an element from the array.
As mentioned in one of the other answers, you could also change to have an array of pointers to the structures, in which case you could directly check for NULL.
Arrays in C do not have positions that are empty. If the array exists, all the elements in it exist.
An element might not be initialized, but there is no general way to determine that, except by tracking it yourself in your program. E.g., as soon as the array is allocated, initialize everything in it. Or maintain a number N indicating that the first N elements of the array have been initialized.
If you want to know whether each individual element has been initialized or not, you must maintain that information yourself, either in a separate array or by adding a flag to the structure, so that each element has its own flag saying whether the rest of the structure in that element has been initialized. You will, of course, need to initialize these flags.
Add 'set/valid' field to your NODE typedef and each time you insert NODE into List just set 'set/valid' to one for example. This way you can always tell if this is valid array element etc.
I have an array of structures. How would I know if that index in an array is empty (not filled with a struct)?
What you can do is either add a flag to the structure, isInitialized, to store whether it has been filled or not
//This is the struct that is inserted in the array
typedef struct node{
char fName[30];
char mName[30];
char lName[30];
char id[8];
int isInitialized;
} NODE;
and initialize all its instances within the array to 0.
Or you can initialize the structure with an illegal or "useless" value (e.g. all strings to length zero, or a special ID).
int isInitialized(NODE *s)
{
/* Since C strings are zero-terminated, char id[8] is at most one
seven-char string terminated by a binary zero. It can never be
normally a sequence of eight 0xFF. */
return memcmp(s->id, 0xFF, 8);
}
// You still have to manually mark nodes free at the beginning.
void initialize(NODE *s)
{
memset(s->id, 0xFF, 8);
}
if (isInitialized(&(myList->nodes[15])))
{
...
}
One caveat to the above code is that now "id" can not safely be taken and printed: an initialization check must be performed, otherwise printf() could fail to find the terminating zero and proceed onwards, and in the case of the last structure, maybe exceed the boundaries of accessible memory and determine a protection fault crash. One could reason, however, that since it does not make sense to print an uninitialized structure (where the saving binary zero could have been lacking anyway), such a check would have had to be performed regardless.
Or you could keep a counter of how many structures have been used so far (this assumes that you never mark as available a structure "in the middle" of the array).
If you have an array of pointers to structures, then you will be able to store NULL in the pointers to not-yet-initialized structures (i.e., the pointer array is allocated, the structures it points to are not yet necessarily so); but here you preallocate the structures, so you have to do it differently.

Space efficient trie

I'm trying to implement a space efficient trie in C. This is my struct:
struct node {
char val; //character stored in node
int key; //key value if this character is an end of word
struct node* children[256];
};
When I add a node, it's index is the unsigned char cast of the character. For example, if I want to add "c", then
children[(unsigned char)'c']
is the pointer to the newly added node. However, this implementation requires me to declare a node* array of 256 elements. What I want to do is:
struct node** children;
and then when adding a node, just malloc space for the node and have
children[(unsigned char)'c']
point to the new node. The issue is that if I don't malloc space for children first, then I obviously can't reference any index or else that's a big error.
So my question is: how do I implement a trie such that it only stores the non-null pointers to its children?
You could try using a de la Briandais trie, where you only have one child pointer for each node, and every node also has a pointer to a "sibling", so that all siblings are effectively stored as a linked list rather than directly pointed to by the parent.
You can't really have it both ways and be both space efficient and have O(1) lookup in the children nodes.
When you only allocate space for the entries that's actually added, and not the null pointers, you can no longer do
children[(unsigned char)'c']
As you can no longer index directly into the array.
One alternative is to simply do a linear search through the children. and store an additional count of how many entries the children array has i.e.
children[(unsigned char)'c'] = ...;
Have to become
for(i = 0; i < len; i++) {
if(children[i] == 'c')
break;
}
if(i == len) {
//...reallocate and add space for one item in children
}
children[i] = ...;
If your tree ends up with a lot of non-empty entries at one level, you might insert the children in sorted order and do a binary search. Or you might add the childrens as a linked list instead of an array.
If you just want to do an English keyword search, I think you can minimize the size of your children, from 256 to just 26 - just enough to cover the 26 letters a-z.
Furthermore, you can use a linked list to keep the number of children even smaller so we can have more efficient iteration.
I haven't gone through the libraries yet but I think trie implementation will help.
You can be both space efficient and keep the constant lookup time by making child nodes of every node a hash table of nodes. Especially when Unicode characters are involved and the set of characters you can have in your dictionary is not limited to 52 + some, this becomes more of a requirement than a nicety. This way you can keep the advantages of using a trie and be time and space efficient at the same time.
I must also add that if the character set you are using is approaching unbounded, chances are having a linked list of nodes may just do fine. If you like an unmanageable nightmare, you can opt for a hybrid approach where first few levels keep their children in hash tables while the lower levels have a linked list of them. For a true bug farm, opt for a dynamic one where as each linked list passes a threshold, you convert it to a hash table on the fly. You could easily amortize the cost.
Possibilities are endless!

Resources