I'm trying to implement quicksort in C.
I've done it before in Python, but I'm new to C and trying it out (please don't suggest I just use qsort()!)
What I don't understand is that since C doesn't handle arrays in the same way as Python, i.e. it can't pass them to and return them from functions, can only pass a pointer to one (or rather, the start of a space) in memory - how then can an array be used in a recursive function?
If my first call takes float array[], chooses a pivot, and sorts it. How can I then make successive calls for the lower and upper partitions, and glue them back together?!
Unless I'm mistaken, the glueing together requires an iteration through, since you can't assign to an array. But we can't do that, because we don't know how much memory we need on each call - and the spaces need to be different, because we still need the one higher (on earlier call)...
I've tried code, I've tried pen and paper, I just can't make this work - I understand recursion conceptually (and practically, in Python), I just can't see how to do this in C. I expect there's some functionality or syntax I just don't know about.
Grateful as ever.
As far as I know, most implementations of the Quicksort algorithm in C sort a given array
"in-situ", and do not return a new sorted array.
A very simple implementation that might be useful to understand the method is shown here:
http://en.wikibooks.org/wiki/Algorithm_Implementation/Sorting/Quicksort#C
As you can see, the function just passes the same array with modified begin/end
index to the recursively called function.
Arrays decay into pointers to the first element when passed to a function,
so all recursively called functions operate on (a part of)
the same original array.
Other implementations might replace the recursion by iteration or tail-recursion,
see for example
Is stdlib's qsort recursive?
which has links to real-world implementations.
Python variables are accessed using a handle to the variable allowing Python to manage variables by knowing their size and the data type and whether the variable is in scope or not or in use or not for garbage collection. In C source code a variable represents an actual memory location.
An array variable in Python is a handle to an array which Python stores in memory along with information about the array such as its size (number of elements), the type of the data stored in the array, etc. In C an array variable is basically a constant pointer to some memory location that contains the array elements. However there is no management data stored along with the array. The only thing in the array's memory location is the data for the array elements. The information about the array's size, data type, etc. is lost after compiling the C source code and is not available at run time.
In Python you can cut and splice and copy array and array pieces because the information about the array is available at run time. In C it is not so simple because the information about the array is not available after the source code is compiled.
The problem faced by the designer of a generalized qsort() function is that it must have an interface that allows the function to be used with a wide variety of arrays. And since the information about the array is not available at run time with C, the programmer must ask for the minimum information needed to make the qsort() function work for a wide variety of arrays.
What the programmer needs to know is the following basic information: (1) where does the array start, (2) what is the size of each element of the array, (3) how many elements are in the array, and (4) what is the comparison function to be used to determine the collating sequence to determine the order of two elements of the array.
The array is sorted in place. What that means is that you pass to the qsort() function the array and when qsort() returns the elements of the array have been sorted. This sorting is done by selecting two elements, comparing them with the comparison function provided, and then if needed swapping the two array elements. Remember that in C an array is basically a constant pointer to an area of memory. So the qsort() function is provided that address, where the array starts, and the caller expects that the array starting at that memory location is sorted when qsort() returns.
Since the comparison function is provided by the user of qsort(), the person writing the comparison function knows what the array elements looks like. The programmer uses the pointers provided by the qsort() function when it calls the comparison function to access the two array elements and decide the order of those two elements returning an indication as to which is lower in the collating sequence.
However the swapping of array elements requires the use of a temporary data area which runs back into the problem of the qsort() function when written did not know the size of the array elements. So the swap is usually done a byte at a time.
The end result of all of the various constraints due to the memory model of C is that a qsort() function will use recursion by specifying array index or array offset from the beginning of the array. So the qsort() function performs the sort on a subsection of the array by specifying the beginning and ending indices of the subsection. What is changing during the recursive function calls is just the index values.
"The C Programming Language" has an implementation of qsort. I'll copy and paste the code directly:
void qsort(int v[], int left, int right) {
int i, last;
void swap(int v[], int i, int j);
int test = 100;
if (left >= right)
return;
swap(v, left, (left+right)/2);
last = left;
for(i = left+1; i <= right; i++)
if (v[i] < v[left])
swap(v, ++last, i);
swap(v, left, last);
qsort(v, left, last-1);
qsort(v, last+1, right);
}
The key here is that you specify the left and right bounds of the array with int left and int right that you want to process. That's how you deal with the "lower and upper partitions, and glue them back together" part you're confused with.
You call qsort initially with qsort(array,0,length-1), where length is the number of elements in array.
Related
before you mark this as a duplicate please notice that I'm looking for a more general solution for arrays of arbitrary dimensions. I have read many posts here or in forums about making 2D or 3D arrays of integers but these are specific solutions for specific dimensions. I want a general solution for an array of any dimension.
First I need to have a type of intlist as defined below:
typedef struct{
int l // length of the list
int * e // pointer to the first element of the array
}intlist;
this actually fills the gap in C for treating arrays just as pointers. using this type I can pass arrays to functions without worrying about loosing the size.
then in the next step I want to have a mdintlist as multidimensional dynamically allocated arrays. the type definition should be something like this:
typedef struct Mdintlist{
intlist d // dimension of the array
/* second part */
}mdintlist;
there are several options for the second part. on option is that to have a pointer towards a mdintlist of lower dimension like
struct Mdintlist * c;
the other options is to use void pointers:
void * c;
I don't know how to continue it from here.
P.S. one solution could be to allocate just one block of memory and then call the elements using a function. However I would like to call the elements in array form. something like tmpmdintlist.c[1][2][3]...
Hope I have explained clearly what I want.
P.S. This is an ancient post, but for those who may end up here some of my efforts can be seen in the Cplus repo.
You can't! you can only use the function option in c, because there is no way to alter the language semantics. In c++ however you can overload the [] operator, and even though I would never do such an ugly thing (x[1][2][3] is alread y ugly, if you continue adding "dimensions" it gets really ugly), I think it would be possible.
Well, if you separate the pointers and the array lengths, you end up with much less code.
int *one_dem_array;
size_t one_dem_count[1];
int **two_dem_array;
size_t two_dem_count[2];
int ***three_dem_array;
size_t three_dem_count[3];
This way you can still use your preferred notation.
int num_at_pos = three_dem_array[4][2][3];
I am looking at a basic comparator function which arranges element in a specific order and I came across the qsort().
I see that the qsort() requires the length of an array and size of array element as an argument. Why are those two values size_t length and size_t item_size necessary?
qsort(void *array, size_t length, size_t item_size,
int (*compar)(const void*, const void*));
Without the length of the array qsort wouldn't know how many elements to sort; without the length, in bytes, of a single element it wouldn't know how to construct a pointer to element at arbitrary position, and move elements around during the sort.
One simple way to understand why you need a particular parameter is to see how it is used in an implementation. Multiple implementations of qsort.c are available. You can pick one, and see how the parameters in question are used.
If you look at qsort's prototype. It accepts a void* as a parameter for it's data array. So it could really be anything and qsort has no way of knowing the size of the elements pointed to, like it would if you were able to pass a distinct typed pointer like int*. Where the elements could simply be reference like a normal array.
This means you need to help it along by telling it how large those elements are, along side with the usual length parameter.
qsort's implementation will thus be able to calculate the offset from the address array to each of the length elements in the array, provide them to your comparator function and shuffle them around as applicable.
I've written up a large collection of abstract data types (ie: hash tables, heaps, etc) and simple algorithms (ie: search, sort, list manipulation, etc) that all work on arrays of int. I've modified some of my algorithms to be like what I have below so I can generalize the code without re-writing the same algorithm for each and every data type I want to sort/compare:
void* bubbleSort(void* inputArray, void (*functionPtr)(void*,void*)), size_t dataTypeSize, size_t numElements)
The idea is that I want to be able to sort an array of any arbitrary data type (ie: custom structs), and to accommodate this, I cast the input array as a pointer-to-void, and the sorting algorithm requires a function pointer to a specific comparison function so it knows how to compare any two elements of the same data/struct type. So far so good.
The one thing I can't figure out is how to properly cast the array within the function so I can access a single element at a time. I'm trying to accomplish something like:
someRandomDataType* tempArray = (someRandomDataType*)inputArray;
However, I can't find any means of doing this at run time without the use of macros (which I'd like to avoid in this case if possible). It seems in my case, all I really need to be able to do is cast inputArray so that it is seen as some array with elements of an arbitrary size. Is there some way to cast a pointer-to-array so that, dynamically, it equates to:
(typeThatIsDataTypeSize*) tempArray = (typeThatIsDataTypeSize*)inputArray;
Where "dataTypeSize" refers to the size_t input value passed to the sorting function? Thanks!
Your bubble function already has all the information. size, which is equal to sizeof element you are containing and count the number of elements.
inputArray points to the very first element. When you want to move to the next element you simply increase the pointer by the size of the element.
void* second_element = ( char* )inputArray + size ;
size_t n = 123 ;
void* nth_element = ( char* )inputArray + size * n ;
This is how you index your array.
Moving elements is done by memcpy with the last parameter is size. And swap is done by declaring temporary memory of size size and using memcpy to move memory around.
given the following function signature:
void readFileData(FILE* fp, double inputMatrix[][], int parameters[])
this doesn't compile.
and the corrected one:
void readFileData(FILE* fp, double inputMatrix[][NUM], int parameters[])
my question is, why does the compiler demands that number of columns will be defined when handling a 2D array in C? Is there a way to pass a 2D array to a function with an unknown dimensions?
thank you
Built-in multi-deminsional arrays in C (and in C++) are implemented using the "index-translation" approach. That means that 2D (3D, 4D etc.) array is laid out in memory as an ordinary 1D array of sufficient size, and the access to the elements of such array is implemented through recalculating the multi-dimensional indices onto a corresponding 1D index. For example, if you define a 2D array of size M x N
double inputMatrix[M][N]
in reality, under the hood the compiler creates an array of size M * N
double inputMatrix_[M * N];
Every time you access the element of your array
inputMatrix[i][j]
the compiler translates it into
inputMatrix_[i * N + j]
As you can see, in order to perform the translation the compiler has to know N, but doesn't really need to know M. This translation formula can easily be generalized for arrays with any number of dimensions. It will involve all sizes of the multi-dimensional array except the first one. This is why every time you declare an array, you are required to specify all sizes except the first one.
As the array in C is purely memory without any meta information about dimensions, the compiler need to know how to apply the row and column index when addressing an element of your matrix.
inputMatrix[i][j] is internally translated to something equivalent to *(inputMatrix + i * NUM + j)
and here you see that NUM is needed.
C doesn't have any specific support for multidimensional arrays. A two-dimensional array such as double inputMatrix[N][M] is just an array of length N whose elements are arrays of length M of doubles.
There are circumstances where you can leave off the number of elements in an array type. This results in an incomplete type — a type whose storage requirements are not known. So you can declare double vector[], which is an array of unspecified size of doubles. However, you can't put objects of incomplete types in an array, because the compiler needs to know the element size when you access elements.
For example, you can write double inputMatrix[][M], which declares an array of unspecified length whose elements are arrays of length M of doubles. The compiler then knows that the address of inputMatrix[i] is i*sizeof(double[M]) bytes beyond the address of inputMatrix[0] (and therefore the address of inputMatrix[i][j] is i*sizeof(double[M])+j*sizeof(double) bytes). Note that it needs to know the value of M; this is why you can't leave off M in the declaration of inputMatrix.
A theoretical consequence of how arrays are laid out is that inputMatrix[i][j] denotes the same address as inputMatrix + M * i + j.¹
A practical consequence of this layout is that for efficient code, you should arrange your arrays so that the dimension that varies most often comes last. For example, if you have a pair of nested loops, you will make better use of the cache with for (i=0; i<N; i++) for (j=0; j<M; j++) ... than with loops nested the other way round. If you need to switch between row access and column access mid-program, it can be beneficial to transpose the matrix (which is better done block by block rather than in columns or in lines).
C89 references: §3.5.4.2 (array types), §3.3.2.1 (array subscript expressions)
C99 references: §6.7.5.2 (array types), §6.5.2.1-3 (array subscript expressions).
¹ Proving that this expression is well-defined is left as an exercise for the reader. Whether inputMatrix[0][M] is a valid way of accessing inputMatrix[1][0] is not so clear, though it would be extremely hard for an implementation to make a difference.
This is because in memory, this is just a contiguous area, a single-dimension array if you will. And to get the real offset of inputMatrix[x][y] the compiler has to calculate (x * elementsPerColumn) + y. So it needs to know elementsPerColumn and that in turn means you need to tell it.
No, there's not. The situation's pretty simple really: what the function receives is really just a single, linear block of memory. Telling it the number of columns tells it how to translate something like block[x][y] into a linear address in the block (i.e., it needs to do something like address = row * column_count + column).
Other people have explained why, but the way to pass a 2D array with unknown dimensions is to pass a pointer. The compiler demotes array parameters to pointers anyway. Just make sure it's clear what you expect in your API docs.
I want to pass an array of arbitrary struct pointers and a comparison function to a generic sorting algorithm. Is this possible in C?
The goooeys of the structs would only be accessed within the comparison function, the sorting function would only need to call the comparison function and swap pointers, but I can't figure out how to declare it.
function sorter( struct arbitrary ** Array, int Length, int cmp(struct node * a, struct node * b))
{
for (int i=0; i<Length;i++){
if cmp(Array[i],Array[i+1]){
swap(Array[i],Array[i+1]
}
}
}
You can declare the function as:
void sorter(void** the_array, size_t array_length, int (*comparison_function)(void*, void*));
Inside of the comparison function, you will then need to cast the two pointers being compared to pointers to whatever struct type the comparison function compares.
Actually, this function already exists... it is called qsort. See some documentation here. It's also more efficient than your implementation, which is O(n^2).
Maybe you need to pass just void pointers?
function sorter(void ** Array, int Length, int cmp(void * a, void * b))
It's always possible in C simply because you can convert every pointer to a void*. But, if you want to be able to convert that back to a pointer to an arbitrary structure, you'll need some sort of type identification.
You could do that by using a function specific to the type (if the things you are comparing are the same), or encode the type into the structure somehow. This can be done by having an extra field in the structure or by changing the cmp() function itself to take a type identifier.
But you should be aware that C already has a qsort() function that's usually reasonably efficient (although there's nothing in the standard that dictates what algorithm it uses - it could use bubble sort and still be conforming). Unless you're implementing one for homework, or have a different algorithm in mind, tou should probably just use that.
Your algorithm, as it stands, looks like the inner loop of a bubble sort and, as such, won't actually sort correctly. Bubble sort consists of two nested loops and is generally only suitable for small data sets or those with specific characteristics (such as mostly sorted already).