Temporal complexity on mergesort is constant - c

I wanted to try and do a C implementation but the results were strange because, when it came to the mergesort algorithm, given the length of the array (even if the its elements where pseudo-random thanks to rand()). Every time it finished running the complexity was actually the same. I know, trying to understand the "problem" this way is not easy so this is the code that I've been written/copied from the internet:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define DIM 1000000
void merge(int*, int, int, int);
void mergesort(int*, int, int);
int complexity=0; //Complexity is the variable that counts how many times the program goes through the relevant cycles
int main(int argc, char *argv[]){
int v[DIM], i;
time_t t;
srand((unsigned)time(&t));
for(i=0; i<DIM; i++){
v[i]=rand()%100;
printf("%d\n", v[i]);
}
mergesort(v, 0, DIM-1);
for(i=0; i<DIM; i++)
printf("%d\t", v[i]);
printf("\n");
printf("iterations: %d\n", complexity);
return 0;
}
void mergesort(int v[], int lo, int hi){
int mid;
if(lo<hi){
mid=(lo+hi)/2;
mergesort(v, lo, mid);
mergesort(v, mid+1, hi);
merge(v, lo, mid, hi);
}
return;
}
//This implementation is not actually mine, I got it from a site because I couldn't figure out why mine wouldn't run and order the input
void merge(int v[], int p, int q, int r) {
int n1, n2, i, j, k, *L, *M;
n1 = q - p + 1;
n2 = r - q;
//creation and initialization of the left and right vectors
L=(int*)malloc(n1*sizeof(int));
M=(int*)malloc(n2*sizeof(int));
for (i = 0; i < n1; i++){
L[i] = v[p + i];
complexity++;
}
for (j = 0; j < n2; j++){
M[j] = v[q + 1 + j];
complexity++;
}
//merging section
i = 0;
j = 0;
k = p;
while (i < n1 && j < n2) {
if (L[i] <= M[j]) {
v[k] = L[i];
i++;
complexity++;
} else {
v[k] = M[j];
j++;
complexity++;
}
k++;
}
//from what I understood this should be the section where what is left out gets copied inside the remaining spots
while (i < n1) {
v[k] = L[i];
i++;
k++;
complexity++;
}
while (j < n2) {
v[k] = M[j];
j++;
k++;
complexity++;
}
return;
}
I'll leave an image to a few trials I did with various sorting algorithms down here as well
Here is the question: is it normal to have the variable that counts temporal complexity constant? My initial thoughts were that it was due to bad implementation of the counter, I have no idea of how to prove it though simply because my knowledge of the algorithm is not that strong.
If that ends up being the correct answer can you direct me towards an implementation (not too complicated but still functional) of the counter to evaluate temporal complexity more precisely?
Edit: The columns A to I of the excel screenshot that I uploaded correspond to the length of the randomly generated array the values are: 100, 500, 1000, 5000, 10000, 50000, 1000000.

No matter the contents of the arrays, the merge function increments complexity by 2(r-p+1). Since the merge function is the only part of the code that depends on the contents of the array, this shows that the complexity variable is incremented a fixed number of times overall.
Here's a sketch of why the merge function increments complexity by 2(r-p+1):
This block increments it by n1:
for (i = 0; i < n1; i++){
L[i] = v[p + i];
complexity++;
}
This block by n2:
for (j = 0; j < n2; j++){
M[j] = v[q + 1 + j];
complexity++;
}
In the remaining code, i and j start at 0, and at each step either i or j is incremented by 1 and complexity increased. When the function returns, i is n1 and j is n2, so this adds another n1+n2 to complexity.
Since n1 = q-p+1 and n2 = r-q, overall the merge function increments complexity by 2*n1 + 2*n2 = 2(q-p+1+r-q) = 2(r-p+1).

Related

What is better way for merge sort? recursive function or non-recursive?

I'm searching about merge sort and I found two kinds of functions.
First one is using recursion like this.
#include <stdio.h>
void merge(array, low, mid, high) {
int temp[MAX];
int i = low;
int j = mid + 1;
int k = low;
while ((i <= mid) && (j <= high)) {
if (array[i] <= array[j])
temp[k++] = array[i++];
else
temp[k++] = array[j++];
}/*End of while*/
while (i <= mid)
temp[k++] = array[i++];
while (j <= high)
temp[k++] = array[j++];
for (i = low; i <= high; i++)
array[i] = temp[i];
}/*End of merge()*/
void merge_sort(int low, int high) {
int mid;
if (low != high) {
mid = (low + high) / 2;
merge_sort(low, mid);
merge_sort(mid + 1, high);
merge(low, mid, high);
}
}/*End of merge_sort*/
And then, I thought recursive function is not good for large arrays. This function causes a lot of recursive calls in this case. I think this is bad way of programming. (Actually I don't like recursion.)
So, I found other way, a merge sorting function without recursion:
#include <stdio.h>
#define MAX 30
int main() {
int arr[MAX], temp[MAX], i, j, k, n, size, l1, h1, l2, h2;
printf("Enter the number of elements : ");
scanf("%d", &n);
for (i = 0; i < n; i++) {
printf("Enter element %d : ", i + 1);
scanf("%d", &arr[i]);
}
printf("Unsorted list is : ");
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
/* l1 lower bound of first pair and so on */
for (size = 1; size < n; size = size * 2) {
l1 = 0;
k = 0; /* Index for temp array */
while (l1 + size < n) {
h1 = l1 + size - 1;
l2 = h1 + 1;
h2 = l2 + size - 1;
/* h2 exceeds the limlt of arr */
if (h2 >= n)
h2 = n - 1;
/* Merge the two pairs with lower limits l1 and l2 */
i = l1;
j = l2;
while (i <= h1 && j <= h2) {
if (arr[i] <= arr[j])
temp[k++] = arr[i++];
else
temp[k++] = arr[j++];
}
while (i <= h1)
temp[k++] = arr[i++];
while (j <= h2)
temp[k++] = arr[j++];
/** Merging completed **/
/*Take the next two pairs for merging */
l1 = h2 + 1;
}/*End of while*/
/*any pair left */
for (i = l1; k < n; i++)
temp[k++] = arr[i];
for (i = 0; i < n; i++)
arr[i] = temp[i];
printf("\nSize=%d \nElements are : ", size);
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
}/*End of for loop */
printf("Sorted list is :\n");
for (i = 0; i < n; i++)
printf("%d ", arr[i]);
printf("\n");
return 0;
}/*End of main()*/
I think this is better than using recursion. This function reduced recursion to a series of for and while loops! Of course, they have behave differently. I think a recursive function is not good for compiler. Am I right?
Assuming optimized implementations, iterative bottom up merge sort is somewhat faster than recursive top down merge sort, since it skips the recursive generation of indexes. For larger arrays, top down's additional overhead is relatively small, O(log(n)), compared to overall time of O(n log(n)), where in both cases, most of the time is spent doing a merge which can be identical for both bottom up and top down. Top down merge sort uses O(log(n)) stack space, while both use O(n) working space. However, almost all library implementations of stable sort are some variation of iterative bottom up merge sort, such as a hybrid of insertion sort and bottom up merge sort.
Link to an answer showing an optimized top down merge sort, using a pair of mutually recursive functions to control the direction of merge to avoid copying of data:
Mergesort implementation is slow
Link to an answer than includes a quick sort, 2 way bottom up merge sort, and 4 way bottom up merge sort:
Optimized merge sort faster than quicksort
You are somewhat right. Iterative bottom up merge sort is faster than recursive top down merge sort. Both methods are good for compiler ;) but recursive method takes more time to compile.
Your code for the recursive approach to merge sort has problems:
the prototype for merge does not have the argument types.
the array is missing from the arguments list of merge_sort
passing high as the index of the last element is error prone and does not allow for empty arrays. You should instead pass the index to the first element beyond the end of the array, such that high - low is the number of elements in the slice to sort. This way the first call to merge_sort can take 0 and the array size.
it is both wasteful and incorrect to allocate a full array int temp[MAX]; for each recursive call. Wasteful because the size might be much larger than needed, leading to potential stack overflow, and incorrect if high - low + 1 is larger than MAX leading to writing beyond the end of the array, causing undefined behavior.
This merge_sort function will call itself recursively at most log2(high - low) times, each call allocating a temporary local array. The number of recursive calls is not the problem, only 30 for 1 billion records, but the local arrays are! If you try to sort a large array, there might not be enough space on the stack for a copy of this array, much less multiple copies, leading to undefined behavior, most likely a crash.
Note however that the iterative approach that you found has the same problem as it allocates temp[MAX] with automatic storage as well.
The solution is to allocate a temporary array from the heap at the top and pass it to the recursive function.
Here is an improved version:
#include <stdio.h>
static void merge(int *array, int *temp, int low, int mid, int high) {
int i = low;
int j = mid;
int k = 0;
while (i < mid && j < high) {
if (array[i] <= array[j])
temp[k++] = array[i++];
else
temp[k++] = array[j++];
}
while (i < mid)
temp[k++] = array[i++];
while (j < high)
temp[k++] = array[j++];
for (i = low, k = 0; i < high; i++, k++)
array[i] = temp[k];
}
static void merge_sort_aux(int *array, int *temp, int low, int high) {
if (high - low > 1) {
int mid = (low + high) / 2;
merge_sort_aux(array, temp, low, mid);
merge_sort_aux(array, temp, mid, high);
merge(array, temp, low, mid, high);
}
}
int merge_sort(int *array, int size) {
if (size > 1) {
int *temp = malloc(size * sizeof(*temp));
if (temp == NULL)
return -1;
merge_sort_aux(array, temp, 0, size);
free(temp);
}
return 0;
}
// call from main as merge_sort(arr, MAX)

Why does my quicksort crash with large, reverse-sorted arrays?

I'm learning C and I tried out a recursive quicksort algorithm. At small input sizes, it works as expected; with random generated arrays it had no problems with all tested sizes (up to 100,000). With an descending array, it somehow breaks (Windows gives me a message, that the program has stopped working) at a certain array size (32,506). Is there any error in my code (for example any wrong memory allocation - I'm not sure if I got this right) or does C have a limit in recursive calls or anything else?
Edit:
I know that my Quicksort implementation is rather naive and that it behaves terribly with this sort of Input, but I didn’t expect it to crash.
I am using GCC with MinGW on the command prompt on Windows 10. I’m not sure how to find out what happens exactly because I’m not really getting any specified error message despite of Windows telling me that my program has stopped working.
#include <stdio.h>
#include <stdlib.h>
int partition(int *a, int lo, int hi) {
int i = lo; int j = hi+1; int v,t;
v = a[lo]; //partition element
while (1) {
while (a[++i] < v) {if (i == hi) break;}
while (v < a[--j]) {if (j == lo) break;}
if (i >= j) break;
t = a[j]; a[j] = a[i]; a[i]= t; //swap
}
t = a[lo]; a[lo] = a[j]; a[j]= t;//swap
return j;
}
void quicksort(int a[], int lo, int hi) {
int j;
if (hi <= lo) return;
j = partition(a, lo, hi);
quicksort(a, lo, j-1);
quicksort(a, j+1, hi);
}
int main() {
int len;
for (len = 32000;len < 40000;len+=100) {
printf("New Arr with len = %d\n",len);
int *arr;
arr = (int*) calloc(len,sizeof(int));
int j;
//create descending Array
for (j = 0; j < len; ++j) {
arr[j] = len-j;
}
printf("start sorting\n");
quicksort(arr,0,len-1);
free(arr);
}
}
For me, your code fails at much larger sizes (c. 370,000 elements). You are likely running into a platform limit (probably limits to recursion depth due to stack overflow). Without the exact error message, it's hard to be sure, of course.
Your input set is likely a pathological case for your implementation - see What makes for a bad case for quick sort?
You can reduce the recursion depth by a better choice of pivot - a common technique is to take the median of the first, central and last elements. Something like this:
int v0 = a[lo], v1 = a[(lo+hi+1)/2], v2 = a[hi];
/* pivot: median of v0,v1,v2 */
int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;
You can also reduce the recursion depth by recursing only for the smaller of the partitions, and using iteration to process the larger one. You may be able to get your compiler's tail-call eliminator to convert the recursion to iteration, but if that doesn't work, you'll need to write it yourself. Something like:
void quicksort(int a[], int lo, int hi) {
while (lo < hi) {
int j = partition(a, lo, hi);
if (j - lo < hi -j) {
quicksort(a, lo, j-1);
lo = j+1;
} else {
quicksort(a, j+1, hi);
hi = j-1;
}
}
}
With the above changes, I can sort arrays of over a billion elements without crashing (I had to make some performance improvements - see below - and even then, it took 17 seconds).
You may also want to return early when you find a sub-array is already sorted. I'll leave that as an exercise.
P.S. A couple of issues in your main():
You don't test the result of calloc() - and you probably should be using malloc() instead, as you will write every element anyway:
int *arr = malloc(len * sizeof *arr);
if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;
Full listing
Here's the code I ended up with:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int partition(int *a, int i, int j) {
int v0 = a[i], v1 = a[(i+j+1)/2], v2 = a[j];
/* pivot: median of v0,v1,v2 */
int v = v0 < v1 ? v1 < v2 ? v1 : v0 < v2 ? v2 : v0 : v0 < v2 ? v0 : v1 < v2 ? v2 : v1;
while (i < j) {
while (a[i] < v && ++i < j)
;
while (v < a[j] && i < --j)
;
int t = a[j]; a[j] = a[i]; a[i]= t; //swap
}
/* i == j; that's where the pivot belongs */
a[i] = v;
return j;
}
void quicksort(int a[], int lo, int hi) {
while (lo < hi) {
int j = partition(a, lo, hi);
if (j - lo < hi -j) {
quicksort(a, lo, j-1);
lo = j+1;
} else {
quicksort(a, j+1, hi);
hi = j-1;
}
}
}
int main() {
int len = INT_MAX/2+1;
printf("New Arr with len = %d\n",len);
int *arr = malloc(len * sizeof *arr);
if (!arr) return fprintf(stderr, "allocation failed\n"), EXIT_FAILURE;
/* populate pessimal array */
for (int j = 0; j < len; ++j) {
arr[j] = len-j;
}
printf("start sorting\n");
quicksort(arr, 0, len-1);
/* test - is it sorted? */
for (int i = 0; i+1 < len; ++i)
if (arr[i] >= arr[i+1])
return fprintf(stderr, "not sorted\n"), EXIT_FAILURE;
free(arr);
}
Recursion is too deep to store it on stack.
It has to store int j = partition(..) for each level.
There are declarative techniques to minimize recursive stack usage.
For example carrying the results as argument. But this case is far more complicated than I could give an example.

Mergesort output changes when printf is used

While trying to craft up a mergesort in C as an exercise, I am encountering a weird issue where the output of the algorithm changes with the presence of a printf. Let me elaborate it further. I have a "MergeSort.h" file in which I have my mergesort algorithm as follows:
#ifndef _MSORT_H_
#define _MSORT_H_
void merge(int a[],int lo,int mi,int hi){
int n1 = (mi - lo) +1;
int n2 = hi - mi;
int left[n1];
int right[n2];
int i;
int j;
int k;
for(i = 0;i < n1;i++){
left[i] = a[lo+i];
}
for(j = 0;j < n2;j++){
right[j] = a[mi+j+1];
}
i = 0;
j = 0;
for(k = lo;k <= hi;k++){
if(left[i] < right[j]){
a[k] = left[i];
i = i + 1;
}else{
a[k] = right[j];
j = j+1;
}
}
}
void msorthelper(int a[],int p,int r){
if(p < r){
int mid = (p + r)/2;
msorthelper(a,0,mid);
msorthelper(a,mid+1,r);
merge(a,p,mid,r);
}
}
void msortex(int a[],int size){
msorthelper(a,0,size-1);
int counter;
for(counter = 0;counter < size;counter++){
printf(" %d ",a[counter]);
}
}
#endif
I have my corresponding sort tester in sorttester.c:
#include <stdio.h>
#include "mergesort.h"
int main(){
int out[] = {8,1,6};
msortex(out,3);
}
The output from running the sorttester is as follows:
1 0 0
Now here's the interesting part, when I put any printf statement to the beginning of the merge, the correct output is generated. Here's an example:
#ifndef _MSORT_H_
#define _MSORT_H_
void merge(int a[],int lo,int mi,int hi){
printf("Hello\n");
int n1 = (mi - lo) +1;
int n2 = hi - mi;
int left[n1];
int right[n2];
.................Rest of the code.....
The output now is:
HelloHello 1 6 8
This is the correct sorted order for the array 8 6 1.
Can somebody tell me what I might be doing wrong here that is causing the output to be drastically by the presence of a printf?
Thanks
Have a look at your merging part of code:
for(k = lo;k <= hi;k++){
if(left[i] < right[j]){
a[k] = left[i];
i = i + 1;
}else{
a[k] = right[j];
j = j+1;
}
}
Lets assume that
left[] = {1, 2}
right[] = {3, 4, 5}
After 2 iterations of your merging loop (k) the value of "i" will be 2. From now on, you will try to compare
left[2] < right[j]
Which is invalid, as left[2] will refer to some random memory (left array size is only 2, so element of index 2 doesnt exist, only 0 and 1)
So, if you add guards for the i and j values, for example by changing first if condition to:
if(i != n1 && (j == n2 || left[i] < right[j])){
You should be fine.
Anyway, I should also tell you that you shouldnt declare array sizes with values that are not const, like:
int left[n1];
int right[n2];
Its actually forbidden by standard. You should either allocate it dynamically, use vectors(if c++) or jus declare them global with large enough size (maximum n value)

Divide and conquer paradigm and recursion in C - Merge sort example

I can't understand how divide and conquer algorithms are implemented in C.
By that I mean that I understand the algorithm but don't understand why and how it works when written in C.
What are the instructions executed and what determines the order of their execution in the following example?
In other words, why do "merge_sort initialization", "merge_sort first", "merge_sort second" and "merging" appear the way they do?
#include<stdio.h>
int arr[8]={1, 2, 3, 4, 5, 6, 7, 8};
int main()
{
int i;
merge_sort(arr, 0, 7);
printf("Sorted array:");
for(i = 0; i < 8; i++)
printf("%d", arr[i]);
return 0;
}
int merge_sort(int arr[],int low,int high)
{
printf("\nmerge_sort initialization\n");
int mid;
if(low < high)
{
mid = (low + high) / 2;
// Divide and Conquer
merge_sort(arr, low, mid);
printf("\n merge_sort first\n");
merge_sort(arr, mid + 1, high);
printf("\n merge_sort second\n");
// Combine
merge(arr, low, mid, high);
printf("\nmerging\n");
}
return 0;
}
int merge(int arr[], int l, int m, int h)
{
int arr1[10], arr2[10];
int n1, n2, i, j, k;
n1 = m - l + 1;
n2 = h - m;
for(i = 0; i < n1; i++)
arr1[i] = arr[l + i];
for(j = 0; j < n2; j++)
arr2[j] = arr[m + j + 1];
arr1[i] = 9999;
arr2[j] = 9999;
i = 0;
j = 0;
for(k = l; k <= h; k++)
{
if(arr1[i] <= arr2[j])
arr[k] = arr1[i++];
else
arr[k] = arr2[j++];
}
return 0;
}
Thanks in advance for your answers.
I am not sure why you are asking people to make you understand the algorithm. I can just assist you you but you have to go through it.
In merge short you have to cut the array in two pieces. suppose you have 10 elements then high=0 and low=10-1=1. mid = (9+0)/2 = 4. So, you have divided the main array to 2 parts from 1th element to 5th and from 6th element to 10th (1..10). When ever you have more than one element in a piece you cut it into two again. At the end merge them i.e. add arrays again but in ascending order. Finally you get a single array sorted. Its very tough to explain every piece. I think this link from wiki can help. http://en.wikipedia.org/wiki/Merge_sort
Now comes the function calling. main function is calling merge_sort it will pass low and hight (the full range of array) but within this function the merge_sort will be called by himself twice with first half range (starting to mid way and after midway to last element). Each call will again do the same (divide the array and call with first half and send half). This process will go on. merge function will add the array in sorted manner. Thats all you need. If you are not clear print or debug the value of parameters.

Merge Sort on an array of floating point numbers of fixed size 25 (C programming language)

My code is as follows:
void mergeSort(float a[], int first, int last) {
//Function performs a mergeSort on an array for indices first to last
int mid;
//if more than 1 element in subarray
if(first < last){
mid =(first + last) / 2;
//mergeSort left half of subarray
mergeSort(a, first, mid);
//mergeSort right half of subarray
mergeSort(a, mid+1, last);
//merge the 2 subarrays
merge(a, first, mid, last);
}
}
void merge(float a[], int first, int mid, int last){
//Function to merge sorted subarrays a[first -> mid] and
//a[(mid+1)-> last] into a sorted subarray a[first->last]
int ndx1;
int ndx2;
int last1;
int last2;
int i;
ndx1 = first;
last1 = mid;
ndx2 = mid + 1;
last2 = last;
i = 0;
//Allocate temporary array with same size as a[first, last]
float temp[SIZE];
while((ndx1 <= last1) && (ndx2 <= last2)){
if(a[ndx1] < a[ndx2]) {
temp[i] = a[ndx1];
ndx1 = ndx1 + 1;
i++;
}
else{
temp[i] = a[ndx2];
ndx2 = ndx2 + 1;
i++;
}
}
while(ndx1 <= last1){
temp[i] = a[ndx1];
ndx1 = ndx1 + 1;
i++;
}
while(ndx2 <= last2){
temp[i] = a[ndx2];
ndx2 = ndx2+1;
i++;
}
int j;
i = 0;
for(j = 0; (last-first) ;j++){
a[j] = temp[i];
i++;
}
}
It runs a few times, but then it locks up and says it has stopped working. There are no errors, and I wrote this right from an algorithm sheet. I don't understand why it is locking up. Any help would be appreciated greatly.
When you (try to) copy back the values from temp int the array,
for(j = 0; (last-first) ;j++){
a[j] = temp[i];
i++;
}
For first != last, you have an infinite loop, that causes the reads and writes to go outside the array bounds, and sooner or later leads to a segmentation fault or access violation. You meant to write j <= (last - first) as the loop condition. But
You are copying back to the wrong part of the array, you always start at a[0], but it should start at a[first], the loop should be
for(j = first; j <= last; ++j) {
a[j] = temp[i++];
}

Resources