Heapsort heapify - c

I found code for a heapsort from: http://rosettacode.org/wiki/Sorting_algorithms/Heapsort#C
The way I understand it (which is wrong somewhere along the lines) is that the heapsort() function has two loops. The first loop is to create the heap structure (either min or max) and the second loop is to actually sort the doubles. But I think I have the first loop wrong.
The entire code goes like this
#include <stdio.h>
#include <stdlib.h>
#define ValType double
#define IS_LESS(v1, v2) (v1 < v2)
void siftDown( ValType *a, int start, int count);
#define SWAP(r,s) do{ValType t=r; r=s; s=t; } while(0)
void heapsort( ValType *a, int count)
{
int start, end;
/* heapify */
for (start = (count-2)/2; start >=0; start--) {
siftDown( a, start, count);
}
for (end=count-1; end > 0; end--) {
SWAP(a[end],a[0]);
siftDown(a, 0, end);
}
}
void siftDown( ValType *a, int start, int end)
{
int root = start;
while ( root*2+1 < end ) {
int child = 2*root + 1;
if ((child + 1 < end) && IS_LESS(a[child],a[child+1])) {
child += 1;
}
if (IS_LESS(a[root], a[child])) {
SWAP( a[child], a[root] );
root = child;
}
else
return;
}
}
int main()
{
int ix;
double valsToSort[] = {
1.4, 50.2, 5.11, -1.55, 301.521, 0.3301, 40.17,
-18.0, 88.1, 30.44, -37.2, 3012.0, 49.2};
#define VSIZE (sizeof(valsToSort)/sizeof(valsToSort[0]))
heapsort(valsToSort, VSIZE);
printf("{");
for (ix=0; ix<VSIZE; ix++) printf(" %.3f ", valsToSort[ix]);
printf("}\n");
return 0;
}
My question is, why does the /heapify/ loop start at (count-2)/2?
the snippet from heapsort() here:
/* heapify */
for (start = (count-2)/2; start >=0; start--) {
siftDown( a, start, count);
}
UPDATE
I think I may have just answered my own question, but is it because we have to establish a heap structure, where the loop's partial focus is on creating a balanced tree? That is, heaps have to have every level filled except for the last one. Is this correct thinking?

For odd count, the first child pair for heapify is a[((count-2)/2)*2 + 1] and a[((count-2)/2)*2 + 2], the last two elements of the array. For even count, the solo child is at a[((count-2)/2)*2 + 1], the last element of the array. This is the starting point to heapify the entire array. The second loop only has to re-heapify a mostly heapfied array[0 to end] as end decreses.
Wiki article:
http://en.wikipedia.org/wiki/Heapsort

Related

Implementing Binary search tree using array in C

I am trying to implement a binary search tree using a 1-D array. I'm familiar with the fact that the left node will be parent*2+1 and right node will be parent*2+2. I am using this concept to write the insertion function here:
int* insert(int tree[], int element){
int parent=0;
if(tree[parent]=='\0'){
tree[parent]=element;
return tree;
}
else{
if(element<tree[parent]){
parent=parent*2+1;
tree[parent]=insert(tree[parent], element);
}
else{
parent=parent*2+2;
tree[parent]=insert(tree[parent], element);
}
}
return tree;
}
However, I'm quite sure this won't work, since I'm passing an element of the array into the insert() function while recursing, when it actually needs an array. I'm not sure how to go about it. Do I replace the return type from int* to int? Any help is appreciated
the fact that the left node will be parent2+1 and right node will be parent2+2
That's correct. You want to use the array index like
0
/ \
/ \
/ \
/ \
1 2
/ \ / \
/ \ / \
3 4 5 6
But your recursion is not doing that.
You always do int parent=0; so you have no knowledge of the real array index and consequently, you access the wrong array elements.
For instance:
When you pass tree[2] you really want the function to use either tree[5] or tree[6] in the next recursive call. However, since you start by setting parent to zero, your next recursive call will us either tree[3] or tree[4].
Conclusion If you want to use recursion, you need to track the actual array index of the current node. Simply passing a pointer to the current node is not sufficient.
Instead your code could be:
int insert(int* tree, unsigned current_idx, int element){
if (current_idx >= ARRAY_SIZE) return -1;
if(tree[current_idx]=='\0'){
tree[current_idx]=element;
return 0;
}
if(element<tree[current_idx]){
return insert(tree, 2*current_idx + 1, element);
}
return insert(tree, 2*current_idx + 2, element);
}
That said - you should also spend some time considering whether recursion is really a good solution for this task.
Without recursion you can do:
int insert(int* tree, int element){
unsigned current_idx = 0;
while (1)
{
if (current_idx >= ARRAY_SIZE) return -1;
if(tree[current_idx]=='\0'){
tree[current_idx]=element;
return 0;
}
if(element<tree[current_idx]){
current = 2*current_idx + 1;
}
else {
current = 2*current_idx + 2;
}
}
}
As you can see the recursive approach didn't give you anything. Instead it made things worse...
You can avoid recursion and do it iteratively. Note tree is actually an integer array of size size. In the function insert() we pass a pointer to the array. Assuming the array is initialized with 0s:
void insert(int* tree, int size, int element)
{
if (tree == NULL)
return;
int pos = 0;
while (pos < size)
{
if (tree[pos])
{
if (tree[pos] < element)
pos = 2 * pos + 2; // right
else if (tree[pos] && tree[pos] > element)
pos = 2 * pos + 1; // left
}
else
{
tree[pos] = element;
return;
}
}
}
Full code:
#include <stdio.h>
#include <stdlib.h>
void insert(int* tree, int size, int element)
{
if (tree == NULL)
return;
int pos = 0;
while (pos < size)
{
if (tree[pos])
{
if (tree[pos] < element)
pos = 2 * pos + 2; // right
else if (tree[pos] && tree[pos] > element)
pos = 2 * pos + 1; // left
}
else
{
tree[pos] = element;
return;
}
}
}
void print(int* tree, int size)
{
for (int i = 0; i < size; i++)
printf("%d ", tree[i]);
printf("\n");
}
int main()
{
int tree[100] = {0};
const int tsize = 100;
// print first 20 elements
print(tree, 20);
insert(tree, tsize, 2);
insert(tree, tsize, 3);
insert(tree, tsize, 5);
insert(tree, tsize, 1);
insert(tree, tsize, 4);
print(tree, 20);
return 0;
}

Hoare quicksort in c

Similarly to other users I am using this wikipedia algorithm. However I have tried to reimplement the algorithm using pointer arithmetic. However I'm having difficulty finding where I've gone wrong.
I think that this if statement is probably the cause but I'm not be sure.
...
if (left >= right) {
ret = (right - ptr);
return ret;
}
temp = *left;
*left = *right;
*right = temp;
/* sortstuff.h */
extern void quicksort(const size_t n, int * ptr);
/* sortstuff.c */
size_t quicksortpartition(const size_t n, int * ptr);
void quicksort(const size_t n, int * ptr) {
int* end = ptr + n - 1;
// for debug purposes
if (original_ptr == NULL) {
original_ptr = ptr;
original_count = n;
}
if (n > 1) {
size_t index = quicksortpartition(n, ptr);
quicksort(index, ptr);
quicksort(n - index - 1, ptr + index + 1);
}
return;
}
size_t quicksortpartition(const size_t n, int * ptr) {
int* right = ptr + n - 1;
int* pivot = ptr + (n - 1) / 2;
int* left = ptr;
int temp;
size_t ret = NULL;
while (1) {
while (*left <= *pivot && left < pivot) {
++left;
}
while (*right > *pivot) {
--right;
}
if (left >= right) {
ret = (right - ptr);
return ret;
}
temp = *left;
*left = *right;
*right = temp;
//print_arr();
}
}
int main(void) {
}
/* main.c */
int array0[] = {5, 22, 16, 3, 1, 14, 9, 5};
const size_t array0_count = sizeof(array0) / sizeof(array0[0]);
int main(void) {
quicksort(array0_count, array0);
printf("array out: ");
for (size_t i = 0; i != array0_count; ++i) {
printf("%d ", array0[i]);
}
puts("");
}
I don't think there are any off by one errors
The code you have presented does not accurately implement the algorithm you referenced. Consider in particular this loop:
while (*left <= *pivot && left < pivot) {
++left;
}
The corresponding loop in the algorithm description has no analog of the left < pivot loop-exit criterion, and its analog of *left <= *pivot uses strict less-than (<), not (<=).
It's easy to see that the former discrepancy must constitute an implementation error. The final sorted position of the pivot is where the left and right pointers meet, but the condition prevents the left pointer ever from advancing past the initial position of the pivot. Thus, if the correct position is rightward of the initial position then the partition function certainly cannot return the correct value. It takes a more thoughtful analysis to realize that in fact, the partition function is moreover prone to looping infinitely in that case, though I think that's somewhat data-dependent.
The latter discrepancy constitutes a provisional error. It risks overrunning the end of the array in the event that the selected pivot value happens to be the largest value in the array, but that's based in part on the fact that the left < pivot condition is erroneous and must be removed. You could replace the latter with left < right to resolve that issue, but although you could form a working sort that way, it probably would not be an improvement on the logic details presented in the algorithm description.
Note, however, that with the <= variation, either quicksortpartition() needs to do extra work (not presently provided for) to ensure that the pivot value ends up at the computed pivot position, or else the quicksort function needs to give up its assumption that that will happen. The former is more practical, supposing you want your sort to be robust.
Pivot needs to be an int, not a pointer. Also to more closely follow the Wiki algorithm, the parameters should be two pointers, not a count and a pointer. I moved the partition logic into the quick sort function.
void QuickSort(int *lo, int *hi)
{
int *i, *j;
int p, t;
if(lo >= hi)
return;
p = *(lo + (hi-lo)/2);
i = lo - 1;
j = hi + 1;
while (1){
while (*(++i) < p);
while (*(--j) > p);
if (i >= j)
break;
t = *i;
*i = *j;
*j = t;
}
QuickSort(lo, j);
QuickSort(j+1, hi);
}
The call would be:
QuickSort(array0, array0+array0_count-1);

C Sort Algorithm Not Outputting Correct Values?

new to C here. I am making a program that will sort and search a list of random ints for learning purposes, and trying to implement Bubble sort, but am getting odd results in my console during debugging.
I have an array like so:
arr[0] = 3
arr[1] = 2
arr[2] = 1
So if I was to sort this list from least to greatest, it should be in the reverse order. Instead, my sort function seems to be logically flawed, and is outputting the following.
arr[0] = 0
arr[1] = 1
arr[2] = 2
Obviously I am new because someone that knows better will probably spot my mistake very quickly.
find.c
/**
* Prompts user for as many as MAX values until EOF is reached,
* then proceeds to search that "haystack" of values for given needle.
*
* Usage: ./find needle
*
* where needle is the value to find in a haystack of values
*/
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include "helpers.h"
// maximum amount of hay
const int MAX = 65536;
int main(int argc, string argv[])
{
// ensure proper usage
if (argc != 2)
{
printf("Usage: ./find needle\n");
return -1;
}
// remember needle
int needle = atoi(argv[1]);
// fill haystack
int size;
int haystack[MAX];
for (size = 0; size < MAX; size++)
{
// wait for hay until EOF
printf("\nhaystack[%i] = ", size);
int straw = get_int();
if (straw == INT_MAX)
{
break;
}
// add hay to stack
haystack[size] = straw;
}
printf("\n");
// sort the haystack
sort(haystack, size);
// try to find needle in haystack
if (search(needle, haystack, size))
{
printf("\nFound needle in haystack!\n\n");
return 0;
}
else
{
printf("\nDidn't find needle in haystack.\n\n");
return 1;
}
}
helpers.c
#include <cs50.h>
#include "helpers.h"
#include <stdio.h>
/**
* Returns true if value is in array of n values, else false.
*/
bool search(int value, int values[], int n)
{
// TODO: implement a searching algorithm
return false;
}
/**
* Sorts array of n values.
*/
void sort(int values[], int n)
{
// TODO: implement an O(n^2) sorting algorithm
int tmp = 0;
int i = 0;
bool swapped = false;
bool sorted = false;
for (int i = 0; i < n; i++)
{
printf("%i\n", values[i]);
}
while (!sorted)
{
//check if number on left is greater than number on right in sequential order of the array.
if (values[i] > values[i+1])
{
tmp = values[i];
values[i] = values[i+1];
values[i+1] = tmp;
swapped = true;
}
if (i >= n - 1)
{
if (!swapped)
{
//No swaps occured, meaning I can assume the list is sorted.
for (int i = 0; i < n; i++)
{
printf("%i\n", values[i]);
}
sorted = true;
break;
} else {
//A swap occured on this pass through of the array. Set the flag back to false for the next pass through, repeating until no swaps are detected. (Meaning every number is in its proper place.)
i = 0;
swapped = false;
}
} else {
i++;
}
}
}
The problem is that you do the comparison and swap before you do the test if (i >= n - 1). This means that it will compare values[i] > values[i+1] when i == n-1, so it will access outside the array bounds, which is undefined behavior. In your case, there happens to be 0 in the memory after the array, so this is getting swapped into the array, and then it gets sorted to the beginning of the array.
Change
if (values[i] > values[i+1])
to
if (i < n-1 && values[i] > values[i+1])
The highest entries you can swap in an array 0..n-1 are n-2 and n-1. So i may not be larger than n-2 so i+1 accesses n-1.
Therefore your check must be:
if (i > n - 2)

Max in array and its frequency

How do you write a function that finds max value in an array as well as the number of times the value appears in the array?
We have to use recursion to solve this problem.
So far i am thinking it should be something like this:
int findMax(int[] a, int head, int last)
{
int max = 0;
if (head == last) {
return a[head];
}
else if (a[head] < a[last]) {
count ++;
return findMax(a, head + 1, last);
}
}
i am not sure if this will return the absolute highest value though, and im not exactly sure how to change what i have
Setting the initial value of max to INT_MIN solves a number of issues. #Rerito
But the approach OP uses iterates through each member of the array and incurs a recursive call for each element. So if the array had 1000 int there would be about 1000 nested calls.
A divide and conquer approach:
If the array length is 0 or 1, handle it. Else find the max answer from the 1st and second halves. Combine the results as appropriate. By dividing by 2, the stack depth usage for a 1000 element array will not exceed 10 nested calls.
Note: In either approach, the number of calls is the same. The difference lies in the maximum degree of nesting. Using recursion where a simple for() loop would suffice is questionable. To conquer a more complex assessment is recursion's strength, hence this approach.
To find the max and its frequency using O(log2(length)) stack depth usage:
#include <stddef.h>
typedef struct {
int value;
size_t frequency; // `size_t` better to use that `int` for large arrays.
} value_freq;
value_freq findMax(const int *a, size_t length) {
value_freq vf;
if (length <= 1) {
if (length == 0) {
vf.value = INT_MIN; // Degenerate value if the array was size 0.
vf.frequency = 0;
} else {
vf.value = *a;
vf.frequency = 1;
}
} else {
size_t length1sthalf = length / 2;
vf = findMax(a, length1sthalf);
value_freq vf1 = findMax(&a[length1sthalf], length - length1sthalf);
if (vf1.value > vf.value)
return vf1;
if (vf.value == vf1.value)
vf.frequency += vf1.frequency;
}
return vf;
}
Your are not thaaaat far.
In order to save the frequency and the max you can keep a pointer to a structure, then just pass the pointer to the start of your array, the length you want to go through, and a pointer to this struct.
Keep in mind that you should use INT_MIN in limits.h as your initial max (see reset(maxfreq *) in the code below), as int can carry negative values.
The following code does the job recursively:
#include <limits.h>
typedef struct {
int max;
int freq;
} maxfreq;
void reset(maxfreq *mfreq){
mfreq->max = INT_MIN;
mfreq->freq = 0;
}
void findMax(int* a, int length, maxfreq *mfreq){
if(length>0){
if(*a == mfreq->max)
mfreq->freq++;
else if(*a > mfreq->max){
mfreq->freq = 1;
mfreq->max = *a;
}
findMax(a+1, length - 1, mfreq);
}
}
A call to findMax will recall itself as many times as the initial length plus one, each time incrementing the provided pointer and processing the corresponding element, so this is basically just going through all of the elements in a once, and no weird splitting.
this works fine with me :
#include <stdio.h>
#include <string.h>
// define a struct that contains the (max, freq) information
struct arrInfo
{
int max;
int count;
};
struct arrInfo maxArr(int * arr, int max, int size, int count)
{
int maxF;
struct arrInfo myArr;
if(size == 0) // to return from recursion we check the size left
{
myArr.max = max; // prepare the struct to output
myArr.count = count;
return(myArr);
}
if(*arr > max) // new maximum found
{
maxF = *arr; // update the max
count = 1; // initialize the frequency
}
else if (*arr == max) // same max encountered another time
{
maxF = max; // keep track of same max
count ++; // increase frequency
}
else // nothing changes
maxF = max; // keep track of max
arr++; // move the pointer to next element
size --; // decrease size by 1
return(maxArr(arr, maxF, size, count)); // recursion
}
int main()
{
struct arrInfo info; // return of the recursive function
// define an array
int arr[] = {8, 4, 8, 3, 7};
info = maxArr(arr, 0, 5, 1); // call with max=0 size=5 freq=1
printf("max = %d count = %d\n", info.max, info.count);
return 0;
}
when ran, it outputs :
max = 8 count = 3
Notice
In my code example I assumed the numbers to be positive (initializing max to 0), I don't know your requirements but you can elaborate.
The reqirements in your assignment are at least questionable. Just for reference, here is how this should be done in real code (to solve your assignment, refer to the other answers):
int findMax(int length, int* array, int* maxCount) {
int trash;
if(!maxCount) maxCount = &trash; //make sure we ignore it when a NULL pointer is passed in
*maxCount = 0;
int result = INT_MIN;
for(int i = 0; i < length; i++) {
if(array[i] > result) {
*maxCount = 1;
result = array[i];
} else if(array[i] == result) {
(*maxCount)++;
}
}
return result;
}
Always do things as straight forward as you can.

Sorting an array of coordinates by their distance from origin

The code should take an array of coordinates from the user, then sort that array, putting the coordinates in order of their distance from the origin. I believe my problem lies in the sorting function (I have used a quicksort).
I am trying to write the function myself to get a better understanding of it, which is why I'm not using qsort().
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define MAX_SIZE 64
typedef struct
{
double x, y;
}POINT;
double distance(POINT p1, POINT p2);
void sortpoints(double distances[MAX_SIZE], int firstindex, int lastindex, POINT data[MAX_SIZE]);
void printpoints(POINT data[], int n_points);
int main()
{
int n_points, i;
POINT data[MAX_SIZE], origin = { 0, 0 };
double distances[MAX_SIZE];
printf("How many values would you like to enter?\n");
scanf("%d", &n_points);
printf("enter your coordinates\n");
for (i = 0; i < n_points; i++)
{
scanf("%lf %lf", &data[i].x, &data[i].y);
distances[i] = distance(data[i], origin); //data and distances is linked by their index number in both arrays
}
sortpoints(distances, 0, i, data);
return 0;
}
double distance(POINT p1, POINT p2)
{
return sqrt(pow((p1.x - p2.x), 2) + pow((p1.y - p2.y), 2));
}
void printpoints(POINT *data, int n_points)
{
int i;
printf("Sorted points (according to distance from the origin):\n");
for (i = 0; i < n_points; i++)
{
printf("%.2lf %.2lf\n", data[i].x, data[i].y);
}
}
//quicksort
void sortpoints(double distances[MAX_SIZE], int firstindex, int lastindex, POINT data[MAX_SIZE])
{
int indexleft = firstindex;
int indexright = lastindex;
int indexpivot = (int)((lastindex + 1) / 2);
int n_points = lastindex + 1;
double left = distances[indexleft];
double right = distances[indexright];
double pivot = distances[indexpivot];
POINT temp;
if (firstindex < lastindex) //this will halt the recursion of the sorting function once all the arrays are 1-size
{
while (indexleft < indexpivot || indexright > indexpivot) //this will stop the sorting once both selectors reach the pivot position
{
//reset the values of left and right for the iterations of this loop
left = distances[indexleft];
right = distances[indexright];
while (left < pivot)
{
indexleft++;
left = distances[indexleft];
}
while (right > pivot)
{
indexright--;
right = distances[indexright];
}
distances[indexright] = left;
distances[indexleft] = right;
temp = data[indexleft];
data[indexleft] = data[indexright];
data[indexright] = temp;
}
//recursive sorting to sort the sublists
sortpoints(distances, firstindex, indexpivot - 1, data);
sortpoints(distances, indexpivot + 1, lastindex, data);
}
printpoints(data, n_points);
}
Thanks for your help, I have been trying to debug this for hours, even using a debugger.
Ouch! You call sortpoints() with i as argument. That argument, according to your prototype and code, should be the last index, and i is not the last index, but the last index + 1.
int indexleft = firstindex;
int indexright = lastindex; // indexright is pointing to a non-existent element.
int indexpivot = (int)((lastindex + 1) / 2);
int n_points = lastindex + 1;
double left = distances[indexleft];
double right = distances[indexright]; // now right is an undefined value, or segfault.
To fix that, call your sortpoints() function as:
sortpoints (0, n_points-1, data);
The problem is in your sortpoints function. The first while loop is looping infinitely. To test that is it an infinite loop or not place a printf statement
printf("Testing first while loop\n");
in your first while loop. You have to fix that.
There are quite a number of problems, but one of them is:
int indexpivot = (int)((lastindex + 1) / 2);
The cast is unnecessary, but that's trivia. Much more fundamental is that if you are sorting a segment from, say, 48..63, you will be pivoting on element 32, which is not in the range you are supposed to be working on. You need to use:
int indexpivot = (lastindex + firstindex) / 2;
or perhaps:
int indexpivot = (lastindex + firstindex + 1) / 2;
For the example range, these will pivot on element 55 or 56, which is at least within the range.
I strongly recommend:
Creating a print function similar to printpoints() but with the following differences:
Takes a 'tag' string to identify what it is printing.
Takes and prints the distance array too.
Takes the arrays and a pair of offsets.
Use this function inside the sort function before recursing.
Use this function inside the sort function before returning.
Use this function in the main function after you've read the data.
Use this function in the main function after the data is sorted.
Print key values — the pivot distance, the pivot index, at appropriate points.
This allows you to check that your partitioning is working correctly (it isn't at the moment).
Then, when you've got the code working, you can remove or disable (comment out) the printing code in the sort function.

Resources