Array percentage algorithm implementation - c

So I just stared programming in C a few days ago and I have this program which takes an unsorted file full of integers, sorts it using quicksort
1st algorithm
Any suggestions on what I have done wrong in this?

From what you have described, it sounds like you are almost there. You are attempting to get the first element of a collection that has a value equal to (or just greather than) 90% of all the other members of the collection. You have already done the sort. The rest should be simply following these steps (if I have understood your question):
1) sort collection into an into array (you've already done this I think)
2) count numbers in collection, store in float n; //number of elements in collection
3) index through sorted array to the 0.9*n th element, (pick first one beyond that point not a duplicate of previous)
4) display results
Here is an implementation (sort of, I did not store n) of what I have described: (ignore the random number generator, et al., it is just a fast way to get an array)
#include <ansi_c.h>
#include <windows.h>
int randomGenerator(int min, int max);
int NotUsedRecently (int number);
int cmpfunc (const void * a, const void * b);
int main(void)
{
int array[1000];
int i;
for(i=0;i<1000;i++)
{
array[i]=randomGenerator(1, 1000);
Sleep(1);
}
//sort array
qsort(array, 1000, sizeof(int), cmpfunc);
//pick the first non repeat 90th percent and print
for(i=900;i<999;i++)
{
if(array[i+1] != array[i])
{
printf("this is the first number meeting criteria: %d", array[i+1]);
break;
}
}
getchar();
return 0;
}
int cmpfunc (const void * a, const void * b)
{
return ( *(int*)a - *(int*)b );
}
int randomGenerator(int min, int max)
{
int random=0, trying=0;
trying = 1;
srand(clock());
while(trying)
{
random = (rand()/32767.0)*(max+1);
(random >= min) ? (trying = 0) : (trying = 1);
}
return random;
}
And here is the output for my first randomly generated array (centering around the 90th percentile), compared against what the algorithm selected: Column on left is the element number, on the right is the sorted list of randomly generated integers. (notice it skips the repeats to ensure smallest value past 90%)
In summary: As I said, I think you are already, almost there. Notice how similar this section of my code is to yours:
You have something already, very similar. Just modify it to start looking at 90% index of the array (whatever that is), then just pick the first value that is not equal to the previous.

One issue in your code is that you need a break case for your second algorithm, once you find the output. Also, you cannot declare variables in your for loop, except under certain conditions. I'm not sure how you got it to compile.

According this part:
int output = array[(int)(floor(0.9*count)) + 1];
int x = (floor(0.9*count) + 1);
while (array[x] == array[x + 1])
{
x = x + 1;
}
printf(" %d ", output);
In while you do not check if x has exceeded count... (What if all the top 10% numbers are equal?)
You set output in first line and print it in last, but do not do antything with it in meantime. (So all those lines in between do nothing).
You definitely are on the right track.

Related

How can I test sorting algorithms for stability in C?

I have three sorting algorithms: bubble, insertion, selection.
I need to experimentally test them for stability. I understand that I need to somehow compare the same numbers before sorting and after, but I don't know how to do it in C.
I need some advice on how this can be implemented.
As chqrlie has rightly pointed out, the following approach does NOT prove absence of stability mistakes for all inputs. It is however something of a smoke-test, which actually has a chance to detect implementation mistakes. This is at least a small step ahead to trying to test based on sequences of numbers, which do NOT allow detecting ANY stability mistake.
In order to test stability you therefore need to do sorting on elements which have the same sorting key AND can still be told apart outside of the sorting.
One way would be to sort elements with two keys. One is non-unique, i.e. there are pairs of elements with the same first key. The other, second key is unique, i.e. no two elements have the same second key.
Then create several pairs of elements with same first key and different second key.
Then sort.
Then verify that the relative position of elements with same first key have not changed, use the non-identical second key to do so.
Then sort again and verify that the elements with same first key have again not changed (relative) location. This is for paranoia, the algorithm (strictly speaking the tested implementation) might be stable from an unsorted list, but unstable on a sorted one.
Then restart with unsorted list/array, similar to before, but with reversed relative location of elements with identical first key.
Then sort and verify unchanged relative location AND reverse relative location compared to first sorting.
Then sort again and verify for paranoia.
As mentioned this will not detect ALL stability mistakes, but at least there are two improvements over a simple "do and verify" test here.
test unsorted and sorted (described as paranoia), this will detect all toggle-based stability mistakes, i.e. any mechanisms to always change the relative order of same-keyed pairs
test both "directions" of same keyed pairs, this at leat ensures detection of all stability mistakes which are caused by some kind of "preferred order", by making sure that the non-preferred order is among the test inputs and would get sorted into the preferred order
If one is going to test it probabilistically, one needs to have, at minimum, key and a monotonic value to test if they are the same order before and after. There are several ways that one can do this. I would think that one of the simpler, more effective, ways is to sort them by oddness: the key is { even, odd } and the value is the integer. Having only two values increases the chance that the non-stable sort algorithms put it in the wrong spot per item. Here, the qsort function is tested.
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
static int cmp(const int *const a, const int *const b)
{ return (*a & 1) - (*b & 1); }
static int void_cmp(const void *a, const void *b) { return cmp(a, b); }
int main(void) {
unsigned a[10], counter = 0;
size_t i;
const size_t a_size = sizeof a / sizeof *a;
const unsigned seed = (unsigned)clock();
/* http://c-faq.com/lib/randrange.html */
printf("Seed %u.\n", seed), srand(seed);
for(i = 0; i < a_size; i++)
a[i] = (counter += 1 + rand() / (RAND_MAX / (2 - 1 + 1) + 1));
qsort(a, a_size, sizeof *a, &void_cmp);
for(i = 0; i < a_size; i++) printf("%s%u", i ? ", " : "", a[i]);
printf("\n");
/* Even numbers. */
for(counter = 0, i = 0; i < a_size && !(a[i] & 1) && counter < a[i];
counter = a[i], i++);
if(i == a_size) goto pass;
if(!(a[i] & 1)) goto fail; /* Not stable by `counter >= a[i]`. */
/* Odd numbers. */
for(counter = 0; i < a_size && counter < a[i]; counter = a[i], i++);
if(i == a_size) goto pass;
fail:
printf("Not stable.\n");
return EXIT_FAILURE;
pass:
printf("Possibly stable.\n");
return EXIT_SUCCESS;
}
Note that this doesn't prove that an algorithm is stable, but after sorting a large number of elements, the chance that it gets it right by chance is exponentially small.

Need help solving a problem with an array

Task: Calculate the 25 values of the function y = ax'2 + bx + c on the interval [e, f], save them in the array Y and find the minimum and maximum values in this array.
#include <stdio.h>
#include <math.h>
int main()
{
float Y[25];
int i;
int x=3,a,b,c;
double y = a*pow(x,2)+b*x+c;
printf("a = ", b);
scanf("%d", &a);
printf("b = ", a);
scanf("%d", &b);
printf("c = ", c);
scanf("%d", &c);
for(i=0;i<25;i++)
{
printf("%f",y); //output results, not needed
x++;
}
system("pause");
}
Problems:
Cant understand how can I use "interval [e,f]" here
Cant understand how to save values to array using C libraries
Cant understand how to write/make a cycle, which will find the
minimum and maximum values
Finally, dont know what exactly i need to do to solve task
You must first ask the user for the values of a, b, c or initialize those variables, and ask for the interval values of e, f, or initialize those variables.
Now you must calculate double interval= (f - e)/25.0 so you have the interval.
Then you must have a loop for (int i=0, double x=e; i<25; i++, x += interval) and calculate each value of the function. You can choose to store the result in an array (declare one at the top) or print them directly.
Problems:
Cant understand how can I use "interval [e,f]" here
(f-e) / 25(interval steps)
Cant understand how to save values to array using C libraries
You need to use some form of loop to traverse the array and save the result of your calculation at every interval step. Something like this:
for(int i = 0; i < SIZE; i++)
// SIZE in this case 25, so you traverse from 0-24 since arrays start 0
Cant understand how to write/make a cycle, which will find the minimum and maximum values
For both cases:
traverse the array with some form of loop and check every item e.g. (again) something like this: for(int i = 0; i < SIZE; i++)
For min:
Initialize a double value(key) with the first element of your array
Loop through your array searching for elements smaller than your initial key value.
if your array at position i is smaller than key, save key = array[i];
For max:
Initialize a double value(key) with 0;
Loop through your array searching for elements bigger than your initial key value.
if your array at position i is bigger than key, save key = array[i];
Finally, dont know what exactly i need to do to solve task
Initialize your variables(yourself or through user input)
Create a function that calculates a*x^2 + b*x + c n times for every step of your interval.
Create a function for min & max that loops through your array and returns the smallest/biggest value.
Thats pretty much it. I will refrain from posting code(for now), since this looks like an assignment to me and I am confident that you can write the code with the information #Paul Ogilvie & I have provided yourself. Good Luck
#include<stdio.h>
#include<math.h>
int main()
{
double y[25];
double x,a,b,c,e,f;
int i,j=0;
printf("Enter a:",&a);
scanf("%lf",&a);
printf("Enter b:",&b);
scanf("%lf",&b);
printf("Enter c:",&c);
scanf("%lf",&c);
printf("Starting Range:",&e);
scanf("%lf",&e);
printf("Ending Range:",&f);
scanf("%lf",&f);
for(i=e;i<=f;i++)
{
y[j++]=(a*pow(i,2))+(b*i)+c;
}
printf("\nThe Maximum element in the given interval is %lf",y[j-1]);
printf("\nThe Minimum element in the given interval is %lf",y[0]);
}
Good LUCK!

Is it cheating to use 'static' when writing a recursive algorithm?

As part of a programming assignment, I'm required to write a recursive function which determines the largest integer in an array. To quote the exact task:
Write a recursive function that finds the largest number in a given list of
integers.
I have come up with two solutions, the first of which makes two recursive calls:
int largest(int arr[], int length){
if(length == 0)
return 0;
else if(arr[length - 1] > largest(arr,length -1))
return arr[length];
else return largest(arr,length -1);
}
The second one makes only one, however it uses a static variable n:
int largest(int arr[], int length){
static int n = -1;
if(length == 0)
return n;
else if (arr[length - 1] > n)
n = arr[length - 1];
return largest(arr, length - 1);
}
I was wondering whether it would be considered cheating use static variables for such a task. Either way, which one is considered better form? Is there a recursive method which tops both?
I wouldn't say that it's cheating to use static variables this way - I'd say that it's incorrect. :-)
Imagine that you call this function multiple times on a number of different arrays. With the static variable introduced, the value of n never resets between calls, so you may end up returning the wrong value. Generally speaking, it's usually poor coding style to set things up like this, since it makes it really easy to get the wrong answer. Additionally, if your array contains only negative values, you may return -1 as the answer even though -1 is actually bigger than everything in the array.
I do think that the second version has one nice advantage over the first - it's much, much faster because it makes only one recursive call rather than two. Consider using the first version, but updating it so that you cache the value returned by the recursive call so that you don't make two calls. This will exponentially speed up the code; the initial version takes time Θ(2n), while the updated version would take time Θ(n).
There is nothing cheating using a static inside function, recursive or otherwise.
There can be many good reasons for why to do so, but in your case I suspect that you are coming up with a wrong solution -- in as largest will only work once in the lifetime of the program running it.
consider the following (pseudo) code;
main() {
largest([ 9, 8, 7]) // would return 9 -- OK
largest([ 1, 2, 3]) // would return 9 ?? bad
}
The reason being that your largest cannot tell the difference between the two calls, but if that is what you want then that is fine.
Edit:
In answer to your comment, something like this will have a better big-O notation than your initial code;
int largest(int arr[], int length){
int split, lower,upper;
switch (length) {
case 1: return arr[0];
case 2: if (arr[1]>arr[0]) return arr[1]; else return arr[0];
default:
if (len <= 0) throw error;
split = length/2;
lower = largest(arr,split);
upper = largest(arr+split,length-split);
if (lower > upper) return lower; else return upper;
}
}
Alternatively, the obvious solution is;
int largest(int arr[], int length){
if (length <= 0) thor error;
int max = arr[0];
for (int i=1; i<length; i++)
if (arr[i] > max) max = arr[i];
return max;
}
which has no recursion at all
It is actually a terrible design, because on the second execution of the function does not return a correct result.
I don't think you need to debate whether it is cheating, if it is wrong.
The first version is also incorrect, because you return arr[length] instead of arr[length-1]. You can eliminate the second recursive call. What can you do instead of calling the same function (with no side-effects) twice with the same arguments?
In addition to the excellent points in the three prior answers, you should practice having more of a recursion-based mind. (1) Handle the trivial case. (2) For a non-trivial case, make a trivial reduction in the task and recur on the (smaller) remaining problem.
I propose that your proper base case is a list of one item: return that item. An empty list has no largest element.
For the recursion case, check the first element against the max of the rest of the list; return the larger. In near-code form, this looks like the below. It makes only one recursive call, and has only one explicit local variable -- and that is to serve as an alias for the recursion result.
int largest(int arr[], int length){
if(length == 1)
// if only one element, return it
return arr[0];
else n = largest(arr,length-1))
// return the larger of the first element or the remaining largest.
return arr[length-1] > n ? arr[length-1] : n
}
Is there a recursive method which tops both?
Recursion gets a bad name when with N elements cause a recursion depth of N like with return largest(arr,length -1);
To avoid this, insure the length on each recursion is halved.
The maximum recursive depth is O(log2(N))
int largest(int arr[], int length) {
if (length <= 0) return INT_MIN;
int big = arr[0];
while (length > 1) {
int length_r = length / 2;
int length_l = length - length_r;
int big_r = largest(&arr[length_l], length_r);
if (big_r > big) big = big_r;
length = length_l;
}
return big;
}
A sneaky and fast method that barely uses recursion as finding the max is trivial with a loop.
int largest(int arr[], int length) {
if (length <= 0) return INT_MIN;
int max = largest(NULL, -1);
while (length) {
length--;
if (arr[length] > max) max = arr[length];
}
return max;
}

Array index of smallest number greater than a given number in a sorted array using C

Need to find the index of a number, that may or may not be present in the array. I tried the below code:
#include <stdio.h>
#include <stdlib.h>
int cmp(const void *lhs, const void *rhs){
return ( *(long long*)lhs - *(long long*)rhs );
}
int main(){
int size = 9;
long long a[] = {16426799,16850699,17802287,18007499,18690047,18870191,18870191,19142027,19783871};
long long x = 17802287;
long long *p = (long long *)bsearch(&x, a, size, sizeof(long long), cmp);
if (p != NULL)
printf("%lld\n", p - a);
return 0;
}
The above code works if the number, in this case 17802287 is present in the array a, but fails if the number is not present in a, e.g. doesn't give any output for x=18802288, I would like to get the index i=5 in that case 5th element onwards the elements are greater than 18802288.
Also the actual array size will have number of elements more than 4 million, would the same code work?
Thanks for the help.
From the man page for bsearch:
The bsearch() function returns a pointer to a matching member of
the array, or NULL if no match is found. If there are multiple
elements that match the key, the element returned is unspecified.
So the function will return NULL if the element in question is not found. If you want to find the first element greater than or equal to the number in question, you'll need to roll your own function to do that.
One of the possible solution can be:
int i, outcome = -1;
for( i = 0; i < size; i++ )
{
if( x == a[i] )
{
outcome = i;
break;
}
}
printf("%d\n", outcome);
You need to write a function that does approximately this:
bsearch_geq (number array low high)
if low is equal to high return high
let halfway be average of low and high
if array[halfway] is equal to number then return halfway
if array[halfway] is greater than number then
return result of "bsearch_geq number array low halfway"
else
return result of "bsearch_geq number array halfway high"
That'll get you 99% of the way, I think, but I'll leave it as an exercise to the reader to figure out the corner cases. The main one I can see is what happens when you get down to just two numbers because the naive "average" may cause infinite recursion.
If you can have multiple occurrences of the same number in the array then you'll need to drop the if array[halfway] is equal]" line.
You should ensure your solution uses tail-recursion for efficiency, but it's not too critical as 4m data-entries only amounts to about 15 recursive calls.

How to improve execution speed on large data sort in C

I managed to roll off an insertion sort routine as shown:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct{
int n;
char l;
char z;
} dat;
void sortx(dat* y){
char tmp[sizeof(dat)+1];
dat *sp=y;
while(y->l){
dat *ip=y;
while(ip>sp && ip->n < (ip-1)->n){
memcpy(tmp,ip,sizeof(dat));
memcpy(ip,ip-1,sizeof(dat));
memcpy(ip-1,tmp,sizeof(dat));
ip--;
}
y++;
}
}
void printa(dat* y){
while(y->l){printf("%c %d,",y->l,y->n);y++;}
printf("\n");
}
int main(int argc,char* argv[]){
const long sz=10000;
dat* new=calloc(sz+2,sizeof(dat));
dat* randx=new;
//fill struct array with random values
int i;
for (i = 0 ; i < sz ; i++) {
randx->l = (unsigned char)(65+(rand() % 25));
randx->n = (rand() % 1000);randx++;
}
//sort - takes forever
sortx(new);
printa(new);
free(new);
return 0;
}
My sorting routine was partly derived from: http://www.programmingsimplified.com/c/source-code/c-program-insertion-sort
but because I am dealing with sorting the array based on the numeric value in the struct, memcpy works for me so far.
The computer I'm using to execute this code has a Pentium 1.6Ghz Processor and when I change sz in the main function to at least 20000, I notice I have to wait two seconds to see the results on the screen.
The reason why I'm testing large numbers is because I want to process server logs in C and will be sorting information by timestamps and sometimes the logs can become very large, and I don't want to put too much strain on the CPU as it is running other processes already such as apache.
Is there anyway I can improve this code so I don't have to wait two seconds to see 20000 structs sorted?
There is already a function that does this, and it's built in in the C standard library: qsort. You just have to provide suitable comparison function.
This function has to return -1 if the item taken as a left argument should be put earlier in the desired order, 1 if it should be put later, or 0 if the items are to be considered equal by qsort.
int dat_sorter(const void* l, const void* r)
{
const dat* left = (const dat*)l;
const dat* right = (const dat*)r;
if(left->n > right->n)
return 1;
else if(left->n < right->n)
return -1;
else
return 0;
}
void sortx(dat* y)
{
/* find the length */
dat* it = y;
size_t count = 0;
while(it->l)
{
count++;
it++;
}
/* do the sorting */
qsort(y, count, sizeof(dat), dat_sorter);
}
If you want to speed it up even more, you can make sortx function take length of the array, so the function won't need to figure it out on its own.
Use quick sort, heap sort, or bottom up merge sort. Wiki has examples of these in their articles, and typically have more complete examples on each article's talk page.
Insertion sort has O(n^2) time complexity, and there are other algorithms out there that will give you O(nlogn) time complexity like mergesort, quicksort, and heapsort. It looks like you are sorting by an integer, so you also might want to consider using LSD radix sort, which is O(n) time complexity.

Resources