As part of a programming assignment, I'm required to write a recursive function which determines the largest integer in an array. To quote the exact task:
Write a recursive function that finds the largest number in a given list of
integers.
I have come up with two solutions, the first of which makes two recursive calls:
int largest(int arr[], int length){
if(length == 0)
return 0;
else if(arr[length - 1] > largest(arr,length -1))
return arr[length];
else return largest(arr,length -1);
}
The second one makes only one, however it uses a static variable n:
int largest(int arr[], int length){
static int n = -1;
if(length == 0)
return n;
else if (arr[length - 1] > n)
n = arr[length - 1];
return largest(arr, length - 1);
}
I was wondering whether it would be considered cheating use static variables for such a task. Either way, which one is considered better form? Is there a recursive method which tops both?
I wouldn't say that it's cheating to use static variables this way - I'd say that it's incorrect. :-)
Imagine that you call this function multiple times on a number of different arrays. With the static variable introduced, the value of n never resets between calls, so you may end up returning the wrong value. Generally speaking, it's usually poor coding style to set things up like this, since it makes it really easy to get the wrong answer. Additionally, if your array contains only negative values, you may return -1 as the answer even though -1 is actually bigger than everything in the array.
I do think that the second version has one nice advantage over the first - it's much, much faster because it makes only one recursive call rather than two. Consider using the first version, but updating it so that you cache the value returned by the recursive call so that you don't make two calls. This will exponentially speed up the code; the initial version takes time Θ(2n), while the updated version would take time Θ(n).
There is nothing cheating using a static inside function, recursive or otherwise.
There can be many good reasons for why to do so, but in your case I suspect that you are coming up with a wrong solution -- in as largest will only work once in the lifetime of the program running it.
consider the following (pseudo) code;
main() {
largest([ 9, 8, 7]) // would return 9 -- OK
largest([ 1, 2, 3]) // would return 9 ?? bad
}
The reason being that your largest cannot tell the difference between the two calls, but if that is what you want then that is fine.
Edit:
In answer to your comment, something like this will have a better big-O notation than your initial code;
int largest(int arr[], int length){
int split, lower,upper;
switch (length) {
case 1: return arr[0];
case 2: if (arr[1]>arr[0]) return arr[1]; else return arr[0];
default:
if (len <= 0) throw error;
split = length/2;
lower = largest(arr,split);
upper = largest(arr+split,length-split);
if (lower > upper) return lower; else return upper;
}
}
Alternatively, the obvious solution is;
int largest(int arr[], int length){
if (length <= 0) thor error;
int max = arr[0];
for (int i=1; i<length; i++)
if (arr[i] > max) max = arr[i];
return max;
}
which has no recursion at all
It is actually a terrible design, because on the second execution of the function does not return a correct result.
I don't think you need to debate whether it is cheating, if it is wrong.
The first version is also incorrect, because you return arr[length] instead of arr[length-1]. You can eliminate the second recursive call. What can you do instead of calling the same function (with no side-effects) twice with the same arguments?
In addition to the excellent points in the three prior answers, you should practice having more of a recursion-based mind. (1) Handle the trivial case. (2) For a non-trivial case, make a trivial reduction in the task and recur on the (smaller) remaining problem.
I propose that your proper base case is a list of one item: return that item. An empty list has no largest element.
For the recursion case, check the first element against the max of the rest of the list; return the larger. In near-code form, this looks like the below. It makes only one recursive call, and has only one explicit local variable -- and that is to serve as an alias for the recursion result.
int largest(int arr[], int length){
if(length == 1)
// if only one element, return it
return arr[0];
else n = largest(arr,length-1))
// return the larger of the first element or the remaining largest.
return arr[length-1] > n ? arr[length-1] : n
}
Is there a recursive method which tops both?
Recursion gets a bad name when with N elements cause a recursion depth of N like with return largest(arr,length -1);
To avoid this, insure the length on each recursion is halved.
The maximum recursive depth is O(log2(N))
int largest(int arr[], int length) {
if (length <= 0) return INT_MIN;
int big = arr[0];
while (length > 1) {
int length_r = length / 2;
int length_l = length - length_r;
int big_r = largest(&arr[length_l], length_r);
if (big_r > big) big = big_r;
length = length_l;
}
return big;
}
A sneaky and fast method that barely uses recursion as finding the max is trivial with a loop.
int largest(int arr[], int length) {
if (length <= 0) return INT_MIN;
int max = largest(NULL, -1);
while (length) {
length--;
if (arr[length] > max) max = arr[length];
}
return max;
}
I'm trying to create some code to fish out records from a list of about 200k to 1million records. Obviously, I would want this process to be as fast as possible. The basic idea is as follows, the records in the large list are a combination of numbers which are to be kept together. For Example:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400013,400076,800013,800076
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
The maximum length of the record is 20 which is why the additional zeroes. Let's not worry about these for a moment. So, I want to "fish" out some records such that no repetitions are observed. If one repetition is there, I can discard that record and no longer look at it further. Thus, I must compile a list which looks like this:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400076,400097,800076,800097
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,200032,200078,500032,500078
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,300043,300083,600043,600083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,600026,600077,900026,900077
0,0,0,0,0,0,0,0,0,0,0,0,0,0,100008,100028,400028,400056,600008,600056
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,400042,400098,500042,500098
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,86,500015,500086
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,700024,700083,900024,900083
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,100003,100047,800003,800047
Note how in the above list, record no. 8 is missing because the number 400076 already exists in a previous record.
The code I am using to do this is as follows:
void Make_List(ConfigList *pathgroups, ConfigList *configlist)
{
int i,j,k,l,flag,pg_num=0,len,p_num=0;
for(i = 0;i<configlist->num_total;i++)
{
flag = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
if(configlist->pathid[i][j])
{
for(k = 0;k<pg_num;k++)
{
for(l = pathgroups->configsize-1;l>=0;l--)
{
if(pathgroups->pathid[k][l])
{
if(configlist->pathid[i][j]==pathgroups->pathid[k][l])
{
flag++;
break;
}
}
else
{
break;
}
}
if(flag)
{
break;
}
}
}
else
{
break;
}
if(flag)
{
break;
}
}
if(!flag)
{
len = 0;
for(j = configlist->configsize-1;j>=0;j--)
{
pathgroups->pathid[pg_num][j]=configlist->pathid[i][j];
if(configlist->pathid[i][j])
{
len++;
}
}
pg_num++;
p_num+=len;
if(p_num>=totpaths)
{
break;
}
}
}
Print_ConfigList(stderr,pathgroups);
}
The structure ConfigList basically stores the 2D array along with other things used in different parts of the program.
num_total tells us the number of rows in the array whereas configsize tells us the number of columns in the array.
totpaths is a breakpoint which terminates the loop early in case assignment is completely finished.
Checking if each element is repeated for each new element analyzed has a computational cost of O(N^2) which, given your large input set, is far too much.
Basically, what you need is a fast access data-structure where you can keep a count of how many times your record has appeared or at least a boolean flag.
The easiest way to do this is to have an array where the position represent each possible value and the array value the count of times the position value has appeared (or its boolean value of existence). However, if your data range is too large you can do this because the memory used to store the array is proportional to the range size.
The alternative to avoid that is to use Hash tables or sets.
As you has established in your comments above, your integer range is [0,99999999] so if you wanted to use a vector to keep track of the presence or not of each single value you would need approximately 96 MB to store it in memory.
This is an example using byte arrays:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#define MAX_IN_RANGE 99999999
int main()
{
char * isInInput = (char*)malloc(MAX_IN_RANGE+1);
memset(isInInput,0,MAX_IN_RANGE+1);
size_t i;
int inputExample[] = {1,3,5,2,1,5};
for(i = 0; i < 6; i++)
{
int value = inputExample[i];
printf("%d\n",value);
if(!isInInput[value])
{
printf("Add value %d to your collection\n", value);
isInInput[value] = 1;
}
else
{
printf("%d is repeated\n", value);
}
}
free(isInInput);
}
To use hash tables instead you can rely on libraries such as Judy in order to avoid implementing your own hash table.