Cycling through interval in C efficiently - c

I have dynamically allocated array consisting of a lot of numbers (200 000+) and I have to find out, if (and how many) these numbers are contained in given interval. There can be duplicates and all the numbers are in random order.
Example of numbers I get at the beginning:
{1,2,3,1484984,48941651,489416,1816,168189161,6484,8169181,9681916,121,231,684979,795641,231484891,...}
Given interval:
<2;150000>
I created a simple algorithm with 2 for loops cycling through all numbers:
for( int j = 0; j <= numberOfRepeats; j++){
for( int i = 0; i < arraySize; i++){
if(currentNumber == array[i]){
counter++;
}
}
currentNumber++;
}
printf(" -> %d\n", counter);
}
This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?
Example of working program:
{ 1, 7, 22, 4, 7, 5, 11, 9, 1 }
<4;7>
-> 4

The problem was simple as the single comment in my question answered it - there was no reason for second loop. Single loop could do it alone.
My changed code:
for(int i = 0; i <= arraySize-1; i++){
if(array[i] <= endOfInterval && array[i] >= startOfInterval){
counter++;
}

This algorithm is too slow for my task. Is there more efficient way for me to implement my solution? Could sorting the arrays by value help in this case / wouldn't that be too slow?
Of course, it is slow. A single pass algorithm to count the number of elements that are in the set should suffice, just count them in a single pass if they pass the test (be n[i] >= lower bound && be n[i] < upper bound or similar approach) will do the work.
Only in case you need to consider duplicates (e.g. not counting them) you will need to consider if you have already touched them or no. In that case, the sorting solution will be faster (a qsort(3) call is O(nlog(n)) against the O(nn) your double loop is doing, so it will run in an almost linear, then you make a second pass over the data (converting your complexity to O(nlog(n) + n), still lower than O(nn) for the large amount of data you have.
Sorting has the advantage that puts all the repeated key values together, so you have to consider only if the last element you read was the same as the one you are processing now, if it is different, then count it only if it is in the specified range.
One final note: Reading a set of 200,000 integers into an array to filter them, based on some criteria is normally a bad, non-scalable way to solve a problem. Your problem (select the elements that belong to a given interval) allow you for a scalable and better solution by streaming the problem (you read a number, check if it is in the interval, then output it, or count it, or whatever you like to do on it), without using a large amount of memory to hold them all before starting. That is far better way to solve a problem, as it allows you to read a true unbounded set of numbers (coming e.g. from a file) and producing an output based on that:
#include <stdio.h>
#define A (2)
#define B (150000)
int main()
{
int the_number;
size_t count = 0;
int res;
while ((res = scanf("%d", &the_number)) > 0) {
if (the_number >= A && the_number <= B)
count++;
}
printf("%zd numbers fitted in the range\n", count);
}
on this example you can give the program 1.0E26 numbers (assuming that you have an input file system large enough to hold a file this size) and your program will be able to handle it (you cannot create an array with capacity to hold 10^26 values)

Related

Given an array of integers of size n+1 consisting of the elements [1,n]. All elements are unique except one which is duplicated k times

I have been attempting to solve the following problem:
You are given an array of n+1 integers where all the elements lies in [1,n]. You are also given that one of the elements is duplicated a certain number of times, whilst the others are distinct. Develop an algorithm to find both the duplicated number and the number of times it is duplicated.
Here is my solution where I let k = number of duplications:
struct LatticePoint{ // to hold duplicate and k
int a;
int b;
LatticePoint(int a_, int b_) : a(a_), b(b_) {}
}
LatticePoint findDuplicateAndK(const std::vector<int>& A){
int n = A.size() - 1;
std::vector<int> Numbers (n);
for(int i = 0; i < n + 1; ++i){
++Numbers[A[i] - 1]; // A[i] in range [1,n] so no out-of-access
}
int i = 0;
while(i < n){
if(Numbers[i] > 1) {
int duplicate = i + 1;
int k = Numbers[i] - 1;
LatticePoint result{duplicate, k};
return LatticePoint;
}
So, the basic idea is this: we go along the array and each time we see the number A[i] we increment the value of Numbers[A[i]]. Since only the duplicate appears more than once, the index of the entry of Numbers with value greater than 1 must be the duplicate number with the value of the entry the number of duplications - 1. This algorithm of O(n) in time complexity and O(n) in space.
I was wondering if someone had a solution that is better in time and/or space? (or indeed if there are any errors in my solution...)
You can reduce the scratch space to n bits instead of n ints, provided you either have or are willing to write a bitset with run-time specified size (see boost::dynamic_bitset).
You don't need to collect duplicate counts until you know which element is duplicated, and then you only need to keep that count. So all you need to track is whether you have previously seen the value (hence, n bits). Once you find the duplicated value, set count to 2 and run through the rest of the vector, incrementing count each time you hit an instance of the value. (You initialise count to 2, since by the time you get there, you will have seen exactly two of them.)
That's still O(n) space, but the constant factor is a lot smaller.
The idea of your code works.
But, thanks to the n+1 elements, we can achieve other tradeoffs of time and space.
If we have some number of buckets we're dividing numbers between, putting n+1 numbers in means that some bucket has to wind up with more than expected. This is a variant on the well-known pigeonhole principle.
So we use 2 buckets, one for the range 1..floor(n/2) and one for floor(n/2)+1..n. After one pass through the array, we know which half the answer is in. We then divide that half into halves, make another pass, and so on. This leads to a binary search which will get the answer with O(1) data, and with ceil(log_2(n)) passes, each taking time O(n). Therefore we get the answer in time O(n log(n)).
Now we don't need to use 2 buckets. If we used 3, we'd take ceil(log_3(n)) passes. So as we increased the fixed number of buckets, we take more space and save time. Are there other tradeoffs?
Well you showed how to do it in 1 pass with n buckets. How many buckets do you need to do it in 2 passes? The answer turns out to be at least sqrt(n) bucekts. And 3 passes is possible with the cube root. And so on.
So you get a whole family of tradeoffs where the more buckets you have, the more space you need, but the fewer passes. And your solution is merely at the extreme end, taking the most spaces and the least time.
Here's a cheekier algorithm, which requires only constant space but rearranges the input vector. (It only reorders; all the original elements are still present at the end.)
It's still O(n) time, although that might not be completely obvious.
The idea is to try to rearrange the array so that A[i] is i, until we find the duplicate. The duplicate will show up when we try to put an element at the right index and it turns out that that index already holds that element. With that, we've found the duplicate; we have a value we want to move to A[j] but the same value is already at A[j]. We then scan through the rest of the array, incrementing the count every time we find another instance.
#include <utility>
#include <vector>
std::pair<int, int> count_dup(std::vector<int> A) {
/* Try to put each element in its "home" position (that is,
* where the value is the same as the index). Since the
* values start at 1, A[0] isn't home to anyone, so we start
* the loop at 1.
*/
int n = A.size();
for (int i = 1; i < n; ++i) {
while (A[i] != i) {
int j = A[i];
if (A[j] == j) {
/* j is the duplicate. Now we need to count them.
* We have one at i. There's one at j, too, but we only
* need to add it if we're not going to run into it in
* the scan. And there might be one at position 0. After that,
* we just scan through the rest of the array.
*/
int count = 1;
if (A[0] == j) ++count;
if (j < i) ++count;
for (++i; i < n; ++i) {
if (A[i] == j) ++count;
}
return std::make_pair(j, count);
}
/* This swap can only happen once per element. */
std::swap(A[i], A[j]);
}
}
/* If we get here, every element from 1 to n is at home.
* So the duplicate must be A[0], and the duplicate count
* must be 2.
*/
return std::make_pair(A[0], 2);
}
A parallel solution with O(1) complexity is possible.
Introduce an array of atomic booleans and two atomic integers called duplicate and count. First set count to 1. Then access the array in parallel at the index positions of the numbers and perform a test-and-set operation on the boolean. If a boolean is set already, assign the number to duplicate and increment count.
This solution may not always perform better than the suggested sequential alternatives. Certainly not if all numbers are duplicates. Still, it has constant complexity in theory. Or maybe linear complexity in the number of duplicates. I am not quite sure. However, it should perform well when using many cores and especially if the test-and-set and increment operations are lock-free.

Making a character array rotate its cells left/right n times

I'm totally new here but I heard a lot about this site and now that I've been accepted for a 7 months software development 'bootcamp' I'm sharpening my C knowledge for an upcoming test.
I've been assigned a question on a test that I've passed already, but I did not finish that question and it bothers me quite a lot.
The question was a task to write a program in C that moves a character (char) array's cells by 1 to the left (it doesn't quite matter in which direction for me, but the question specified left). And I also took upon myself NOT to use a temporary array/stack or any other structure to hold the entire array data during execution.
So a 'string' or array of chars containing '0' '1' '2' 'A' 'B' 'C' will become
'1' '2' 'A' 'B' 'C' '0' after using the function once.
Writing this was no problem, I believe I ended up with something similar to:
void ArrayCharMoveLeft(char arr[], int arrsize, int times) {
int i;
for (i = 0; i <= arrsize ; i++) {
ArraySwap2CellsChar(arr, i, i+1);
}
}
As you can see the function is somewhat modular since it allows to input how many times the cells need to move or shift to the left. I did not implement it, but that was the idea.
As far as I know there are 3 ways to make this:
Loop ArrayCharMoveLeft times times. This feels instinctively inefficient.
Use recursion in ArrayCharMoveLeft. This should resemble the first solution, but I'm not 100% sure on how to implement this.
This is the way I'm trying to figure out: No loop within loop, no recursion, no temporary array, the program will know how to move the cells x times to the left/right without any issues.
The problem is that after swapping say N times of cells in the array, the remaining array size - times are sometimes not organized. For example:
Using ArrayCharMoveLeft with 3 as times with our given array mentioned above will yield
ABC021 instead of the expected value of ABC012.
I've run the following function for this:
int i;
char* lastcell;
if (!(times % arrsize))
{
printf("Nothing to move!\n");
return;
}
times = times % arrsize;
// Input checking. in case user inputs multiples of the array size, auto reduce to array size reminder
for (i = 0; i < arrsize-times; i++) {
printf("I = %d ", i);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, i+times);
}
As you can see the for runs from 0 to array size - times. If this function is used, say with an array containing 14 chars. Then using times = 5 will make the for run from 0 to 9, so cells 10 - 14 are NOT in order (but the rest are).
The worst thing about this is that the remaining cells always maintain the sequence, but at different position. Meaning instead of 0123 they could be 3012 or 2301... etc.
I've run different arrays on different times values and didn't find a particular pattern such as "if remaining cells = 3 then use ArrayCharMoveLeft on remaining cells with times = 1).
It always seem to be 1 out of 2 options: the remaining cells are in order, or shifted with different values. It seems to be something similar to this:
times shift+direction to allign
1 0
2 0
3 0
4 1R
5 3R
6 5R
7 3R
8 1R
the numbers change with different times and arrays. Anyone got an idea for this?
even if you use recursion or loops within loops, I'd like to hear a possible solution. Only firm rule for this is not to use a temporary array.
Thanks in advance!
If irrespective of efficiency or simplicity for the purpose of studying you want to use only exchanges of two array elements with ArraySwap2CellsChar, you can keep your loop with some adjustment. As you noted, the given for (i = 0; i < arrsize-times; i++) loop leaves the last times elements out of place. In order to correctly place all elements, the loop condition has to be i < arrsize-1 (one less suffices because if every element but the last is correct, the last one must be right, too). Of course when i runs nearly up to arrsize, i+times can't be kept as the other swap index; instead, the correct index j of the element which is to be put at index i has to be computed. This computation turns out somewhat tricky, due to the element having been swapped already from its original place. Here's a modified variant of your loop:
for (i = 0; i < arrsize-1; i++)
{
printf("i = %d ", i);
int j = i+times;
while (arrsize <= j) j %= arrsize, j += (i-j+times-1)/times*times;
printf("j = %d ", j);
PrintArray(arr, arrsize);
ArraySwap2CellsChar(arr, i, j);
}
Use standard library functions memcpy, memmove, etc as they are very optimized for your platform.
Use the correct type for sizes - size_t not int
char *ArrayCharMoveLeft(char *arr, const size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
if(ntimes)
{
char temp[ntimes];
memcpy(temp, arr, ntimes);
memmove(arr, arr + ntimes, arrsize - ntimes);
memcpy(arr + arrsize - ntimes, temp, ntimes);
}
return arr;
}
But you want it without the temporary array (more memory efficient, very bad performance-wise):
char *ArrayCharMoveLeft(char *arr, size_t arrsize, size_t ntimes)
{
ntimes %= arrsize;
while(ntimes--)
{
char temp = arr[0];
memmove(arr, arr + 1, arrsize - 1);
arr[arrsize -1] = temp;
}
return arr;
}
https://godbolt.org/z/od68dKTWq
https://godbolt.org/z/noah9zdYY
Disclaimer: I'm not sure if it's common to share a full working code here or not, since this is literally my first question asked here, so I'll refrain from doing so assuming the idea is answering specific questions, and not providing an example solution for grabs (which might defeat the purpose of studying and exploring C). This argument is backed by the fact that this specific task is derived from a programing test used by a programing course and it's purpose is to filter out applicants who aren't fit for intense 7 months training in software development. If you still wish to see my code, message me privately.
So, with a great amount of help from #Armali I'm happy to announce the question is answered! Together we came up with a function that takes an array of characters in C (string), and without using any previously written libraries (such as strings.h), or even a temporary array, it rotates all the cells in the array N times to the left.
Example: using ArrayCharMoveLeft() on the following array with N = 5:
Original array: 0123456789ABCDEF
Updated array: 56789ABCDEF01234
As you can see the first cell (0) is now the sixth cell (5), the 2nd cell is the 7th cell and so on. So each cell was moved to the left 5 times. The first 5 cells 'overflow' to the end of the array and now appear as the Last 5 cells, while maintaining their order.
The function works with various array lengths and N values.
This is not any sort of achievement, but rather an attempt to execute the task with as little variables as possible (only 4 ints, besides the char array, also counting the sub function used to swap the cells).
It was achieved using a nested loop so by no means its efficient runtime-wise, just memory wise, while still being self-coded functions, with no external libraries used (except stdio.h).
Refer to Armali's posted solution, it should get you the answer for this question.

What is the most efficient (fastest) way to find an N number of the largest integers in an array in C?

Let's have an array of size 8
Let's have N be 3
With an array:
1 3 2 17 19 23 0 2
Our output should be:
23, 19, 17
Explanation: The three largest numbers from the array, listed in descending order.
I have tried this:
int array[8];
int largest[N] = {0, 0, 0};
for (int i = 1; i < N; i++) {
for (int j = 0; j < SIZE_OF_ARRAY; j++) {
if (largest[i] > array[j]) {
largest[i] = array[j];
array[j] = 0;
}
}
}
Additionally, let the constraint be as such:
integers in the array should be 0 <= i <= 1 000
N should be 1 <= N <= SIZE_OF_ARRAY - 1
SIZE_OF_ARRAY should be 2 <= SIZE_OF_ARRAY <= 1 000 000
My way of implementing it is very inefficient, as it scrubs the entire array an N number of times. With huge arrays, this can take several minutes to do.
What would be the fastest and most efficient way to implement this in C?
You should look at the histogram algorithm. Since the values have to be between 0 and 1000, you just allocate an array for each of those values:
#define MAX_VALUE 1000
int occurrences[MAX_VALUE+1];
int largest[N];
int i, j;
for (i=0; i<N; i++)
largest[N] = -1;
for (i=0; i<=MAX_VALUE; i++)
occurrences[i] = 0;
for (i=0; i<SIZE_OF_ARRAY; i++)
occurrences[array[i]]++;
// Step through the occurrences array backward to find the N largest values.
for (i=MAX_VALUE, j=0, i; i>=0 && j<N; i--)
if (occurrences[i] > 0)
largest[j++] = i;
Note that this will yield only one element in largest for each unique value. Modify the insertion accordingly if you want all occurrences to appear in largest. Because of that, you may get values of -1 for some elements if there weren't enough unique large numbers to fill the largest array. Finally, the results in largest will be sorted from largest to smallest. That will be easy to fix if you want to: just fill the largest array from right to left.
The fastest way is to recognize that data doesn't just appear (it either exists at compile time; or arrives by IO - from files, from network, etc); and therefore you can find the 3 highest values when the data is created (at compile time; or when you're parsing and sanity checking and then storing data received by IO - from files, from network, etc). This is likely to be the fastest possible way (because you're either doing nothing at run-time, or avoiding the need to look at all the data a second time).
However; in this case, if the data is modified after it was created then you'd need to update the "3 highest values" at the same time as the data is modified; which is easy if a lower value is replaced by a higher value (you just check if the new value becomes one of the 3 highest values) but involves a search if a "previously highest" value is being replaced with a lower value.
If you need to search; then it can be done with a single loop, like:
firstHighest = INT_MIN;
secondHighest = INT_MIN;
thirdHighest = INT_MIN;
for (int i = 1; i < N; i++) {
if(array[i] > thirdHighest) {
if(array[i] > secondHighest) {
if(array[i] > firstHighest) {
thirdHighest = secondHighest;
secondHighest = firstHighest;
firstHighest = array[i];
} else {
thirdHighest = secondHighest;
secondHighest = array[i];
}
} else {
thirdHighest = array[i];
}
}
}
Note: The exact code will depend on what you want to do with duplicates (you may need to replace if(array[j] > secondHighest) { with if(array[j] >= secondHighest) { and if(array[j] > firstHighest) { with if(array[j] >= firstHighest) { if you want the numbers 1, 2, 3, 4, 4, 4, 4 to give the answer 4, 4, 4 instead of 2, 3, 4).
For large amounts of data it can be accelerated with SIMD and/or multiple threads. For example; if SIMD can do "bundles of 8 integers" and you have 4 CPUs (and 4 threads); then you can split it into quarters then treat each quarter as columns of 8 elements; find the highest 3 values in each column in each quarter; then determine the highest 3 values from the "highest 3 values in each column in each quarter". In this case you will probably want to add padding (dummy values set to INT_MIN) to the end of the array to ensure that the array's total size is a multiple of SIMD width and number of CPUs.
For small amounts of data the extra overhead of setting up SIMD and/or coordinating multiple threads is going to cost more than it saves; and the "simple loop" version is likely to be as fast as it gets.
For unknown/variable amounts of data you could provide multiple alternatives (simple loop, SIMD with single thread, and SIMD with a variable number of threads) and decide which method to use (and how many threads to use) at run-time based on the amount of data.
One method I can think of is to just sort the array and return the first N numbers. Since the array is sorted, the N number we return will be the N largest numbers of the array. This method will take a time complexity of O(nlogn) where n is the number of elements we have in the given array. I think this is probably very good time complexity you can get when approaching this problem.
Another approach with similar time complexity would be to use a max-heap. Form max-heap from the given array and for N times, use pop() (or extract or whatever you call it) to get the top-most element which would be the max element remaining in the heap after each pop.
The time complexity of this approach could be considered to be even better than first one - O(n + Nlogn) where n is the number of elements in array and N is the number of largest elements to be found. Here, O(n) would be required to build heap and for popping the top-most element, we would need O(logn) for N times which sums up to - O(n + Nlogn), slightly better than O(nlogn)

What is the bug in this code?

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.
int remove_repeat(int *a, int n)
{
int i, k;
k = 0;
for (i = 1; i < n; i++)
{
if (a[k] != a[i])
{
a[k+1] = a[i];
k++;
}
}
return (k+1);
}
main()
{
int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
int n;
int i;
n = remove_repeat(a, 9);
for (i = 0; i < n; i++)
printf("a[%d] = %d\n", i, a[i]);
}
1] What is incorrect in above code to remove duplicates.
2] Any other O(N) or O(NlogN) solution for this problem. Its logic?
Heap sort in O(n log n) time.
Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
Heap sort again in O(n log n) to distil out the repeating elements.
Still bounded by O(n log n).
Your code only checks whether an item in the array is the same as its immediate predecessor.
If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.
If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.
To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.
If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:
if (a[k] != a[i])
a[k+1] = a[i];
to something like:
if (!hash_find(hash_table, a[i])) {
hash_insert(hash_table, a[i]);
a[k+1] = a[i];
}
If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.
On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).
Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).
You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.
Code below is in Java, but hopefully you get the idea.
int[] removeRepeats(int[] a) {
// Assume these are the integers between 0 and 1000
Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)
for (int i=0;i<a.length;++i) {
v[a[i]] = Boolean.TRUE;
}
// v[i] = null => number not seen
// v[i] = true => number seen
int[] out = new int[a.length];
int ptr = 0;
for (int i=0;i<a.length;++i) {
if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
out[ptr++] = a[i];
v[a[i]] = Boolean.FALSE;
}
}
// Out now doesn't contain duplicates, order is preserved and ptr represents how
// many elements are set.
return out;
}
You are going to need two loops, one to go through the source and one to check each item in the destination array.
You are not going to get O(N).
[EDIT]
The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).
Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it.
I suggest a O(NlnN) way with a modification of heapsort.
With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right?
So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1.
It should do the trick in O(NlnN) way.
Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

Linear Search Algorithm Optimization

I just finished a homework problem for Computer Science 1 (yes, it's homework, but hear me out!). Now, the assignment is 100% complete and working, so I don't need help on it. My question involves the efficiency of an algorithm I'm using (we aren't graded on algorithmic efficiency yet, I'm just really curious).
The function I'm about to present currently uses a modified version of the linear search algorithm (that I came up with, all by myself!) in order to check how many numbers on a given lottery ticket match the winning numbers, assuming that both the numbers on the ticket and the numbers drawn are in ascending order. I was wondering, is there any way to make this algorithm more efficient?
/*
* Function: ticketCheck
*
* #param struct ticket
* #param array winningNums[6]
*
* Takes in a ticket, counts how many numbers
* in the ticket match, and returns the number
* of matches.
*
* Uses a modified linear search algorithm,
* in which the index of the successor to the
* last matched number is used as the index of
* the first number tested for the next ticket value.
*
* #return int numMatches
*/
int ticketCheck( struct ticket ticket, int winningNums[6] )
{
int numMatches = 0;
int offset = 0;
int i;
int j;
for( i = 0; i < 6; i++ )
{
for( j = 0 + offset; j < 6; j++ )
{
if( ticket.ticketNum[i] == winningNums[j] )
{
numMatches++;
offset = j + 1;
break;
}
if( ticket.ticketNum[i] < winningNums[j] )
{
i++;
j--;
continue;
}
}
}
return numMatches;
}
It's more or less there, but not quite. In most situations, it's O(n), but it's O(n^2) if every ticketNum is greater than every winningNum. (This is because the inner j loop doesn't break when j==6 like it should, but runs the next i iteration instead.)
You want your algorithm to increment either i or j at each step, and to terminate when i==6 or j==6. [Your algorithm almost satisfies this, as stated above.] As a result, you only need one loop:
for (i=0,j=0; i<6 && j<6; /* no increment step here */) {
if (ticketNum[i] == winningNum[j]) {
numMatches++;
i++;
j++;
}
else if (ticketNum[i] < winningNum[j]) {
/* ticketNum[i] won't match any winningNum, discard it */
i++;
}
else { /* ticketNum[i] > winningNum[j] */
/* discard winningNum[j] similarly */
j++;
}
}
Clearly this is O(n); at each stage, it either increments i or j, so the most steps it can do is 2*n-1. This has almost the same behaviour as your algorithm, but is easier to follow and easier to see that it's correct.
You're basically looking for the size of the intersection of two sets. Given that most lottos use around 50 balls (or so), you could store the numbers as bits that are set in an unsigned long long. Finding the common numbers is then a simple matter of ANDing the two together: commonNums = TicketNums & winningNums;.
Finding the size of the intersection is a matter of counting the one bits in the resulting number, a subject that's been covered previously (though in this case, you'd use 64-bit numbers, or a pair of 32-bit numbers, instead of a single 32-bit number).
Yes, there is something faster, but probably using more memory. Make an array full of 0 in the size of the possible numbers, put a 1 on every drawn number. For every ticket number add the value at the index of that number.
int NumsArray[MAX_NUMBER+1];
memset(NumsArray, 0, sizeof NumsArray);
for( i = 0; i < 6; i++ )
NumsArray[winningNums[i]] = 1;
for( i = 0; i < 6; i++ )
numMatches += NumsArray[ticket.ticketNum[i]];
12 loop rounds instead of up to 36
The surrounding code left as an exercise.
EDIT: It also has the advantage of not needing to sort both set of values.
This is really only a minor change on a scale like this, but if the second loop reaches a number bigger than the current ticket number, it is already allowed to brake. Furthermore, if your seconds traverses numbers lower than your ticket number, it may update the offset even if no match is found within that iteration.
PS:
Not to forget, general results on efficiency make more sense, if we take the number of balls or the size of the ticket to be variable. Otherwise it is too much dependent of the machine.
If instead of comparing the arrays of lottery numbers you were to create two bit arrays of flags -- each flag being set if it's index is in that array -- then you could perform a bitwise and on the two bit arrays (the lottery ticket and the winning number sets) and produce another bit array whose bits were flags for matching numbers only. Then count the bits set.
For many lotteries 64 bits would be enough, so a uint64_t should be big enough to cover this. Also, some architectures have instructions to count the bits set in a register, which some compilers might be able to recognize and optimize for.
The efficiency of this algorithm is based both on the range of lottery numbers (M) and the number of lottery numbers per ticket (N). The setting if the flags is O(N), while the and-ing of the two bit arrays and counting of the bits could be O(M), depending on if your M (lotto number range) is larger than the size that the target cpu can preform these operations on directly. Most likely, though, M will be small and its impact will likely be less than that of N on the performance.

Resources