Frequency of a number in array faster than linear time - arrays

Find the frequency of a number in array in less than O(n) time.
Array 1,2,2,3,4,5,5,5,2
Input 5
Output 3
Array 1,1,1,1
Input 1
Output 4

If the only information you have is an unsorted array (as your test data seems to indicate), you cannot do better than O(n) in finding the frequency of a given value. There's no getting around that.
In order to achieve a better time complexity, there are a variety of ways.
One would be to keep the array sorted (or a parallel sorted array if you didn't want to change the order). This way, you could use a binary search to find the first item with the given value then sequentially scan that portion to get a count. While the worst case (all items the same and that value is what you're looking for) is still O(n), it will tend toward O(log n) average case.
Note that sorting the data each time before looking for a value will not work since that will almost certainly push you above the O(n) limit. The idea would be to sort only on item insertion.
Another method, provided your domain (possible values) is limited, is to maintain the actual frequencies of those values separately. For example, if the domain is limited to the numbers one through a hundred, have a separate array containing the frequency of each value.
When the list is empty, all frequencies are zero. Whenever you add or remove an item, increment or decrement the frequency for that value. This would make frequency extraction a quick O(1) operation.
But, as stated, both these solutions require extra/modified data to be maintained. Without that, you cannot do better than O(n) since you will need to examine every item in the array to see if it matches the value you're looking for.

Related

Fast way to count smaller/equal/larger elements in array

I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?
If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.
Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).

Algorithm - What is the best algorithm for detecting duplicate numbers in small array?

What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead.
Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.
After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.
But actually I am really confused about what to use in my case which is a small array.
Thanks in advance.
To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).
You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)
In other words, you can use something like (pseudo-code):
for idx1 = 0 to nums.len - 2 inclusive:
for idx2 = idx1 + 1 to nums.len - 1 inclusive:
if nums[idx1] == nums[idx2]:
return nums[idx1]
return no dups found
This finds the first value in the array which has a duplicate.
If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.
You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.
Two good approaches depend on the fact that you know or not the range from which numbers are picked up.
Case 1: the range is known.
Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.
You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).
Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.
The same idea is behind counting sort, and it works fine also for your problem.
Case 2: the range is not known.
Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.
Note.
In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values.
Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N).
The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).
In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.
As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.
Which is the best choice? Well, it depends on your problem, we cannot say.

Sort the t smallest integers of an unsorted array of length n

I am trying to find the most efficient way to sort the t smallest integers of an unsorted array of length n.
I am trying to have O(n) runtime but, keep getting stuck.
The best I can think of is just sorting the entire array and taking the first t. In all other cases, I keep hitting the chance that the smallest is left behind, and if I check them all then, it has the same time complexity of sorting the entire array.
Can anyone give me some ideas?
Run something like quickselect to find the t-th element and then partition the data to extract the t smallest elements. This can be done in O(n) time (average case).
Quickselect is:
An algorithm, similar on quicksort, which repeatedly picks a 'pivot' and partitions the data according to this pivot (leaving the pivot in the middle, with smaller elements on the left, and larger elements on the right). It then recurses to the side which contains the target element (which it can easily determined by just counting the number of elements on either side).
Then you'll still need to sort the t elements, which can be done with, for example, quicksort or mergesort, giving a running time of O(t log t).
The total running time will be O(n + t log t) - you probably can't do much better than that.
If t is considerably smaller than n you can find those t elements in one traverse over the array, always saving the t smallest items and getting rid of bigger integers - many data structures are available for this, BST for example.
Then the run time will be min(O(n), O(t log(t)))

Comparing two String array in most efficient way

This problem is about searching a string in a master array (contains the list of all UIDs). The second array contains all the strings to be searched.
For example:
First array(Master List) contains: UID1 UID2 UID3... UID99
Second array contains: UID3 UID144 UID50
If a match is found in first array then 1 is returned otherwise 0 is return. So the output for the above example should be 101.
What could be the most efficient approach (targeting C) to solve the above keeping in mind that the traditional way dealing with this would be n^2!!!
sort the master string array and do binary search.
Efficient in terms of what?
I would go with #Trying's suggestion as a good compromise between decent running speed, low memory usage, and very (very!) low complexity of implementation.
Just use qsort() to sort the first master array in place, then use bsearch() to search it.
Assuming n elements in the master array and m in the second array, this should give O(m*log n) time complexity which seems decent.
Another option is to build a hash for the strings in the Master list, it's a single O(M) (assuming the lengths are O(1)), then assuming the hash is distributed evenly, searching a single element should take on average O(M/S), with S being the size the hash (the even distribution means that on average this is the amount of elements mapping into the same hash entry). You can further control the size to fine tune the trade off between space and efficiency
There are mainly two good approaches for this problem:
Use a binary search: a binary search requires the UIDs in the first array to be sorted and allows you to find a solution in O(log n) where n is the number of elements in the master array. The total complexity would be O(m log n) with m the number of elements to be searched.
Use a hashmap: You can store the elements of the master array in a hashmap (O(n)) and then check whether your elements of the second array are in the hashmap (O(m)). The total complexity would be O(n+m).
While the complexity of the second approach looks better, you must keep in mind that if your hash is bad, it could be O(m*n) in the worst case (but you would be very very unlikely). Also you would use more memory and the operations are also slower. In your case, I would use the first approach.

How do I find common elements from n arrays

I am thinking of sorting and then doing binary search. Is that the best way?
I advocate for hashes in such cases: you'll have time proportional to common size of both arrays.
Since most major languages offer hashtable in their standard libraries, I hardly need to show your how to implement such solution.
Iterate through each one and use a hash table to store counts. The key is the value of the integer and the value is the count of appearances.
It depends. If one set is substantially smaller than the other, or for some other reason you expect the intersection to be quite sparse, then a binary search may be justified. Otherwise, it's probably easiest to step through both at once. If the current element in one is smaller than in the other, advance to the next item in that array. When/if you get to equal elements, you send that as output, and advance to the next item in both arrays. (This assumes, that as you advocated, you've already sorted both, of course).
This is an O(N+M) operation, where N is the size of one array, and M the size of the other. Using a binary search, you get O(N lg2 M) instead, which can be lower complexity if one array is lot smaller than the other, but is likely to be a net loss if they're close to the same size.
Depending on what you need/want, the versions that attempt to just count occurrences can cause a pretty substantial problem: if there are multiple occurrences of a single item in one array, they will still count that as two occurrences of that item, indicating an intersection that doesn't really exist. You can prevent this, but doing so renders the job somewhat less trivial -- you insert items from one array into your hash table, but always set the count to 1. When that's finished, you process the second array by setting the count to 2 if and only if the item is already present in the table.
Define "best".
If you want to do it fast, you can do it O(n) by iterating through each array and keeping a count for each unique element. Details of how to count the unique elements depend on the alphabet of things that can be in the array, eg, is it sparse or dense?
Note that this is O(n) in the number of arrays, but O(nm) for arrays of length m).
The best way is probably to hash all the values and keep a count of occurrences, culling all that have not occurred i times when you examine array i where i = {1, 2, ..., n}. Unfortunately, no deterministic algorithm can get you less than an O(n*m) running time, since it's impossible to do this without examining all the values in all the arrays if they're unsorted.
A faster algorithm would need to either have an acceptable level of probability (Monte Carlo), or rely on some known condition of the lists to examine only a subset of elements (i.e. you only care about elements that have occurred in all i-1 previous lists when considering the ith list, but in an unsorted list it's non-trivial to search for elements.

Resources