Divide and Conquer Algorithms (Application of Binary Search?!)

Divide and Conquer Algorithms (Application of Binary Search?!) - arrays

I am new here. Being a grad student, I have been brainstorming on algorithms for a while now. I appreciate any help that can be extended regarding the problem below. I have searched enough and I couldn't find any close solution to this problem.
We have an array of sorted distinct numbers that is infinitely long. The first n numbers are fractions that are greater than 0 but less than 1. All the remaining elements are “1”s, and you are not given the value of n. You need to develop an algorithm to check if a user-given fraction F occurs in that array. Analyze the time complexity of your algorithm as a function of n. (An example for n=8 , where the 1's begin at 8th position of the array)
My approach:
I am guessing that the best way to solve this is by employing binary search. Each time we can bring down the size of the array by half and finally, arrive at the fraction to be found. Let us assume that there are m elements in the array, including the 1's. The number of fractional elements is n.
The time complexity of performing the binary search on the whole array is O(log(m)). Since I am asked to express the time complexity in terms of n, m = n+k (assuming that the number of 1's in the array is k)
So the time complexity of this problem is O(log(n+k)).
Please throw in your thoughts. Thanks

You can indeed solve that for an infinite array, i.e. not knowing m, by exponential search.
Try the first element and double the index until you get a 1. This will take O(Lg n) steps. Then you switch to binary search and get the answer in additional O(Lg n) steps.
The value of k is irrelevant.
This approach can make sense in the real world, i.e. with an array of finite but unknown size, provided that at least half of the array is filled with ones, so that the search terminates in-bounds.

The binary search works for sorted arrays. If you have fractions which are between 0 and 1, your array is sorted, so you can do binary search on whole array, and it has complexity of O(lg (n+k)), where n and k are values as you stated in your question. Anyway, your array can be very big, and the number of fractions rather small. So, regardless time complexity, in this case sequential search will be faster.
So, in your case, I suggest simple sequential search, which has complexity O(n).

Related

Sorting a partially sorted array in O(n)

Hey so I'm just really stuck on this question.
I need to devise an algorithm (no need for code) that sorts a certain partially sorted array into a fully sorted array. The array has N real numbers and the first N-[N\sqrt(N)] (the [] denotes the floor of this number) elements are sorted, while are the rest are not. There are no special properties to the unsorted numbers at the end, in fact I'm told nothing about them other than they're obviously real numbers like the rest.
The kicker is time complexity for the algorithm needs to be O(n).
My first thought was to try and sort only the unsorted numbers and then use a merge algorithm, but I can't figure out any sorting algorithm that would work here in O(n). So I'm thinking about this all wrong, any ideas?

This is not possible in the general case using a comparison-based sorting algorithm. You are most likely missing something from the question.
Imagine the partially sorted array [1, 2, 3, 4564, 8481, 448788, 145, 86411, 23477]. It contains 9 elements, the first 3 of which are sorted (note that floor(N/sqrt(N)) = floor(sqrt(N)) assuming you meant N/sqrt(N), and floor(sqrt(9)) = 3). The problem is that the unsorted elements are all in a range that does not contain the sorted elements. It makes the sorted part of the array useless to any sorting algorithm, since they will stay there anyway (or be moved to the very end in the case where they are greater than the unsorted elements).
With this kind of input, you still need to sort, independently, N - floor(sqrt(N)) elements. And as far as I know, N - floor(sqrt(N)) ~ N (the ~ basically means "is the same complexity as"). So you are left with an array of approximately N elements to sort, which takes O(N log N) time in the general case.
Now, I specified "using a comparison-based sorting algorithm", because sorting real numbers (in some range, like the usual floating-point numbers stored in computers) can be done in amortized O(N) time using a hash sort (similar to a counting sort), or maybe even a modified radix sort if done properly. But the fact that a part of the array is already sorted doesn't help.

In other words, this means there are sqrt(N) unsorted elements at the end of the array. You can sort them with an O(n^2) algorithm which will give a time of O(sqrt(N)^2) = O(N); then do the merge you mentioned which will also run in O(N). Both steps together will therefore take just O(N).

Fast way to count smaller/equal/larger elements in array

I need to optimize my algorithm for counting larger/smaller/equal numbers in array(unsorted), than a given number.
I have to do this a lot of times and given array also can have thousands of elements.
Array doesn't change, number is changing
Example:
array: 1,2,3,4,5
n = 3
Number of <: 2
Number of >: 2
Number of ==:1
First thought:
Iterate through the array and check if element is > or < or == than n.
O(n*k)
Possible optimization:
O((n+k) * logn)
Firstly sort the array (im using c qsort), then use binary search to find equal number, and then somehow count smaller and larger values. But how to do that?
If elements exists (bsearch returns pointer to the element) I also need to check if array contain possible duplicates of this elements (so I need to check before and after this elements while they are equal to found element), and then use some pointer operations to count larger and smaller values.
How to get number of values larger/smaller having a pointer to equal element?
But what to do if I don't find the value (bsearch returns null)?

If the array is unsorted, and the numbers in it have no other useful properties, there is no way to beat an O(n) approach of walking the array once, and counting items in the three buckets.
Sorting the array followed by a binary search would be no better than O(n), assuming that you employ a sort algorithm that is linear in time (e.g. a radix sort). For comparison-based sorts, such as quicksort, the timing would increase to O(n*log2n).
On the other hand, sorting would help if you need to run multiple queries against the same set of numbers. The timing for k queries against n numbers would go from O(n*k) for k linear searches to O(n+k*log2n) assuming a linear-time sort, or O((n+k)*log2n) with comparison-based sort. Given a sufficiently large k, the average query time would go down.

Since the array is (apparently?) not changing, presort it. This allows a binary search (Log(n))
a.) implement your own version of bsearch (it will be less code anyhow)
you can do it inline using indices vs. pointers
you won't need function pointers to a specialized function
b.) Since you say that you want to count the number of matches, you imply that the array can contain multiple entries with the same value (otherwise you would have used a boolean has_n).
This means you'll need to do a linear search for the beginning and end of the array of "n"s.
From which you can calculate the number less than n and greater than n.
It appears that you have some unwritten algorithm for choosing these (for n=3 you look for count of values greater and less than 2 and equal to 1, so there is no way to give specific code)
c.) For further optimization (at the expense of memory) you can sort the data into a binary search tree of structs that holds not just the value, but also the count and the number of values before and after each value. It may not use more memory at all if you have a lot of repeat values, but it is hard to tell without the dataset.
That's as much as I can help without code that describes your hidden algorithms and data or at least a sufficient description (aside from recommending a course or courses in data structures and algorithms).

Interview Q: How to find the most popular number in a sorted array

A number is popular if it appears more than or equal to N/4 times (N is the length of array) He wanted me to make use of the sorted property of the array and come up with a better solution than O(n) time complexity.

I would do it with similar approach as binary search.
At first I would find values of 8 numbers which would be on 0/8, 1/8, 2/8 ... 8/8 of array. The same numbers you find first when you binary search for something.
As the same number must be in n/4 of array size or higher, it must reach at least two of that boundaries in row. Like number at 2/8 and 3/8 is same.
Therefore this identification of number and partially position is done in constant time.
Then you just continue to find where it start and where it ends, which is typical binary search.
Complexity : O(log n)

Efficient way to compute sum of k largest numbers in a list?

I was reading some practice interview questions and I have a question about this one. Assume a list of random integers each between 1 & 100, compute the sum of k largest integers? Discuss space and time complexity and whether the approach changes if each integer is between 1 & m where m varies?
My first thought is to sort the array and compute the sum of largest k numbers. Then, I thought if I use a binary tree structure where I can look starting from bottom right tree. I am not sure if my approach would change whether numbers are 1 to 100 or 1 to m? Any thoughts of most efficient approach?

The most efficient way might be to use something like randomized quickselect. It doesn't do the sorting step to completion and instead does just the partition step from quicksort. If you don't want the k largest integers in some particular order, this would be the way I'd go with. It takes linear time but the analysis is not very straightforward. m would have little impact on this. Also, you can write code in such a way that the sum is computed as you partition the array.
Time: O(n)
Space: O(1)
The alternative is sorting using something like counting sort which has a linear time guarantee. As you say the values are integers in a fixed range, it would work quite well. As m increases the space requirement goes up, but computing the sum is quite efficient within the buckets.
Time: O(m) in the worst case (see comments for the argument)
Space: O(m)

I'd say sorting is probably uneccessary. If k is small, then all you need to do is maintain a sorted list that truncates elements beyond the kth largest element.
Each step in this should be O(k) in the worst possible case where the element added is maximized. However, the average case scenario is much better, after a certain number of elements, most should just be smaller than the last element in the list and the operation will be O(log(k)).

One way is to use a min-heap (implemented as a binary tree) of maximum size k. To see if a new element belongs in the heap or not is only O(1) since it's a min-heap and retrieval of minimum element is a constant time operation. Each insertion step (or non-insertion...in the case of an element that is too small to be inserted) along the O(n) list is O(log k). The final tree traversal and summation step is O(k).
Total complexity:
O (n log k + k) = O(n log k))
Unless you have multiple cores running on your computer, in which case, parallel computing is an option, summation should only be done at the end. On-the-fly-computing adds additional computation steps without actually reducing your time complexity at all (you will actually have more computations to do) . You will always have to sum k elements anyways, so why not avoid the additional addition and subtraction steps?

Running time for an algorithm that sums up integers

I am having trouble on an assignment regarding running time.
The problem statement is:
"Isabel has an interesting way of summing up the values in an array A of n integers, where n is a power of two. She creates an array B of half the size of A, and sets B[i] = A[2i] + A[2i+1], for i=0,1,…,(n/2)-1. If B has size 1, then she outputs B[0]. Otherwise, she replaces A with B, and repeats the process. What is the running time of her algorithm?"
Would this be considered a O(log n) or a O(n)? I am thinking O(log n) because you would keep on dividing the array in half until you get the final result and I believe the basis of O(log n) is that you do not traverse the entire data structure. However in order to compute the sum, you have to access each element within the array thus making me think that it could possibly be O(n). Any help in understanding this would be greatly appreciated.

I believe the basis of O(log n) is that you do not traverse the entire
data structure.
There's no basis for beliefs or guesses. Run through the algorithm mentally.
How many recursions are there going to be for array A of size n?
How many summations are there going to be for each recursion (when array A is of size n)?
First run: n/2 summations, n accesses to elements of A
.
.
.
Last run: 1 summation, 2 accesses to elements of A
How many runs are there total? When you sum this up, what is the highest power of n?

As you figured out yourself, you do need to access all elements to compute the sum. So your proposition:
I believe the basis of O(log n) is that you do not traverse the entire data structure
does not hold. You can safely disregard the possibility of the algorithm being O(log n) then.
As for being O(n) or something different, you need to think about how many operations will be done as a whole. George Skoptsov's answer gives a good hint at that. I'd just like to call attention to a fact (from my own experience) that to determine "the running time" you need to take everything into account: memory access, operations, input and output, etc. In your simple case, only looking at the accesses (or the number of sums) might be enough, but in practice you can have very skewed results if you don't look at the problem from every angle.