Sorting when comparisons cost no time - arrays

I want to sort an array, A, under this cost model:
For any value x, an assignment of the form A[i] = x has a cost of 1. Furthmore, A[i] = A[j] has a cost of 1.
Other operations, such as comparing and assignments of the for x = A[i] (where x is not a location in the array) has a cost of 0.
Questions:
Give a lower bound on the worst-case time required to sort the array A. Your answer should be an exact expression in terms of n and not use asymptotic notation.
Describe a sorting algorithm that uses O(n) space. The runtime should exactly match the lower bound given in 1 (exactly, not asymptotically).
Describe an in-place sorting algorithm that is optimal to this cost model. The runtime should exactly match the bound given in 1 (exactly, not asymptotically).
My attempts:
n. This is because, in the worst case, n elements of the array are in an index they are not supposed to be in. Therefore it will take n assignments to get the array in a sorted order.
My algorithm in psudo code:
def weird_sort(A):
B = an array the same size of A
C = an array of bools (default True) the same size of A
for i in range(0, A.size):
min = first index in c that is True
for j in range(0, A.size):
if (A[j] < A[min]) and (C[j]):
min = j
B[i] = A[min]
C[i] = False
A = B
I believe this takes exactly n time to run since the only time we are assigning anything into A is in the last line, where we copy the contents of B into A.
No idea where to start. It appears to me that in order to keep everything in place we have to swap things in array A, but I can't figure out how to go about how to sort an array with n/2 swaps. Can someone get me moving in the right direction? Can you also scrutinize my answer for 1 and 2?

I consider inplace to allow O(1) additional variables since otherwise I don't think it's possible
First lets solve subproblem: Given i, find the number which should be on i-th position. It's possible to do using bruteforce since comparing is free.
Now copy 1st element(to additional variable), find smallest element and put it in position 1. Now this element was in position i. Let's find element which should be on position i and copy it here (suppose it was on position j), now find element which belongs to position j, etc. Eventually we found element we initially copied, put it back. So, we set k variables to their places using k assignments (in a cycle structure).
Now do the same for all other elements. You can't remember for each variable whether you put in its place, but you can check if it is on its place for free.
If there are equal elements in A this should be done more carefully but it should still work

Although when talking about efficient sorting algorithms, we usually tend to talk about quicksort, this kind of algorithms are optimized the number of comparisons.
Other algorithms, however, try to optimize the number of memory accesses instead (as in your case). Some of them are called cache-oblivious algorithms (don't make assumptions on the specific memory hierarchy parameters) and cache-aware (they are tuned for a specific memory hierarchy). You can find multiple algorithms of this kind, so you might be interested in giving them a look.
As an example, Harald Prokop's PhD thesis talks about cache-oblivious algorithms, and proposes the distribution sort, which partially sorts the data in sub groups that potentially fit in the lower partitions of the memory hierarchy.
Distribution sort uses O(n ln(n)) work and incurs O(1+ (L/n)*(1+logz(n)) cache misses to sort n elements.
where L is size of a cache bank and z the size of the cache itself. The performance model only assumes that there is only a single cache level, although it adapts to all the cache levels thanks to the oblivious property.
The fundamental concept is that the assignment cost changes depending on where an element is placed in the memory hierarchy.

Related

What is algorithm to find K for finding medians in two sorted array in leetcode

The solution implementing find medians in two sorted array is awesome. However, I am still very confused about code to calculate K
var aMid = aLength * k / (aLength + bLength)
var bMid = k - aMid - 1
I guess this is the key part of this algorithm which I really dont know why is calculated like this. To explain more clearly what I mean, the core logic is divide and conquer, considering the fact that different size list should be divided differently. I wonder why this formula is working perfectly.
Can someone give me some insight about it. I searched lots of online documents and it is very hard to find materials to explain this part well.
Many thanks in advance
The link shows two different ways of computing the comparison points in each array: one always uses k/2, even if the array doesn't have that many elements; the other (which you quote) tries to distribute the comparison points based on the size of the arrays.
As can be seen from these two examples, neither of which is optimal, it doesn't make much difference how you compute the comparison points, as long as the size of the the two components is generally linear in K (using a fixed size of 5 for one of the comparison points won't work, for example.)
The algorithm effectively reduces the problem size by either aMid or bMid on each iteration. Ideally, the problem size would be reduced by k/2; and that's the computation you should use if both arrays have at least k/2 members. If one has two few members, you can set the comparison point for the array to its last element, and compute the other comparison point so that the total is k - 1. If you end up discarding all of the elements from some array, you can then immediately return element k of the other array.
That strategy will usually perform fewer iterations than either of the proposals in your link, but it is still O(log k).

Count distinct array entries [with no add memory nor array changes]

Task is count unique numbers of a given array. I saw numerous similar questions on SO, but here we have additional requirements, which weren't stated in other questions:
Amount of allowed additional memory is O(1)
Changes to array are
prohibited
I was able to write quadratic algorithm, which agrees with given constraints. But I keep wondering, may one could do better on such a problem? Thank you for your time.
Algorithm working with O(n^2)
def count(a):
unique = len(a)
ind = 0
while ind < len(a):
x = a[ind]
i = ind+1
while i < len(a):
if a[i] == x:
unique -= 1
break
i += 1
ind += 1
print("Total uniques: ", unique)
This is a very similar problem to a follow-up question in chapter 1 (Arrays and Strings) from Cracking the Coding Interview:
Implement an algorithm to determine if a string has all unique
characters. What if you cannot use additional data structures?
The answer (to the follow-up question) is that if you can't assume anything about the array (namely, it is not sorted, you don't know its size, etc.), then there is no algorithm better than what you showed.
That being said, you may think about relaxing the constraints a little bit, to make it more interesting. For example, if you have an upper bound on the array size, you could use a bit vector to keep track of which values you've read before while traversing the array, although this is not strictly an O(1) solution when it comes to memory usage (one could argue that by knowing the maximum array size, the memory usage is constant, and thus O(1), but that is a little bit of cheating). Similarly, if the array was sorted, you could also solve it in O(n) by going through each element at a time and check if its neighbors are different numbers.
Because there is no underlying structure in the array given (sorted, etc.) you are forced to brute force every value in the array...
There is a more complicated approach that I believe would work. It entails keeping your array of unique numbers sorted. This means that it would take more time when inserting into the array but would allow you to look-up values much quicker. You should be able to insert into the array in logn time by looking at the value directly in the middle of the array and checking if it's larger or smaller. You'd then eliminate half the array as a valid insertion location and repeat. You would use a similar approach to look-up values in the array. The only issue with this is that it requires more memory space than I believe you are allocated (1).
That being said, I think given the constraints on the task restrict the algorithm to O(n^2).

fastest way to find if all the elements of an array are distinct?

I am looking for a faster way to find whether an array of elements contains distinct elements only. The worst thing to do is to take each element and compare it to every other element in the array. Next best would be to sort the list and then compare, which still does not improves this much. Is there any other way to do this?
Brute-force:
Brute-force (checking every element with every other element) takes O(n2).
Sorting:
Sorting takes O(n log n), which is generally considered to be a fairly decent running time.
Sorting has the advantage above the below (hash table) approach in that it can be done in-place (O(1) extra space), where-as the below takes O(n) extra space.
Hash table:
An alternative is to use a hash table.
For each item:
Check if that item exists in the hash table (if it does, all items are not distinct) and
Insert that item into the hash table
Since insert and contains queries run in expected O(1) on a hash table, the overall running time would be expected O(n), and, as mentioned above, O(n) extra space.
Bit array:
Another alternative, if the elements are all integers in some given range, is to have a bit array with size equal to the range of integers.
Similarly to what was done for the hash table approach, for each element, you'd check whether the applicable bit is set, and then set it.
This takes O(m + n) time and O(m) extra space where m is the range of integers and n is the size of the array (unless you consider allocating the array to be free, in which case it just takes O(n) time).
Create a Red and Black tree where elements as keys and number of occurrences are value. You can then navigate the tree. Time and space complexity is O(n) where n is the number of elements. Key benefits using a red and black tree include consistent performance and simple memory management - excellent choice for a distributed environement. Perspectives welcome.
Alternative solution (interesting only from theoretic point of view):
I think you can adapt the Quickselect algorithm. In short, this algorithm runs in the same way as Quick sort, but it only splits the array in two groups, according to some chosen pivot (less and more than the pivot, respectively), thus the sorting is omitted. It's average case performance is O(n).
My idea is to look for elements equal to the chosen pivot on each step. In this way, whenever there are more than two elements, we will compare the pivot to each element. If we have found a duplicate, we have the answer. Otherwise we split the problem in two similar ones, but with smaller size and run the algorithm on them.
Disclaimer: The worst case performance of Quickselect is O(n^2). Therefore, using a hash table is way more time efficient.
However, as Quickselect is an in-place algorithm, it requires only constant memory overhead as opposed to linear additional memory for a hash table (not that it matters nowadays).
Here is O(1) space complexity approach. The idea is that we will just keep the array with unique elements in the beginning of itself.
The time complexity is O(n*log(n)) since we want to avoid space usage and so we can use python's in-place sort method for list.
It may feel like C, but worked for me
a.sort()
i = 0
k = 0
while i < len(a) - 1:
if a[i] == a[i+1]:
j = i
while j < len(a) - 1 and a[j] == a[j+1]:
j += 1
if j < len(a) - 1:
a[k+1] = a[j+1]
i = j + 1
k += 1
else:
pass
else:
i += 1
k += 1
a = a[:k+1]

Find common elements in two sorted arrays [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
The intersection of two sorted arrays
We have two sorted arrays A and B, besides compare one with all the elements in other array, how to design a best algorithm to find the array with their common elements?
Hold two pointers: one for each array.
i <- 0, j <- 0
repeat while i < length(arr1) and j < length(arr2):
if arr1[i] > arr2[j]: increase j
else if arr1[i] < arr2[j]: increase i
else : output arr[i], increase both pointers
The idea is, if the data is sorted, if the element is "too big" in one array, it will be "too big" for all other elements left in the array - since it is sorted.
This solution requires a single traversal on the data. O(n) (with good constants as well).
If the lengths of two arrays (say, A has N elements and B has M elements) are similar, then the best approach would be to perform linear search of one array's elements in another array. Of course, since the arrays are sorted, the next search should begin where the previous search has stopped. This is the classic principle used in "sorted array merge" algorithm. The complexity on O(N + M).
If the lengths are significantly different (say, M << N), then a much more optimal approach would be to iterate through elements of the shorter array and use binary search to look for these values in the longer array. The complexity is O(M * log N) in that case.
As you can see O(M * log N) is better than O(N + M) if M is much smaller than N, and worse otherwise.
The difference in array sizes which should trigger the switch from one approach to another depends on some practical considerations. If should be chosen based on practical experiments with your data.
These two approaches (linear and binary searches) can be "blended" into a single algorithm. Let's assume M <= N. In that case let's choose step value S = [N / M]. You take first element from array A and perform a straddled linear search for that element in array B with step S, meaning that you check elements B[0], B[S], B[2*S], B[3*S], ... and so on. Once you find the index range [S*i, S*(i+1)] that potentially contains the element you are searching for, you switch to binary search inside that segment of array B. Done. The straddled linear search for the next element of A begins where the previous search left off. (As a side note, it might make sense to choose the value of S equal to a power of 2).
This "blended" algorithm is the most asymptotically optimal search/merge algorithm for two sorted arrays in existence. However, in practice the more simple approach with choosing either binary or linear search depending on relative sizes of the arrays works perfectly well.
besides compare one with all the elements in other array
You will have to compare A[] to B[] in order to know that they are the same -- unless you know a lot about what kind of data they can hold. The nature of the comparison probably has many solutions and can be optimized as required.
If the arrays are very strictly created ie only sequential values of a known pattern and always starts from a known point you could just look at the length of each array and know whether or not all items are common.
This unfortunately doesn't sound like a very realistic or useful array and so you are back to checking for A[i] in B[]

How can we find the i'th greatest element of the array?

Algorithm for Finding nth smallest/largest element in an array using data strucuture self balancing binary search tree..
Read the post: Find kth smallest element in a binary search tree in Optimum way. But the correct answer is not clear, as i am not able to figure out the correct answer, for an example that i took...... Please a bit more explanation required.......
C.A.R. Hoare's select algorithm is designed for precisely this purpose. It executes in [expected] linear time, with logarithmic extra storage.
Edit: the obvious alternative of sorting, then picking the right element has O(N log N) complexity instead of O(N). Storing the i largest elements in sorted order requires O(i) auxiliary storage, and roughly O(N * i log i) complexity. This can be a win if i is known a priori to be quite small (e.g. 1 or 2). For more general use, select is usually better.
Edit2: offhand, I don't have a good reference for it, but described the idea in a previous answer.
First sort the array descending, then take the ith element.
Create a sorted data structure to hold i elements and set the initial count to 0.
Process each element in the source array, adding it to that new structure until the new structure is full.
Then process the rest of the source array. For each one that is larger than the smallest in the sorted data structure, remove the smallest from that structure and put the new one in.
Once you've processed all elements in the source array, your structure will hold the i greatest elements. Just grab the last of these and you have your i'th greatest element.
Voila!
Alternatively, sort it then just grab the i'th element directly.
That's a fitting task for the heaps which feature very low insert and low delete_min costs. E.g. pairing heaps. It would have the worst case O(n*log(n)) performance. But since non-trivial to implement, better check first suggested elsewhere selection algorithms.
There are many strategies available for your task (if you don't focus on the self-balancing tree to begin with).
It's usually a tradeoff speed / memory. Most algorithms require either to modify the array in place or O(N) additional storage.
The solution with self-balancing tree is in the latter category, but it's not the right choice here. The issue is that building the tree itself takes O(N*log N), which will dominate the later search term and give a final complexity of O(N*log N). Therefore you're not better than simply sorting the array and use a complex datastructure...
In general, the issue largely depends on the magnitude of i related to N. If you think for a minute, for i == 1 it's trivial right ? It's called finding the maximum.
Well, the same strategy obviously work for i == 2 (carrying the 2 maximum elements around) in linear time. And it's also trivially symmetric: ie if you need to find the N-1 th element, then just carry around the 2 minimum elements.
However, it loses efficiency when i is about N/2 or N/4. Carrying the i maximum elements then mean sorting an array of size i... and thus we fallback on the N*log N wall.
Jerry Coffin pointed out a simple solution, which works well for this case. Here is the reference on Wikipedia. The full article also describes the Median of Median method: it's more reliable, but involves more work and is thus generally slower.
Create an empty list L
For each element x in the original list,
add x in sorted position to L
if L has more than i elements,
pop the smallest one off L
if List2 has i elements,
return the i-th element,
else
return failure
This should take O(N (log (i))). If i is assumd to be a constant, then it is O(N).
Build a heap from the elements and call MIN i times.

Resources