Cache Optimization - Hashmap vs QuickSort? - arrays

Suppose that I have N unsorted arrays, of integers. I'd like to find the intersection of those arrays.
There are two good ways to approach this problem.
One, I can sort the arrays in place with an nlogn sort, like QuickSort or MergeSort. Then I can put a pointer at the start of each array. Compare each array to the one below it, iterating the pointer of whichever array[pointer] is smaller, or if they're all equal, you've found an intersection.
This is an O(nlogn) solution, with constant memory (since everything is done in-place).
The second solution is to use a hash map, putting in the values that appear in the first array as keys, and then incrementing those values as you traverse through the remaining arrays (and then grabbing everything that had a value of N). This is an O(n) solution, with O(n) memory, where n is the total size of all of the arrays.
Theoretically, the former solution is o(nlogn), and the latter is O(n). However, hash maps do not have great locality, due to the way that items can be randomly scattered through the map, due to collisions. The other solution, although o(nlogn), traverses through the array one at a time, exhibiting excellent locality. Since a CPU will tend to pull the array values from memory that are next to the current index into the cache, the O(nlogn) solution will be hitting the cache much more often than the hash map solution.
Therefore, given a significantly large array size (as number of elements goes to infinity), is it feasible that the o(nlogn) solution is actually faster than the O(n) solution?

For integers you can use a non-comparison sort (see counting, radix sort). A large set might be encoded, e.g. sequential runs into ranges. That would compress the data set and allow for skipping past large blocks (see RoaringBitmaps). There is the potential to be hardware friendly and have O(n) complexity.
Complexity theory does not account for constants. As you suspect there is always the potential for an algorithm with a higher complexity to be faster than the alternative, due to the hidden constants. By exploiting the nature of the problem, e.g. limiting the solution to integers, there are potential optimizations not available to general purpose approach. Good algorithm design often requires understanding and leveraging those constraints.

Related

Is O(cn) at least as fast as O(n) in a non asymptotically way?

So first of all let me talk about the motivation for this question. Let's supose you have to find the minimum and the maximum values in an array. In this case, you wave two ways of doing so.
The first one consists in iterating over the array and finding the maximum value, then doing the same thing to find the minimum value. This solution is O(2n).
The second one consists in iterating over the array just one time and finding both the minimum and maximum value at the same time. This solution is O(n).
Even though the time complexity has been halved, for each iteration of the O(n) solution you now have twice as many instructions (ignoring how the compiler can possibly optmize these instructions) so I believe they should take the same amount of time to execute.
Let me give you a second example. Now you need to reverse an array. Again, you have two ways of doing so.
The first one is to create an empty array, iterate over the data array filling the empty array. This solution is O(n).
The second one is to iterate over the data array, swapping the 0th and n-1th elements, then the 1th and n-2th elements and so on (using this strategy) until you reach the middle of the array. This solution is O((1/2)n).
Again, even though the time complexity has been cutted in half, you have three times more instructions per iteration. You're iterating over (1/2)n elements, but for each iteration you have to perform three XOR instructions. If you were not to use XOR, but an auxiliary variable you would still need 2 more instructions to perform the variable swapping, so now I believe that o((1/2)n) should actually be worse than o(n).
Having said these things, my question is the following:
Ignoring space complexity, garbage collecting and the compiler possible optimizations, can I assume that having O(c1*n) and O(c2*n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1*n) is as fast or faster than the one that gives me O(c2*n)?
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.
c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
The whole issue lies within the words "fast" or "faster". Computational complexity doesn't strictly measure what we intuitively understand as "fast". Without going into mathematical details (although it's a good idea: https://en.wikipedia.org/wiki/Big_O_notation), it answers the question "how fast it will go slower when my input grows". So if you have O(n^2) complexity you can roughly expect that doubling the size of the input will make your algorithm take 4 times more time. Whereas for linear complexity, 2 times bigger input gives only doubles the time. As you can see, it's relative, so any constants cancel out.
To sum up: from the way you ask your question, it doesn't seem the big-O notation is the correct tool here.
By definition, if c1 and c2 are constants, O(c1*n) === O(c2*c) === O(n). That is, the number of operations per element of your array of length n is completely irrelevant in this kind of complexity analysis.
All that it will tell you is that "it's linear". That is, if you have 1 bazillion operations for an array of length n, then you'll have 2 bazillion operations for an array of length 2*n (plus or minus something that grows slower than linear).
can I assume that having O(c1n) and O(c2n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
Nope, not at all.
First, because the constants there are meaningless in that analysis. There's no way to put it: it is absolutely irrelevant whatever restrictions you put in c1 and c2 for big-O analysis. The whole idea is that it will discard those restrictions.
Second, because they don't tell you anything that would enable you to compare the two algorithms runtime for a specific value of n.
Such complexity analysis only enables you to compare the asymptotic behavior of algorithms. Real-world problems in general don't care about where the asymptotes are.
Assume that A1(n) is the number of operations Algorithm 1 needs for an input of length n, and A2(n) is the same for Algorithm 2. You could have:
A1(n) = 10n + 900
A2(n) = 100n
The complexity of both is O(A1) = O(A2) = O(n). For small inputs, A2 is faster. For large inputs, A1 is faster. The point where they change is n == 10.
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.
Not only that, but also there's the fact that when you have 2 different algorithms that are really of different complexity classes (e.g., linear vs quadratic), it might still make sense to use the one of higher complexity as it may still be faster.
For example:
A3(n) = n^2
A4(n) = n + 10^20.
E.g., Algorithm 3 is quadratic, while Algorithm 4 is linear but it has a constant huge initialization time.
For inputs of size of up to around n == 10^10, it will be faster to use the quadratic algorithm.
It may very well be the case that all relevant inputs for your specific problem fall within that range, meaning that the quadratic algorithm would be the better, faster choice.
The bottom line is: for analyzing the actual time it will take to run an algorithm on a given input (or a given bounded range of inputs, as nearly all real-world problems are) and compare it with another algorithm, big-O analysis is meaningless.
Another way to put it: you're asking a practical "engineering" question (i.e., which option is better / faster) but trying to answer the question with a tool that's only useful for "theoretical" analysis. That tool is important, yes. But it has no chance of giving you the answer you're looking for, by design.
By definition, time complexity ignores constants. So O((1/2)n) == O(n) == O(2n) == O(cn).
Your example of O((1/2)n) shows why this is the case, because the constants can measure units of anything, so comparing them is meaningless.
You can never tell which algorithm is faster based only on the time complexity. But, you can tell which one would be faster as n approaches infinity. Since constants are removed from the time complexity, they would be considered equal and therefore with O(c1n) and O(c2n) you still would not be able to tell which one is faster even as n approaches infinity.
(my theoretical computer science courses are a couple of decades ago)
O(cn) is O(n).
It's still a linear search over the array.

If 1D and 2D array always have equivalent content will time complexity differ?

If, for example, I have a set of numbers and I populate a copy in a 1d array and a copy in a 2d array. So essentially I have, and will always have an equivalent amount of elements in each array. In this case does the time complexity actually differ, holding in mind that the number of elements will will be always equivalent?
No, the time complexity of the same algorithm operating on both types of inputs will be the same. Intuitively, the time complexity of an algorithm will not change just because the input data is arranged in a different way.
That being said, apparently the notion of input size depends a bit on the context, which can be puzzling. When discussing sorting algorithms, the input consists of n elements, which means that a time complexity of e.g. O(n) (which however is impossible for comparison-based sorting) would be termed linear. In contrast, when discussing algorithms for matrix multiplication, the input is usually imagined as an n*n matrix - which has not n, but n^2 elements. In this case, an algorithm of complexity of O(n*n) (which however is unlikely again) would again be termed linear, although the expression describing it is actually a square term.
To put it all in a nutshell, the time complexity refers to the actual input size, not some technical parameter which might be different from it.

what is more expensive: compare or accessing an index of array

basically i saw i video on youtube that visualized sorting algorithms and they provided the program so that we can play with it .. and the program counts two main things (comparisons , array accesses) .. i wanted to see which one of (merge & quick) sort is the fastest ..
for 100 random numbers
quick sort:
comparisons 1000
array accesses 1400
merge sort:
comparisons 540
array accesses 1900
so quick sort uses less array access while merge sort uses less comparisons and the difference increases with the number of the indexes .. so which one of those is harder for computer to do?
The numbers are off. Results from actual runs with 100 random numbers. Note that quick sort compare count is affected by the implementation, Hoare uses less compares than Lomuto.
quick sort (Hoare partition scheme)
pivot reads 87 (average)
compares 401 (average)
array accesses 854 (average)
merge sort:
compares 307 (average)
array accesses 1400 (best, average, worst)
Since numbers are being sorted, I'm assuming they fit in registers, which reduces the array accesses.
For quick sort, the compares are done versus a pivot value, which should be read just once per recursive instance of quick sort and placed in a register, then one read for each value compared. An optimizing compiler may keep the values used for compare in registers so that swaps already have the two values in registers and only need to do two writes.
For merge sort, the compares add almost zero overhead to the array accesses, since the compared values will be read into registers, compared, then written from the registers instead of reading from memory again.
Sorting performance depends on many conditions, I think answering your exact question won't lead to a helpful answer (you can benchmark it easily yourself).
Sorting a small number of elements is usually not time critical, benchmarking makes sense for larger lists.
Also it is a rare case to sort an array of integers, it is much more common to sort a list of objects, comparing one or more of their properties.
If you head for performance, think about multi threading.
MergeSort is stable (equal elements keep their relative position), QuickSort is not, so you are comparing different results.
In your example, the quicksort algorithm is probably faster most of the time. If the comparison is more complex, e.g. string instead of int or multiple fields, MergeSort will become more and more effective because it needs fewer (expensive) comparisons. If you want to parallize the sorting, MergeSort is predestined because of the algorithm itself.

In which cases do we use heapsort?

In which cases heap sort can be used? As we know, heap sort has a complexity of n×lg(n). But it's used far less often than quick and merge sort. So when do we use this heap sort exactly and what are its drawbacks?
Characteristics of Heapsort
O(nlogn) time best, average, worst case performance
O(1) extra memory
Where to use it?
Guaranteed O(nlogn) performance. When you don't necessarily need very fast performance, but guaranteed O(nlogn) performance (e.g. in a game), because Quicksort's O(n^2) can be painfully slow. Why not use Mergesort then? Because it takes O(n) extra memory.
To avoid Quicksort's worst case. C++'s std::sort routine generally uses a varation of Quicksort called Introsort, which uses Heapsort to sort the current partition if the Quicksort recursion goes too deep, indicating that a worst case has occurred.
Partially sorted array even if stopped abruptly. We get a partially sorted array if Heapsort is somehow stopped abruptly. Might be useful, who knows?
Disadvantages
Relatively slow as compared to Quicksort
Cache inefficient
Not stable
Not really adaptive (Doesn't get faster if given somewhat sorted array)
Based on the wikipedia article for sorting algorithms, it appears that the Heapsort and Mergesort all have identical time complexity O(n log n) for best, average and worst case.
Quicksort has a disadvantage there as its worst case time complexity of O(n2) (a).
Mergesort has the disadvantage that its memory complexity is O(n) whereas Heapsort is O(1). On the other hand, Mergesort is a stable sort and Heapsort is not.
So, based on that, I would choose Heapsort in preference to Mergesort if I didn't care about the stability of the sort, so as to minimise memory usage. If stability was required, I would choose MergeSort.
Or, more correctly, if I had huge amounts of data to sort, and I had to code my own algorithms to do it, I'd do that. For the vast majority of cases, the difference between the two is irrelevant, until your data sets get massive.
In fact, I've even used bubble sort in real production environments where no other sort was provided, because:
it's incredibly easy to write (even the optimised version);
it's more than efficient enough if the data has certain properties (either small datsets or datasets that were already mostly sorted before you added a couple of items).
Like goto and multiple return points, even seemingly bad algorithms have their place :-)
(a) And, before you wonder why C uses a less efficient algorithm, it doesn't (necessarily). Despite the qsort name, there's no mandate that it use Quicksort under the covers - that's a common misconception. It may well use one of the other algorithms.
Kindly note that the running time complexity of heap sort is the same as O(n log n) irrespective of whether the array is already partially sorted in either ascending or descending order.
Kindly refer to below link for further clarification on big O calculation for the same :
https://ita.skanev.com/06/04/03.html

A Memory-Adaptive Merge Algorithm?

Many algorithms work by using the merge algorithm to merge two different sorted arrays into a single sorted array. For example, given as input the arrays
1 3 4 5 8
and
2 6 7 9
The merge of these arrays would be the array
1 2 3 4 5 6 7 8 9
Traditionally, there seem to be two different approaches to merging sorted arrays (note that the case for merging linked lists is quite different). First, there are out-of-place merge algorithms that work by allocating a temporary buffer for storage, then storing the result of the merge in the temporary buffer. Second, if the two arrays happen to be part of the same input array, there are in-place merge algorithms that use only O(1) auxiliary storage space and rearrange the two contiguous sequences into one sorted sequence. These two classes of algorithms both run in O(n) time, but the out-of-place merge algorithm tends to have a much lower constant factor because it does not have such stringent memory requirements.
My question is whether there is a known merging algorithm that can "interpolate" between these two approaches. That is, the algorithm would use somewhere between O(1) and O(n) memory, but the more memory it has available to it, the faster it runs. For example, if we were to measure the absolute number of array reads/writes performed by the algorithm, it might have a runtime of the form n g(s) + f(s), where s is the amount of space available to it and g(s) and f(s) are functions derivable from that amount of space available. The advantage of this function is that it could try to merge together two arrays in the most efficient way possible given memory constraints - the more memory available on the system, the more memory it would use and (ideally) the better the performance it would have.
More formally, the algorithm should work as follows. Given as input an array A consisting of two adjacent, sorted ranges, rearrange the elements in the array so that the elements are completely in sorted order. The algorithm is allowed to use external space, and its performance should be worst-case O(n) in all cases, but should run progressively more quickly given a greater amount of auxiliary space to use.
Is anyone familiar with an algorithm of this sort (or know where to look to find a description of one?)
at least according to the documentation, the in-place merge function in SGI STL is adaptive and "its run-time complexity depends on how much memory is available". The source code is available of course you could at least check this one.
EDIT: STL has inplace_merge, which will adapt to the size of the temporary buffer available. If the temporary buffer is at least as big as one of the sub-arrays, it's O(N). Otherwise, it splits the merge into two sub-merges and recurses. The split takes O(log N) to find the right part of the other sub array to rotate in (binary search).
So it goes from O(N) to O(N log N) depending on how much memory you have available.

Resources