What happens when Data.Vector.unfoldr doesn't fuse? - arrays

Suppose I create a Vector using unfoldr, as opposed to unfoldrN, and it doesn't fuse, so the vector actually needs to be created. How does the system decide how large to make it? I haven't been able to find anything about this in the documentation. The source code shows that it calls unstream, which has a lot of complicated code I can't make head or tail of.

I am not entirely sure, but I chased the source code from unfoldr until Data.Vector.Generic.Mutable.unstream. Its documentation states:
Create a new mutable vector and fill it with elements from the
'Stream'. The vector will grow exponentially if the maximum size of
the 'Stream' is unknown.
So, my guess is that it starts with a small size (like 10 or so) and starts filling the vector. As soon as the vector is full, it doubles its size (or makes its size 50% larger, or increment its size by some other ratio) and copies the old elements into the new vector. Exponential growth ensures that if you fill the vector with n elements you will do at most O(log(n)) copies, hence the overall complexity will be O(n log(n)), which is "close enough" to linear time.
The actual ratio seems to be 2, as per the enlarge_delta function, which just returns max 1 (length v), which is passed to grow which adds that many elements to the vector.
As Carl notes, exponential copying is O(n), not only O(n log(n)). Indeed, using ratio=2, the number of copied elements would be (minus some rounding) 2^0+2^1+...+2^(log(n)) = 2^(log(n)+1)-1 = 2n-1, hence O(n).

Related

If 1D and 2D array always have equivalent content will time complexity differ?

If, for example, I have a set of numbers and I populate a copy in a 1d array and a copy in a 2d array. So essentially I have, and will always have an equivalent amount of elements in each array. In this case does the time complexity actually differ, holding in mind that the number of elements will will be always equivalent?
No, the time complexity of the same algorithm operating on both types of inputs will be the same. Intuitively, the time complexity of an algorithm will not change just because the input data is arranged in a different way.
That being said, apparently the notion of input size depends a bit on the context, which can be puzzling. When discussing sorting algorithms, the input consists of n elements, which means that a time complexity of e.g. O(n) (which however is impossible for comparison-based sorting) would be termed linear. In contrast, when discussing algorithms for matrix multiplication, the input is usually imagined as an n*n matrix - which has not n, but n^2 elements. In this case, an algorithm of complexity of O(n*n) (which however is unlikely again) would again be termed linear, although the expression describing it is actually a square term.
To put it all in a nutshell, the time complexity refers to the actual input size, not some technical parameter which might be different from it.

Cache Optimization - Hashmap vs QuickSort?

Suppose that I have N unsorted arrays, of integers. I'd like to find the intersection of those arrays.
There are two good ways to approach this problem.
One, I can sort the arrays in place with an nlogn sort, like QuickSort or MergeSort. Then I can put a pointer at the start of each array. Compare each array to the one below it, iterating the pointer of whichever array[pointer] is smaller, or if they're all equal, you've found an intersection.
This is an O(nlogn) solution, with constant memory (since everything is done in-place).
The second solution is to use a hash map, putting in the values that appear in the first array as keys, and then incrementing those values as you traverse through the remaining arrays (and then grabbing everything that had a value of N). This is an O(n) solution, with O(n) memory, where n is the total size of all of the arrays.
Theoretically, the former solution is o(nlogn), and the latter is O(n). However, hash maps do not have great locality, due to the way that items can be randomly scattered through the map, due to collisions. The other solution, although o(nlogn), traverses through the array one at a time, exhibiting excellent locality. Since a CPU will tend to pull the array values from memory that are next to the current index into the cache, the O(nlogn) solution will be hitting the cache much more often than the hash map solution.
Therefore, given a significantly large array size (as number of elements goes to infinity), is it feasible that the o(nlogn) solution is actually faster than the O(n) solution?
For integers you can use a non-comparison sort (see counting, radix sort). A large set might be encoded, e.g. sequential runs into ranges. That would compress the data set and allow for skipping past large blocks (see RoaringBitmaps). There is the potential to be hardware friendly and have O(n) complexity.
Complexity theory does not account for constants. As you suspect there is always the potential for an algorithm with a higher complexity to be faster than the alternative, due to the hidden constants. By exploiting the nature of the problem, e.g. limiting the solution to integers, there are potential optimizations not available to general purpose approach. Good algorithm design often requires understanding and leveraging those constraints.

dynamic array's time complexity of putting an element

In a written examination, I meet a question like this:
When a Dynamic Array is full, it will extend to double space, it's just like 2 to 4, 16 to 32 etc. But what's time complexity of putting an element to the array?
I think that extending space should not be considered, so I wrote O(n), but I am not sure.
what's the answer?
It depends on the question that was asked.
If the question asked for the time required for one insertion, then the answer is O(n) because big-O implies "worst case." In the worst case, you need to grow the array. Growing an array requires allocating a bigger memory block (as you say often 2 times as big, but other factors bigger than 1 may be used) and then copying the entire contents, which is the n existing elements. In some languages like Java, the extra space must also be initialized.
If the question asked for amortized time, then the answer is O(1). Another way of saying this is that the cost of n adds is O(n).
How can this be? Each addition is O(n), but n of them also require O(n). This is the beauty of amortization. For simplicity, say the array starts with size 1 and grows by a factor of 2 every time it fills, so we're always copying a power of 2 elements. This means the cost of growing is 1 the first time, 2 the second time, etc. In general, the total cost of growing to n elements is TC=1+2+4+...n. Well, it's not hard to see that TC = 2n-1. E.g. if n = 8, then TC=1+2+4+8=15=2*8-1. So TC is proportional to n or O(n).
This analysis works no matter the initial array size or the factor of growth, so long as the factor is greater than 1.
If your teacher is good, he or she asked this question in an ambiguous manner to see if you could discuss both answers.
In order to grow the array size you cannot simply "add more to the end" because you will more likely get a "segmentation fault" type of error. So even though as a mean value it takes θ(1) steps because you have enough space, in terms if O notation is O(n) because you have to copy the old array into a new bigger array (for which you allocated memory) and that should take n steps...generally. On the other hand of course that you can copy arrays faster generally because it's just a memory copy from a continuous space and that should be 1 step in the best scenario ,i.e where the page (OS) can take the whole array. In the end ... mathematically , even considering that we are making making n / (4096 * 2^10) (4 KB) steps, it still means a O(n) complexity.

How to find the kth smallest element of a list without sorting the list?

I need to find the median of an array without sorting or copying the array.
The array is stored in the shared memory of a cuda program. Copying it to global memory would slow the program down and there is not enough space in shared memory to make an additional copy of it there.
I could use two 'for' loops and iterate over every possible value and count how many values are smaller than it but this would be O(n^2). Not ideal
Does anybody now of a O(n) or O(nlogn) algorithm which solves my problem?
Thanks.
If your input are integers with absolute value smaller than C, there's a simple O(n log C) algorithm that needs only constant additional memory: Just binary search for the answer, i.e. find the smallest number x such that x is larger than or equal to at least k elements in the array. It's easily parallelizable too via a parallel prefix scan to do the counting.
Your time and especially memory constraints make this problem difficult. It becomes easy, however, if you're able to use an approximate median.
Say an element y is an ε approximate median if
m/2 − ε m < rank(y) < m/2 + ε m
Then all you need to do is sample
t = 7ε−2
log(2δ
−1
)
elements, and find their median any way you want.
Note that the number of samples you need is independent of your array's size - it is just a function of ε and δ.

Why double stack capacity instead of just increasing it by fixed amount?

I'm using using an array implementation of a stack, if the stack is full instead of throwing error I am doubling the array size, copying over the elements, changing stack reference and adding the new element to the stack. (I'm following a book to teach my self this stuff).
What I don't fully understand is why should I double it, why not increase it by a fixed amount, why not just increase it by 3 times.
I assume it has something to do with the time complexity or something?
A explanation would be greatly appreciated!
Doubling has just become the standard for generic implementations of things like array lists ("dynamically" sized arrays that really just do what you're doing in the background) and really most dynamically sized data types that are backed by arrays. If you knew your scenario and had the time and willpower to write a custom stack/array list implementation you could certainly write a more optimal solution.
If you knew in your software that items would be added incredibly infrequently after the initial array was built, you could initialise it with a specific size then only increase it by the size of what was being added to preserve memory.
On the other hand if you knew the list would be expanded very frequently, you might chose to increase the list size by 3 times or more when it runs out of space.
For a generic implementation that's part of a common library, your implementation specifics and requirements aren't known so doubling is just a happy medium.
In theory, you indeed arrive at different time complexities. If you increase by a constant size, you divide the number of re-allocations (and thus O(n) copies) by a constant, but you still get O(n) time complexity for appending. If you double them, you get a better time complexity for appending (armortized O(1) IIRC), and as you at most consume twice as much memory as needed, you still got the same space complexity.
In practice, it's less severe, but nevertheless viable. Copies are expensive, while a bit of memory usually doesn't hurt. It's a tradeoff, but you'd have to be quite low on memory to choose another strategy. Often, you don't know beforehand (or can't let the stack know due to API limits) how much space you'll actually need. For instance, if you build a 1024 element stack starting with one element, you get down to (I may be off by one) 10 re-allocations, from 1024/K -- assuming K=3, that would be roughly 34 times as many re-allocations, only to save a bit of memory.
The same holds for any other factor. 2 is nice because you never end up with non-integer sizes and it's still quite small, limiting the wasted space to 50%. Specific use cases may be better-served by other factors, but usually the ROI is too small to justify re-implementing and optimizing what's already available in some library.
The problem with a fixed amount is choosing that fixed amount - if you (say) choose 100 items as your fixed amount, that makes sense if your stack is currently ~100 items in size. However, if your stack is already 10,000 items in size, it's likely to grow to 11,000 items. You don't want to do 10 reallocations / moves to grow the size of your stack by 10%.
As for 2x versus 3x, that's pretty arbitrary - nothing wrong with choosing 3x; which is "better" will depend on your exact use case and how you define "better".
Scaling by 2x is easy, and will ensure that on average items get copied no more than twice [an expansion will copy half the items for the first time, a quarter for the second, an eighth for the third, etc.] If things instead grew by a fixed amount, then when e.g. the twentieth expansion was performed, half the items will be copied for the tenth time.
Growing by a factor of more than 2x will increase the average "permanent" slack space; growing by a smaller factor will increase the amount of storage that is allocated and abandoned. Depending upon the relative perceived "costs" of permanent and abandoned allocations, the optimal growth factor may be larger or smaller, but growth factors which are anywhere close to optimum will generally not perform too much worse than would optimum growth factors. Regardless of what the optimum growth factor would be, a growth factor of 2x will be close enough to yield decent performance.

Resources