Haskell: List v. Array, difference in performance - arrays

Another question from a Haskell n00b.
I'm comparing the efficiency of various methods used to solve Problem #14 on the Project Euler website. In particular, I'm hoping to better understand the factors driving the difference in evaluation time for four (slightly) different approaches to solving the problem.
(Descriptions of problem #14 and the various approaches are below.)
First, a quick overview of Problem #14. It has to do with "Collatz numbers" (i.e., same programming exercise as my previous post which explored a different aspect of Haskell). A Collatz number for a given integer is equal to the length of the Collatz sequence for that integer. A Collatz sequence for an integer is calculated as follows: the first number ("n0") in the sequence is that integer itself; if n0 is even, the next number in the sequence ("n1") is equal to n / 2; if n0 is odd, then n1 is equal to 3 * n0 + 1. We continue recursively extending the sequence until we arrive at 1, at which point the sequence is finished. For example, the collatz sequence for 5 is: {5, 16, 8, 4, 2, 1} (because 16 = 3 * 5 + 1, 8 = 16 / 2, 4 = 8 / 2,...).
Problem 14 asks us to find the integer below 1,000,000 which has the largest Collatz number. To that effect, we can consider a function "collatz" which, when passed an integer "n" as an argument, returns the integer below n with the largest Collatz number. In other words, p 1000000 gives us the answer to Problem #14.
For the purposes of this exercise (i.e., understanding differences in evaluation time), we can consider Haskell versions of 'collatz' which vary across two dimensions:
(1) Implementation: Do we store the dataset of Collatz numbers (which will be generated for all integers 1..n) as a list or an array? I call this the "implementation" dimension, i.e., a function's implementation is either "list" or "array".
(2) Algorithm: do we calculate the Collatz number for any given integer n by extending out the Collatz sequence until it is complete (i.e., until we reach 1)? Or do we only extend out the sequence until we reach a number k which is smaller than n (at which point we can simply use the Collatz number of k, which we've already calculated)? I call this the "algorithm" dimension, i.e., a function's algorithm is either "complete" (calculation of Collatz number for each integer) or "partial". The latter obviously requires fewer operations.
Below are the four possible versions of the "collatz" function: array / partial, list / partial, array / complete and list / complete:
import Data.Array ( (!) , listArray , assocs )
import Data.Ord ( comparing )
import Data.List ( maximumBy )
--array implementation; partial algorithm (FEWEST OPERATIONS)
collatzAP x = maximumBy (comparing snd) $ assocs a where
a = listArray (0,x) (0:1:[c n n | n <- [2..x]])
c n i = let z = if even i then div i 2 else 3*i+1
in if i < n then a ! i else 1 + c n z
--list implementation; partial algorithm
collatzLP x = maximum a where
a = zip (0:1:[c n n | n <- [2..x]]) [0..x]
c n i = let z = if even i then div i 2 else 3*i+1
in if i < n then fst (a!!i) else 1 + c n z
--array implementation, complete algorithm
collatzAC x = maximumBy (comparing snd) $ assocs a where
a = listArray (0,x) (0:1:[c n n | n <- [2..x]])
c n i = let z = if even i then div i 2 else 3*i+1
in if i == 1 then 1 else 1 + c n z
--list implementation, complete algorithm (MOST OPERATIONS)
collatzLC x = maximum a where
a = zip (0:1:[c n n | n <- [2..x]]) [0..x]
c n i = let z = if even i then div i 2 else 3*i+1
in if i == 1 then 1 else 1 + c n z
Regarding speed of evaluation: I know that arrays are far faster to access than lists (i.e., O(1) vs. O(n) access time for a given index n) so I expected the 'array' implementation of "collatz" to be faster than the 'list' implementation, ceteris paribus. Also, I expected the 'partial' algorithm to be faster than the 'complete' algorithm (ceteris paribus), given it needs to perform fewer operations in order to construct the dataset of Collatz numbers.
Testing our four functions across inputs of varying size, we observe the following evaluation times (comments below):
It's indeed the case that the 'array/partial' version is the fastest version of "collatz" (by a good margin). However, I find it a bit counter-intuitive that 'list/complete' isn't the slowest version. That honor goes to 'list/partial', which is more than 20x slower than 'list/complete'!
My question: Is the difference in evaluation time between 'list/partial' and 'list/complete' (as compared to that between 'array/partial' and 'array/complete') entirely due to the difference in access efficiency between lists and arrays in Haskell? Or am I not performing a "controlled experiment" (i.e., are there other factors at play)?

I do not understand how the question about relative performance of two algorithms that work with lists are related to arrays at all...but here is my take:
Try to avoid indexing lists, especially long lists, if performance is of any concern. Indexing is really a traversal (as you know). "List/partial" is indexing/traversing a lot. List/complete is not. Hence the difference between Array/complete and List/complete is negligible, and the different between "list/partial" and the rest is huge.

Related

Is it possible to do 3-sum/4-sum...k-sum better than O(n^2) with these conditions? - Tech Interview

this is a classic problem, but I am curious if it is possible to do better with these conditions.
Problem: Suppose we have a sorted array of length 4*N, that is, each element is repeated 4 times. Note that N can be any natural number. Also, each element in the array is subject to the constraint 0 < A[i] < 190*N. Are there 4 elements in the array such that A[i] + A[j] + A[k] + A[m] = V, where V can be any positive integer; note we must use exactly 4 elements and they can be repeated. It is not necessarily a requirement to find the 4 elements that satisfy the condition, rather, just showing it can be done for a given array and V is enough.
Ex : A = [1,1,1,1,4,4,4,4,5,5,5,5,11,11,11,11]
V = 22
This is true because, 11 + 5 + 5 + 1 = 22.
My attempt:
Instead of "4sum" I first tried k-sum, but this proved pretty difficult so I instead went for this variation. The first solution I came to was rather naive O(n^2). However, given these constraints, I imagine that we can do better. I tried some dynamic programming methods and divide and conquer, but that didn't quite get me anywhere. To be specific, I am not sure how to cleverly approach this in a way where I can "eliminate" portions of the array without having to explicitly check values against all or almost all permutations.
Make an vector S0 of length 256N where S0[x]=1 if x appears in A.
Perform a convolution of S0 with itself to produce a new vector S1 of length 512N. S1[x] is nonzero iff x is the sum of 2 numbers in A.
Perform a convolution of S1 with itself to make a new vector S2. S2[x] is nonzero iff x is the sum of 4 numbers in A.
Check S2[V] to get your answer.
Convolution can be performed in O(N log N) time using FFT convolution (http://www.dspguide.com/ch18/2.htm) or similar techniques.
Since at most 4 such convolutions are performed, the total complexity is O(N log N)

Finding all combinations of elements from two sets such that their geometric mean falls into third set

I have a integers from 1 to n. I randomly allot every integer into one of three sets A, B and C (A ∩ B = B ∩ C = C ∩ A = Ø). Every integer does belong to one set. So I need to calculate all combination of elements (a,b) such that a ∈ A, b ∈ B, and the geometric mean of a,b belongs to C. Basically sqrt(a*b) ∈ C.
My solution is to first mark on an array of size n whether every element went into set A,B or C. Then I loop through the array for all elements that belong to A. When I encounter one, I again loop through for all elements that belong to B. If array[sqrt(a*b)] == C, then I add (a, b, sqrt(a,b)) as one possible combination. Then I do the same for the entire array, which is O(n^2).
Is there a more optimal solution possible?
It can be done with better complexity than O(n^2). The solution sketched here is in O(n * sqrt(n) * log(n)).
The main idea is the following:
let (a, b, c) be a good solution, i.e. one with sqrt(a * b) = c. We can write a as a = s * t^2, where s is the product of the prime numbers that have odd exponents in a's prime factorization. It's guaranteed that the remaining part of a is a perfect square. Since a * b is a perfect square, then b must be of the form s * k^2. For each a (there are O(n) such numbers), after finding s from the decomposition above (this can be done in O(log(n)), as it will be described next), we can restrict our search for the number b to those of the form b = s * k^2, but there are only O(sqrt(n)) numbers like this smaller than n. For each pair a, b enumerated like this we can test in O(1) whether there is a good c, using the representation you used in the question.
One critical part in the idea above is decomposing a into s * t^2, i.e. finding the primes that have odd power in a's factorization.
This can be done using a pre-processing step, that finds the prime factors (but not also their powers) of every number in {1, 2, .. n}, using a slightly modified sieve of Eratosthenes. This modified version would not only mark a number as "not prime" when iterating over the multiples of a prime, but would also append the current prime number to the list of the factors of the current multiple. The time complexity of this pre-processing step is n * sum{for each prime p < n}(1/p) = n * log(log(n)) -- see this for details.
Using the result of the pre-processing, which is the list of primes which divide a, we can find those primes with odd power in O(log(n)). This is achieved by dividing a by each prime in the list until it is no more divisible by that prime. If we made an odd number of divisions, then we use the current prime in s. After all divisions are done, the result will be equal to 1. The complexity of this is O(log(n)) because in the worst case we always divide the initial number by 2 (the smallest prime number), thus it will take at most log2(a) steps to reach value 1.
The complexity of the main step dominates the complexity of the preprocessing, thus the overall complexity of this approach is O(n * sqrt(n) * log(n)).
Remark: in the decomposition a = s * t^2, s is the product of the prime numbers in a with odd exponents, but their exponent is not used in s (i.e. s is just the product of those primes, with exponent 1). Only in this situation it is guaranteed that b should be of the form s * k^2. Indeed, since a * b = c * c, the prime factorization of the right hand side uses only even exponents, thus all primes from s should also appear in b with odd exponents, and all other primes from b's factorization should have even exponents.
Expanding on the following line: "we can restrict our search for the number b to those of the form b = s * k^2, but there are only O(sqrt(n)) numbers like this smaller than n".
Let's consider an example. Imagine that we have something like n = 10,000 and we are currently looking for solutions having a = 360 = 2^3 * 3^2 * 5. The primes with odd exponent in a's factorization are 2 and 5 (thus s = 2 * 5; a = 10 * 6^2).
Since a * b is a perfect square, it means that all primes in the prime factorization of a * b have even exponents . This implies that those two primes (2 and 5) need to also appear in b's factorization with odd exponents, and the rest of the exponents in b's prime factorization need to be even. Thus b is of the form s * k^2 = 10 * k ^ 2.
So we proved that b = 10 * k ^ 2. This is helpful, because we can now enumerate all the b values of this form quickly (in O(sqrt(n)). We only need to consider k = 1, k = 2, ..., k = (int)sqrt(n / 10). Larger values of k result in values of b larger than n. Each of these k values determines one b value, which we need to verify. Note that when verifying one of these b values, it should be first checked whether it indeed is in set B, which can be done in O(1), and whether sqrt(a * b) is in the set C, which can also be done in O(1).

How to find the most frequent number and its frequency in an array in range L,R most efficiently?

Lets say we are given an array A[] of length N and we have to answer Q queries which consists of two integers L,R. We have to find the number from A[L] to A[R] which has its frequency at least (R-L+1)/2. If such number doesn't exist then we have to print "No such number"
I could think of only O(Q*(R-L)) approach of running a frequency counter and first obtaining the most frequent number in the array from L to R. Then count its frequency.
But more optimization is needed.
Constraints: 1<= N <= 3*10^5, ,1<=Q<=10^5 ,1<=L<=R<=N
I know an O((N + Q) * sqrt(N)) solution:
Let's call a number heavy if at occurs at least B times in the array. There are at most N / B heavy numbers in the array.
If the query segment is "short" (R - L + 1 < 2 * B), we can answer it in O(B) time (by simply iterating over all elements of the range).
If the query segment is "long" (R - L + 1 >= 2 * B), a frequent element must be heavy. We can iterate over all heavy numbers and check if at least one then fits (to do that, we can precompute prefix sums of number of occurrences for each heavy element and find the number of its occurrences in a [L, R] segment in constant time).
If we set B = C * sqrt(N) for some constant C, this solution runs in O((N + Q) * sqrt(N)) time and uses O(N * sqrt(N)) memory. With properly chosen C, and may fit into time and memory limit.
There is also a randomized solution which runs in O(N + Q * log N * k) time.
Let's store a vector of position of occurrences for each unique element in the array. Now we can find the number of occurrences of a fixed element in a fixed range in O(log N) time (two binary searches over the vector of occurrences).
For each query, we'll do the following:
pick a random element from the segment
Check the number of its occurrences in O(log N) time as described above
If it's frequent enough, we are done. Otherwise, we pick another random element and do the same
If a frequent element exists, the probability not to pick it is no more than 1 / 2 for each trial. If we do it k times, the probability not to find it is (1 / 2) ^ k
With a proper choice of k (so that O(k * log N) per query is fast enough and (1 / 2) ^ k is reasonably small), this solution should pass.
Both solutions are easy to code (the first just needs prefix sums, the second only uses a vector of occurrences and binary search). If I had to code one them, I'd pick the latter (the former can be more painful to squeeze in time and memory limit).

Probability mass of summing two discrete random variables, in linearithmic time

Given two discrete random variables, their (arbitrary) probability mass functions a and b and a natural-number N such that both of the variables have the domain [0..N] (therefore the functions can be represented as arrays), the probability that the functions' corresponding random variables have a given sum (i.e. P(A+B==target)) can be computed, in O(N) time, by treating the arrays as vectors and using their dot product, albeit with one of the inputs reversed and both inputs re-sliced in order to align them and eliminate bounds errors; thus each position i of a is matched with a position j of b such that i+j==target. Such an algorithm looks something like this:
-- same runtime as dotProduct and sum; other components are O(1)
P :: Vector Int -> Vector Int -> Int -> Ratio Int
P a b target | length a /= length b = undefined
| 0 <= target && target <= 2 * length a
= (dotProduct (shift target a) (reverse (shift target b)))
%
(sum a * sum b) -- O(length a)
-- == sum $ map (\x -> (a!x)*(b!(target-x))) [0..length a]
| otherwise = 0
where
-- O(1)
shift t v = slice start' len' v
where start = t - length v - 1
len = length v - abs start
-- unlike `drop ... $ take ... v`,
-- slice does not simply `id` when given out-of-bounds indices
start' = min (V.length v) (max 0 start)
len' = min (V.length v) (max 0 len)
-- usual linear-algebra definition
-- O(length a); length inequality already guarded-away by caller
dotProduct a b = sum $ zipWith (*) a b
Given the same information, one might treat the variables' sum as its own discrete random variable, albeit one whose probability mass function is unknown. Evaluating the entirety of this probability mass function (and thereby producing the array that corresponds thereto) can be done in O(N²) time by performing N dot-products, with each product having its operands differently-shifted; i.e.:
pm :: Vector Int -> Vector Int -> Vector (Ratio Int)
pm a b = map (P a b) $ generate (2 * length a + 1) id
I am told, however, that producing such a table of values of this probability mass function can actually be done in O(N*log(N)) time. As far as I can tell, no two of the multiplications across all of the involved dot-products share the same ordered pair of indices, and I do not think that I can, e.g., combine two dot-subproducts in any useful way to form a T(n)=2T(n/2)+O(n)-type recursion; therefore I am curious as to how and why, exactly, such a runtime is possible.
In a nutshell, you have a transformation F (called discrete Fourier transform) that maps the set of vectors of size N onto itself and such that
F(a*b) = F(a).F(b)
where * is the convolution operator you just described and . is the standard dot product.
Moreover Fis invertible and you can therefore recover a*b as
a*b = F^{-1}(F(a).F(b))
Now this is all very nice but the key point is that F (and F^{-1}) can be computed in O(N log(N)) time using something called Fast Fourier Transform (FFT). Thereby, because the usual dot product . can be computed in O(N), you obtain a O(N log(N)) algorithm for computing the convolution of two distributions.
I therefore suggest you look up this and that.

Generating random number in sorted order

I want to generate random number in sorted order.
I wrote below code:
void CreateSortedNode(pNode head)
{
int size = 10, last = 0;
pNode temp;
while(size-- > 0) {
temp = (pnode)malloc(sizeof(struct node));
last += (rand()%10);
temp->data = last;//randomly generate number in sorted order
list_add(temp);
}
}
[EDIT:]
Expecting number will be generated in increased or decreased order: i.e {2, 5, 9, 23, 45, 68 }
int main()
{
int size = 10, last = 0;
while(size-- > 0) {
last += (rand()%10);
printf("%4d",last);
}
return 0;
}
Any better idea?
Solved back in 1979 (by Bentley and Saxe at Carnegie-Mellon):
https://apps.dtic.mil/dtic/tr/fulltext/u2/a066739.pdf
The solution is ridiculously compact in terms of code too!
Their paper is in Pascal, I converted it to Python so it should work with any language:
from random import random
cur_max=100 #desired maximum random number
n=100 #size of the array to fill
x=[0]*(n) #generate an array x of size n
for i in range(n,0,-1):
cur_max=cur_max*random()**(1/i) #the magic formula
x[i-1]=cur_max
print(x) #the results
Enjoy your sorted random numbers...
Without any information about sample size or sample universe, it's not easy to know if the following is interesting but irrelevant or a solution, but since it is in any case interesting, here goes.
The problem:
In O(1) space, produce an unbiased ordered random sample of size n from an ordered set S of size N: <S1,S2,…SN>, such that the elements in the sample are in the same order as the elements in the ordered set.
The solution:
With probability n/|S|, do the following:
add S1 to the sample.
decrement n
Remove S1 from S
Repeat steps 1 and 2, each time with the new first element (and size) of S until n is 0, at which point the sample will have the desired number of elements.
The solution in python:
from random import randrange
# select n random integers in order from range(N)
def sample(n, N):
# insist that 0 <= n <= N
for i in range(N):
if randrange(N - i) < n:
yield i
n -= 1
if n <= 0:
break
The problem with the solution:
It takes O(N) time. We'd really like to take O(n) time, since n is likely to be much smaller than N. On the other hand, we'd like to retain the O(1) space, in case n is also quite large.
A better solution (outline only)
(The following is adapted from a 1987 paper by Jeffrey Scott Vitter, "An Efficient Algorithm for Sequential Random Sampling". See Dr. Vitter's publications page.. Please read the paper for the details.)
Instead of incrementing i and selecting a random number, as in the above python code, it would be cool if we could generate a random number according to some distribution which would be the number of times that i will be incremented without any element being yielded. All we need is the distribution (which will obviously depend on the current values of n and N.)
Of course, we can derive the distribution precisely from an examination of the algorithm. That doesn't help much, though, because the resulting formula requires a lot of time to compute accurately, and the end result is still O(N).
However, we don't always have to compute it accurately. Suppose we have some easily computable reasonably good approximation which consistently underestimates the probabilities (with the consequence that it will sometimes not make a prediction). If that approximation works, we can use it; if not, we'll need to fallback to the accurate computation. If that happens sufficiently rarely, we might be able to achieve O(n) on the average. And indeed, Dr. Vitter's paper shows how to do this. (With code.)
Suppose you wanted to generate just three random numbers, x, y, and z so that they are in sorted order x <= y <= z. You will place these in some C++ container, which I'll just denote as a list like D = [x, y, z], so we can also say that x is component 0 of D, or D_0 and so on.
For any sequential algorithm that first draws a random value for x, let's say it comes up with 2.5, then this tells us some information about what y has to be, Namely, y >= 2.5.
So, conditional on the value of x, your desired random number algorithm has to satisfy the property that p(y >= x | x) = 1. If the distribution you are drawing from is anything like a common distribution, like uniform or Guassian, then it's clear to see that usually p(y >= x) would be some other expression involving the density for that distribution. (In fact, only a pathological distribution like a Dirac Delta at "infinity" could be independent, and would be nonsense for your application.)
So what we can speculate with great confidence is that p(y >= t | x) for various values of t is not equal to p(y >= t). That's the definition for dependent random variables. So now you know that the random variable y (second in your eventual list) is not statistically independent of x.
Another way to state it is that in your output data D, the components of D are not statistically independent observations. And in fact they must be positively correlated since if we learn that x is bigger than we thought, we also automatically learn that y is bigger than or equal to what we thought.
In this sense, a sequential algorithm that provides this kind of output is an example of a Markov Chain. The probability distribution of a given number in the sequence is conditionally dependent on the previous number.
If you really want a Markov Chain like that (I suspect that you don't), then you could instead draw a first number at random (for x) and then draw positive deltas, which you will add to each successive number, like this:
Draw a value for x, say 2.5
Draw a strictly positive value for y-x, say 13.7, so y is 2.5 + 13.7 = 16.2
Draw a strictly positive value for z-y, say 0.001, so z is 16.201
and so on...
You just have to acknowledge that the components of your result are not statistically independent, and so you cannot use them in an application that relies on statistical independence assumptions.

Resources