Parallel algorithms O(log p)

Parallel algorithms O(log p) - theory

First off this isn't for any homework question, it's just on a general type of algorithm. In a parallel computing course I'm taking I'm having trouble wrapping my head around a style of algorithm that has runtime O( something + ... log p). For example we've looked at sequence reduction algorithms that are O(n/p + log p) where p = #procs and n is problem size. Log base 2.
The problem I have is the idea of log(p). For one I'm used to seeing log(n) everywhere in reducing problems to two subproblems of size n/2 etc. The second is just the idea of having the step complexity of an algorithm as log(p). Because that would imply that for a problem of fixed size if I increase the number of processors then I am increasing the number of steps in the algorithm? I have always thought of the step complexity of an algorithm as the sort of inherent sequential aspect of the algorithm and hence increasing or decreasing the number of processors shouldn't have any effect on this. Is this a bad way to think of it?
I guess what would be helpful is some pseudocode of algorithms that have log(p) running time somewhere in them.

Consider computing the sum of n numbers. Each processor can be assigned n/p numbers, but how do you add up the results from the individual processors? You could pass all p results to one processor, for a runtime O(n/p+p), but you can combine the sums faster in a tree-like fashion.

I think that O(n/p + log(p)) does make sense, because n/p + log(p) it's decreasing at the increasing of the p variable, so the running time decrease as you add processors and this bound does make sense; otherwise a running time of log(p) isn't likely to be natural because it's decreasing in respect to the processors number.

Related

Is O(cn) at least as fast as O(n) in a non asymptotically way?

So first of all let me talk about the motivation for this question. Let's supose you have to find the minimum and the maximum values in an array. In this case, you wave two ways of doing so.
The first one consists in iterating over the array and finding the maximum value, then doing the same thing to find the minimum value. This solution is O(2n).
The second one consists in iterating over the array just one time and finding both the minimum and maximum value at the same time. This solution is O(n).
Even though the time complexity has been halved, for each iteration of the O(n) solution you now have twice as many instructions (ignoring how the compiler can possibly optmize these instructions) so I believe they should take the same amount of time to execute.
Let me give you a second example. Now you need to reverse an array. Again, you have two ways of doing so.
The first one is to create an empty array, iterate over the data array filling the empty array. This solution is O(n).
The second one is to iterate over the data array, swapping the 0th and n-1th elements, then the 1th and n-2th elements and so on (using this strategy) until you reach the middle of the array. This solution is O((1/2)n).
Again, even though the time complexity has been cutted in half, you have three times more instructions per iteration. You're iterating over (1/2)n elements, but for each iteration you have to perform three XOR instructions. If you were not to use XOR, but an auxiliary variable you would still need 2 more instructions to perform the variable swapping, so now I believe that o((1/2)n) should actually be worse than o(n).
Having said these things, my question is the following:
Ignoring space complexity, garbage collecting and the compiler possible optimizations, can I assume that having O(c1*n) and O(c2*n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1*n) is as fast or faster than the one that gives me O(c2*n)?
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.

c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
The whole issue lies within the words "fast" or "faster". Computational complexity doesn't strictly measure what we intuitively understand as "fast". Without going into mathematical details (although it's a good idea: https://en.wikipedia.org/wiki/Big_O_notation), it answers the question "how fast it will go slower when my input grows". So if you have O(n^2) complexity you can roughly expect that doubling the size of the input will make your algorithm take 4 times more time. Whereas for linear complexity, 2 times bigger input gives only doubles the time. As you can see, it's relative, so any constants cancel out.
To sum up: from the way you ask your question, it doesn't seem the big-O notation is the correct tool here.

By definition, if c1 and c2 are constants, O(c1*n) === O(c2*c) === O(n). That is, the number of operations per element of your array of length n is completely irrelevant in this kind of complexity analysis.
All that it will tell you is that "it's linear". That is, if you have 1 bazillion operations for an array of length n, then you'll have 2 bazillion operations for an array of length 2*n (plus or minus something that grows slower than linear).
can I assume that having O(c1n) and O(c2n) algorithms so that c1 > c2, can I be sure that the algorithm that gives me O(c1n) is as fast or faster than the one that gives me O(c2n)?
Nope, not at all.
First, because the constants there are meaningless in that analysis. There's no way to put it: it is absolutely irrelevant whatever restrictions you put in c1 and c2 for big-O analysis. The whole idea is that it will discard those restrictions.
Second, because they don't tell you anything that would enable you to compare the two algorithms runtime for a specific value of n.
Such complexity analysis only enables you to compare the asymptotic behavior of algorithms. Real-world problems in general don't care about where the asymptotes are.
Assume that A1(n) is the number of operations Algorithm 1 needs for an input of length n, and A2(n) is the same for Algorithm 2. You could have:
A1(n) = 10n + 900
A2(n) = 100n
The complexity of both is O(A1) = O(A2) = O(n). For small inputs, A2 is faster. For large inputs, A1 is faster. The point where they change is n == 10.
This question is cool because it can make a difference on how I start writing code from here and on. If the "more complex" (c1) way is as fast as the "less complex" (c2) but more readable, i'm sticking with the "more complex" one.
Not only that, but also there's the fact that when you have 2 different algorithms that are really of different complexity classes (e.g., linear vs quadratic), it might still make sense to use the one of higher complexity as it may still be faster.
For example:
A3(n) = n^2
A4(n) = n + 10^20.
E.g., Algorithm 3 is quadratic, while Algorithm 4 is linear but it has a constant huge initialization time.
For inputs of size of up to around n == 10^10, it will be faster to use the quadratic algorithm.
It may very well be the case that all relevant inputs for your specific problem fall within that range, meaning that the quadratic algorithm would be the better, faster choice.
The bottom line is: for analyzing the actual time it will take to run an algorithm on a given input (or a given bounded range of inputs, as nearly all real-world problems are) and compare it with another algorithm, big-O analysis is meaningless.
Another way to put it: you're asking a practical "engineering" question (i.e., which option is better / faster) but trying to answer the question with a tool that's only useful for "theoretical" analysis. That tool is important, yes. But it has no chance of giving you the answer you're looking for, by design.

By definition, time complexity ignores constants. So O((1/2)n) == O(n) == O(2n) == O(cn).
Your example of O((1/2)n) shows why this is the case, because the constants can measure units of anything, so comparing them is meaningless.
You can never tell which algorithm is faster based only on the time complexity. But, you can tell which one would be faster as n approaches infinity. Since constants are removed from the time complexity, they would be considered equal and therefore with O(c1n) and O(c2n) you still would not be able to tell which one is faster even as n approaches infinity.

(my theoretical computer science courses are a couple of decades ago)
O(cn) is O(n).
It's still a linear search over the array.

Temporal complexity of primary instructions in C

I have a question about algorithmic complexity.
Do the basic instructions in C have an equivalent complexity, if not, in what order are they:
if, write/read a single cell of a matrix, a+b, a*b, a = b ...
Thanks

No. The basic instructions in C cannot be ordered by any kind of wall-time or theoretic complexity. This is not specified and probably cannot be specified by the Standard; rather, these properties arise from the interaction of the code, the OS, and the underlying architecture.
I think you're looking for information on cycles per instruction.
However, even this is not the whole story. Modern CPUs have hierarchical caches. If your algorithm operates on data which is primarily in a fast cache, then it will run much faster than a program which operates on data that must be repeatedly accessed from RAM, the hard drive, or over a network. The amount of calculation done per load is an application's arithmetic intensity. Roofline models provide a tool for thinking about this. You can achieve better cache utilization via blocking and other techniques, though the subfield of communication avoiding algorithms explores this in-depth.
Ultimately, the C language is a high-level abstraction of what a processor actually does. In standard cost models we think of all instructions as taking the same amount of time. In more accurate, but potentially more difficult to use, cache-aware cost models, data movement is treated as being more expensive.

Complexity is not about the time it takes to execute "basic" code lines like addition, multiplication, division and so on.
Even if these expressions have different execution time they all have complexity O(1).
Complexity is about what happens when some variable figure changes. That variable figure can be many different things. Some examples could be "the number of element in an array", "the number of elements in a linked list", "the size of a file", "the size of a matrix".
For instance - if you write code that has to find the largest value in an array of integers, the execution time depends on the number of elements in the array. The code will have to visit every array element to check if it's larger than the previous elements. Consequently, the complexity is O(N), where N is the number of elements. From that we can't say how much time it will take to find the largest element but we can say that it will take 10 times longer to execute on a 1000 element array than on a 100 element array.
Now if you did the same with a linked list (i.e. find largest element) the complexity would again be O(N). However, this does not say that a linked list perform just the same as an array. It only says that it scales in the same way as an array.
A simplified way to say it - if there is no loops involved the complexity is always
O(1).

Algorithm Complexity vs Running Time

I have an algorithm used for signal quantization. For the algorithm I have an equation to calculate its complexity with different values of parameters. This algorithm is implemented in C. Sometimes according to the equation I have less complexity but the running time is higher. I'm not 100% sure about the equation.
My question is running time and algorithm complexity are all the time having straight relation? Means, always the higher complexity we have, the higher running time happens? Or it's different from one algorithm to another?

Time complexity is more a measure of how time varies with input size than an absolute measure.
(This is an extreme simplification, but it will do for explaining the phenomenon you're seeing.)
If n is your problem size and your actual running time is 1000000000 * n, it has linear complexity, while 0.000000001*n^2 would be quadratic.
If you plot them against each other, you'll see that 0.000000001*n^2 is smaller than 1000000000 * n all the way up to around n = 1e18, despite its "greater complexity".
(0.000000001*n^2 + 1000000000 * n would also be quadratic, but always have worse execution time than both.)

No, running time and algorithmic complexity do not have a simple relationship.
Estimating or comparing run times can easily get very complicated and detailed. There are many variables that vary even with the same program and input data - that's why benchmarks do multiple runs and process them statistically.
If you're looking for big differences, generally the two most significant factors are algorithmic complexity ("big O()") and start up time. Frequently, the lower "big O()" algorithm requires more complex startup; that is, it takes more initial setup in the program before entering the actual loop. If it takes longer to do that initial setup than run the rest of the algorithm for small data sets, the larger O() rated algorithm will run faster for those small data sets. For large data sets, the lower O() algorithm will be faster. There will be a data set size where the total time is equal, called the "crossover" size.
For performance, you'd want to check if most of your data was above or below that crossover as part of picking the algorithm to implement.
Getting more and more detail and accuracy in runtime predictions gets much more complex very quickly.

Searching missing number - simple example

A little task on searching algorithm and complextiy in C. I just want to make sure im right.
I have n natural numbers from 1 to n+1 ordered from small to big, and i need to find the missing one.
For example: 1 2 3 5 6 7 8 9 10 11 - ans: 4
The fastest and the simple answer is do one loop and check every number with the number that comes after it. And the complexity of that is O(n) in the worst case.
I thought maybe i missing something and i can find it with using Binary Search. Can anybody think on more efficient algorithm in that simple example?
like O(log(n)) or something ?

There's obviously two answers:
If your problem is a purely theoretical problem, especially for large n, you'd do something like a binary search and check whether the middle between the two last boundaries is actually (upper-lower)/2.
However, if this is a practical question, for modern systems executing programs written in C and compiled by a modern, highly optimizing compiler for n << 10000, I'd assume that the linear search approach is much, much faster, simply because it can be vectorized so easily. In fact, modern CPUs have instructions to take e.g. each
4 integers at once, subtract four other integers,
compare the result to [4 4 4 4]
increment the counter by 4,
load the next 4 integers,
and so on, which very neatly lends itself to the fact that CPUs and memory controllers prefetch linear memory, and thus, jumping around in logarithmically descending step sizes can have an enormous performance impact.
So: For large n, where linear search would be impractical, go for the binary search approach; for n where that is questionable, go for the linear search. If you not only have SIMD capabilities but also multiple cores, you will want to split your problem. If your problem is not actually exactly 1 missing number, you might want to use a completely different approach ... The whole O(n) business is generally more of a benchmark usable purely for theoretical constructs, and unless the difference is immensely large, is rarely the sole reason to pick a specific algorithm in a real-world implementation.

For a comparison-based algorithm, you can't beat Lg(N) comparisons in the worst case. This is simply because the answer is a number between 1 and N and it takes Lg(N) bits of information to represent such a number. (And a comparison gives you a single bit.)
Unless the distribution of the answers is very skewed, you can't do much better than Lg(N) on average.
Now I don't see how a non-comparison-based method could exploit the fact that the sequence is ordered, and do better than O(N).

What is the running time of comb sort?

According to me, Comb sort should also run in sub quadratic time just like shell sort. This is because comb sort is to bubble sort just how shell sort is related to insertion sort. Shell sort sorts the array according to gap sequences applying insertion sort and similarly comb sort sorts the array according to gap sequences applying bubble sort. So what is the the running time of comb sort?

(This question has been unanswered for a while, so I'm converting my comment into an answer.)
Although there are similarities between shell sort and comb sort, the average-case runtime of comb sort is O(n2). Proving this is a bit tricky, and the technique that I've seen used to prove it is the incompressibility method, an information-theoretic technique involving Kolmogorov complexity.
Hope this helps!

With what sequence of increments?
If the increments are chosen to be: the set of all numbers of the form (2^p * 3^q), that are less than N, then, yes, the running time is better than quadratic (it's proportional to N times the square of the logarithm of N). With that set of increments, Combsort performs exactly the same exchanges as a Shellsort using the same increments (the "Pratt sequence"). But that's not what people usually have in mind when they're talking about Combsort.
In theory...
With increments that are decreasing geometrically (e.g. on each pass over the input the increment is, say, about 80% of the previous increment), which is what people usually mean when they talk about Combsort... yes, asymptotically, it is quadratic in both the worst-case and the average case. But...
In practice...
So long as the increments are relatively prime and the ratio between one increment and the next is sensible (80% is fine), n has to astronomically large before the average running time will be much more than n.log(n). I've sorted hundreds of millions of records at a time with Combsort, and I've only ever seen quadratic running times when I've deliberately engineered them by constructing "killer inputs". In practice, with relatively prime increments (and a ratio between adjacent increments of 1.25:1), even for millions of records, Combsort requires on average, about 3 times as many comparisons as a mergesort and typically takes between 2 and 3 times as long to run.