Complexity for nested for loops [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I want to find the complexity of the code below.
I used this code to find the second highest element in the array using sorting.
for(i=0;i<2;i++)
{
for(j=0;j<n;j++)
{
//some code
}
}
Is the complexity O(2n) or O(n2)?

It is a very vast topic. I am just putting my effort to bring it to you. rest you refer some good books for it. My recommendation in Coreman.
Complexity :
basic structure of a for loop is
for(initialization;condition;updation)
in updation we are updating values, so basically we are iterating the loop upto the condition.
so it is like
n*(n+1)/2
which is basically O(n^2) in your two for loop case.
Estimation of Complexity:
Sometimes it is not easy to get a formula for the complexity of an algorithm. In such cases it may be possible to estimate it by experiment. Counting-variables can be added to the program, incremented when some critical operation is carried out and the final totals printed. The running time can also be measured, either by a stop-watch or better by calling a routine to print the computer system's clock. The complexity might be inferred by examining how such measures vary with the problem size.
The accuracy of timing a program or an operation can be improved by timing a number of executions, perhaps in a loop, and dividing the total time taken by that number. A time-shared computer is used by many people simultaneously. The elapsed time taken by a program depends on the system load. Therefore any timing done on a shared machine must be based on the central processor time devoted to the particular program under study and not on the elapsed time.
Examining differences between adjacent terms in a series can indicate the form of the underlying function that defines the series. A linear function, T(n)=a*n+b gives rise to constant difference between T(n) and T(n-1):
D1(n) = T(n)-T(n-1) = a*n+b-a*(n-1)-b = a
A quadratic function T(n)=a*n2+b*n+c gives rise to linear first-order differences:
D1(n) = T(n)-T(n-1) = a*n2+b*n+c-a*(n-1)2-b*(n-1)-c = 2a*n-a+b
which gives rise to constant second-order differences D2(n) = D1(n)-D1(n-1). In general, a polynomial of degree d is revealed by constant dth-order differences.

The best way to know the solution is to draw a table:
Iteration | i | j
----------+--------+-------
0 | 0 | 0
0 | 0 | 1
0 | 0 | 2
0 | ... | ...
0 | ... | ...
0 | ... | n - 1
1 | 1 | 0
1 | 1 | 1
1 | ... | ...
1 | ... | ...
1 | ... | n - 1
How many times it is executed? That's the answer..
If you want to have an intuition you should pick some n, run an example.. Then choose another n and see what you get, finally you'll conclude what's the answer.

if "some code" does o(1) then the complexity of this code is O(2n)
that's because the inner code is in complexity of o(n), and we do this loop for 2 times. then it's O(2n)

Big Oh notation gives order of magnitude estimates, differences in constants really do not affect the magnitude of an algorithm, so O(2n) = 2O(n) = (n).
Similar to 1000 >> 10 = 5. That is, 1000 is much bigger than 10 and it is just as "bigger" than 5 as it is for 10, even though 10 is twice the value of 5.

Related

Questions about time and space complexity nuances

Let's say I have an algorithm, spiral(), that takes an integer n and returns an array containing the integers 1 to n2 in a spiral pattern. i.e.,
spiral(n);
iterates over the integers 1 to n^2 and inserts them into a 2d array which it creates and returns. e.g.
spiral(3);
returns
[[1,2,3],
[8,9,4],
[7,6,5]]
Obviously, the time complexity is O(n^2), but what is the space complexity? I would say also O(n^2) as we're allocating that much space as a result of calling the function, but I see in most places like leetcode for questions like this they say that the time complexity is O(1). Does necessary return values not count?
Another smaller question I have is about functions like spiral() above. Let's say there's a function like
returnmbyn(m,n);
which returns an array of size m*n, but uses no other memory, and just iterates through and inserts each value in the array iteratively. Do we also consider this function to have a space complexity of O(n^2) and time complexity of O(1)?
Generally, the auxiliary space is referred to as space complexity. Space complexity is the total space required for solving a problem including the input and output memory. But it doesn't make sense to include the input and output spaces when analyzing an algorithm and hence we talk about auxiliary space and call it the "space complexity" of that algorithm.
Let me give you an example. Suppose you are given this problem:
Given 0 <= i, j < n, find M[i][j], where M is a nxn matrix that looks like this:
| 1 1 1 1 ... 1 1 |
| 1 2 2 2 ... 2 2 |
| 1 2 3 3 ... 3 3 |
| 1 2 3 . |
|  ⋮ . ⋮ |
| 1 2 3 ....... n |
How would you do that? Of course you could easily write such matrix in the memory, which would take you O(n^2) time and space, since you need n^2 operations to fill the matrix and n^2 slots to hold in the memory.
The point is: do you really need to store a matrix to know what is inside the matrix? In this case, this matrix can be easily represented as a function:
f(i, j) = min(i, j) + 1
Thus, with O(1) time and space, you can tell what is in M[i][j] without storing the matrix itself.
The trick of these kinds of question is to think about the function that gives you what is inside each slot of the matrix without having to actually fill it.

Can we use binary search to find most frequently occuring integer in sorted array? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Problem:
Given a sorted array of integers find the most frequently occurring integer. If there are multiple integers that satisfy this condition, return any one of them.
My basic solution:
Scan through the array and keep track of how many times you've seen each integer. Since it's sorted, you know that once you see a different integer, you've gotten the frequency of the previous integer. Keep track of which integer had the highest frequency.
This is O(N) time, O(1) space solution.
I am wondering if there's a more efficient algorithm that uses some form of binary search. It will still be O(N) time, but it should be faster for the average case.
Asymptotically (big-oh wise), you cannot use binary search to improve the worst case, for the reasons the answers above mine have presented. However, here are some ideas that may or may not help you in practice.
For each integer, binary search for its last occurrence. Once you find it, you know how many times it appears in the array, and can update your counts accordingly. Then, continue your search from the position you found.
This is advantageous if you have only a few elements that repeat a lot of times, for example:
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Because you will only do 3 binary searches. If, however, you have many distinct elements:
1 2 3 4 5 6
Then you will do O(n) binary searches, resulting in O(n log n) complexity, so worse.
This gives you a better best case and a worse worst case than your initial algorithm.
Can we do better? We could improve the worst case by finding the last occurrence of the number at position i like this: look at 2i, then at 4i etc. as long as the value at those positions are the same. If they are not, look at (i + 2i) / 2 etc.
For example, consider the array:
i
1 2 3 4 5 6 7 ...
1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
We look at 2i = 2, it has the same value. We look at 4i = 4, same value. We look at 8i = 8, different value. We backtrack to (4 + 8) / 2 = 6. Different value. Backtrack to (4 + 6) / 2 = 5. Same value. Try (5 + 6) / 2 = 5, same value. We search no more, because our window has width 1, so we're done. Continue the search from position 6.
This should improve the best case, while keeping the worst case as fast as possible.
Asymptotically, nothing is improved. To see if it actually works better on average in practice, you'll have to test it.
Binary search, which eliminates half of the remaining candidates, probably wouldn't work. There are some techniques you could use to avoid reading every element in the array. Unless your array is extremely long or you're solving a problem for curiosity, the naive (linear scan) solution is probably good enough.
Here's why I think binary search wouldn't work: start with an array: given the value of the middle item, you do not have enough information to eliminate the lower or upper half from the search.
However, we can scan the array in multiple passes, each time checking twice as many elements. When we find two elements that are the same, make one final pass. If no other elements were repeated, you've found the longest element run (without even knowing how many of that element is in the sorted list).
Otherwise, investigate the two (or more) longer sequences to determine which is longest.
Consider a sorted list.
Index 0 1 2 3 4 5 6 7 8 9 a b c d e f
List 1 2 3 3 3 3 3 3 3 4 5 5 6 6 6 7
Pass1 1 . . . . . . 3 . . . . . . . 7
Pass2 1 . . 3 . . . 3 . . . 5 . . . 7
Pass3 1 2 . 3 . x . 3 . 4 . 5 . 6 . 7
After pass 3, we know that the run of 3's must be at least 5, while the longest run of any other number is at most 3. Therefore, 3 is the most frequently occurring number in the list.
Using the right data structures and algorithms (use binary-tree-style indexing), you can avoid reading values more than once. You can also avoid reading the 3 (marked as an x in pass 3) since you already know its value.
This solution has running time O(n/k) which degrades to O(n) for k=1 for a list with n elements and a longest run of k elements. For small k, the naive solution will perform better due to simpler logic, data structures, and higher RAM cache hits.
If you need to determine the frequency of the most common number, it would take O((n/k) log k) as indicated by David to find the first and last position of the longest run of numbers using binary search on up to n/k groups of size k.
The worst case cannot be better than O(n) time. Consider the case where each element exists once, except for one element which exists twice. In order to find that element, you'd need to look at every element in the array until you find it. This is because knowing the value of any array element does not give you any information regarding the location of the duplicate element, until it's actually found. This is in contrast to binary search, where the value of an array element allows you to rule out many other elements.
No, in the worst case we have to scan at least n - 2 elements, but see
below for an algorithm that exploits inputs with many duplicates.
Consider an adversary that, for the first n - 3 distinct probes into the
n-element array, returns m for the value at index m. Now the algorithm
knows that the array looks like
1 2 3 ... i-1 ??? i+1 ... j-1 ??? j+1 ... k-1 ??? k+1 ... n-2 n-1 n.
Depending on what the ???s are, the sole correct answer could be j-1
or j+1, so the algorithm isn’t done yet.
This example involved an array where there were very few duplicates. In
fact, we can design an algorithm that, if the most frequent element
occurs k times out of n, uses O((n/k) log k) probes into the array. For
j from ceil(log2(n)) - 1 down to 0, examine the subarray consisting of
every (2**j)th element. Stop if we find a duplicate. The cost so far
is O(n/k). Now, for each element in the subarray, use binary search to
find its extent (O(n/k) searches in subarrays of size O(k), for a total
of O((n/k) log k)).
It can be shown that all algorithms have a worst case of Omega((n/k) log
k), making this one optimal in the worst case up to constant factors.

what is convergence in k Means?

I have a very small question related to unsupervised learning because my teacher have not use this word in any lectures. I got this word while reading tutorials. Does this mean if values are same to initial values in last iteration of clusters then it is called converge? for example
| c1 | c2 | cluster
| (1,0) | (2,1)|
|-------|------|------------
A(1,0)| .. |.. |get smallest value
B(0,1)|.. |... |
c(2,1)|.. |... |
D(2,1)|.. |.... |
now after performing n-iteration and if values come same in both c1 and c2 that is (1,0) and (2,1) in last n-th iteration and taking avg if other than single , is it convergence?
Ideally, if the values in the last two consequent iterations are same then the algorithm is said to have converged. But often people use a less strict criteria for convergence, like, the difference in the values of last two iterations is less than a particular threshold etc,.
Incase of K-means clustering, the word convergence means the algorithm have successfully completed this clustering or grouping of data points in k number of clusters.The algorithm will make sure it has completely grouped the data points into correct clusters, if the centroids (k values) in k-means remains same place or in point for 2 iteration .

Stata: Observation-pairwise calculation

input X group
21 1
62 1
98 1
12 2
87 2
end
Now I try to calculate a measure as follows:
$$ \sum_{g} \left | X_{ig}-X_{jg} \right | $$
,where $i$ or $j$ ($i \neq j$) indexes an observation. g corresponds to the group variable (here, 1 and 2)
How to calculate this number using loops?
Looks like a Gini mean difference, apart from a scaling factor. There are numerous user-written commands already in this territory. There is (unusually) a summary within the Stata manual at [R] inequality.
In addition, this is related to the second L-moment. See the lmoments command from SSC.
You need not calculate this through a double loop over indexes. It collapses to a linear combination of the order statistics.
LATER: See David's 1998 paper which is open-access at
https://doi.org/10.1214/ss/1028905831

Find all possible row-wise sums in a 2D array

Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.

Resources