Related
So, I have a 2D array, int a[X][Y];
X can go up to 10 000 000 and Y is maximum 6.
Given an array int v[Z] (Z <= Y), I have to see if I find a line in a that contains all the elements from v.
What would be the fastest algorithm for this matter and how would you implement this?
I have already tried the classic method of taking line by line and then with the 2 fors search, one for v elements and one for a elements but it takes too long.
What would be the best (fastest) approach ?
int check()
{
int nrfound;
for (int l = 0; l < lines_counter; l++) for each line in a array
{
nrfound = 0;
for (int i = 0; i < n; i++) { // for each element in v array
for (int j = 0; j < m; j++) // for each element in a[l] line
if (v[i] == a[l][j])
nrfound++;
if (nrfound == Z)
return 0;
}
}
return 1;
}
I see three things to consider:
Using threads.
If it's possible, when constructing int a[X][Y] table I would create additional array int[6][Y] which will contain:
List of indexes which contain 1, 2, 3 .. 6 elements. This allows you to narrow the search.
For each X count Hash of it's values. Then count Hash of V values.
Compare Hash code, instead of each separate value.
For the case of reusing the same array a[] with multiple different v[]:
Sort every line of a[][] as preliminary step (executed once)
Sort v[]
Use single loop (instead of two) to get intersection of ordered v[] and every ordered line of a[] - with approach like merge procedure of merge sort
index_v = 0
index_a = 0
while index_v < length_v and index_a < length_a:
if v[index_v] == a[index_a]
index_v++, index_a++
else if v[index_v] < a[index_a]
index_v++
else
index_a++
if index_v == length_v:
return OK, a[] line contains all v elements
Sorting 1e7 arrays of size 6 can be easily parallelized using fixed sorting network with or without Simd/multithreading.
Sort v and compare that with same principle as merge sorting two sorted lists.
The overall worst case complexity is between 13e7..24e7 comparisons (sorting network for 6 elements requires 12 conditional swaps and merging v/a[n] requires 1..12 comparisons.
As you're working in C, it limits available data structures:
I would suggest :
Initialize N threads, divide matrix rows X in N buckets, and run searching for each bucket in parallel.
Depending on type of 2D input array : You can save some time with boundary conditions as you want all elements of query array maintain the order. You can also make use of (Z <= Y) Length of each line as to match if should first match the length.
Sorting the array will add complexity to it. So better to avoid it.
Your algorithm has a flaw if there are duplicate elements in the a[i][] subarrays. A matching element of v will be counted multiple times and the count may equal Z by coincidence.
Here is a corrected version:
int check(int X, int Y, int Z, int a[X][Y], int v[Z]) {
for (int x = 0; x < X; x++) {
// for each line in array a
int mask = 0;
for (int z = 0; z < Z; z++) {
// for each element in array v
for (int y = 0, m = 1; y < Y; y++, m <<= 1) {
// for each element in line a[x]
if (v[z] == a[x][y] && !(mask & m)) {
mask |= m;
break;
}
}
if (y == Y)
break;
}
if (z == Z)
return 0; // found a match
}
}
return 1; // no match
}
Unfortunately, the above code might be even slower than the posted one, but it is worth testing as the inner loop is exited as soon as a element from v is not found in a[x].
Given an array of non-negative integers, you are initially positioned at the first index of the array.
Each element in the array represents your maximum jump length at that position.
Your goal is to reach the last index in the minimum number of jumps.
For example:
Given array A = [2,3,1,1,4]
The minimum number of jumps to reach the last index is 2. (Jump 1 step from index 0 to 1, then 3 steps to the last index.)
I have built a dp[] array from left to right such that dp[i] indicates the minimum number of jumps needed to reach arr[i] from arr[0]. Finally, we return dp[n-1].
Worst case time complexity of my code is O(n^2).
Can this be done in a better time complexity.
This question is copied from leetcode.
int jump(vector<int>& a) {
int i,j,k,n,jumps,ladder,stairs;
n = a.size();
if(n==0 || n==1)return 0;
jumps = 1, ladder = stairs = a[0];
for(i = 1; i<n; i++){
if(i + a[i] > ladder)
ladder = i+a[i];
stairs --;
if(stairs + i >= n-1)
return jumps;
if(stairs == 0){
jumps++;
stairs = ladder - i;
}
}
return jumps;
}
You can use a range-minimum segment tree to solve this problem. A segment tree is a data structure which allows you to maintain an array of values and also query aggregate operations on subsegments of the array. More information can be found here: https://cses.fi/book/book.pdf (section 9.3)
You will store values d[i] in the segment tree, d[i] is the minimum number of steps needed to reach the last index if you start from index i. Clearly, d[n-1] = 0. In general:
d[i] = 1 + min(d[i+1], ..., d[min(n-1, i+a[i])]),
so you can find all the values in d by computing them backwards, updating the segment tree after each step. The final solution is d[0]. Since both updates and queries on segment trees work in O(log n), the whole algorithm works in O(n log n).
I think, you can boost computing the dynamic with these technique:
You spend O(N) for compute current d[i]. But you can keep a set with d[j],
where j = 0..i - 1. And now all you need to use binary search to find:
such d[j], that is minimum among all(0..i-1) and from j position i-pos is reachable.
It will be O(n * logn) solution
That is a simple excercise in dynamic programming. As you have tagged it already, I wonder why you're not trying to apply it.
Let V[k] be the minimum number of steps to get from position k to the end of the list a = (a[0], a[1], ...., a[n-1]).
Then obviously V[n-1]=0. Now loop backwards:
for(int k=n-2;k>=0;--k)
{
int minStep = n + 1;
for(int j=k+1;j<=std::min(n-1,k+a[k]);++j)
{
minStep = std::min(minStep, V[j])
}
V[k]= minStep + 1;
}
Demo in C++
After the loop, which takes O(a[0]+a[1]+...+a[n-1]) time, V[0] contains the minimum number of steps to reach the end of the list.
In order to find the way through the list, you can then choose the action greedily. That is, from position k you always go to an allowed position l where V[l] is minimal.
(Note that I've assumed positive entries of the list here, not non-negative ones. Possible zeros can easily be removed from the problem, as it is never optimal to go there.)
https://leetcode.com/problems/jump-game-ii
class Solution {
public int jump(int[] nums) {
int n = nums.length;
if(n < 2){
return 0;
}
int ans = 1;
int rightBoundaryCovered = nums[0];
for(int i=1;i<n;i++){
if(rightBoundaryCovered >= n-1){
return ans;
}
int currMax = i+ nums[i];
while(rightBoundaryCovered>=i){
currMax = Math.max(currMax, i+nums[i]);
i++;
}
//missed this decrement statement and faced multiple WA's
i--;
ans++;
if(currMax>rightBoundaryCovered){
rightBoundaryCovered = currMax;
}
}
return ans;
}
}
Java solution (From Elements of Programming Interviews):
public boolean canJump(int[] nums) {
int maximumReach = 0;
for(int i = 0; i < nums.length; i++) {
// Return false if you jump more.
if(i > maximumReach) { return false; }
// Logic is we need to keep checking every index the
// farthest we can travel
// Update the maxReach accordingly.
maximumReach = Math.max(i + nums[i], maximumReach);
}
return true;
}
For example: int A[] = {3,2,1,2,3,2,1,3,1,2,3};
How to sort this array efficiently?
This is for a job interview, I need just a pseudo-code.
The promising way how to sort it seems to be the counting sort. Worth to have a look at this lecture by Richard Buckland, especially the part from 15:20.
Analogically to the counting sort, but even better would be to create an array representing the domain, initialize all its elements to 0 and then iterate through your array and count these values. Once you know those counts of domain values, you can rewrite values of your array accordingly. Complexity of such an algorithm would be O(n).
Here's the C++ code with the behaviour as I described it. Its complexity is actually O(2n) though:
int A[] = {3,2,1,2,3,2,1,3,1,2,3};
int domain[4] = {0};
// count occurrences of domain values - O(n):
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
domain[A[i]]++;
// rewrite values of the array A accordingly - O(n):
for (int k = 0, i = 1; i < 4; ++i)
for (int j = 0; j < domain[i]; ++j)
A[k++] = i;
Note, that if there is big difference between domain values, storing domain as an array is inefficient. In that case it is much better idea to use map (thanks abhinav for pointing it out). Here's the C++ code that uses std::map for storing domain value - occurrences count pairs:
int A[] = {2000,10000,7,10000,10000,2000,10000,7,7,10000};
std::map<int, int> domain;
// count occurrences of domain values:
int size = sizeof(A) / sizeof(int);
for (int i = 0; i < size; ++i)
{
std::map<int, int>::iterator keyItr = domain.lower_bound(A[i]);
if (keyItr != domain.end() && !domain.key_comp()(A[i], keyItr->first))
keyItr->second++; // next occurrence
else
domain.insert(keyItr, std::pair<int,int>(A[i],1)); // first occurrence
}
// rewrite values of the array A accordingly:
int k = 0;
for (auto i = domain.begin(); i != domain.end(); ++i)
for (int j = 0; j < i->second; ++j)
A[k++] = i->first;
(if there is a way how to use std::map in above code more efficient, let me know)
Its a standard problem in computer science : Dutch national flag problem
See the link.
count each number and then create new array based on their counts...time complexity in O(n)
int counts[3] = {0,0,0};
for(int a in A)
counts[a-1]++;
for(int i = 0; i < counts[0]; i++)
A[i] = 1;
for(int i = counts[0]; i < counts[0] + counts[1]; i++)
A[i] = 2;
for(int i = counts[0] + counts[1]; i < counts[0] + counts[1] + counts[2]; i++)
A[i] = 3;
Problem description: You have n buckets, each bucket contain one coin , the value of the coin can be 5 or 10 or 20. you have to sort the buckets under this limitation: 1. you can use this 2 functions only: SwitchBaskets (Basket1, Basket2) – switch 2 baskets GetCoinValue (Basket1) – return Coin Value in selected basket 2. you cant define array of size n 3. use the switch function as little as possible.
My simple pseudo-code solution, which can be implemented in any language with O(n) complexity.
I will pick coin from basket
1) if it is 5 - push it to be the first,
2)if it is 20- push it to be the last,
3)If 10 - leave it where it is.
4) and look at the next bucket in line.
Edit: if you can't push elements to the first or last position then Merge sort would be ideally for piratical implementation. Here is how it will work:
Merge sort takes advantage of the ease of merging already sorted lists into a new sorted list. It starts by comparing every two elements (i.e., 1 with 2, then 3 with 4...) and swapping them if the first should come after the second. It then merges each of the resulting lists of two into lists of four, then merges those lists of four, and so on; until at last two lists are merged into the final sorted list. Of the algorithms described here, this is the first that scales well to very large lists, because its worst-case running time is O(n log n). Merge sort has seen a relatively recent surge in popularity for practical implementations, being used for the standard sort routine in the programming languages
I think the question is intending for you to use bucket sort. In cases where there are a small number of values bucket sort can be much faster than the more commonly used quicksort or mergesort.
As robert mentioned basketsort (or bucketsort) is the best in this situation.
I would also added next algorithm (it's actually very similar to busket sort):
[pseudocode is java-style]
Create a HashMap<Integer, Interger> map and cycle throught your array:
for (Integer i : array) {
Integer value = map.get(i);
if (value == null) {
map.put(i, 1);
} else {
map.put(i, value + 1);
}
}
I think I understasnd the question - you can use only O(1) space, and you can change the array only by swapping cells. (So you can use 2 operations on the array - swap and get)
My solution:
Use 2 index pointers - one for the position of the last 1, and one for the position of the last 2.
In stage i, you assume that the array is allready sorted from 1 to i-1,
than you check the i-th cell:
If A[i] == 3
you do nothing.
If A[i] == 2
you swap it with the cell after the last 2 index.
If A[i] == 1
you swap it with the cell after the last 2 index, and than swap the cell
after the last 2 index (that contains 1) with the cell after the last 1 index.
This is the main idea, you need to take care of the little details.
Overall O(n) complexity.
Here is the groovy solution, based on #ElYusubov but instead of pushing Bucket(5) to beginning & Bucket(15) to end. Use sifting so that 5's move toward beginning and 15 towards end.
Whenever we swap a bucket from end to current position, we decrement end, do not increment current counter as we need to check for the element again.
array = [15,5,10,5,10,10,15,5,15,10,5]
def swapBucket(int a, int b) {
if (a == b) return;
array[a] = array[a] + array[b]
array[b] = array[a] - array[b]
array[a] = array[a] - array[b]
}
def getBucketValue(int a) {
return array[a];
}
def start = 0, end = array.size() -1, counter = 0;
// we can probably do away with this start,end but it helps when already sorted.
// start - first bucket from left which is not 5
while (start < end) {
if (getBucketValue(start) != 5) break;
start++;
}
// end - first bucket from right whichis not 15
while (end > start) {
if (getBucketValue(end) != 15) break;
end--;
}
// already sorted when end = 1 { 1...size-1 are Buck(15) } or start = end-1
for (counter = start; counter < end;) {
def value = getBucketValue(counter)
if (value == 5) { swapBucket(start, counter); start++; counter++;}
else if (value == 15) { swapBucket(end, counter); end--; } // do not inc counter
else { counter++; }
}
for (key in array) { print " ${key} " }
This can be done very easily using-->
Dutch national Flag algorithm http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Flag/
instead of using 1,2,3 take it as 0,1,2
Have you tried to look at wiki for example? - http://en.wikipedia.org/wiki/Sorting_algorithm
This code is for c#:
However, you have to consider the algorithms to implement it in a non-language/framework specific way. As suggested Bucket set might be the efficient one to go with. If you provide detailed information on problem, i would try to look at best solution.
Good Luck...
Here is a code sample in C# .NET
int[] intArray = new int[9] {3,2,1,2,3,2,1,3,1 };
Array.Sort(intArray);
// write array
foreach (int i in intArray) Console.Write("{0}, ", i.ToString());
Just for fun, here's how you would implement "pushing values to the far edge", as ElYusubub suggested:
sort(array) {
a = 0
b = array.length
# a is the first item which isn't a 1
while array[a] == 1
a++
# b is the last item which isn't a 3
while array[b] == 3
b--
# go over all the items from the first non-1 to the last non-3
for (i = a; i <= b; i++)
# the while loop is because the swap could result in a 3 or a 1
while array[i] != 2
if array[i] == 1
swap(i, a)
while array[a] == 1
a++
else # array[i] == 3
swap(i, b)
while array[b] == 3
b--
This could actually be an optimal solution. I'm not sure.
Lets break the problem we have just two numbers in array . [1,2,1,2,2,2,1,1]
We can sort in one pass o(n) with minm swaps if;
We start two pointers from left and right until they meet each other.
Swapping left element with right if left element is bigger. (sort ascending)
We can do another pass, for three numbers (k-1 passes). In pass one we moved 1's to their final position and in pass 2 we moved 2's.
def start = 0, end = array.size() - 1;
// Pass 1, move lowest order element (1) to their final position
while (start < end) {
// first element from left which is not 1
for ( ; Array[start] == 1 && start < end ; start++);
// first element from right which IS 1
for ( ; Array[end] != 1 && start < end ; end--);
if (start < end) swap(start, end);
}
// In second pass we can do 10,15
// We can extend this using recurion, for sorting domain = k, we need k-1 recurions
def DNF(input,length):
high = length - 1
p = 0
i = 0
while i <= high:
if input[i] == 0:
input[i],input[p]=input[p],input[i]
p = p+1
i = i+1
elif input[i] == 2:
input[i],input[high]=input[high],input[i]
high = high-1
else:
i = i+1
input = [0,1,2,2,1,0]
print "input: ", input
DNF(input,len(input))
print "output: ", input
I would use a recursive approach over here
fun sortNums(smallestIndex,largestIndex,array,currentIndex){
if(currentIndex >= array.size)
return
if (array[currentIndex] == 1){
You have found the smallest element, now increase the smallestIndex
//You need to put this element to left side of the array at the smallestIndex position.
//You can simply swap(smallestIndex, currentIndex)
// The catch here is you should not swap it if it's already on the left side
//recursive call
sortNums(smallestIndex,largestIndex,array,currentIndex or currentIndex+1)// Now the task of incrementing current Index in recursive call depends on the element at currentIndex. if it's 3, then you might want to let the fate of currentIndex decided by recursive function else simply increment by 1 and move further
} else if (array[currentInde]==3){
// same logic but you need to add it at end
}
}
You can start the recursive function by sortNums(smallestIndex=-1,largestIndex=array.size,array,currentIndex=0)
You can find the sample code over here
Code Link
//Bubble sort for unsorted array - algorithm
public void bubleSort(int arr[], int n) { //n is the length of an array
int temp;
for(int i = 0; i <= n-2; i++){
for(int j = 0; j <= (n-2-i); j++){
if(arr[j] > arr[j +1]){
temp = arr[j];
arr[j] = arr[j +1];
arr[j + 1] = temp;
}
}
}
Given a snipplet of code, how will you determine the complexities in general. I find myself getting very confused with Big O questions. For example, a very simple question:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
}
}
The TA explained this with something like combinations. Like this is n choose 2 = (n(n-1))/2 = n^2 + 0.5, then remove the constant so it becomes n^2. I can put int test values and try but how does this combination thing come in?
What if theres an if statement? How is the complexity determined?
for (int i = 0; i < n; i++) {
if (i % 2 ==0) {
for (int j = i; j < n; j++) { ... }
} else {
for (int j = 0; j < i; j++) { ... }
}
}
Then what about recursion ...
int fib(int a, int b, int n) {
if (n == 3) {
return a + b;
} else {
return fib(b, a+b, n-1);
}
}
In general, there is no way to determine the complexity of a given function
Warning! Wall of text incoming!
1. There are very simple algorithms that no one knows whether they even halt or not.
There is no algorithm that can decide whether a given program halts or not, if given a certain input. Calculating the computational complexity is an even harder problem since not only do we need to prove that the algorithm halts but we need to prove how fast it does so.
//The Collatz conjecture states that the sequence generated by the following
// algorithm always reaches 1, for any initial positive integer. It has been
// an open problem for 70+ years now.
function col(n){
if (n == 1){
return 0;
}else if (n % 2 == 0){ //even
return 1 + col(n/2);
}else{ //odd
return 1 + col(3*n + 1);
}
}
2. Some algorithms have weird and off-beat complexities
A general "complexity determining scheme" would easily get too complicated because of these guys
//The Ackermann function. One of the first examples of a non-primitive-recursive algorithm.
function ack(m, n){
if(m == 0){
return n + 1;
}else if( n == 0 ){
return ack(m-1, 1);
}else{
return ack(m-1, ack(m, n-1));
}
}
function f(n){ return ack(n, n); }
//f(1) = 3
//f(2) = 7
//f(3) = 61
//f(4) takes longer then your wildest dreams to terminate.
3. Some functions are very simple but will confuse lots of kinds of static analysis attempts
//Mc'Carthy's 91 function. Try guessing what it does without
// running it or reading the Wikipedia page ;)
function f91(n){
if(n > 100){
return n - 10;
}else{
return f91(f91(n + 11));
}
}
That said, we still need a way to find the complexity of stuff, right? For loops are a simple and common pattern. Take your initial example:
for(i=0; i<N; i++){
for(j=0; j<i; j++){
print something
}
}
Since each print something is O(1), the time complexity of the algorithm will be determined by how many times we run that line. Well, as your TA mentioned, we do this by looking at the combinations in this case. The inner loop will run (N + (N-1) + ... + 1) times, for a total of (N+1)*N/2.
Since we disregard constants we get O(N2).
Now for the more tricky cases we can get more mathematical. Try to create a function whose value represents how long the algorithm takes to run, given the size N of the input. Often we can construct a recursive version of this function directly from the algorithm itself and so calculating the complexity becomes the problem of putting bounds on that function. We call this function a recurrence
For example:
function fib_like(n){
if(n <= 1){
return 17;
}else{
return 42 + fib_like(n-1) + fib_like(n-2);
}
}
it is easy to see that the running time, in terms of N, will be given by
T(N) = 1 if (N <= 1)
T(N) = T(N-1) + T(N-2) otherwise
Well, T(N) is just the good-old Fibonacci function. We can use induction to put some bounds on that.
For, example, Lets prove, by induction, that T(N) <= 2^n for all N (ie, T(N) is O(2^n))
base case: n = 0 or n = 1
T(0) = 1 <= 1 = 2^0
T(1) = 1 <= 2 = 2^1
inductive case (n > 1):
T(N) = T(n-1) + T(n-2)
aplying the inductive hypothesis in T(n-1) and T(n-2)...
T(N) <= 2^(n-1) + 2^(n-2)
so..
T(N) <= 2^(n-1) + 2^(n-1)
<= 2^n
(we can try doing something similar to prove the lower bound too)
In most cases, having a good guess on the final runtime of the function will allow you to easily solve recurrence problems with an induction proof. Of course, this requires you to be able to guess first - only lots of practice can help you here.
And as f final note, I would like to point out about the Master theorem, the only rule for more difficult recurrence problems I can think of now that is commonly used. Use it when you have to deal with a tricky divide and conquer algorithm.
Also, in your "if case" example, I would solve that by cheating and splitting it into two separate loops that don; t have an if inside.
for (int i = 0; i < n; i++) {
if (i % 2 ==0) {
for (int j = i; j < n; j++) { ... }
} else {
for (int j = 0; j < i; j++) { ... }
}
}
Has the same runtime as
for (int i = 0; i < n; i += 2) {
for (int j = i; j < n; j++) { ... }
}
for (int i = 1; i < n; i+=2) {
for (int j = 0; j < i; j++) { ... }
}
And each of the two parts can be easily seen to be O(N^2) for a total that is also O(N^2).
Note that I used a good trick trick to get rid of the "if" here. There is no general rule for doing so, as shown by the Collatz algorithm example
In general, deciding algorithm complexity is theoretically impossible.
However, one cool and code-centric method for doing it is to actually just think in terms of programs directly. Take your example:
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
}
}
Now we want to analyze its complexity, so let's add a simple counter that counts the number of executions of the inner line:
int counter = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
System.out.println("*");
counter++;
}
}
Because the System.out.println line doesn't really matter, let's remove it:
int counter = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
counter++;
}
}
Now that we have only the counter left, we can obviously simplify the inner loop out:
int counter = 0;
for (int i = 0; i < n; i++) {
counter += n;
}
... because we know that the increment is run exactly n times. And now we see that counter is incremented by n exactly n times, so we simplify this to:
int counter = 0;
counter += n * n;
And we emerged with the (correct) O(n2) complexity :) It's there in the code :)
Let's look how this works for a recursive Fibonacci calculator:
int fib(int n) {
if (n < 2) return 1;
return fib(n - 1) + fib(n - 2);
}
Change the routine so that it returns the number of iterations spent inside it instead of the actual Fibonacci numbers:
int fib_count(int n) {
if (n < 2) return 1;
return fib_count(n - 1) + fib_count(n - 2);
}
It's still Fibonacci! :) So we know now that the recursive Fibonacci calculator is of complexity O(F(n)) where F is the Fibonacci number itself.
Ok, let's look at something more interesting, say simple (and inefficient) mergesort:
void mergesort(Array a, int from, int to) {
if (from >= to - 1) return;
int m = (from + to) / 2;
/* Recursively sort halves */
mergesort(a, from, m);
mergesort(m, m, to);
/* Then merge */
Array b = new Array(to - from);
int i = from;
int j = m;
int ptr = 0;
while (i < m || j < to) {
if (i == m || a[j] < a[i]) {
b[ptr] = a[j++];
} else {
b[ptr] = a[i++];
}
ptr++;
}
for (i = from; i < to; i++)
a[i] = b[i - from];
}
Because we are not interested in the actual result but the complexity, we change the routine so that it actually returns the number of units of work carried out:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
/* Recursively sort halves */
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
/* Then merge */
Array b = new Array(to - from);
int i = from;
int j = m;
int ptr = 0;
while (i < m || j < to) {
if (i == m || a[j] < a[i]) {
b[ptr] = a[j++];
} else {
b[ptr] = a[i++];
}
ptr++;
count++;
}
for (i = from; i < to; i++) {
count++;
a[i] = b[i - from];
}
return count;
}
Then we remove those lines that do not actually impact the counts and simplify:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
/* Recursively sort halves */
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
/* Then merge */
count += to - from;
/* Copy the array */
count += to - from;
return count;
}
Still simplifying a bit:
int mergesort(Array a, int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
int count = 0;
count += mergesort(a, from, m);
count += mergesort(m, m, to);
count += (to - from) * 2;
return count;
}
We can now actually dispense with the array:
int mergesort(int from, int to) {
if (from >= to - 1) return 1;
int m = (from + to) / 2;
int count = 0;
count += mergesort(from, m);
count += mergesort(m, to);
count += (to - from) * 2;
return count;
}
We can now see that actually the absolute values of from and to do not matter any more, but only their distance, so we modify this to:
int mergesort(int d) {
if (d <= 1) return 1;
int count = 0;
count += mergesort(d / 2);
count += mergesort(d / 2);
count += d * 2;
return count;
}
And then we get to:
int mergesort(int d) {
if (d <= 1) return 1;
return 2 * mergesort(d / 2) + d * 2;
}
Here obviously d on the first call is the size of the array to be sorted, so you have the recurrence for the complexity M(x) (this is in plain sight on the second line :)
M(x) = 2(M(x/2) + x)
and this you need to solve in order to get to a closed form solution. This you do easiest by guessing the solution M(x) = x log x, and verify for the right side:
2 (x/2 log x/2 + x)
= x log x/2 + 2x
= x (log x - log 2 + 2)
= x (log x - C)
and verify it is asymptotically equivalent to the left side:
x log x - Cx
------------ = 1 - [Cx / (x log x)] = 1 - [C / log x] --> 1 - 0 = 1.
x log x
Even though this is an over generalization, I like to think of Big-O in terms of lists, where the length of the list is N items.
Thus, if you have a for-loop that iterates over everything in the list, it is O(N). In your code, you have one line that (in isolation all by itself) is 0(N).
for (int i = 0; i < n; i++) {
If you have a for loop nested inside another for loop, and you perform an operation on each item in the list that requires you to look at every item in the list, then you are doing an operation N times for each of N items, thus O(N^2). In your example above you do in fact, have another for loop nested inside your for loop. So you can think about it as if each for loop is 0(N), and then because they are nested, multiply them together for a total value of 0(N^2).
Conversely, if you are just doing a quick operation on a single item then that would be O(1). There is no 'list of length n' to go over, just a single one time operation.To put this in context, in your example above, the operation:
if (i % 2 ==0)
is 0(1). What is important isn't the 'if', but the fact that checking to see if a single item is equal to another item is a quick operation on a single item. Like before, the if statement is nested inside your external for loop. However, because it is 0(1), then you are multiplying everything by '1', and so there is no 'noticeable' affect in your final calculation for the run time of the entire function.
For logs, and dealing with more complex situations (like this business of counting up to j or i, and not just n again), I would point you towards a more elegant explanation here.
I like to use two things for Big-O notation: standard Big-O, which is worst case scenario, and average Big-O, which is what normally ends up happening. It also helps me to remember that Big-O notation is trying to approximate run-time as a function of N, the number of inputs.
The TA explained this with something like combinations. Like this is n choose 2 = (n(n-1))/2 = n^2 + 0.5, then remove the constant so it becomes n^2. I can put int test values and try but how does this combination thing come in?
As I said, normal big-O is worst case scenario. You can try to count the number of times that each line gets executed, but it is simpler to just look at the first example and say that there are two loops over the length of n, one embedded in the other, so it is n * n. If they were one after another, it'd be n + n, equaling 2n. Since its an approximation, you just say n or linear.
What if theres an if statement? How is the complexity determined?
This is where for me having average case and best case helps a lot for organizing my thoughts. In worst case, you ignore the if and say n^2. In average case, for your example, you have a loop over n, with another loop over part of n that happens half of the time. This gives you n * n/x/2 (the x is whatever fraction of n gets looped over in your embedded loops. This gives you n^2/(2x), so you'd get n^2 just the same. This is because its an approximation.
I know this isn't a complete answer to your question, but hopefully it sheds some kind of light on approximating complexities in code.
As has been said in the answers above mine, it is clearly not possible to determine this for all snippets of code; I just wanted to add the idea of using average case Big-O to the discussion.
For the first snippet, it's just n^2 because you perform n operations n times. If j was initialized to i, or went up to i, the explanation you posted would be more appropriate but as it stands it is not.
For the second snippet, you can easily see that half of the time the first one will be executed, and the second will be executed the other half of the time. Depending on what's in there (hopefully it's dependent on n), you can rewrite the equation as a recursive one.
The recursive equations (including the third snippet) can be written as such: the third one would appear as
T(n) = T(n-1) + 1
Which we can easily see is O(n).
Big-O is just an approximation, it doesn't say how long an algorithm takes to execute, it just says something about how much longer it takes when the size of its input grows.
So if the input is size N and the algorithm evaluates an expression of constant complexity: O(1) N times, the complexity of the algorithm is linear: O(N). If the expression has linear complexity, the algorithm has quadratic complexity: O(N*N).
Some expressions have exponential complexity: O(N^N) or logarithmic complexity: O(log N). For an algorithm with loops and recursion, multiply the complexities of each level of loop and/or recursion. In terms of complexity, looping and recursion are equivalent. An algorithm that has different complexities at different stages in the algorithm, choose the highest complexity and ignore the rest. And finally, all constant complexities are considered equivalent: O(5) is the same as O(1), O(5*N) is the same as O(N).
I recently came across a question somewhere:
Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?
What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?
Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.
Eg:
Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10
Input - Expected => 2
Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)
Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)
printf("Answer : %d\n",
array[0] ^
array[1] ^
array[2] ^
// continue typing...
array[999] ^
array[1000] ^
1 ^
2 ^
// continue typing...
999^
1000
);
Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.
And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)
Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.
Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.
Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.
for (int i = 1; i < 1001; i++)
{
array[i] = array[i] ^ array[i-1] ^ i;
}
printf("Answer : %d\n", array[1000]);
I will leave the exercise of figuring out why this works to you, with a simple hint :-):
a ^ a = 0
0 ^ a = a
A non destructive version of solution by Franci Penov.
This can be done by making use of the XOR operator.
Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index: 0, 1, 2, 3, 4
Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.
int i;
int dupe = 0;
for(i = 0; i < N; i++) {
dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.
The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.
Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)
This makes use of an additional variable so does not meet the requirements in the question completely.
Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.
To paraphrase Francis Penov's solution.
The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.
The solution is:
acc = 0
for i in array: acc = acc ^ i
Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.
acc = 0
for i in len(array): acc = acc ^ i ^ array[i]
Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)
But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.
It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).
Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.
One line solution in Python
arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2
Explanation on why it works is in #Matthieu M.'s answer.
If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.
Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:
sum = 0
for each element of the array:
sum += that element of the array
number_that_occurred_twice = sum - 500500
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
public static void main(String[] args) {
int start = 1;
int end = 10;
int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
System.out.println(findDuplicate(arr, start, end));
}
static int findDuplicate(int arr[], int start, int end) {
int sumAll = 0;
for(int i = start; i <= end; i++) {
sumAll += i;
}
System.out.println(sumAll);
int sumArrElem = 0;
for(int e : arr) {
sumArrElem += e;
}
System.out.println(sumArrElem);
return sumArrElem - sumAll;
}
No extra storage requirement (apart from loop variable).
int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
array[0] += array[i];
}
printf(
"Answer : %d\n",
( array[0] - (length * (length + 1)) / 2 )
);
Do arguments and callstacks count as auxiliary storage?
int sumRemaining(int* remaining, int count) {
if (!count) {
return 0;
}
return remaining[0] + sumRemaining(remaining + 1, count - 1);
}
printf("duplicate is %d", sumRemaining(array, 1001) - 500500);
Edit: tail call version
int sumRemaining(int* remaining, int count, int sumSoFar) {
if (!count) {
return sumSoFar;
}
return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
public int duplicateNumber(int[] A) {
int count = 0;
for(int k = 0; k < A.Length; k++)
count += A[k];
return count - (A.Length * (A.Length - 1) >> 1);
}
A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.
For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.
for i=0 to n-1
begin:
diff = a[i]-i;
dup = dup + diff;
end
// where dup is the duplicate element..
But by this method I won't be able to find out the index at which the duplicate element is present!
For that I need to traverse the array another time which is not desirable.
Improvement of Fraci's answer based on the property of XORing consecutive values:
int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
result = result ^ array[i];
}
Where:
// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
int modulo = x % 4;
if (modulo == 0)
return value;
else if (modulo == 1)
return 1;
else if (modulo == 2)
return i + 1;
else
return 0;
}
Or in pseudocode/math lang f(n) defined as (optimized):
if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0
And in canonical form f(n) is:
f(0) = 0
f(n) = f(n-1) xor n
My answer to question 2:
Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.
Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,
Thus:
SUM = mySum + x + y;
PROD = myProd* x*y;
Thus:
x*y = PROD/myProd; x+y = SUM - mySum;
We can find x,y if solve this equation.
In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!
In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.
private static int findDuplicated(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
int[] checker = new int[array.length];
Arrays.fill(checker, -1);
for (int i = 0; i < array.length; i++) {
int value = array[i];
int checked = checker[value];
if (checked == -1) {
checker[value] = value;
} else {
return value;
}
}
return -1;
}
private static int findDuplicatedWithoutAux(int[] array) {
if (array == null || array.length < 2) {
System.out.println("invalid");
return -1;
}
for (int i = 0; i < array.length; i++) {
int value = array[i];
for (int j = i + 1; j < array.length; j++) {
int toCompare = array[j];
if (value == toCompare) {
return array[i];
}
}
}
return -1;
}