Sum of distance between every pair of same element in an array - arrays

I have an array [a0,a1,...., an] I want to calculate the sum of the distance between every pair of the same element.
1)First element of array will always be zero.
2)Second element of array will be greater than zero.
3) No two consecutive elements can be same.
4) Size of array can be upto 10^5+1 and elements of array can be from 0 to 10^7
For example, if array is [0,2,5 ,0,5,7,0] then distance between first 0 and second 0 is 2*. distance between first 0 and third 0 is 5* and distance between second 0 and third 0 is 2*. distance between first 5 and second 5 is 1*. Hence sum of distances between same element is 2* + 5* + 2* + 1* = 10;
For this I tried to build a formula:- for every element having occurence more than 1 (0 based indexing and first element is always zero)--> sum = sum + (lastIndex - firstIndex - 1) * (NumberOfOccurence - 1)
if occurence of element is odd subtract -1 from sum else leave as it is. But this approach is not working in every case.
,,But this approach works if array is [0,5,7,0] or if array is [0,2,5,0,5,7,0,1,2,3,0]
Can you suggest another efficient approach or formula?
Edit :- This problem is not a part of any coding contest, it's just a little part of a bigger problem

My method requires space that scales with the number of possible values for elements, but has O(n) time complexity.
I've made no effort to check that the sum doesn't overflow an unsigned long, I just assume that it won't. Same for checking that any input values are in fact no more than max_val. These are details that would have to be addressed.
For each possible value, it keeps track of how much would be added to the sum if one of that element is encountered in total_distance. In instances_so_far, it keeps track of how many instances of a value have already been seen. This is how much would be added to total_distance each step. To make this more efficient, the last index at which a value was encountered is tracked, such that total_distance need only be added to when that particular value is encountered, instead of having nested loops that add every value at every step.
#include <stdio.h>
#include <stddef.h>
// const size_t max_val = 15;
const size_t max_val = 10000000;
unsigned long instances_so_far[max_val + 1] = {0};
unsigned long total_distance[max_val + 1] = {0};
unsigned long last_index_encountered[max_val + 1];
// void print_array(unsigned long *array, size_t len) {
// printf("{");
// for (size_t i = 0; i < len; ++i) {
// printf("%lu,", array[i]);
// }
// printf("}\n");
// }
unsigned long get_sum(unsigned long *array, size_t len) {
unsigned long sum = 0;
for (size_t i = 0; i < len; ++i) {
if (instances_so_far[array[i]] >= 1) {
total_distance[array[i]] += (i - last_index_encountered[array[i]]) * instances_so_far[array[i]] - 1;
}
sum += total_distance[array[i]];
instances_so_far[array[i]] += 1;
last_index_encountered[array[i]] = i;
// printf("inst ");
// print_array(instances_so_far, max_val + 1);
// printf("totd ");
// print_array(total_distance, max_val + 1);
// printf("encn ");
// print_array(last_index_encountered, max_val + 1);
// printf("sums %lu\n", sum);
// printf("\n");
}
return sum;
}
unsigned long test[] = {0,1,0,2,0,3,0,4,5,6,7,8,9,10,0};
int main(void) {
printf("%lu\n", get_sum(test, sizeof(test) / sizeof(test[0])));
return 0;
}
I've tested it with a few of the examples here, and gotten the answers I expected.
I had to use static storage for the arrays because they overflowed the stack if put there.
I've left in the commented-out code I used for debugging, it's helpful to understand what's going on, if you reduce max_val to a smaller number.
Please let me know if you find a counter-example that fails.

Here is Python 3 code for your problem. This works on all the examples given in your question and in the comments--I included the test code.
This works by looking at how each consecutive pair of repeated elements adds to the overall sum of distances. If the list has 6 elements, the pair distances are:
x x x x x x The repeated element's locations in the array
-- First, consecutive pairs
--
--
--
--
----- Now, pairs that have one element inside
-----
-----
-----
-------- Now, pairs that have two elements inside
--------
--------
----------- Now, pairs that have three elements inside
-----------
-------------- Now, pairs that have four elements inside
If we look down between each consecutive pair, we see that it adds to the overall sum of all pairs:
5 8 9 8 5
And if we look at the differences between those values we get
3 1 -1 -3
Now if we use my preferred definition of "distance" for a pairs, namely the difference of their indices, we can use those multiplicities for consecutive pairs to calculate the overall sum of distances for all pairs. But since your definition is not mine, we calculate the sum for my definition then adjust it for your definition.
This code makes one pass through the original array to get the occurrences for each element value in the array, then another pass through those distinct element values. (I used the pairwise routine to avoid another pass through the array.) That makes my algorithm O(n) in time complexity, where n is the length of the array. This is much better than the naive O(n^2). Since my code builds an array of the repeated elements, once per unique element value, this has space complexity of at worst O(n).
import collections
import itertools
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ..."""
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
def sum_distances_of_pairs(alist):
# Make a dictionary giving the indices for each element of the list.
element_ndxs = collections.defaultdict(list)
for ndx, element in enumerate(alist):
element_ndxs[element].append(ndx)
# Sum the distances of pairs for each element, using my def of distance
sum_of_all_pair_distances = 0
for element, ndx_list in element_ndxs.items():
# Filter out elements not occurring more than once and count the rest
if len(ndx_list) < 2:
continue
# Sum the distances of pairs for this element, using my def of distance
sum_of_pair_distances = 0
multiplicity = len(ndx_list) - 1
delta_multiplicity = multiplicity - 2
for ndx1, ndx2 in pairwise(ndx_list):
# Update the contribution of this consecutive pair to the sum
sum_of_pair_distances += multiplicity * (ndx2 - ndx1)
# Prepare for the next consecutive pair
multiplicity += delta_multiplicity
delta_multiplicity -= 2
# Adjust that sum of distances for the desired definition of distance
cnt_all_pairs = len(ndx_list) * (len(ndx_list) - 1) // 2
sum_of_pair_distances -= cnt_all_pairs
# Add that sum for this element into the overall sum
sum_of_all_pair_distances += sum_of_pair_distances
return sum_of_all_pair_distances
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0]) == 10
assert sum_distances_of_pairs([0, 5, 7, 0]) == 2
assert sum_distances_of_pairs([0, 2, 5, 0, 5, 7, 0, 1, 2, 3, 0]) == 34
assert sum_distances_of_pairs([0, 0, 0, 0, 1, 2, 0]) == 18
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 4, 5, 6, 7, 8, 9, 0, 10, 0]) == 66
assert sum_distances_of_pairs([0, 1, 0, 2, 0, 3, 0, 4, 5, 6, 7, 8, 9, 10, 0]) == 54

Related

Minimum of maximums of non-contiguous subsequences of size k

Before I start: I hope this question is not a duplicate. I've found a couple of similar ones, but none of them seems to describe exactly the same problem. But if it is a duplicate, I will be glad to see a solution (even if it is different from my algorithm.)
I have been trying to answer this question. After a couple of attempts I managed to implement an algorithm that seems to be correct (in C). I've prepared a couple of tests and they all pass.
Now, initially I thought that the task would be easier. Therefore, I would be certain of my solution, and would publish it right after I would see it works. But I'd rather not publish an answer that presents a solution that only seems to be correct. So, I wrote a "proof of correctness", or at least something that looks like that. (I don't remember if I have ever written any proof of correctness for a program, so I'm rather certain its quality can be improved.)
So, I have two questions:
Is the algorithm that I wrote correct?
Is the "proof" that I wrote correct?
Also, I'd love to know if you have any tips on how to improve both the algorithm and the "proof" beside correctness, and maybe even the implementation (though I know C, I can always make a mistake). If either the algorithm formulations, the proof, or the C code seems too complicated to read or check, please give me some tips, and I'll try to simplify them.
And please, don't hesitate to point out that I misunderstood the problem completely if that is the case. After all, it is most important to present the right solution for the author of the original question.
I'm going to wait some time for an answer here before I publish an answer to the original question. But eventually, if there won't be any, I think I will publish it anyway.
The problem
To quote the author of the original question:
Suppose I have an array, arr = [2, 3, 5, 9] and k = 2. I am supposed to find subsequences of length k such that no two elements in each subsequence are adjacent. Then find the maximums of those sequences. Finally, find the minimum of the maximums. For example, for arr, the valid subsequences are [2,5], [3,9], [2,9] with maximums 5, 9, and 9 respectively. The expected output would be the minimum of the maximums, which is 5.
My algorithm
I've made two assumptions not stated in the original question:
the elements of the input sequence are unique,
for the input subsequence length and k, 2 <= k <= (length(S) + 1) / 2.
They may look a bit arbitrary, but I think that they simplify the problem a bit. When it comes to the uniqueness, I think I could remove this assumption (so that the algorithm will suit for more use cases). But before, I need to know whether the current solution is correct.
Pseudocode, version 1
find_k_length_sequence_maxes_min (S, k)
if k < 2 or length(S) < 2 * k - 1
return NO_SUCH_MINIMUM
sorted = copy(S)
sort_ascending(sorted)
for t from 1 to length(S)
current_length = 0
index = find_index(S, sorted[t])
last_index = index
for u descending from index to 1
if u < last_index - 1 && S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
last_index = index
for u ascending from index to length(S)
if u > last_index + 1 and S[u] <= sorted[t]
current_length += 1
last_index = u
if current_length >= k
return sorted[t]
Pseudocode, version 2
(This is the same algorithm as in version 1, only written using more natural language.)
(1) Let S be a sequence of integers such that all of its elements are unique.
(2) Let a "non-contiguous subsequence of S" mean such a subsequence of S that any two elements of it are non-adjacent in S.
(3) Let k be an integer such that 2 <= k <= (length(S) + 1) / 2.
(4) Find the minimum of maximums of all the non-contiguous subsequences of S of length k.
(4.1) Find the minimal element of S such that it is the maximum of a non-contiguous subsequence of S of size k.
(4.1.1) Let sorted be a permutation of S such that its elements are sorted in ascending order.
(4.1.2) For every element e of sorted, check whether it is a maximum of a non-contiguous subsequence of S of length k. If it is, return it.
(4.1.2.1) Let x and y be integers such that 1 <= x <= index(minmax(k)) and index(minmax(k)) <= y <= length(S).
(4.1.2.2) Let all(x, y) be the set of all the non-contiguous subsequences of S between S[x] (including) and S[y] (including) such that e is the maximum of each of them.
(4.1.2.3) Check whether the length of the longest sequence of all(1, index(e)) is greater than or equal to k. If it is, return e.
(4.1.2.4) Check whether the sum of the lengths of the longest subsequence of all(1, index(e)) and the length of the longest subsequence of all(index(e), length(S)) is greater than or equal to k. If it is, return e.
Proof of correctness
(1) Glossary:
by "observation" I mean a statement not derived from any observation or conclusion, not demanding a proof,
by "conclusion" I mean a statement derived from at least one observation or conclusion, not demanding a proof,
by "theorem" I mean a statement not derived from any observation or conclusion, demanding a proof.
(2) Let S be a sequence of integers such that all of its elements are unique.
(3) Let a "non-contiguous subsequence of S" mean such a subsequence of S that any two elements of it are non-adjacent in S.
(4) Let k be an integer such that 2 <= k <= (length(S) + 1) / 2.
(5) Let minmax(k) be an element of S such that it is the minimum of maximums of all the non-contiguous subsequences of S of length k.
(6) (Theorem) minmax(k) is a minimal element of S such that it is a maximum of a non-contiguous subsequence of S of length k.
(7) In other words, there is no element in S less than minmax(k) that is a maximum of a non-contiguous subsequence of S of length k.
(8) (Proof of (6)) (Observation) Since minmax(k) is the minimum of maximums of all the non-contiguous sequences of S of length k, there is no non-contiguous subsequence of S of length k such that its maximum is greater than minmax(k).
(9) (Proof of (6)) (Conclusion) If (6), then any element of S less than minmax(k) cannot be a maximum of any non-contiguous subsequence of S of length k.
(10) (Proof of (6)) QED
(11) Let x and y be integers such that 1 <= x <= index(minmax(k)) and index(minmax(k)) <= y <= length(S).
(12) Let all(x, y) be the set of all the non-contiguous subsequences of S between S[x] (including) and S[y] (including) such that minmax(k) is the maximum of each of them.
(13) (Observation) minmax(k) is the maximum of the longest sequence of all(1, length(S)).
(14) This observation may seem too trivial to note. But, apparently it was easier for me to write the algorithm, and prove it, with the longest subsequence in mind, instead of a subsequence of length k. Therefore I think this observation is worth noting.
(15) (Theorem) One can produce the longest sequence of all(1, index(minmax(k))) by:
starting from minmax(k),
moving to S[1],
taking always the next element that is both less than or equal to minmax(k), and non-adjacent to the last taken one.
(16) (Proof of (15)) Let a "possible element" of S mean an element that is both less than or equal to minmax(k), and non-adjacent to the last taken one.
(16a) (Proof of (15)) Let C be the subsequence produced in (15).
(17) (Proof of (15)) (Observation)
Before the first taken element, there is exactly 0 possible elements,
between any two taken elements (excluding them), there is exactly 0 or 1 possible elements,
after the last taken element, there is exactly 0 or 1 possible elements.
(18) (Proof of (15)) Let D be a sequence of all(1, index(minmax(k))) such that length(D) > length(C).
(19) (Proof of (15)) At least one of the following conditions is fulfilled:
before the first taken element, there is less than 0 possible elements in D,
between two taken elements (excluding them) such that there is 1 possible elements between them in C, there is 0 possible elements in D,
after the last taken element, there is less than 1 possible element in D.
(20) (Proof of (15)) (Observation)
There cannot be less than 0 possible elements before the first taken element,
if there is less than 1 possible elements between two taken elements (excluding them) in D, where in C there is 1, it means that we have taken either an element greater than minmax(k), or an element adjacent to the last taken one, which contradicts (12),
if there is less than 1 possible element between the last taken element in D, where in C there is 1, it means that we have taken either an element greater than minmax(k), or an element adjacent to the last taken one, which contradicts (12).
(21) (Proof of (15)) QED
(22) (Observation) (15) applies also to all(index(minmax(k)), length(S)).
(23) (Observation) length(all(1, length(S))) = length(all(1, index(minmax(k)))) + length(all(index(minmax(k)), length(S))).
Implementation
All the tests pass if any of the assert calls does not abort the program.
#include <limits.h> // For INT_MAX
#include <assert.h> // For assert
#include <string.h> // For memcpy
#include <stdlib.h> // For qsort
int compar (const void * first, const void * second) {
if (* (int *)first < * (int *)second) return -1;
else if (* (int *)first == * (int *)second) return 0;
else return 1;
}
void find_k_size_sequence_maxes_min (int array_length, int array[], int k, int * result_min) {
if (k < 2 || array_length < 2 * k - 1) return;
int sorted[array_length];
memcpy(sorted, array, sizeof (int) * array_length);
qsort(sorted, array_length, sizeof (int), compar);
for (int t = 0; t < array_length; ++t) {
int index = -1;
while (array[++index] != sorted[t]);
int size = 1;
int last_index = index;
for (int u = index; u >= 0; --u) {
if (u < last_index - 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
last_index = index;
for (int u = index; u < array_length; ++u) {
if (u > last_index + 1 && array[u] <= sorted[t]) {
++size;
last_index = u;
}
if (size >= k) {
* result_min = sorted[t];
return;
}
}
}
}
int main (void) {
// Test case 1
int array1[] = { 6, 3, 5, 8, 1, 0, 9, 7, 4, 2, };
int array1_length = (int)((double)sizeof array1 / sizeof (int));
int k = 2;
int min = INT_MAX;
find_k_size_sequence_maxes_min(array1_length, array1, k, & min);
assert(min == 2);
// Test case 2
int array2[] = { 1, 7, 2, 3, 9, 11, 8, 14, };
int array2_length = (int)((double)sizeof array2 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 2);
// Test case 3
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 8);
// Test case 4
k = 4;
min = INT_MAX;
find_k_size_sequence_maxes_min(array2_length, array2, k, & min);
assert(min == 9);
// Test case 5
int array3[] = { 3, 5, 4, 0, 8, 2, };
int array3_length = (int)((double)sizeof array3 / sizeof (int));
k = 3;
min = INT_MAX;
find_k_size_sequence_maxes_min(array3_length, array3, k, & min);
assert(min == 3);
// Test case 6
int array4[] = { 18, 21, 20, 6 };
int array4_length = (int)((double)sizeof array4 / sizeof (int));
k = 2;
min = INT_MAX;
find_k_size_sequence_maxes_min(array4_length, array4, k, & min);
assert(min == 18);
// Test case 7
int array5_length = 1000000;
int array5[array5_length];
for (int m = array5_length - 1; m >= 0; --m) array5[m] = m;
k = 100;
min = INT_MAX;
find_k_size_sequence_maxes_min(array5_length, array5, k, & min);
assert(min == 198);
}
Edit: Thanks to #user3386109, the number of iterations on sorted may be reduced in some cases. There need to be at least k - 1 elements less than sorted[t] to form a subarray of size k or greater together with sorted[t]. Therefore, in the for loop, it should be int t = k - 1 instead of int t = 0.
Edit: Now that it passed a week, I published my solution as an answer in the original question: Minimum of maximums for k-size nonconsecutive subsequence of array If you will happen to have any further tips on how to improve it, you can share them either here, or in the original question (as comments to my answer).

Codility: MaxZeroProduct - complexity issues

My solution scored 100% correctness, but 0% Performance.
I just can't figure out how to minimize time complexity.
Problem:
Write a function:
int solution(int A[], int N);
that, given an array of N positive integers, returns the maximum number of trailing zeros of the number obtained by multiplying three different elements from the array. Numbers are considered different if they are at different positions in the array.
For example, given A = [7, 15, 6, 20, 5, 10], the function should return 3 (you can obtain three trailing zeros by taking the product of numbers 15, 20 and 10 or 20, 5 and 10).
For another example, given A = [25, 10, 25, 10, 32], the function should return 4 (you can obtain four trailing zeros by taking the product of numbers 25, 25 and 32).
Assume that:
N is an integer within the range [3..100,000];
each element of array A is an integer within the range [1..1,000,000,000].
Complexity:
expected worst-case time complexity is O(N*log(max(A)));
expected worst-case space complexity is O(N) (not counting the storage required for input arguments).
Solution:
the idea:
factorize each element into pairs of 5's and 2's
sum each 3 pairs into one pair - this costs O(N^3)
find the pair who's minimum coordinate value is the biggest
return that minimun coordinate value
the code:
int solution(int A[], int N) {
int fives = 0, twos = 0, max_zeros = 0;
int(*factors)[2] = calloc(N, sizeof(int[2])); //each item (x,y) represents the amount of 5's and 2's of the corresponding item in A
for (int i = 0; i< N; i++) {
factorize(A[i], &fives, &twos);
factors[i][0] = fives;
factors[i][1] = twos;
}
//O(N^3)
for (int i = 0; i<N; i++) {
for (int j = i + 1; j<N; j++) {
for (int k = j + 1; k<N; k++) {
int x = factors[i][0] + factors[j][0] + factors[k][0];
int y = factors[i][1] + factors[j][1] + factors[k][1];
max_zeros = max(max_zeros, min(x, y));
}
}
}
return max_zeros;
}
void factorize(int val, int* fives, int* twos) {
int tmp = val;
*fives = 0, *twos = 0;
if (val == 0) return;
while (val % 5 == 0) { //factors of 5
val /= 5;
(*fives)++;
}
while (val % 2 == 0) { //factors of 2
val /= 2;
(*twos)++;
}
}
I can't figure out how else i can iterate over the N-sized array in order to find the optimal 3 items in time O(N*log(max(A))).
Since 2^30 > 1e9 and 5^13 > 1e9, there's a limit of 30 * 13 = 390 different pairs of factors of 2 and 5 in the array, no matter how large the array. This is an upper bound (the actual number is 213).
Discard all but three representatives from the array for each pair, and then your O(N^3) algorithm is probably fast enough.
If it's still not fast enough, you can continue by applying dynamic programming, computing P[i,j], the largest product of factors of 2s and 5s of pairs of elements with index <=i of the form x * 2^y * 5^y+j (where x is divisible by neither 2 nor 5). This table can then be used in a second dynamic programming pass to find the product of three numbers with the most 0's.
In real world I don't like such meta-thinking, but still, we are faced some artificial problem with some artificial restrictions...
Since space complexity is O(N), we can't afford dynamic programming based on initial input. We can't even make a map of N*factors. Well, we can afford map of N*2, anyway, but that's mostly all we can.
Since time complexity is O(Nlog(max(A))), we can allow ourselves to factorize items and do some simple one-way reduction. Probably we can sort items with count sort - it's a bit more like Nlog^2(max(A)) for 2-index sorting, but big O will even it out.
If my spider sense is right, we should simply pick something out of this counts array and polish it with 1-run through array. Something like best count for 2, then best for 5, and then we can enumerate the rest of elements finding best overal product. It's just heuristic, but dimentions don't lie!
Just my 2 cents

Is it possible to invert an array with constant extra space?

Let's say I have an array A with n unique elements on the range [0, n). In other words, I have a permutation of the integers [0, n).
Is possible to transform A into B using O(1) extra space (AKA in-place) such that B[A[i]] = i?
For example:
A B
[3, 1, 0, 2, 4] -> [2, 1, 3, 0, 4]
Yes, it is possible, with O(n^2) time algorithm:
Take element at index 0, then write 0 to the cell indexed by that element. Then use just overwritten element to get next index and write previous index there. Continue until you go back to index 0. This is cycle leader algorithm.
Then do the same starting from index 1, 2, ... But before doing any changes perform cycle leader algorithm without any modifications starting from this index. If this cycle contains any index below the starting index, just skip it.
Or this O(n^3) time algorithm:
Take element at index 0, then write 0 to the cell indexed by that element. Then use just overwritten element to get next index and write previous index there. Continue until you go back to index 0.
Then do the same starting from index 1, 2, ... But before doing any changes perform cycle leader algorithm without any modifications starting from all preceding indexes. If current index is present in any preceding cycle, just skip it.
I have written (slightly optimized) implementation of O(n^2) algorithm in C++11 to determine how many additional accesses are needed for each element on average if random permutation is inverted. Here are the results:
size accesses
2^10 2.76172
2^12 4.77271
2^14 6.36212
2^16 7.10641
2^18 9.05811
2^20 10.3053
2^22 11.6851
2^24 12.6975
2^26 14.6125
2^28 16.0617
While size grows exponentially, number of element accesses grows almost linearly, so expected time complexity for random permutations is something like O(n log n).
Inverting an array A requires us to find a permutation B which fulfills the requirement A[B[i]] == i for all i.
To build the inverse in-place, we have to swap elements and indices by setting A[A[i]] = i for each element A[i]. Obviously, if we would simply iterate through A and perform aforementioned replacement, we might override upcoming elements in A and our computation would fail.
Therefore, we have to swap elements and indices along cycles of A by following c = A[c] until we reach our cycle's starting index c = i.
Every element of A belongs to one such cycle. Since we have no space to store whether or not an element A[i] has already been processed and needs to be skipped, we have to follow its cycle: If we reach an index c < i we would know that this element is part of a previously processed cycle.
This algorithm has a worst-case run-time complexity of O(n²), an average run-time complexity of O(n log n) and a best-case run-time complexity of O(n).
function invert(array) {
main:
for (var i = 0, length = array.length; i < length; ++i) {
// check if this cycle has already been traversed before:
for (var c = array[i]; c != i; c = array[c]) {
if (c <= i) continue main;
}
// Replacing each cycle element with its predecessors index:
var c_index = i,
c = array[i];
do {
var tmp = array[c];
array[c] = c_index; // replace
c_index = c; // move forward
c = tmp;
} while (i != c_index)
}
return array;
}
console.log(invert([3, 1, 0, 2, 4])); // [2, 1, 3, 0, 4]
Example for A = [1, 2, 3, 0] :
The first element 1 at index 0 belongs to the cycle of elements 1 - 2 - 3 - 0. Once we shift indices 0, 1, 2 and 3 along this cycle, we have completed the first step.
The next element 0 at index 1 belongs to the same cycle and our check tells us so in only one step (since it is a backwards step).
The same holds for the remaining elements 1 and 2.
In total, we perform 4 + 1 + 1 + 1 'operations'. This is the best-case scenario.
Implementation of this explanation in Python:
def inverse_permutation_zero_based(A):
"""
Swap elements and indices along cycles of A by following `c = A[c]` until we reach
our cycle's starting index `c = i`.
Every element of A belongs to one such cycle. Since we have no space to store
whether or not an element A[i] has already been processed and needs to be skipped,
we have to follow its cycle: If we reach an index c < i we would know that this
element is part of a previously processed cycle.
Time Complexity: O(n*n), Space Complexity: O(1)
"""
def cycle(i, A):
"""
Replacing each cycle element with its predecessors index
"""
c_index = i
c = A[i]
while True:
temp = A[c]
A[c] = c_index # replace
c_index = c # move forward
c = temp
if i == c_index:
break
for i in range(len(A)):
# check if this cycle has already been traversed before
j = A[i]
while j != i:
if j <= i:
break
j = A[j]
else:
cycle(i, A)
return A
>>> inverse_permutation_zero_based([3, 1, 0, 2, 4])
[2, 1, 3, 0, 4]
This can be done in O(n) time complexity and O(1) space if we try to store 2 numbers at a single position.
First, let's see how we can get 2 values from a single variable. Suppose we have a variable x and we want to get two values from it, 2 and 1. So,
x = n*1 + 2 , suppose n = 5 here.
x = 5*1 + 2 = 7
Now for 2, we can take remainder of x, ie, x%5. And for 1, we can take quotient of x, ie , x/5
and if we take n = 3
x = 3*1 + 2 = 5
x%3 = 5%3 = 2
x/3 = 5/3 = 1
We know here that the array contains values in range [0, n-1], so we can take the divisor as n, size of array. So, we will use the above concept to store 2 numbers at every index, one will represent old value and other will represent the new value.
A B
0 1 2 3 4 0 1 2 3 4
[3, 1, 0, 2, 4] -> [2, 1, 3, 0, 4]
.
a[0] = 3, that means, a[3] = 0 in our answer.
a[a[0]] = 2 //old
a[a[0]] = 0 //new
a[a[0]] = n* new + old = 5*0 + 2 = 2
a[a[i]] = n*i + a[a[i]]
And during array traversal, a[i] value can be greater than n because we are modifying it. So we will use a[i]%n to get the old value.
So the logic should be
a[a[i]%n] = n*i + a[a[i]%n]
Array -> 13 6 15 2 24
Now, to get the older values, take the remainder on dividing each value by n, and to get the new values, just divide each value by n, in this case, n=5.
Array -> 2 1 3 0 4
Following approach Optimizes the cycle walk if it is already handled. Also each element is 1 based. Need to convert accordingly while trying to access the elements in the given array.
enter code here
#include <stdio.h>
#include <iostream>
#include <vector>
#include <bits/stdc++.h>
using namespace std;
// helper function to traverse cycles
void cycle(int i, vector<int>& A) {
int cur_index = i+1, next_index = A[i];
while (next_index > 0) {
int temp = A[next_index-1];
A[next_index-1] = -(cur_index);
cur_index = next_index;
next_index = temp;
if (i+1 == abs(cur_index)) {
break;
}
}
}
void inverse_permutation(vector<int>& A) {
for (int i = 0; i < A.size(); i++) {
cycle(i, A);
}
for (int i = 0; i < A.size(); i++) {
A[i] = abs(A[i]);
}
for (int i = 0; i < A.size(); i++) {
cout<<A[i]<<" ";
}
}
int main(){
// vector<int> perm = {4,0,3,1,2,5,6,7,8};
vector<int> perm = {5,1,4,2,3,6,7,9,8};
//vector<int> perm = { 17,2,15,19,3,7,12,4,18,20,5,14,13,6,11,10,1,9,8,16};
// vector<int> perm = {4, 1, 2, 3};
// { 6,17,9,23,2,10,20,7,11,5,14,13,4,1,25,22,8,24,21,18,19,12,15,16,3 } =
// { 14,5,25,13,10,1,8,17,3,6,9,22,12,11,23,24,2,20,21,7,19,16,4,18,15 }
// vector<int> perm = {6, 17, 9, 23, 2, 10, 20, 7, 11, 5, 14, 13, 4, 1, 25, 22, 8, 24, 21, 18, 19, 12, 15, 16, 3};
inverse_permutation(perm);
return 0;
}

How to tell if an array is a permutation in O(n)?

Input: A read-only array of N elements containing integer values from 1 to N (some integer values can appear more than once!). And a memory zone of a fixed size (10, 100, 1000 etc - not depending on N).
How to tell in O(n) if the array represents a permutation?
--What I achieved so far (an answer proved that this was not good):--
I use the limited memory area to store the sum and the product of the array.
I compare the sum with N*(N+1)/2 and the product with N!
I know that if condition (2) is true I might have a permutation. I'm wondering if there's a way to prove that condition (2) is sufficient to tell if I have a permutation. So far I haven't figured this out ...
I'm very slightly skeptical that there is a solution. Your problem seems to be very close to one posed several years ago in the mathematical literature, with a summary given here ("The Duplicate Detection Problem", S. Kamal Abdali, 2003) that uses cycle-detection -- the idea being the following:
If there is a duplicate, there exists a number j between 1 and N such that the following would lead to an infinite loop:
x := j;
do
{
x := a[x];
}
while (x != j);
because a permutation consists of one or more subsets S of distinct elements s0, s1, ... sk-1 where sj = a[sj-1] for all j between 1 and k-1, and s0 = a[sk-1], so all elements are involved in cycles -- one of the duplicates would not be part of such a subset.
e.g. if the array = [2, 1, 4, 6, 8, 7, 9, 3, 8]
then the element in bold at position 5 is a duplicate because all the other elements form cycles: { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3}. Whereas the arrays [2, 1, 4, 6, 5, 7, 9, 3, 8] and [2, 1, 4, 6, 3, 7, 9, 5, 8] are valid permutations (with cycles { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 3, 5 } and { 2 -> 1, 4 -> 6 -> 7 -> 9 -> 8 -> 5 -> 3 } respectively).
Abdali goes into a way of finding duplicates. Basically the following algorithm (using Floyd's cycle-finding algorithm) works if you happen across one of the duplicates in question:
function is_duplicate(a, N, j)
{
/* assume we've already scanned the array to make sure all elements
are integers between 1 and N */
x1 := j;
x2 := j;
do
{
x1 := a[x1];
x2 := a[x2];
x2 := a[x2];
} while (x1 != x2);
/* stops when it finds a cycle; x2 has gone around it twice,
x1 has gone around it once.
If j is part of that cycle, both will be equal to j. */
return (x1 != j);
}
The difficulty is I'm not sure your problem as stated matches the one in his paper, and I'm also not sure if the method he describes runs in O(N) or uses a fixed amount of space. A potential counterexample is the following array:
[3, 4, 5, 6, 7, 8, 9, 10, ... N-10, N-9, N-8, N-7, N-2, N-5, N-5, N-3, N-5, N-1, N, 1, 2]
which is basically the identity permutation shifted by 2, with the elements [N-6, N-4, and N-2] replaced by [N-2, N-5, N-5]. This has the correct sum (not the correct product, but I reject taking the product as a possible detection method since the space requirements for computing N! with arbitrary precision arithmetic are O(N) which violates the spirit of the "fixed memory space" requirement), and if you try to find cycles, you will get cycles { 3 -> 5 -> 7 -> 9 -> ... N-7 -> N-5 -> N-1 } and { 4 -> 6 -> 8 -> ... N-10 -> N-8 -> N-2 -> N -> 2}. The problem is that there could be up to N cycles, (identity permutation has N cycles) each taking up to O(N) to find a duplicate, and you have to keep track somehow of which cycles have been traced and which have not. I'm skeptical that it is possible to do this in a fixed amount of space. But maybe it is.
This is a heavy enough problem that it's worth asking on mathoverflow.net (despite the fact that most of the time mathoverflow.net is cited on stackoverflow it's for problems which are too easy)
edit: I did ask on mathoverflow, there's some interesting discussion there.
This is impossible to do in O(1) space, at least with a single-scan algorithm.
Proof
Suppose you have processed N/2 of the N elements. Assuming the sequence is a permutation then, given the state of the algorithm, you should be able to figure out the set of N/2 remaining elements. If you can't figure out the remaining elements, then the algorithm can be fooled by repeating some of the old elements.
There are N choose N/2 possible remaining sets. Each of them must be represented by a distinct internal state of the algorithm, because otherwise you couldn't figure out the remaining elements. However, it takes logarithmic space to store X states, so it takes BigTheta(log(N choose N/2)) space to store N choose N/2 states. That values grows with N, and therefore the algorithm's internal state can not fit in O(1) space.
More Formal Proof
You want to create a program P which, given the final N/2 elements and the internal state of the linear-time-constant-space algorithm after it has processed N/2 elements, determines if the entire sequence is a permutation of 1..N. There is no time or space bound on this secondary program.
Assuming P exists we can create a program Q, taking only the internal state of the linear-time-constant-space algorithm, which determines the necessary final N/2 elements of the sequence (if it was a permutation). Q works by passing P every possible final N/2 elements and returning the set for which P returns true.
However, because Q has N choose N/2 possible outputs, it must have at least N choose N/2 possible inputs. That means the internal state of the original algorithm must store at least N choose N/2 states, requiring BigTheta(log N choose N/2), which is greater than constant size.
Therefore the original algorithm, which does have time and space bounds, also can't work correctly if it has constant-size internal state.
[I think this idea can be generalized, but thinking isn't proving.]
Consequences
BigTheta(log(N choose N/2)) is equal to BigTheta(N). Therefore just using a boolean array and ticking values as you encounter them is (probably) space-optimal, and time-optimal too since it takes linear time.
I doubt you would be able to prove that ;)
(1, 2, 4, 4, 4, 5, 7, 9, 9)
I think that more generally, this problem isn't solvable by processing the numbers in order. Suppose you are processing the elements in order and you are halfway the array. Now the state of your program has to somehow reflect which numbers you've encountered so far. This requires at least O(n) bits to store.
This isn't going to work due to the complexity being given as a function of N rather than M, implying that N >> M
This was my shot at it, but for a bloom filter to be useful, you need a big M, at which point you may as well use simple bit toggling for something like integers
http://en.wikipedia.org/wiki/Bloom_filter
For each element in the array
Run the k hash functions
Check for inclusion in the bloom filter
If it is there, there is a probability you've seen the element before
If it isn't, add it
When you are done, you may as well compare it to the results of a 1..N array in order, as that'll only cost you another N.
Now if I haven't put enough caveats in. It isn't 100%, or even close since you specified complexity in N, which implies that N >> M, so fundamentally it won't work as you have specified it.
BTW, the false positive rate for an individual item should be
e = 2^(-m/(n*sqrt(2)))
Which monkeying around with will give you an idea how big M would need to be to be acceptable.
I don't know how to do it in O(N), or even if it can be done in O(N). I know that it can be done in O(N log N) if you (use an appropriate) sort and compare.
That being said, there are many O(N) techniques that can be done to show that one is NOT a permutation of the other.
Check the length. If unequal, obviously not a permutation.
Create an XOR fingerprint. If the value of all the elements XOR'ed together does not match, then it can not be a permutation. A match would however be inconclusive.
Find the sum of all elements. Although the result may overflow, that should not be a worry when matching this 'fingerprint'. If however, you did a checksum that involved multiplying then overflow would be an issue.
Hope this helps.
You might be able to do this in randomized O(n) time and constant space by computing sum(x_i) and product(x_i) modulo a bunch of different randomly chosen constants C of size O(n). This basically gets you around the problem that product(x_i) gets too large.
There's still a lot of open questions, though, like if sum(x_i)=N(N+1)/2 and product(x_i)=N! are sufficient conditions to guarantee a permutation, and what is the chance that a non-permutation generates a false positive (I would hope ~1/C for each C you try, but maybe not).
it's a permutation if and only if there are no duplicate values in the array, should be easy to check that in O(N)
Depending on how much space you have, relative to N, you might try using hashing and buckets.
That is, iterate over the entire list, hash each element, and store it in a bucket. You'll need to find a way to reduce bucket collisions from the hashes, but that is a solved problem.
If an element tries to go into a bucket with an item identical to it, it is a permutation.
This type of solution would be O(N) as you touch each element only once.
However, the problem with this is whether space M is larger than N or not. If M > N, this solution will be fine, but if M < N, then you will not be able to solve the problem with 100% accuracy.
First, an information theoretic reason why this may be possible. We can trivially check that the numbers in the array are in bounds in O(N) time and O(1) space. To specify any such array of in-bounds numbers requires N log N bits of information. But to specify a permutation requires approximately (N log N) - N bits of information (Stirling's approximation). Thus, if we could acquire N bits of information during testing, we might be able to know the answer. This is trivial to do in N time (in fact, with M static space we can pretty easily acquire log M information per step, and under special circumstances we can acquire log N information).
On the other hand, we only get to store something like M log N bits of information in our static storage space, which is presumably much less than N, so it depends greatly what the shape of the decision surface is between "permutation" and "not".
I think that this is almost possible but not quite given the problem setup. I think one is "supposed" to use the cycling trick (as in the link that Iulian mentioned), but the key assumption of having a tail in hand fails here because you can index the last element of the array with a permutation.
The sum and the product will not guarantee the correct answer, since these hashes are subject to collisions, i.e. different inputs might potentially produce identical results. If you want a perfect hash, a single-number result that actually fully describes the numerical composition of the array, it might be the following.
Imagine that for any number i in [1, N] range you can produce a unique prime number P(i) (for example, P(i) is the i-th prime number). Now all you need to do is calculate the product of all P(i) for all numbers in your array. The product will fully and unambiguously describe the composition of your array, disregarding the ordering of values in it. All you need to do is to precalculate the "perfect" value (for a permutation) and compare it with the result for a given input :)
Of course, the algorithm like this does not immediately satisfy the posted requirements. But at the same time it is intuitively too generic: it allows you to detect a permutation of absolutely any numerical combination in an array. In your case you need to detect a permutation of a specific combination 1, 2, ..., N. Maybe this can somehow be used to simplify things... Probably not.
Alright, this is different, but it appears to work!
I ran this test program (C#):
static void Main(string[] args) {
for (int j = 3; j < 100; j++) {
int x = 0;
for (int i = 1; i <= j; i++) {
x ^= i;
}
Console.WriteLine("j: " + j + "\tx: " + x + "\tj%4: " + (j % 4));
}
}
Short explanation: x is the result of all the XORs for a single list, i is the element in a particular list, and j is the size of the list. Since all I'm doing is XOR, the order of the elements don't matter. But I'm looking at what correct permutations look like when this is applied.
If you look at j%4, you can do a switch on that value and get something like this:
bool IsPermutation = false;
switch (j % 4) {
case 0:
IsPermutation = (x == j);
break;
case 1:
IsPermutation = (x == 1);
break;
case 2:
IsPermutation = (x == j + 1);
break;
case 3:
IsPermutation = (x == 0);
break;
}
Now I acknowledge that this probably requires some fine tuning. It's not 100%, but it's a good easy way to get started. Maybe with some small checks running throughout the XOR loop, this could be perfected. Try starting somewhere around there.
it looks like asking to find duplicate in array with stack machine.
it sounds impossible to know the full history of the stack , while you extract each number and have limited knowledge of the numbers that were taken out.
Here's proof it can't be done:
Suppose by some artifice you have detected no duplicates in all but the last cell. Then the problem reduces to checking if that last cell contains a duplicate.
If you have no structured representation of the problem state so far, then you are reduced to performing a linear search over the entire previous input, for EACH cell. It's easy to see how this leaves you with a quadratic-time algorithm.
Now, suppose through some clever data structure that you actually know which number you expect to see last. Then certainly that knowledge takes at least enough bits to store the number you seek -- perhaps one memory cell? But there is a second-to-last number and a second-to-last sub-problem: then you must similarly represent a set of two possible numbers yet-to-be-seen. This certainly requires more storage than encoding only for one remaining number. By a progression of similar arguments, the size of the state must grow with the size of the problem, unless you're willing to accept a quadratic-time worst-case.
This is the time-space trade-off. You can have quadratic time and constant space, or linear time and linear space. You cannot have linear time and constant space.
Check out the following solution. It uses O(1) additional space.
It alters the array during the checking process, but returns it back to its initial state at the end.
The idea is:
Check if any of the elements is out of the range [1, n] => O(n).
Go over the numbers in order (all of them are now assured to be in the range [1, n]), and for each number x (e.g. 3):
go to the x'th cell (e.g. a[3]), if it's negative, then someone already visited it before you => Not permutation. Otherwise (a[3] is positive), multiply it by -1.
=> O(n).
Go over the array and negate all negative numbers.
This way, we know for sure that all elements are in the range [1, n], and that there are no duplicates => The array is a permutation.
int is_permutation_linear(int a[], int n) {
int i, is_permutation = 1;
// Step 1.
for (i = 0; i < n; ++i) {
if (a[i] < 1 || a[i] > n) {
return 0;
}
}
// Step 2.
for (i = 0; i < n; ++i) {
if (a[abs(a[i]) - 1] < 0) {
is_permutation = 0;
break;
}
a[i] *= -1;
}
// Step 3.
for (i = 0; i < n; ++i) {
if (a[i] < 0) {
a[i] *= -1;
}
}
return is_permutation;
}
Here is the complete program that tests it:
/*
* is_permutation_linear.c
*
* Created on: Dec 27, 2011
* Author: Anis
*/
#include <stdio.h>
int abs(int x) {
return x >= 0 ? x : -x;
}
int is_permutation_linear(int a[], int n) {
int i, is_permutation = 1;
for (i = 0; i < n; ++i) {
if (a[i] < 1 || a[i] > n) {
return 0;
}
}
for (i = 0; i < n; ++i) {
if (a[abs(a[i]) - 1] < 0) {
is_permutation = 0;
break;
}
a[abs(a[i]) - 1] *= -1;
}
for (i = 0; i < n; ++i) {
if (a[i] < 0) {
a[i] *= -1;
}
}
return is_permutation;
}
void print_array(int a[], int n) {
int i;
for (i = 0; i < n; i++) {
printf("%2d ", a[i]);
}
}
int main() {
int arrays[9][8] = { { 1, 2, 3, 4, 5, 6, 7, 8 },
{ 8, 6, 7, 2, 5, 4, 1, 3 },
{ 0, 1, 2, 3, 4, 5, 6, 7 },
{ 1, 1, 2, 3, 4, 5, 6, 7 },
{ 8, 7, 6, 5, 4, 3, 2, 1 },
{ 3, 5, 1, 6, 8, 4, 7, 2 },
{ 8, 3, 2, 1, 4, 5, 6, 7 },
{ 1, 1, 1, 1, 1, 1, 1, 1 },
{ 1, 8, 4, 2, 1, 3, 5, 6 } };
int i;
for (i = 0; i < 9; i++) {
printf("array: ");
print_array(arrays[i], 8);
printf("is %spermutation.\n",
is_permutation_linear(arrays[i], 8) ? "" : "not ");
printf("after: ");
print_array(arrays[i], 8);
printf("\n\n");
}
return 0;
}
And its output:
array: 1 2 3 4 5 6 7 8 is permutation.
after: 1 2 3 4 5 6 7 8
array: 8 6 7 2 5 4 1 3 is permutation.
after: 8 6 7 2 5 4 1 3
array: 0 1 2 3 4 5 6 7 is not permutation.
after: 0 1 2 3 4 5 6 7
array: 1 1 2 3 4 5 6 7 is not permutation.
after: 1 1 2 3 4 5 6 7
array: 8 7 6 5 4 3 2 1 is permutation.
after: 8 7 6 5 4 3 2 1
array: 3 5 1 6 8 4 7 2 is permutation.
after: 3 5 1 6 8 4 7 2
array: 8 3 2 1 4 5 6 7 is permutation.
after: 8 3 2 1 4 5 6 7
array: 1 1 1 1 1 1 1 1 is not permutation.
after: 1 1 1 1 1 1 1 1
array: 1 8 4 2 1 3 5 6 is not permutation.
after: 1 8 4 2 1 3 5 6
Java solution below answers question partly. Time complexity I believe is O(n). (This belief based on the fact that solution doesn't contains nested loops.) About memory -- not sure. Question appears first on relevant requests in google, so it probably can be useful for somebody.
public static boolean isPermutation(int[] array) {
boolean result = true;
array = removeDuplicates(array);
int startValue = 1;
for (int i = 0; i < array.length; i++) {
if (startValue + i != array[i]){
return false;
}
}
return result;
}
public static int[] removeDuplicates(int[] input){
Arrays.sort(input);
List<Integer> result = new ArrayList<Integer>();
int current = input[0];
boolean found = false;
for (int i = 0; i < input.length; i++) {
if (current == input[i] && !found) {
found = true;
} else if (current != input[i]) {
result.add(current);
current = input[i];
found = false;
}
}
result.add(current);
int[] array = new int[result.size()];
for (int i = 0; i < array.length ; i ++){
array[i] = result.get(i);
}
return array;
}
public static void main (String ... args){
int[] input = new int[] { 4,2,3,4,1};
System.out.println(isPermutation(input));
//output true
input = new int[] { 4,2,4,1};
System.out.println(isPermutation(input));
//output false
}
int solution(int A[], int N) {
int i,j,count=0, d=0, temp=0,max;
for(i=0;i<N-1;i++) {
for(j=0;j<N-i-1;j++) {
if(A[j]>A[j+1]) {
temp = A[j+1];
A[j+1] = A[j];
A[j] = temp;
}
}
}
max = A[N-1];
for(i=N-1;i>=0;i--) {
if(A[i]==max) {
count++;
}
else {
d++;
}
max = max-1;
}
if(d!=0) {
return 0;
}
else {
return 1;
}
}

Algorithm to determine if array contains n...n+m?

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions:
Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false.
The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out.
Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...)
edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this?
The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?
Under the assumption numbers less than one are not allowed and there are no duplicates, there is a simple summation identity for this - the sum of numbers from 1 to m in increments of 1 is (m * (m + 1)) / 2. You can then sum the array and use this identity.
You can find out if there is a dupe under the above guarantees, plus the guarantee no number is above m or less than n (which can be checked in O(N))
The idea in pseudo-code:
0) Start at N = 0
1) Take the N-th element in the list.
2) If it is not in the right place if the list had been sorted, check where it should be.
3) If the place where it should be already has the same number, you have a dupe - RETURN TRUE
4) Otherwise, swap the numbers (to put the first number in the right place).
5) With the number you just swapped with, is it in the right place?
6) If no, go back to step two.
7) Otherwise, start at step one with N = N + 1. If this would be past the end of the list, you have no dupes.
And, yes, that runs in O(N) although it may look like O(N ^ 2)
Note to everyone (stuff collected from comments)
This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).
Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)
New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):
So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.
In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.
I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:
[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6]
[2, 2, 3, 3, 4, 7, 7]
I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:
[1, 3, 3, 4, 4, 5, 8, 8]
Simpler approach (modifying hazzen's idea):
An integer array of length m contains all the numbers from n to n+m-1 exactly once iff
every array element is between n and n+m-1
there are no duplicates
(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)
If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):
For all array indexes i from 0 to m-1 do
If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
While j is not equal to i
If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
Swap list[i] with list[j]
Recalculate j = array[i] - n
RETURN TRUE
I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.
By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.
We take this observation for granted and try to check if the array contains [0,m).
Find the first node that's not in its correct position, e.g.
0 1 2 3 7 5 6 8 4 ; the original dataset (after the renaming we discussed)
^
`---this is position 4 and the 7 shouldn't be here
Swap that number into where it should be. i.e. swap the 7 with the 8:
0 1 2 3 8 5 6 7 4 ;
| `--------- 7 is in the right place.
`--------------- this is now the 'current' position
Now we repeat this. Looking again at our current position we ask:
"is this the correct number for here?"
If not, we swap it into its correct place.
If it is in the right place, we move right and do this again.
Following this rule again, we get:
0 1 2 3 4 5 6 7 8 ; 4 and 8 were just swapped
This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).
If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
Simpler method:
Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.
Step 2, iterate through the list, keep track of the lowest and highest items.
Step 3, Does (highest - lowest) equal m ? If so, return true.
Any one-pass algorithm requires Omega(n) bits of storage.
Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs
A A' - yes
B A' - no
since it cannot distinguish the first case from the second.
Q.E.D.
Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.
EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.
If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.
Here's a working solution in O(n)
This is using the pseudocode suggested by Hazzen plus some of my own ideas. It works for negative numbers as well and doesn't require any sum-of-the-squares stuff.
function testArray($nums, $n, $m) {
// check the sum. PHP offers this array_sum() method, but it's
// trivial to write your own. O(n) here.
if (array_sum($nums) != ($m * ($m + 2 * $n - 1) / 2)) {
return false; // checksum failed.
}
for ($i = 0; $i < $m; ++$i) {
// check if the number is in the proper range
if ($nums[$i] < $n || $nums[$i] >= $n + $m) {
return false; // value out of range.
}
while (($shouldBe = $nums[$i] - $n) != $i) {
if ($nums[$shouldBe] == $nums[$i]) {
return false; // duplicate
}
$temp = $nums[$i];
$nums[$i] = $nums[$shouldBe];
$nums[$shouldBe] = $temp;
}
}
return true; // huzzah!
}
var_dump(testArray(array(1, 2, 3, 4, 5), 1, 5)); // true
var_dump(testArray(array(5, 4, 3, 2, 1), 1, 5)); // true
var_dump(testArray(array(6, 4, 3, 2, 0), 1, 5)); // false - out of range
var_dump(testArray(array(5, 5, 3, 2, 1), 1, 5)); // false - checksum fail
var_dump(testArray(array(5, 4, 3, 2, 5), 1, 5)); // false - dupe
var_dump(testArray(array(-2, -1, 0, 1, 2), -2, 5)); // true
Awhile back I heard about a very clever sorting algorithm from someone who worked for the phone company. They had to sort a massive number of phone numbers. After going through a bunch of different sort strategies, they finally hit on a very elegant solution: they just created a bit array and treated the offset into the bit array as the phone number. They then swept through their database with a single pass, changing the bit for each number to 1. After that, they swept through the bit array once, spitting out the phone numbers for entries that had the bit set high.
Along those lines, I believe that you can use the data in the array itself as a meta data structure to look for duplicates. Worst case, you could have a separate array, but I'm pretty sure you can use the input array if you don't mind a bit of swapping.
I'm going to leave out the n parameter for time being, b/c that just confuses things - adding in an index offset is pretty easy to do.
Consider:
for i = 0 to m
if (a[a[i]]==a[i]) return false; // we have a duplicate
while (a[a[i]] > a[i]) swapArrayIndexes(a[i], i)
sum = sum + a[i]
next
if sum = (n+m-1)*m return true else return false
This isn't O(n) - probably closer to O(n Log n) - but it does provide for constant space and may provide a different vector of attack for the problem.
If we want O(n), then using an array of bytes and some bit operations will provide the duplication check with an extra n/32 bytes of memory used (assuming 32 bit ints, of course).
EDIT: The above algorithm could be improved further by adding the sum check to the inside of the loop, and check for:
if sum > (n+m-1)*m return false
that way it will fail fast.
Assuming you know only the length of the array and you are allowed to modify the array it can be done in O(1) space and O(n) time.
The process has two straightforward steps.
1. "modulo sort" the array. [5,3,2,4] => [4,5,2,3] (O(2n))
2. Check that each value's neighbor is one higher than itself (modulo) (O(n))
All told you need at most 3 passes through the array.
The modulo sort is the 'tricky' part, but the objective is simple. Take each value in the array and store it at its own address (modulo length). This requires one pass through the array, looping over each location 'evicting' its value by swapping it to its correct location and moving in the value at its destination. If you ever move in a value which is congruent to the value you just evicted, you have a duplicate and can exit early.
Worst case, it's O(2n).
The check is a single pass through the array examining each value with it's next highest neighbor. Always O(n).
Combined algorithm is O(n)+O(2n) = O(3n) = O(n)
Pseudocode from my solution:
foreach(values[])
while(values[i] not congruent to i)
to-be-evicted = values[i]
evict(values[i]) // swap to its 'proper' location
if(values[i]%length == to-be-evicted%length)
return false; // a 'duplicate' arrived when we evicted that number
end while
end foreach
foreach(values[])
if((values[i]+1)%length != values[i+1]%length)
return false
end foreach
I've included the java code proof of concept below, it's not pretty, but it passes all the unit tests I made for it. I call these a 'StraightArray' because they correspond to the poker hand of a straight (contiguous sequence ignoring suit).
public class StraightArray {
static int evict(int[] a, int i) {
int t = a[i];
a[i] = a[t%a.length];
a[t%a.length] = t;
return t;
}
static boolean isStraight(int[] values) {
for(int i = 0; i < values.length; i++) {
while(values[i]%values.length != i) {
int evicted = evict(values, i);
if(evicted%values.length == values[i]%values.length) {
return false;
}
}
}
for(int i = 0; i < values.length-1; i++) {
int n = (values[i]%values.length)+1;
int m = values[(i+1)]%values.length;
if(n != m) {
return false;
}
}
return true;
}
}
Hazzen's algorithm implementation in C
#include<stdio.h>
#define swapxor(a,i,j) a[i]^=a[j];a[j]^=a[i];a[i]^=a[j];
int check_ntom(int a[], int n, int m) {
int i = 0, j = 0;
for(i = 0; i < m; i++) {
if(a[i] < n || a[i] >= n+m) return 0; //invalid entry
j = a[i] - n;
while(j != i) {
if(a[i]==a[j]) return -1; //bucket already occupied. Dupe.
swapxor(a, i, j); //faster bitwise swap
j = a[i] - n;
if(a[i]>=n+m) return 0; //[NEW] invalid entry
}
}
return 200; //OK
}
int main() {
int n=5, m=5;
int a[] = {6, 5, 7, 9, 8};
int r = check_ntom(a, n, m);
printf("%d", r);
return 0;
}
Edit: change made to the code to eliminate illegal memory access.
boolean determineContinuousArray(int *arr, int len)
{
// Suppose the array is like below:
//int arr[10] = {7,11,14,9,8,100,12,5,13,6};
//int len = sizeof(arr)/sizeof(int);
int n = arr[0];
int *result = new int[len];
for(int i=0; i< len; i++)
result[i] = -1;
for (int i=0; i < len; i++)
{
int cur = arr[i];
int hold ;
if ( arr[i] < n){
n = arr[i];
}
while(true){
if ( cur - n >= len){
cout << "array index out of range: meaning this is not a valid array" << endl;
return false;
}
else if ( result[cur - n] != cur){
hold = result[cur - n];
result[cur - n] = cur;
if (hold == -1) break;
cur = hold;
}else{
cout << "found duplicate number " << cur << endl;
return false;
}
}
}
cout << "this is a valid array" << endl;
for(int j=0 ; j< len; j++)
cout << result[j] << "," ;
cout << endl;
return true;
}
def test(a, n, m):
seen = [False] * m
for x in a:
if x < n or x >= n+m:
return False
if seen[x-n]:
return False
seen[x-n] = True
return False not in seen
print test([2, 3, 1], 1, 3)
print test([1, 3, 1], 1, 3)
print test([1, 2, 4], 1, 3)
Note that this only makes one pass through the first array, not considering the linear search involved in not in. :)
I also could have used a python set, but I opted for the straightforward solution where the performance characteristics of set need not be considered.
Update: Smashery pointed out that I had misparsed "constant amount of memory" and this solution doesn't actually solve the problem.
If you want to know the sum of the numbers [n ... n + m - 1] just use this equation.
var sum = m * (m + 2 * n - 1) / 2;
That works for any number, positive or negative, even if n is a decimal.
Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.
O(1) indicates constant space which does not change by the number of n. It does not matter if it is 1 or 2 variables as long as it is a constant number. Why are you saying it is more than O(1) space? If you are calculating the sum of n numbers by accumulating it in a temporary variable, you would be using exactly 1 variable anyway.
Commenting in an answer because the system does not allow me to write comments yet.
Update (in reply to comments): in this answer i meant O(1) space wherever "space" or "time" was omitted. The quoted text is a part of an earlier answer to which this is a reply to.
Given this -
Write a method that takes an int array of size m ...
I suppose it is fair to conclude there is an upper limit for m, equal to the value of the largest int (2^32 being typical). In other words, even though m is not specified as an int, the fact that the array can't have duplicates implies there can't be more than the number of values you can form out of 32 bits, which in turn implies m is limited to be an int also.
If such a conclusion is acceptable, then I propose to use a fixed space of (2^33 + 2) * 4 bytes = 34,359,738,376 bytes = 34.4GB to handle all possible cases. (Not counting the space required by the input array and its loop).
Of course, for optimization, I would first take m into account, and allocate only the actual amount needed, (2m+2) * 4 bytes.
If this is acceptable for the O(1) space constraint - for the stated problem - then let me proceed to an algorithmic proposal... :)
Assumptions: array of m ints, positive or negative, none greater than what 4 bytes can hold. Duplicates are handled. First value can be any valid int. Restrict m as above.
First, create an int array of length 2m-1, ary, and provide three int variables: left, diff, and right. Notice that makes 2m+2...
Second, take the first value from the input array and copy it to position m-1 in the new array. Initialize the three variables.
set ary[m-1] - nthVal // n=0
set left = diff = right = 0
Third, loop through the remaining values in the input array and do the following for each iteration:
set diff = nthVal - ary[m-1]
if (diff > m-1 + right || diff < 1-m + left) return false // out of bounds
if (ary[m-1+diff] != null) return false // duplicate
set ary[m-1+diff] = nthVal
if (diff>left) left = diff // constrains left bound further right
if (diff<right) right = diff // constrains right bound further left
I decided to put this in code, and it worked.
Here is a working sample using C#:
public class Program
{
static bool puzzle(int[] inAry)
{
var m = inAry.Count();
var outAry = new int?[2 * m - 1];
int diff = 0;
int left = 0;
int right = 0;
outAry[m - 1] = inAry[0];
for (var i = 1; i < m; i += 1)
{
diff = inAry[i] - inAry[0];
if (diff > m - 1 + right || diff < 1 - m + left) return false;
if (outAry[m - 1 + diff] != null) return false;
outAry[m - 1 + diff] = inAry[i];
if (diff > left) left = diff;
if (diff < right) right = diff;
}
return true;
}
static void Main(string[] args)
{
var inAry = new int[3]{ 2, 3, 4 };
Console.WriteLine(puzzle(inAry));
inAry = new int[13] { -3, 5, -1, -2, 9, 8, 2, 3, 0, 6, 4, 7, 1 };
Console.WriteLine(puzzle(inAry));
inAry = new int[3] { 21, 31, 41 };
Console.WriteLine(puzzle(inAry));
Console.ReadLine();
}
}
note: this comment is based on the original text of the question (it has been corrected since)
If the question is posed exactly as written above (and it is not just a typo) and for array of size n the function should return (True/False) if the array consists of the numbers 1...n+1,
... then the answer will always be false because the array with all the numbers 1...n+1 will be of size n+1 and not n. hence the question can be answered in O(1). :)
Counter-example for XOR algorithm.
(can't post it as a comment)
#popopome
For a = {0, 2, 7, 5,} it return true (means that a is a permutation of the range [0, 4) ), but it must return false in this case (a is obviously is not a permutaton of [0, 4) ).
Another counter example: {0, 0, 1, 3, 5, 6, 6} -- all values are in range but there are duplicates.
I could incorrectly implement popopome's idea (or tests), therefore here is the code:
bool isperm_popopome(int m; int a[m], int m, int n)
{
/** O(m) in time (single pass), O(1) in space,
no restrictions on n,
no overflow,
a[] may be readonly
*/
int even_xor = 0;
int odd_xor = 0;
for (int i = 0; i < m; ++i)
{
if (a[i] % 2 == 0) // is even
even_xor ^= a[i];
else
odd_xor ^= a[i];
const int b = i + n;
if (b % 2 == 0) // is even
even_xor ^= b;
else
odd_xor ^= b;
}
return (even_xor == 0) && (odd_xor == 0);
}
A C version of b3's pseudo-code
(to avoid misinterpretation of the pseudo-code)
Counter example: {1, 1, 2, 4, 6, 7, 7}.
int pow_minus_one(int power)
{
return (power % 2 == 0) ? 1 : -1;
}
int ceil_half(int n)
{
return n / 2 + (n % 2);
}
bool isperm_b3_3(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
doesn't use n
possible overflow in sum
a[] may be readonly
*/
int altsum = 0;
int mina = INT_MAX;
int maxa = INT_MIN;
for (int i = 0; i < m; ++i)
{
const int v = a[i] - n + 1; // [n, n+m-1] -> [1, m] to deal with n=0
if (mina > v)
mina = v;
if (maxa < v)
maxa = v;
altsum += pow_minus_one(v) * v;
}
return ((maxa-mina == m-1)
and ((pow_minus_one(mina + m-1) * ceil_half(mina + m-1)
- pow_minus_one(mina-1) * ceil_half(mina-1)) == altsum));
}
In Python:
def ispermutation(iterable, m, n):
"""Whether iterable and the range [n, n+m) have the same elements.
pre-condition: there are no duplicates in the iterable
"""
for i, elem in enumerate(iterable):
if not n <= elem < n+m:
return False
return i == m-1
print(ispermutation([1, 42], 2, 1) == False)
print(ispermutation(range(10), 10, 0) == True)
print(ispermutation((2, 1, 3), 3, 1) == True)
print(ispermutation((2, 1, 3), 3, 0) == False)
print(ispermutation((2, 1, 3), 4, 1) == False)
print(ispermutation((2, 1, 3), 2, 1) == False)
It is O(m) in time and O(1) in space. It does not take into account duplicates.
Alternate solution:
def ispermutation(iterable, m, n):
"""Same as above.
pre-condition: assert(len(list(iterable)) == m)
"""
return all(n <= elem < n+m for elem in iterable)
MY CURRENT BEST OPTION
def uniqueSet( array )
check_index = 0;
check_value = 0;
min = array[0];
array.each_with_index{ |value,index|
check_index = check_index ^ ( 1 << index );
check_value = check_value ^ ( 1 << value );
min = value if value < min
}
check_index = check_index << min;
return check_index == check_value;
end
O(n) and Space O(1)
I wrote a script to brute force combinations that could fail that and it didn't find any.
If you have an array which contravenes this function do tell. :)
#J.F. Sebastian
Its not a true hashing algorithm. Technically, its a highly efficient packed boolean array of "seen" values.
ci = 0, cv = 0
[5,4,3]{
i = 0
v = 5
1 << 0 == 000001
1 << 5 == 100000
0 ^ 000001 = 000001
0 ^ 100000 = 100000
i = 1
v = 4
1 << 1 == 000010
1 << 4 == 010000
000001 ^ 000010 = 000011
100000 ^ 010000 = 110000
i = 2
v = 3
1 << 2 == 000100
1 << 3 == 001000
000011 ^ 000100 = 000111
110000 ^ 001000 = 111000
}
min = 3
000111 << 3 == 111000
111000 === 111000
The point of this being mostly that in order to "fake" most the problem cases one uses duplicates to do so. In this system, XOR penalises you for using the same value twice and assumes you instead did it 0 times.
The caveats here being of course:
both input array length and maximum array value is limited by the maximum value for $x in ( 1 << $x > 0 )
ultimate effectiveness depends on how your underlying system implements the abilities to:
shift 1 bit n places right.
xor 2 registers. ( where 'registers' may, depending on implementation, span several registers )
edit
Noted, above statements seem confusing. Assuming a perfect machine, where an "integer" is a register with Infinite precision, which can still perform a ^ b in O(1) time.
But failing these assumptions, one has to start asking the algorithmic complexity of simple math.
How complex is 1 == 1 ?, surely that should be O(1) every time right?.
What about 2^32 == 2^32 .
O(1)? 2^33 == 2^33? Now you've got a question of register size and the underlying implementation.
Fortunately XOR and == can be done in parallel, so if one assumes infinite precision and a machine designed to cope with infinite precision, it is safe to assume XOR and == take constant time regardless of their value ( because its infinite width, it will have infinite 0 padding. Obviously this doesn't exist. But also, changing 000000 to 000100 is not increasing memory usage.
Yet on some machines , ( 1 << 32 ) << 1 will consume more memory, but how much is uncertain.
A C version of Kent Fredric's Ruby solution
(to facilitate testing)
Counter-example (for C version): {8, 33, 27, 30, 9, 2, 35, 7, 26, 32, 2, 23, 0, 13, 1, 6, 31, 3, 28, 4, 5, 18, 12, 2, 9, 14, 17, 21, 19, 22, 15, 20, 24, 11, 10, 16, 25}. Here n=0, m=35. This sequence misses 34 and has two 2.
It is an O(m) in time and O(1) in space solution.
Out-of-range values are easily detected in O(n) in time and O(1) in space, therefore tests are concentrated on in-range (means all values are in the valid range [n, n+m)) sequences. Otherwise {1, 34} is a counter example (for C version, sizeof(int)==4, standard binary representation of numbers).
The main difference between C and Ruby version:
<< operator will rotate values in C due to a finite sizeof(int),
but in Ruby numbers will grow to accomodate the result e.g.,
Ruby: 1 << 100 # -> 1267650600228229401496703205376
C: int n = 100; 1 << n // -> 16
In Ruby: check_index ^= 1 << i; is equivalent to check_index.setbit(i). The same effect could be implemented in C++: vector<bool> v(m); v[i] = true;
bool isperm_fredric(int m; int a[m], int m, int n)
{
/**
O(m) in time (single pass), O(1) in space,
no restriction on n,
?overflow?
a[] may be readonly
*/
int check_index = 0;
int check_value = 0;
int min = a[0];
for (int i = 0; i < m; ++i) {
check_index ^= 1 << i;
check_value ^= 1 << (a[i] - n); //
if (a[i] < min)
min = a[i];
}
check_index <<= min - n; // min and n may differ e.g.,
// {1, 1}: min=1, but n may be 0.
return check_index == check_value;
}
Values of the above function were tested against the following code:
bool *seen_isperm_trusted = NULL;
bool isperm_trusted(int m; int a[m], int m, int n)
{
/** O(m) in time, O(m) in space */
for (int i = 0; i < m; ++i) // could be memset(s_i_t, 0, m*sizeof(*s_i_t));
seen_isperm_trusted[i] = false;
for (int i = 0; i < m; ++i) {
if (a[i] < n or a[i] >= n + m)
return false; // out of range
if (seen_isperm_trusted[a[i]-n])
return false; // duplicates
else
seen_isperm_trusted[a[i]-n] = true;
}
return true; // a[] is a permutation of the range: [n, n+m)
}
Input arrays are generated with:
void backtrack(int m; int a[m], int m, int nitems)
{
/** generate all permutations with repetition for the range [0, m) */
if (nitems == m) {
(void)test_array(a, nitems, 0); // {0, 0}, {0, 1}, {1, 0}, {1, 1}
}
else for (int i = 0; i < m; ++i) {
a[nitems] = i;
backtrack(a, m, nitems + 1);
}
}
The Answer from "nickf" dows not work if the array is unsorted
var_dump(testArray(array(5, 3, 1, 2, 4), 1, 5)); //gives "duplicates" !!!!
Also your formula to compute sum([n...n+m-1]) looks incorrect....
the correct formula is (m(m+1)/2 - n(n-1)/2)
An array contains N numbers, and you want to determine whether two of the
numbers sum to a given number K. For instance, if the input is 8,4, 1,6 and K is 10,
the answer is yes (4 and 6). A number may be used twice. Do the following.
a. Give an O(N2) algorithm to solve this problem.
b. Give an O(N log N) algorithm to solve this problem. (Hint: Sort the items first.
After doing so, you can solve the problem in linear time.)
c. Code both solutions and compare the running times of your algorithms.
4.
Product of m consecutive numbers is divisible by m! [ m factorial ]
so in one pass you can compute the product of the m numbers, also compute m! and see if the product modulo m ! is zero at the end of the pass
I might be missing something but this is what comes to my mind ...
something like this in python
my_list1 = [9,5,8,7,6]
my_list2 = [3,5,4,7]
def consecutive(my_list):
count = 0
prod = fact = 1
for num in my_list:
prod *= num
count +=1
fact *= count
if not prod % fact:
return 1
else:
return 0
print consecutive(my_list1)
print consecutive(my_list2)
HotPotato ~$ python m_consecutive.py
1
0
I propose the following:
Choose a finite set of prime numbers P_1,P_2,...,P_K, and compute the occurrences of the elements in the input sequence (minus the minimum) modulo each P_i. The pattern of a valid sequence is known.
For example for a sequence of 17 elements, modulo 2 we must have the profile: [9 8], modulo 3: [6 6 5], modulo 5: [4 4 3 3 3], etc.
Combining the test using several bases we obtain a more and more precise probabilistic test. Since the entries are bounded by the integer size, there exists a finite base providing an exact test. This is similar to probabilistic pseudo primality tests.
S_i is an int array of size P_i, initially filled with 0, i=1..K
M is the length of the input sequence
Mn = INT_MAX
Mx = INT_MIN
for x in the input sequence:
for i in 1..K: S_i[x % P_i]++ // count occurrences mod Pi
Mn = min(Mn,x) // update min
Mx = max(Mx,x) // and max
if Mx-Mn != M-1: return False // Check bounds
for i in 1..K:
// Check profile mod P_i
Q = M / P_i
R = M % P_i
Check S_i[(Mn+j) % P_i] is Q+1 for j=0..R-1 and Q for j=R..P_i-1
if this test fails, return False
return True
Any contiguous array [ n, n+1, ..., n+m-1 ] can be mapped on to a 'base' interval [ 0, 1, ..., m ] using the modulo operator. For each i in the interval, there is exactly one i%m in the base interval and vice versa.
Any contiguous array also has a 'span' m (maximum - minimum + 1) equal to it's size.
Using these facts, you can create an "encountered" boolean array of same size containing all falses initially, and while visiting the input array, put their related "encountered" elements to true.
This algorithm is O(n) in space, O(n) in time, and checks for duplicates.
def contiguous( values )
#initialization
encountered = Array.new( values.size, false )
min, max = nil, nil
visited = 0
values.each do |v|
index = v % encountered.size
if( encountered[ index ] )
return "duplicates";
end
encountered[ index ] = true
min = v if min == nil or v < min
max = v if max == nil or v > max
visited += 1
end
if ( max - min + 1 != values.size ) or visited != values.size
return "hole"
else
return "contiguous"
end
end
tests = [
[ false, [ 2,4,5,6 ] ],
[ false, [ 10,11,13,14 ] ] ,
[ true , [ 20,21,22,23 ] ] ,
[ true , [ 19,20,21,22,23 ] ] ,
[ true , [ 20,21,22,23,24 ] ] ,
[ false, [ 20,21,22,23,24+5 ] ] ,
[ false, [ 2,2,3,4,5 ] ]
]
tests.each do |t|
result = contiguous( t[1] )
if( t[0] != ( result == "contiguous" ) )
puts "Failed Test : " + t[1].to_s + " returned " + result
end
end
I like Greg Hewgill's idea of Radix sorting. To find duplicates, you can sort in O(N) time given the constraints on the values in this array.
For an in-place O(1) space O(N) time that restores the original ordering of the list, you don't have to do an actual swap on that number; you can just mark it with a flag:
//Java: assumes all numbers in arr > 1
boolean checkArrayConsecutiveRange(int[] arr) {
// find min/max
int min = arr[0]; int max = arr[0]
for (int i=1; i<arr.length; i++) {
min = (arr[i] < min ? arr[i] : min);
max = (arr[i] > max ? arr[i] : max);
}
if (max-min != arr.length) return false;
// flag and check
boolean ret = true;
for (int i=0; i<arr.length; i++) {
int targetI = Math.abs(arr[i])-min;
if (arr[targetI] < 0) {
ret = false;
break;
}
arr[targetI] = -arr[targetI];
}
for (int i=0; i<arr.length; i++) {
arr[i] = Math.abs(arr[i]);
}
return ret;
}
Storing the flags inside the given array is kind of cheating, and doesn't play well with parallelization. I'm still trying to think of a way to do it without touching the array in O(N) time and O(log N) space. Checking against the sum and against the sum of least squares (arr[i] - arr.length/2.0)^2 feels like it might work. The one defining characteristic we know about a 0...m array with no duplicates is that it's uniformly distributed; we should just check that.
Now if only I could prove it.
I'd like to note that the solution above involving factorial takes O(N) space to store the factorial itself. N! > 2^N, which takes N bytes to store.
Oops! I got caught up in a duplicate question and did not see the already identical solutions here. And I thought I'd finally done something original! Here is a historical archive of when I was slightly more pleased:
Well, I have no certainty if this algorithm satisfies all conditions. In fact, I haven't even validated that it works beyond a couple test cases I have tried. Even if my algorithm does have problems, hopefully my approach sparks some solutions.
This algorithm, to my knowledge, works in constant memory and scans the array three times. Perhaps an added bonus is that it works for the full range of integers, if that wasn't part of the original problem.
I am not much of a pseudo-code person, and I really think the code might simply make more sense than words. Here is an implementation I wrote in PHP. Take heed of the comments.
function is_permutation($ints) {
/* Gather some meta-data. These scans can
be done simultaneously */
$lowest = min($ints);
$length = count($ints);
$max_index = $length - 1;
$sort_run_count = 0;
/* I do not have any proof that running this sort twice
will always completely sort the array (of course only
intentionally happening if the array is a permutation) */
while ($sort_run_count < 2) {
for ($i = 0; $i < $length; ++$i) {
$dest_index = $ints[$i] - $lowest;
if ($i == $dest_index) {
continue;
}
if ($dest_index > $max_index) {
return false;
}
if ($ints[$i] == $ints[$dest_index]) {
return false;
}
$temp = $ints[$dest_index];
$ints[$dest_index] = $ints[$i];
$ints[$i] = $temp;
}
++$sort_run_count;
}
return true;
}
So there is an algorithm that takes O(n^2) that does not require modifying the input array and takes constant space.
First, assume that you know n and m. This is a linear operation, so it does not add any additional complexity. Next, assume there exists one element equal to n and one element equal to n+m-1 and all the rest are in [n, n+m). Given that, we can reduce the problem to having an array with elements in [0, m).
Now, since we know that the elements are bounded by the size of the array, we can treat each element as a node with a single link to another element; in other words, the array describes a directed graph. In this directed graph, if there are no duplicate elements, every node belongs to a cycle, that is, a node is reachable from itself in m or less steps. If there is a duplicate element, then there exists one node that is not reachable from itself at all.
So, to detect this, you walk the entire array from start to finish and determine if each element returns to itself in <=m steps. If any element is not reachable in <=m steps, then you have a duplicate and can return false. Otherwise, when you finish visiting all elements, you can return true:
for (int start_index= 0; start_index<m; ++start_index)
{
int steps= 1;
int current_element_index= arr[start_index];
while (steps<m+1 && current_element_index!=start_index)
{
current_element_index= arr[current_element_index];
++steps;
}
if (steps>m)
{
return false;
}
}
return true;
You can optimize this by storing additional information:
Record sum of the length of the cycle from each element, unless the cycle visits an element before that element, call it sum_of_steps.
For every element, only step m-sum_of_steps nodes out. If you don't return to the starting element and you don't visit an element before the starting element, you have found a loop containing duplicate elements and can return false.
This is still O(n^2), e.g. {1, 2, 3, 0, 5, 6, 7, 4}, but it's a little bit faster.
ciphwn has it right. It is all to do with statistics. What the question is asking is, in statistical terms, is whether or not the sequence of numbers form a discrete uniform distribution. A discrete uniform distribution is where all values of a finite set of possible values are equally probable. Fortunately there are some useful formulas to determine if a discrete set is uniform. Firstly, to determine the mean of the set (a..b) is (a+b)/2 and the variance is (n.n-1)/12. Next, determine the variance of the given set:
variance = sum [i=1..n] (f(i)-mean).(f(i)-mean)/n
and then compare with the expected variance. This will require two passes over the data, once to determine the mean and again to calculate the variance.
References:
uniform discrete distribution
variance
Here is a solution in O(N) time and O(1) extra space for finding duplicates :-
public static boolean check_range(int arr[],int n,int m) {
for(int i=0;i<m;i++) {
arr[i] = arr[i] - n;
if(arr[i]>=m)
return(false);
}
System.out.println("In range");
int j=0;
while(j<m) {
System.out.println(j);
if(arr[j]<m) {
if(arr[arr[j]]<m) {
int t = arr[arr[j]];
arr[arr[j]] = arr[j] + m;
arr[j] = t;
if(j==arr[j]) {
arr[j] = arr[j] + m;
j++;
}
}
else return(false);
}
else j++;
}
Explanation:-
Bring number to range (0,m-1) by arr[i] = arr[i] - n if out of range return false.
for each i check if arr[arr[i]] is unoccupied that is it has value less than m
if so swap(arr[i],arr[arr[i]]) and arr[arr[i]] = arr[arr[i]] + m to signal that it is occupied
if arr[j] = j and simply add m and increment j
if arr[arr[j]] >=m means it is occupied hence current value is duplicate hence return false.
if arr[j] >= m then skip

Resources