Tips for finding patterns in an array - arrays

I have an array of 256 values. Those 256 values were calculated in some mysterious way, and range from 0-3 inclusive. To increase the efficiency of my program, I can calculate the results of the array given an index, rather than actually looking up in the array.
Basically, the program gives me an index, which would be looked up in the array, but I know that I can actually calculate what will be in that index using the index number itself.
For example
a[0] = 3, a[1] = 2, a[2] = 1, ... , a[254] = 1, a[255] = 1
I'm not actually asking for the calculation here, but looking at every number in the array, what are some tips on figuring out the pattern? I apologize if this is poorly worded, I'll attempt to clear up any questions.

There likely isn't a general approach to solving this problem without having some idea about the function that generated the data. You mentioned "efficiency" — if there really are only 256 values and the function to generate the data has any kind of computational complexity, it's probably more efficient to just keep it as an array.

Related

algorithm to find missing values in an array .. in place?

I had a job interview today in which I was given an array of size n and the objective was to find the missing values in the array.
Input:
arr = [9,7,1,3,4,2,5]
Output:
6, 8
The input arr contains elements only from 1 till n. Note that the array is not sorted.
I got the "naive" version of the program done quickly, but it was O(n2) time complexity. I came up with a solution using a hash table (Python dictionary in my case) with time and space complexity both as O(n).
But then, I was asked to do it in place with no extra space, like the extra arrays and dictionary space I used in the solutions mentioned above were not allowed.
Any ideas to do this in time-complexity O(n) and space complexity O(1).
It could have been a trick question, given that "array like this" is not a sufficient specification, allowing for short cuts. And without short cuts (which make assumptions on the values in the array or other restrictions), the problem should be of sorting complexity (O(n log n)).
[2,2,2,2,2,2,2] is also an array "like this"? Or
[2,-2,2,-2,1000] ?
Depending on the "like-this-ishness" of what they imagine, there might or might not be short cuts.
So maybe, this job interview question was about how you query for missing information/requirements.
Assumption: The array consists of only integers in range [1,n] (there could be no other way to solve this question for a random number of missing values in O(N) without extra space).
The algorithm would be as follows:
for i in range 1 to n:
index = abs(array[i])
if array[index]>0:
array[index] *= -1
for i in range 1 to n:
if array[i]>0:
print(i)
We use a simple observation here: for each element in the array, we make the item at index element negative. Then, we traverse the array again and see if we get any positive values, which means that index was never modified. If the index wasn't modified, then it was one of the missing values, so we print it.

What is algorithm to find K for finding medians in two sorted array in leetcode

The solution implementing find medians in two sorted array is awesome. However, I am still very confused about code to calculate K
var aMid = aLength * k / (aLength + bLength)
var bMid = k - aMid - 1
I guess this is the key part of this algorithm which I really dont know why is calculated like this. To explain more clearly what I mean, the core logic is divide and conquer, considering the fact that different size list should be divided differently. I wonder why this formula is working perfectly.
Can someone give me some insight about it. I searched lots of online documents and it is very hard to find materials to explain this part well.
Many thanks in advance
The link shows two different ways of computing the comparison points in each array: one always uses k/2, even if the array doesn't have that many elements; the other (which you quote) tries to distribute the comparison points based on the size of the arrays.
As can be seen from these two examples, neither of which is optimal, it doesn't make much difference how you compute the comparison points, as long as the size of the the two components is generally linear in K (using a fixed size of 5 for one of the comparison points won't work, for example.)
The algorithm effectively reduces the problem size by either aMid or bMid on each iteration. Ideally, the problem size would be reduced by k/2; and that's the computation you should use if both arrays have at least k/2 members. If one has two few members, you can set the comparison point for the array to its last element, and compute the other comparison point so that the total is k - 1. If you end up discarding all of the elements from some array, you can then immediately return element k of the other array.
That strategy will usually perform fewer iterations than either of the proposals in your link, but it is still O(log k).

Algorithm - What is the best algorithm for detecting duplicate numbers in small array?

What is the best algorithm for detecting duplicate numbers in array, the best in speed, memory and avoiving overhead.
Small Array like [5,9,13,3,2,5,6,7,1] Note that 5 i dublicate.
After searching and reading about sorting algorithms, I realized that I will use one of these algorithms, Quick Sort, Insertion Sort or Merge Sort.
But actually I am really confused about what to use in my case which is a small array.
Thanks in advance.
To be honest, with that size of array, you may as well choose the O(n2) solution (checking every element against every other element).
You'll generally only need to worry about performance if/when the array gets larger. For small data sets like this, you could well have found the duplicate with an 'inefficient' solution before the sort phase of an efficient solution will have finished :-)
In other words, you can use something like (pseudo-code):
for idx1 = 0 to nums.len - 2 inclusive:
for idx2 = idx1 + 1 to nums.len - 1 inclusive:
if nums[idx1] == nums[idx2]:
return nums[idx1]
return no dups found
This finds the first value in the array which has a duplicate.
If you want an exhaustive list of duplicates, then just add the duplicate value to another (initially empty) array (once only per value) and keep going.
You can sort it using any half-decent algorithm though, for a data set of the size you're discussing, even a bubble sort would probably be adequate. Then you just process the sorted items sequentially, looking for runs of values but it's probably overkill in your case.
Two good approaches depend on the fact that you know or not the range from which numbers are picked up.
Case 1: the range is known.
Suppose you know that all numbers are in the range [a, b[, thus the length of the range is l=b-a.
You can create an array A the length of which is l and fill it with 0s, thus iterate over the original array and for each element e increment the value of A[e-a] (here we are actually mapping the range in [0,l[).
Once finished, you can iterate over A and find the duplicate numbers. In fact, if there exists i such that A[i] is greater than 1, it implies that i+a is a repeated number.
The same idea is behind counting sort, and it works fine also for your problem.
Case 2: the range is not known.
Quite simple. Slightly modify the approach above mentioned, instead of an array use a map where the keys are the number from your original array and the values are the times you find them. At the end, iterate over the set of keys and search those that have been found more then once.
Note.
In both the cases above mentioned, the complexity should be O(N) and you cannot do better, for you have at least to visit all the stored values.
Look at the first example: we iterate over two arrays, the lengths of which are N and l<=N, thus the complexity is at max 2*N, that is O(N).
The second example is indeed a bit more complex and dependent on the implementation of the map, but for the sake of simplicity we can safely assume that it is O(N).
In memory, you are constructing data structures the sizes of which are proportional to the number of different values contained in the original array.
As it usually happens, memory occupancy and performance are the keys of your choice. Greater the former, better the latter and vice versa. As suggested in another response, if you know that the array is small, you can safely rely on an algorithm the complexity of which is O(N^2), but that does not require memory at all.
Which is the best choice? Well, it depends on your problem, we cannot say.

Count distinct array entries [with no add memory nor array changes]

Task is count unique numbers of a given array. I saw numerous similar questions on SO, but here we have additional requirements, which weren't stated in other questions:
Amount of allowed additional memory is O(1)
Changes to array are
prohibited
I was able to write quadratic algorithm, which agrees with given constraints. But I keep wondering, may one could do better on such a problem? Thank you for your time.
Algorithm working with O(n^2)
def count(a):
unique = len(a)
ind = 0
while ind < len(a):
x = a[ind]
i = ind+1
while i < len(a):
if a[i] == x:
unique -= 1
break
i += 1
ind += 1
print("Total uniques: ", unique)
This is a very similar problem to a follow-up question in chapter 1 (Arrays and Strings) from Cracking the Coding Interview:
Implement an algorithm to determine if a string has all unique
characters. What if you cannot use additional data structures?
The answer (to the follow-up question) is that if you can't assume anything about the array (namely, it is not sorted, you don't know its size, etc.), then there is no algorithm better than what you showed.
That being said, you may think about relaxing the constraints a little bit, to make it more interesting. For example, if you have an upper bound on the array size, you could use a bit vector to keep track of which values you've read before while traversing the array, although this is not strictly an O(1) solution when it comes to memory usage (one could argue that by knowing the maximum array size, the memory usage is constant, and thus O(1), but that is a little bit of cheating). Similarly, if the array was sorted, you could also solve it in O(n) by going through each element at a time and check if its neighbors are different numbers.
Because there is no underlying structure in the array given (sorted, etc.) you are forced to brute force every value in the array...
There is a more complicated approach that I believe would work. It entails keeping your array of unique numbers sorted. This means that it would take more time when inserting into the array but would allow you to look-up values much quicker. You should be able to insert into the array in logn time by looking at the value directly in the middle of the array and checking if it's larger or smaller. You'd then eliminate half the array as a valid insertion location and repeat. You would use a similar approach to look-up values in the array. The only issue with this is that it requires more memory space than I believe you are allocated (1).
That being said, I think given the constraints on the task restrict the algorithm to O(n^2).

determine if an array has the numbers a to b each once [duplicate]

This question already has answers here:
How to tell if an array is a permutation in O(n)?
(16 answers)
Closed 9 years ago.
Given an array A of size n, and two numbers a and b with b-a+1=n, I need to determine whether or not A contains each of the numbers between a and b (exactly once).
For example, if n=4 and a=1,b=4, then I'm looking to see if A is a rearrangement of [1,2,3,4].
In practice, I need to do this with O(1) space (no hash table).
My first idea was to sort A, but I have to do this without rearranging A, so that's out.
My second idea is to run through A once, adding up the entries and checking that they are in the correct range. At the end, I have to get the right sum (for a=1,b=n, this is n(n+1)/2), but this doesn't always catch everything, e.g. [1,1,4,4,5] passes the test for n=5,a=1,b=5, but shouldn't.
The only idea of mine that works is to pass through the array n times making sure to see each number once and only once. Is there a faster solution?
You can do this with a single pass through the array, using only a minor modification of the n(n+1)/2 method you already mentioned.
To do so, walk through the array, ignoring elements outside the a..b range. For numbers that are in the correct range, you want to track three values: the sum of the numbers, the sum of the squares of the numbers, and the count of the numbers.
You can pre-figure the correct values for both the sum of numbers and the sum of the squares (and, trivially, the count).
Then compare your result to the expected results. Consider, for example, if you're searching for 1, 2, 3, 4. If you used only the sums of the numbers, then [1, 1, 4, 4] would produce the correct result (1+2+3+4 = 10, 1+1+4+4 = 10), but if you also add the sums of the squares, the problem is obvious: 1+4+9+16 = 30 but 1+1+16+16 = 34.
This is essentially applying (something at least very similar to) a Bloom filter to the problem. Given a sufficiently large group and a fixed pair of functions, there's going to be some set of incorrect inputs that will produce the correct output. You can reduce that possibility to an arbitrarily low value by increasing the number of filters you apply. Alternatively, you can probably design an adaptive algorithm that can't be fooled--offhand, it seems like if your range of inputs is N, then raising each number to the power N+1 will probably assure that you can only get the correct result with exactly the correct inputs (but I'll admit, I'm not absolutely certain that's correct).
Here is a O(1) space and O(n) solution that might help :-
Find the mean and standard deviation in range (a,b)
Scan the array and find mean and standard deviation.
if any number is outside (a,b) return false
if(mean1!=mean2 || sd1!=sd2) return false else true.
Note : I might not be 100% accurate.
Here's a solution that fails with the probability of a hash collision.
Take an excellent (for example cryptographic) hash function H.
Compute: xor(H(x) for x in a...b)
Compute: xor(H(A[i]) for i in 1...n)
If the two are the different, then for sure you don't have a permutation. If the two are the same, then you've almost certainly got a permutation. You can make this immune to input that's been picked to produce a hash collision by including a random seed into the hash.
This is obviously O(b-a) in running time, needs O(1) external storage, and trivial to implement.

Resources