Why does linear probing work with a relatively prime step? - arrays

I was reading about linear probing in a hash table tutorial and came upon this:
The step size is almost always 1 with linear probing, but it is acceptable to use other step sizes as long as the step size is relatively prime to the table size so that every index is eventually visited. If this restriction isn't met, all of the indices may not be visited...
(The basic problem is: You need to visit every index in an array starting at an arbitrary index and skipping ahead a fixed number of indices [the skip] to the next index, wrapping to the beginning of the array if necessary with modulo.)
I understand why not all indices could be visited if the step size isn't relatively prime to the table size, but I don't understand why the converse is true: that all the indices will be visited if the step size is relatively prime to the array size.
I've observed this relatively prime property working in several examples that I've worked out by hand, but I don't understand why it works in every case.
In short, my question is: Why is every index of an array visited with a step that is relatively prime to the array size? Is there a proof of this?
Thanks!

Wikipedia about Cyclic Groups
The units of the ring Z/nZ are the numbers coprime to n.
Also:
[If two numbers are co-prime] There exist integers x and y such that ax + by = 1
So, if "a" is your step length, and "b" the length of the array, you can reach any index "z" by
axz + byz = z
=>
axz = z (mod b)
i.e stepping "xz" times (and wrapping over the array "yz" times).

number of steps is lcm(A,P)/P or A/gcd(A,P) where A is array size and P is this magic coprime.
so if gcd(A,P) != 1 then number of steps will be less than A
On contrary if gcd(A,P) == 1 (coprimes) then number of steps will be A and all indexes will be visited

Related

Finding the Average case complexity of an Algorithm

I have an algorithm for Sequential search of an unsorted array:
SequentialSearch(A[0..n-1],K)
i=0
while i < n and A[i] != K do
i = i+1
if i < n then return i
else return -1
Where we have an input array A[0...n-1] and a search key K
I know that the worst case is n, because we would have to search the entire array, hence n items O(n)
I know that the best case is 1, since that would mean the first item we search is the one we want, or the array has all the same items, either way it's O(1)
But I have no idea on how to calculate the average case. The answer my textbook gives is:
= (p/n)[1+2+...+i+...+n] + n(1-p)
is there a general formula I can follow for when I see an algorithm like this one, to calculate it?
PICTURE BELOW
Textbook example
= (p/n)[1+2+...+i+...+n] + n(1-p)
p here is the probability of an search key found in the array, since we have n elements, we have p/n as the probability of finding the key at the particular index within n . We essentially doing weighted average as in each iteration, we weigh in 1 comparison, 2 comparison, and until n comparison. Because we have to take all inputs into account, the second part n(1-p) tells us the probability of input that doesn't exist in the array 1-p. and it takes n as we search through the entire array.
You'd need to consider the input cases, something like equivalence classes of input, which depends on the context of the algorithm. If none of those things are known, then assuming that the input is an array of random integers, the average case would probably be O(n). This is because, roughly, you have no way of proving to a useful extent how often your query will be found in an array of N integer values in the range of ~-32k to ~32k.
More formally, let X be a discrete random variable denoting the number of elements of the array A that are needed to be scanned. There are n elements and since all positions are equally likely for inputs generated randomly, X ~ Uniform(1,n) where X = 1,..,n, given that search key is found in the array (with probability p), otherwise all the elements need to be scanned, with X=n (with probability 1-p).
Hence, P(X=x)=(1/n).p.I{x<n}+((1/n).p+(1-p)).I{x=n} for x = 1,..,n, where I{x=n} is the indicator function and will have value 1 iff x=n otherwise 0.
Average time complexity of the algorithm is the expected time taken to execute the algorithm when the input is an arbitrary sequence. By definition,
The following figure shows how time taken for searching the array changes with n and p.

Probability, expected number

In an unsorted array, an element is a local maximum if it is larger than
both of the two adjacent elements. The first and last elements of the array are considered local
maxima if they are larger than the only adjacent element. If we create an array by randomly
permuting the numbers from 1 to n, what is the expected number of local maxima? Prove
your answer correct using additivity of expectations.
Im stuck with this question, i have no clue how to solve this...
You've got an unsorted Array array with n elements. You've got two possible positions for where the local maxima could be. The local maxima could be either on the end or between the first and last element.
Case 1:
If you're looking at the element in either the first or last index (array[0] or array[n-1]) What's the probability that the element is a local maxima? In other words what's the probability that the value of that element will be greater than the element to its right? There are 10 possible value each index could hold {0,1,2,3,4,5,6,7,8,9}. Therefore a 50% chance that on average the element in the first index will be greater than the element in the second index. (array[0] > array[1])
Case 2:
If you're looking at any element that ISNT the first or last element of the array, (n-2 elements) then what's the probability that each one will be the local max? Similarly to the first case, we know there are 10 possible values each index could hold, therefore a 1/3 chance that on average, the element we choose will be greater than the one before it and greater than the one after it.
Putting it all together:
There are 2 cases that have a 1/2 probability of being local maxima and there are n-2 cases that have a 1/3 probability of being local maxima. (2 + n-2 = n, all possible cases). (2)(1/2) + (n-2)(1/3) = (1+n)/(3).
Solvable of course, but won't deprive you the fun of doing it yourself. I will give you a tip. Consider this sketch. What do you think it represents? If you figure this out, you will know that a pattern is available to discover for any n, odd and even. Good luck. If still stuck, will tip you more.

efficient algorithms with array of increasing integers

I've been self teaching myself data structures in python and don't know if I'm overthinking (or underthinking!) the following question:
My goal is come up with an efficient algorithm
With the algorithm, my goal is to determine whether an integer i exists such that A[i] = i in an array of increasing integers
I then want to find the the running time in big-O notation as a function of n the length of A?
so wouldn't this just be a slightly modified version of O(log n) with a function equivalent to: f(i) = A[i] - i. Am I reading this problem wrong? Any help would be greatly appreciated!
Note 1: because you say the integers are increasing, you have ruled out that there are duplicates in the array (otherwise you would say monotonically increasing). So a quick check that would rule out whether there is no solution is if the first element is larger than 1. In other words, for there to be any chance of a solution, first element has to be <= 1.
Note 2: similar to Note 1, if last element is < length of array, then there is no solution.
In general, I think the best you can do is binary search. You trap it between low and high indices, and then check the middle index between low and high. If array[middle] equals middle, return yes. If it is less than middle, then set left to middle+1. Otherwise, set right to middle - 1. If left becomes > right, return no.
Running time is O( log n ).
Edit: algorithm does NOT work if you allow monotonically increasing. Exercise: explain why. :-)
You're correct. Finding an element i in your A sized array is O(Log A) indeed.
However, you can do much better: O(Log A) -> O(1) if you trade memory complexity for time complexity, which is what "optimizers" tend to do.
What I mean is: If you insert new Array elements into an "efficient" hash table you can achieve the find function in constant time O(1)
This is depends a lot on the elements you're inserting:
Are they unique? Think of collisions
How often do you insert?
This is an interesting problem :-)
You can use bisection to locate the place where a[i] == i:
0 1 2 3 4 5 6
a = [-10 -5 2 5 12 20 100]
When i = 3, i < a[i], so bisect down
When i = 1 i > a[i], so bisect up
When i = 2 i == a[i], you found the match
The running time is O(log n).

maximum no of equal elements in the array after n transformation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
You have an array containing n elements. At any move, you choose two indices i and j, i not equals j and increment value at one index and decrease value at other index. You can make this move any number of times. We need is the maximum number of elements which can have the same value (after any number of moves).
Example for 1,2,3,4 answer is 3 since we can have at most 3 elements equal after applying the moves any number of times. But i am searching for the algorithm to do that so needed help.
As stated, this doesn't take much of an algorithm. If you can do it as many times as you want, you should always be able to get a result with either N or N-1 elements equal (where N is the size of the input array).
Take the sum of the input. For example, in your case: sum(1,2,3,4) = 10.
If sum % N == 0, the answer is N. Any time before that, you'll have at least one element higher than sum/N and at least one lower. Increment the low, decrement the high.
Else the answer is N-1. The final set can have N-1 elements equal to (int)sum/N and the last element will be the remainder from the original sum. You can use that last element as a "spare" to increment/decrement whichever other elements you want.
Since you don't have to actually find the transformations, the end result is O(N). You just take the sum and mod it by N to check which answer to give. There's no use recursing or searching for "averaging pairs" unless you want to find the sequence of steps that lead to the answer..
This might be a very ineffective algorithm - but you could try some sort of dynamic programming.
def elems(array, previous_attempts = []):
# Get a list of all the possible arrays (after transforming the current array)
# and remove all of the arrays we have seen so far.
possible_arrays = possibilities(array) - previous_attempts
# If there are no more possible, return the number of elements the array
# has in common.
if possible_arrays is empty:
return num_in_common(array)
# Otherwise, for all the possibilities find the one that creates the maximum
# amount of elements in common.
max = 0
for a in possible_arrays:
# This is a recursive call that calculates the score for the possibility.
# It also keeps track of the current state so we don't revisit it.
score = elems(a, previous_attempts.append(array))
if score > max:
max = score
return max
You can count the number of occurrences of each value in the array (Lets call the array A of size N). Let the maximal number of occurrences of any value in A be max (may be several values), you are only interested in the values that appeared max times, as other values can't surpass max+1 appearances by the suggested method.
Extreme cases:
if max=N the answer is N
if max=N-1 the answer is N-1
In all the other cases, for each value V that appeared max times, you are trying to find two other values that have an average of V but don't equal V. if they exist, then the answer is max+2 (you can increment them so they both will be equal to V). if no such indexes i and j exist, the answer is max+1 (you can increment any two other values until one of them will be equal to V).
EDIT:
This answer assumes that you only choose i and j once and then increase/decrease them as much as you like (I guess I misinterpreted).

How to identify the duplicated number, in an array, with minimum compexity?

There is an array of size 10,000. It store the number 1 to 10,000 in randomly order.
Each number occurs one time only.
Now if any number is removed from that array and any other number is duplicated into array.
How can we identify the which number is duplicated, with minimum complexity?
NOTE : We can not use another array.
The fastest way is an O(N) in-place pigeonhole sort.
Start at the first location of the array, a[0]. Say it has the value 5. You know that 5 belongs at a[4], so swap locations 0 and 4. Now a new value is in a[0]. Swap it to where it needs to go.
Repeat until a[0] == 1, then move on to a[1] and swap until a[1] == 2, etc.
If at any point you end up attempting to swap two identical values, then you have found the duplicate!
Runtime: O(N) with a very low coefficient and early exit. Storage required: zero.
Bonus optimization: count how many swaps have occurred and exit early if n_swaps == array_size. This resulted in a 15% improvement when I implemented a similar algorithm for permuting a sequence.
Compute the sum and the sum of the squares of the elements (you will need 64 bit values for the sum of the squares). From these you can recover which element was modified:
Subtract the expected values for the unmodified array.
If x was removed and y duplicated you get the difference y - x for the sum and y2 - x2 = (y + x) (y - x) for the sum of squares.
From that it is easy to recover x and y.
Edit: Note that this may be faster than pigeonhole sort, because it runs linearly over the array and is thus more cache friendly.
Why not simply using a second array or other data structure like hash table (hash table if you like, depending on the memory/performance tradeoff). This second array would simply store the count of a number in the original array. Now just add a +/- to the access function of the original array and you have your information immediately.
ps when you wrote "we can not use another array" - I assume, you can not change the ORIGINAL data structure. However the use of additional data structures is possible....
Sort the array, then iterate through until you hit two of the same number in a row.

Resources