Generating also non-unique (duplicated) permutations - c

I've written a basic permutation program in C.
The user types a number, and it prints all the permutations of that number.
Basically, this is how it works (the main algorithm is the one used to find the next higher permutation):
int currentPerm = toAscending(num);
int lastPerm = toDescending(num);
int counter = 1;
printf("%d", currentPerm);
while (currentPerm != lastPerm)
{
counter++;
currentPerm = nextHigherPerm(currentPerm);
printf("%d", currentPerm);
}
However, when the number input includes repeated digits - duplicates - some permutations are not being generated, since they're duplicates. The counter shows a different number than it's supposed to - Instead of showing the factorial of the number of digits in the number, it shows a smaller number, of only unique permutations.
For example:
num = 1234567
counter = 5040 (!7 - all unique)
num = 1123456
counter = 2520
num = 1112345
counter = 840
I want to it to treat repeated/duplicated digits as if they were different - I don't want to generate only unique permutations - but rather generate all the permutations, regardless of whether they're repeated and duplicates of others.

Uhm... why not just calculate the factorial of the length of the input string then? ;)

I want to it to treat repeated/duplicated digits as if they were
different - I don't want to calculate only the number of unique
permutations.
If the only information that nextHigherPerm() uses is the number that's passed in, you're out of luck. Consider nextHigherPerm(122). How can the function know how many versions of 122 it has already seen? Should nextHigherPerm(122) return 122 or 212? There's no way to know unless you keep track of the current state of the generator separately.

When you have 3 letters for example ABC, you can make: ABC, ACB, BAC, BCA, CAB, CBA, 6 combinations (6!). If 2 of those letters repeat like AAB, you can make: AAB, ABA, BAA, IT IS NOT 3! so What is it? From where does it comes from? The real way to calculate it when a digit or letter is repeated is with combinations -> ( n k ) = n! / ( n! * ( n! - k! ) )
Let's make another illustrative example: AAAB, then the possible combinations are AAAB, AABA, ABAA, BAAA only four combinations, and if you calcualte them by the formula 4C3 = 4.
How is the correct procedure to generate all these lists:
Store the digits in an array. Example ABCD.
Set the 0 element of the array as the pivot element, and exclude it from the temp array. A {BCD}
Then as you want all the combinations (Even the repeated), move the elements of the temporal array to the right or left (However you like) until you reach the n element.
A{BCD}------------A{CDB}------------A{DBC}
Do the second step again but with the temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
Do the third step again but inside the second temp array.
A{B{CD}}------------A{C{DB}}------------A{D{BC}}
A{B{DC}}------------A{C{BD}}------------A{D{CB}}
Go to the first array and move the array, BCDA, set B as pivot, and do this until you find all combinations.

Why not convert it to a string then treat your program like an anagram generator?

Related

Define a vector with random steps

I want to create an array that has incremental random steps, I've used this simple code.
t_inici=(0:10*rand:100);
The problem is that the random number keeps unchangable between steps. Is there any simple way to change the seed of the random number within each step?
If you have a set number of points, say nPts, then you could do the following
nPts = 10; % Could use 'randi' here for random number of points
lims = [0, 10] % Start and end points
x = rand(1, nPts); % Create random numbers
% Sort and scale x to fit your limits and be ordered
x = diff(lims) * ( sort(x) - min(x) ) / diff(minmax(x)) + lims(1)
This approach always includes your end point, which a 0:dx:10 approach would not necessarily.
If you had some maximum number of points, say nPtsMax, then you could do the following
nPtsMax = 1000; % Max number of points
lims = [0,10]; % Start and end points
% Could do 10* or any other multiplier as in your example in front of 'rand'
x = lims(1) + [0 cumsum(rand(1, nPtsMax))];
x(x > lims(2)) = []; % remove values above maximum limit
This approach may be slower, but is still fairly quick and better represents the behaviour in your question.
My first approach to this would be to generate N-2 samples, where N is the desired amount of samples randomly, sort them, and add the extrema:
N=50;
endpoint=100;
initpoint=0;
randsamples=sort(rand(1, N-2)*(endpoint-initpoint)+initpoint);
t_inici=[initpoint randsamples endpoint];
However not sure how "uniformly random" this is, as you are "faking" the last 2 data, to have the extrema included. This will somehow distort pure randomness (I think). If you are not necessarily interested on including the extrema, then just remove the last line and generate N points. That will make sure that they are indeed random (or as random as MATLAB can create them).
Here is an alternative solution with "uniformly random"
[initpoint,endpoint,coef]=deal(0,100,10);
t_inici(1)=initpoint;
while(t_inici(end)<endpoint)
t_inici(end+1)=t_inici(end)+rand()*coef;
end
t_inici(end)=[];
In my point of view, it fits your attempts well with unknown steps, start from 0, but not necessarily end at 100.
From your code it seems you want a uniformly random step that varies between each two entries. This implies that the number of entries that the vector will have is unknown in advance.
A way to do that is as follows. This is similar to Hunter Jiang's answer but adds entries in batches instead of one by one, in order to reduce the number of loop iterations.
Guess a number of required entries, n. Any value will do, but a large value will result in fewer iterations and will probably be more efficient.
Initiallize result to the first value.
Generate n entries and concatenate them to the (temporary) result.
See if the current entries are already too many.
If they are, cut as needed and output (final) result. Else go back to step 3.
Code:
lower_value = 0;
upper_value = 100;
step_scale = 10;
n = 5*(upper_value-lower_value)/step_scale*2; % STEP 1. The number 5 here is arbitrary.
% It's probably more efficient to err with too many than with too few
result = lower_value; % STEP 2
done = false;
while ~done
result = [result result(end)+cumsum(step_scale*rand(1,n))]; % STEP 3. Include
% n new entries
ind_final = find(result>upper_value,1)-1; % STEP 4. Index of first entry exceeding
% upper_value, if any
if ind_final % STEP 5. If non-empty, we're done
result = result(1:ind_final-1);
done = true;
end
end

Count number of permutations of a string with two distinct digits

How can i calculate how many numbers are there between 000000 and 999999 that contain only two distinct digits?
For example 000001 can be counted as one. The same goes for 002200, 112211, 100000. However 112233 contains three distinct digits so it can't be counted.
Thanks
Let's simplify the problem.
Suppose we need find all the permutations of numbers with just 0,1. So the possible combinations can be like 000011,000001,001110 etc. Since there needs to be 2 distinct digits There can be following combinations:
[Zeroes, Ones]: {1,5},{2,4},{3,3},{4,2},{5,1}
That means 1 zeroes 5 ones will have: 000001, 000010, 000100, 001000, 010000, 100000
So if there are Z zeroes then there will be 6CZ combinations with Z zeroes and 6 - Z ones.
Since Z can have a value from 1-5, we can say that there are 5∑Z=16CZ possible numbers with 0,1 combination with at-least 1 zero & 1 one.
Now coming back to original problem Since there are 10 digits and we need two distinct digits so 10C2 i.e. 45 Combination will be there ex: {0,1}, {0,2} ..... {1,2} ....
So the answer is 10C2 * 5∑Z=16CZ
As you haven't specified any specific programming language, I did use of javascript with proper comments. Hope it helps you.
var counter = 0; // this counts if it contains exactly two different digits only
for(var i=10000; i<10005 ; i++) { // change the loop values as you need
var x = i.toString(); // converting number to string which makes easy to split
var chars = x.split(''); // split characters and keep in an array
var uniqueChars = Array.from(new Set(chars)); // get distinct characters from array
if(uniqueChars.length == 2){ // check if it contains exactly two elements
counter++;
}
}
console.log(counter);

How do I check to see if two (or more) elements of an array/vector are the same?

For one of my homework problems, we had to write a function that creates an array containing n random numbers between 1 and 365. (Done). Then, check if any of these n birthdays are identical. Is there a shorter way to do this than doing several loops or several logical expressions?
Thank you!
CODE SO FAR, NOT DONE YET!!
function = [prob] bdayprob(N,n)
N = input('Please enter the number of experiments performed: N = ');
n = input('Please enter the sample size: n = ');
count = 0;
for(i=1:n)
x(i) = randi(365);
if(x(i)== x)
count = count + 1
end
return
If I'm interpreting your question properly, you want to check to see if generating n integers or days results in n unique numbers. Given your current knowledge in MATLAB, it's as simple as doing:
n = 30; %// Define sample size
N = 10; %// Define number of trials
%// Define logical array where each location tells you whether
%// birthdays were repeated for a trial
check = false(1, N);
%// For each trial...
for idx = 1 : N
%// Generate sample size random numbers
days = randi(365, n, 1);
%// Check to see if the total number of unique birthdays
%// are equal to the sample size
check(idx) = numel(unique(days)) == n;
end
Woah! Let's go through the code slowly shall we? We first define the sample size and the number of trials. We then specify a logical array where each location tells you whether or not there were repeated birthdays generated for that trial. Now, we start with a loop where for each trial, we generate random numbers from 1 to 365 that is of n or sample size long. We then use unique and figure out all unique integers that were generated from this random generation. If all of the birthdays are unique, then the total number of unique birthdays generated should equal the sample size. If we don't, then we have repeats. For example, if we generated a sample of [1 1 1 2 2], the output of unique would be [1 2], and the total number of unique elements is 2. Since this doesn't equal 5 or the sample size, then we know that the birthdays generated weren't unique. However, if we had [1 3 4 6 7], unique would give the same output, and since the output length is the same as the sample size, we know that all of the days are unique.
So, we check to see if this number is equal to the sample size for each iteration. If it is, then we output true. If not, we output false. When I run this code on my end, this is what I get for check. I set the sample size to 30 and the number of trials to be 10.
check =
0 0 1 1 0 0 0 0 1 0
Take note that if you increase the sample size, there is a higher probability that you will get duplicates, because randi can be considered as sampling with replacement. Therefore, the larger the sample size, the higher the chance of getting duplicate values. I made the sample size small on purpose so that we can see that it's possible to get unique days. However, if you set it to something like 100, or 200, you will most likely get check to be all false as there will most likely be duplicates per trial.
Here are some more approaches that avoid loops. Let
n = 20; %// define sample size
x = randi(365,n,1); %// generate n values between 1 and 365
Any of the following code snippets returns true (or 1) if there are two identical values in x, and false (or 0) otherwise:
Sort and then check if any two consecutive elements are the same:
result = any(diff(sort(x))==0);
Do all pairwise comparisons manually; remove self-pairs and duplicate pairs; and check if any of the remaining comparisons is true:
result = nnz(tril(bsxfun(#eq, x, x.'),-1))>0;
Compute the distance between distinct values, considering each pair just once, and then check if any distance is 0:
result = any(pdist(x(:))==0);
Find the number of occurrences of the most common value (mode):
[~, occurs] = mode(x);
result = occurs>1;
I don't know if I'm supposed to solve the problem for you, but perhaps a few hints may lead you in the right direction (besides I'm not a matlab expert so it will be in general terms):
Maybe not, but you have to ask yourself what they expect of you. The solution you propose requires you to loop through the array in two nested loops which will mean n*(n-1)/2 times through the loop (ie quadratic time complexity).
There are a number of ways you can improve the time complexity of the problem. The most straightforward would be to have a 365 element table where you can keep track if a particular number has been seen yet - which would require only a single loop (ie linear time complexity), but perhaps that's not what they're looking for either. But maybe that solution is a little bit ad-hoc? What we're basically looking for is a fast lookup if a particular number has been seen before - there exists more memory efficient structures that allows look up in O(1) time and O(log n) time (if you know these you have an arsenal of tools to use).
Then of course you could use the pidgeonhole principle to provide the answer much faster in some special cases (remember that you only asked to determine whether two or more numbers are equal or not).

Filling arrays with random int number

I am pretty new to Matlab and am trying to create an m-by-n matrix containing numbers within a specified range (ie. between 0 and 250) with a specified step (ie. [0:10:250]).
Is there an equivalent function, or a workaround, to:
r = randi ( [a b], m , n )
where I can specify the step myself?
claudiop definitely had a good suggestion, but an alternative to this, in case you have a set of numbers that you want to randomly select from, is to chose random indices of a vector containing your desired numbers
nums = 0:10:250;
randIdxs = randi([1,numel(nums)],m,n);
nums(randIdxs);
It is randomly selecting numbers from the nums vector that will contains the numbers you want to randomly switch between. This solution becomes more useful when there is no clear pattern between all of the numbers, such as [3,5,7,13,23,37]. This is no clear spacing or function between primes, so if you wanted to randomly choose prime numbers, this solution would be good.
Updated
Edited the code above to use numel instead of length in case the numbers were in a multidimensional array. Not a huge change at all
Think about a transformation from a set of random integers over a fixed range ([a b]) to another set with a skip. Multiplication should take care of the skip... then just be careful of the end condition:
r = 10*randi([0 25], m, n);
I leave the general formula to you... if you want, you can wrap this in another function randi_skip().
You can use the randi function with the numbers between 0 and 25 and then multiplicate the result by 10.
r = mystep * randi ( [a b] , m , n )

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources