Count number of permutations of a string with two distinct digits - permutation

How can i calculate how many numbers are there between 000000 and 999999 that contain only two distinct digits?
For example 000001 can be counted as one. The same goes for 002200, 112211, 100000. However 112233 contains three distinct digits so it can't be counted.

Let's simplify the problem.
Suppose we need find all the permutations of numbers with just 0,1. So the possible combinations can be like 000011,000001,001110 etc. Since there needs to be 2 distinct digits There can be following combinations:
[Zeroes, Ones]: {1,5},{2,4},{3,3},{4,2},{5,1}
That means 1 zeroes 5 ones will have: 000001, 000010, 000100, 001000, 010000, 100000
So if there are Z zeroes then there will be 6CZ combinations with Z zeroes and 6 - Z ones.
Since Z can have a value from 1-5, we can say that there are 5∑Z=16CZ possible numbers with 0,1 combination with at-least 1 zero & 1 one.
Now coming back to original problem Since there are 10 digits and we need two distinct digits so 10C2 i.e. 45 Combination will be there ex: {0,1}, {0,2} ..... {1,2} ....
So the answer is 10C2 * 5∑Z=16CZ

As you haven't specified any specific programming language, I did use of javascript with proper comments. Hope it helps you.
var counter = 0; // this counts if it contains exactly two different digits only
for(var i=10000; i<10005 ; i++) { // change the loop values as you need
var x = i.toString(); // converting number to string which makes easy to split
var chars = x.split(''); // split characters and keep in an array
var uniqueChars = Array.from(new Set(chars)); // get distinct characters from array
if(uniqueChars.length == 2){ // check if it contains exactly two elements


matlab: how to speed up the count of consecutive values in a cell array

I have the 137x19 cell array Location(1,4).loc and I want to find the number of times that horizontal consecutive values are present in Location(1,4).loc. I have used this code:
for ii=1:137
for ii=1:137
for ii=1:137
for ii=1:137
... continue for all the columns. This code run and gives me the correct result but it's not automated and it's slow. Can you give me ideas to automate and speed up the code?
I think I will write an answer to this since I've not done so for a while.
First convert your cell Array to a matrix,this will ease the following steps by a lot. Then diff is the way to go
A = randi(5,[137,19]);
DiffA = diff(A')'; %// Diff creates a matrix that is 136 by 19, where each consecutive value is subtracted by its previous value.
So a 0 in DiffA would represent 2 consecutive numbers in A are equal, 2 consecutive 0s would mean 3 consecutive numbers in A are equal.
idx = DiffA==0;
cnt(:,1) = sum(idx,2);
To do 3 consecutive number counts, you could do something like:
idx2 = abs(DiffA(:,1:end-1))+abs(DiffA(:,2:end)) == 0;
cnt(:,2) = sum(idx2,2);
Or use another Diff, the abs is used to avoid negative number + positive number that also happens to give 0; otherwise only 0 + 0 will give you a 0; you can now continue this pattern by doing:
idx3 = abs(DiffA(:,1:end-2))+abs(DiffA(:,2:end-1))+abs(DiffA(:,3:end)) == 0
cnt(:,3) = sum(idx3,2);
In loop format:
absDiffA = abs(DiffA)
for ii = 1:W
absDiffA = abs(absDiffA(:,1:end-1) + absDiffA(:,1+1:end));
idx = (absDiffA == 0);
cnt(:,ii) = sum(idx,2);
NOTE: this method counts [0,0,0] twice when evaluating 2 consecutives, and once when evaluating 3 consecutives.

How do I check to see if two (or more) elements of an array/vector are the same?

For one of my homework problems, we had to write a function that creates an array containing n random numbers between 1 and 365. (Done). Then, check if any of these n birthdays are identical. Is there a shorter way to do this than doing several loops or several logical expressions?
Thank you!
function = [prob] bdayprob(N,n)
N = input('Please enter the number of experiments performed: N = ');
n = input('Please enter the sample size: n = ');
count = 0;
x(i) = randi(365);
if(x(i)== x)
count = count + 1
If I'm interpreting your question properly, you want to check to see if generating n integers or days results in n unique numbers. Given your current knowledge in MATLAB, it's as simple as doing:
n = 30; %// Define sample size
N = 10; %// Define number of trials
%// Define logical array where each location tells you whether
%// birthdays were repeated for a trial
check = false(1, N);
%// For each trial...
for idx = 1 : N
%// Generate sample size random numbers
days = randi(365, n, 1);
%// Check to see if the total number of unique birthdays
%// are equal to the sample size
check(idx) = numel(unique(days)) == n;
Woah! Let's go through the code slowly shall we? We first define the sample size and the number of trials. We then specify a logical array where each location tells you whether or not there were repeated birthdays generated for that trial. Now, we start with a loop where for each trial, we generate random numbers from 1 to 365 that is of n or sample size long. We then use unique and figure out all unique integers that were generated from this random generation. If all of the birthdays are unique, then the total number of unique birthdays generated should equal the sample size. If we don't, then we have repeats. For example, if we generated a sample of [1 1 1 2 2], the output of unique would be [1 2], and the total number of unique elements is 2. Since this doesn't equal 5 or the sample size, then we know that the birthdays generated weren't unique. However, if we had [1 3 4 6 7], unique would give the same output, and since the output length is the same as the sample size, we know that all of the days are unique.
So, we check to see if this number is equal to the sample size for each iteration. If it is, then we output true. If not, we output false. When I run this code on my end, this is what I get for check. I set the sample size to 30 and the number of trials to be 10.
check =
0 0 1 1 0 0 0 0 1 0
Take note that if you increase the sample size, there is a higher probability that you will get duplicates, because randi can be considered as sampling with replacement. Therefore, the larger the sample size, the higher the chance of getting duplicate values. I made the sample size small on purpose so that we can see that it's possible to get unique days. However, if you set it to something like 100, or 200, you will most likely get check to be all false as there will most likely be duplicates per trial.
Here are some more approaches that avoid loops. Let
n = 20; %// define sample size
x = randi(365,n,1); %// generate n values between 1 and 365
Any of the following code snippets returns true (or 1) if there are two identical values in x, and false (or 0) otherwise:
Sort and then check if any two consecutive elements are the same:
result = any(diff(sort(x))==0);
Do all pairwise comparisons manually; remove self-pairs and duplicate pairs; and check if any of the remaining comparisons is true:
result = nnz(tril(bsxfun(#eq, x, x.'),-1))>0;
Compute the distance between distinct values, considering each pair just once, and then check if any distance is 0:
result = any(pdist(x(:))==0);
Find the number of occurrences of the most common value (mode):
[~, occurs] = mode(x);
result = occurs>1;
I don't know if I'm supposed to solve the problem for you, but perhaps a few hints may lead you in the right direction (besides I'm not a matlab expert so it will be in general terms):
Maybe not, but you have to ask yourself what they expect of you. The solution you propose requires you to loop through the array in two nested loops which will mean n*(n-1)/2 times through the loop (ie quadratic time complexity).
There are a number of ways you can improve the time complexity of the problem. The most straightforward would be to have a 365 element table where you can keep track if a particular number has been seen yet - which would require only a single loop (ie linear time complexity), but perhaps that's not what they're looking for either. But maybe that solution is a little bit ad-hoc? What we're basically looking for is a fast lookup if a particular number has been seen before - there exists more memory efficient structures that allows look up in O(1) time and O(log n) time (if you know these you have an arsenal of tools to use).
Then of course you could use the pidgeonhole principle to provide the answer much faster in some special cases (remember that you only asked to determine whether two or more numbers are equal or not).

Using binary strings on storing ordered items in database

In this post, #boisvert mentioned that if using string as the order field's value, it is best shown for a binary string, and then gave an algorithm to calculate the average of two binary strings as follows:
Avalue = 1+0*(1/2)+1*(1/4)+1*(1/8)
Bvalue = 1+1*(1/2)+0*(1/4)+0*(1/8)
average, new value = 1+0*(1/2)+1*(1/4)+1*(1/8)+1*(1/16) new string = "10111"
content order
A '1011'
new! '10111'
B '1100'
C '1101'
I couldn't understand these very well, what's the value of the first item putting into the DB and the items inserting before/after it? How to calculate the average between '1011' and the new value '10111', or between '111' and '1000'?
Any help is much appreciated.
The binary strings are fractions, not integers; the decimal point is always at the beginning (or after the first digit, in #boisvert's answer; it doesn't make any difference as long as the position of the decimal point is fixed. Of course, it's actually a binary point since these are binary numbers.)
To find the average:
If the strings differ in length, put enough 0s at the end of the shorter string so that it is the same length as the longer string.
Add the two strings together, using binary addition, always putting the last carry at the beginning, even if it is ´0'. [See algorithm below].
Remove any 0s at the end.
Example 1: 1011 and 10111
Extend the first string with a 0: 10110 and 10111
Find the sum:
A: 10110
B: 10111
Carry: 101100
Sum: 101101
No trailing zeros, so the result is 101101
Example 2: 111 and 1000
1. 1110 1000
2. 10110
3. 1011
Starting off and insertion at the end:
The first item put into the database has the label 1. If at any point you need to add an item at the very beginning, use the first label with a 0 before it. Similarly, if you need to add an item at the end, use the first label with a 1 before it.
Binary addition:
Since the strings are the same length, this is easy; set Carry to 0, and scan both strings from back to front. (The output is also produced back-to-front.)
At each position:
* If the sum of Carry and the two digits is 1 or 3, output a 1, otherwise output a 0.
* If the sum of Carry and the two digits is 2 or 3, set Carry to 1, otherwise set it to 0.
When you've finished all the digits, output the value of Carry.
Practical implementation:
In practice, you wouldn't use binary strings; you'd use some fairly large base, the only requirement being that it is even. But the algorithms are the same. When constructing the representation of your numbers, you need to assign digits to characters in alphabetical order, so that the resulting strings can be sorted alphabetically without converting them to numbers; the database doesn't know how to convert to numbers, but it knows how to sort strings alphabetically.

Algorithm to split an array into P subarrays of balanced sum

I have an big array of length N, let's say something like:
2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1
I need to split this array into P subarrays (in this example, P=4 would be reasonable), such that the sum of the elements in each subarray is as close as possible to sigma, being:
sigma=(sum of all elements in original array)/P
In this example, sigma=15.
For the sake of clarity, one possible result would be:
2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1
(sums: 12,19,14,15)
I have written a very naive algorithm based in how I would do the divisions by hand, but I don't know how to impose the condition that a division whose sums are (14,14,14,14,19) is worse than one that is (15,14,16,14,16).
Thank you in advance.
First, let’s formalize your optimization problem by specifying the input, output, and the measure for each possible solution (I hope this is in your interest):
Given an array A of positive integers and a positive integer P, separate the array A into P non-overlapping subarrays such that the difference between the sum of each subarray and the perfect sum of the subarrays (sum(A)/P) is minimal.
Input: Array A of positive integers; P is a positive integer.
Output: Array SA of P non-negative integers representing the length of each subarray of A where the sum of these subarray lengths is equal to the length of A.
Measure: abs(sum(sa)-sum(A)/P) is minimal for each sa ∈ {sa | sa = (Ai, …, Ai+‍SAj) for i = (Σ SAj), j from 0 to P-1}.
The input and output define the set of valid solutions. The measure defines a measure to compare multiple valid solutions. And since we’re looking for a solution with the least difference to the perfect solution (minimization problem), measure should also be minimal.
With this information, it is quite easy to implement the measure function (here in Python):
def measure(a, sa):
sigma = sum(a)/len(sa)
diff = 0
i = 0
for j in xrange(0, len(sa)):
diff += abs(sum(a[i:i+sa[j]])-sigma)
i += sa[j]
return diff
print measure([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], [3,4,4,5]) # prints 8
Now finding an optimal solution is a little harder.
We can use the Backtracking algorithm for finding valid solutions and use the measure function to rate them. We basically try all possible combinations of P non-negative integer numbers that sum up to length(A) to represent all possible valid solutions. Although this ensures not to miss a valid solution, it is basically a brute-force approach with the benefit that we can omit some branches that cannot be any better than our yet best solution. E.g. in the example above, we wouldn’t need to test solutions with [9,…] (measure > 38) if we already have a solution with measure ≤ 38.
Following the pseudocode pattern from Wikipedia, our bt function looks as follows:
def bt(c):
global P, optimum, optimum_diff
if reject(P,c):
if accept(P,c):
print "%r with %d" % (c, measure(P,c))
if measure(P,c) < optimum_diff:
optimum = c
optimum_diff = measure(P,c)
s = first(P,c)
while s is not None:
s = next(P,s)
The global variables P, optimum, and optimum_diff represent the problem instance holding the values for A, P, and sigma, as well as the optimal solution and its measure:
class MinimalSumOfSubArraySumsProblem:
def __init__(self, a, p):
self.a = a
self.p = p
self.sigma = sum(a)/p
Next we specify the reject and accept functions that are quite straight forward:
def reject(P,c):
return optimum_diff < measure(P,c)
def accept(P,c):
return None not in c
This simply rejects any candidate whose measure is already more than our yet optimal solution. And we’re accepting any valid solution.
The measure function is also slightly changed due to the fact that c can now contain None values:
def measure(P, c):
diff = 0
i = 0
for j in xrange(0, P.p):
if c[j] is None:
diff += abs(sum(P.a[i:i+c[j]])-P.sigma)
i += c[j]
return diff
The remaining two function first and next are a little more complicated:
def first(P,c):
t = 0
is_complete = True
for i in xrange(0, len(c)):
if c[i] is None:
if i+1 < len(c):
c[i] = 0
c[i] = len(P.a) - t
is_complete = False
t += c[i]
if is_complete:
return None
return c
def next(P,s):
t = 0
for i in xrange(0, len(s)):
t += s[i]
if i+1 >= len(s) or s[i+1] is None:
if t+1 > len(P.a):
return None
s[i] += 1
return s
Basically, first either replaces the next None value in the list with either 0 if it’s not the last value in the list or with the remainder to represent a valid solution (little optimization here) if it’s the last value in the list, or it return None if there is no None value in the list. next simply increments the rightmost integer by one or returns None if an increment would breach the total limit.
Now all you need is to create a problem instance, initialize the global variables and call bt with the root:
P = MinimalSumOfSubArraySumsProblem([2,4,6,7,6,3,3,3,4,3,4,4,4,3,3,1], 4)
optimum = None
optimum_diff = float("inf")
If I am not mistaken here, one more approach is dynamic programming.
You can define P[ pos, n ] as the smallest possible "penalty" accumulated up to position pos if n subarrays were created. Obviously there is some position pos' such that
P[pos', n-1] + penalty(pos', pos) = P[pos, n]
You can just minimize over pos' = 1..pos.
The naive implementation will run in O(N^2 * M), where N - size of the original array and M - number of divisions.
#Gumbo 's answer is clear and actionable, but consumes lots of time when length(A) bigger than 400 and P bigger than 8. This is because that algorithm is kind of brute-forcing with benefits as he said.
In fact, a very fast solution is using dynamic programming.
Given an array A of positive integers and a positive integer P, separate the array A into P non-overlapping subarrays such that the difference between the sum of each subarray and the perfect sum of the subarrays (sum(A)/P) is minimal.
Measure: , where is sum of elements of subarray , is the average of P subarray' sums.
This can make sure the balance of sum, because it use the definition of Standard Deviation.
Persuming that array A has N elements; Q(i,j) means the minimum Measure value when split the last i elements of A into j subarrays. D(i,j) means (sum(B)-sum(A)/P)^2 when array B consists of the i~jth elements of A ( 0<=i<=j<N ).
The minimum measure of the question is to calculate Q(N,P). And we find that:
Q(N,P)=MIN{Q(N-1,P-1)+D(0,0); Q(N-2,P-1)+D(0,1); ...; Q(N-1,P-1)+D(0,N-P)}
So it like can be solved by dynamic programming.
Q(i,1) = D(N-i,N-1)
Q(i,j) = MIN{ Q(i-1,j-1)+D(N-i,N-i);
So the algorithm step is:
1. Cal j=1:
Q(1,1), Q(2,1)... Q(3,1)
2. Cal j=2:
Q(2,2) = MIN{Q(1,1)+D(N-2,N-2)};
Q(3,2) = MIN{Q(2,1)+D(N-3,N-3); Q(1,1)+D(N-3,N-2)}
Q(4,2) = MIN{Q(3,1)+D(N-4,N-4); Q(2,1)+D(N-4,N-3); Q(1,1)+D(N-4,N-2)}
... Cal j=...
P. Cal j=P:
Q(P,P), Q(P+1,P)...Q(N,P)
The final minimum Measure value is stored as Q(N,P)!
To trace each subarray's length, you can store the
MIN choice when calculate Q(i,j)=MIN{Q+D...}
space for D(i,j);
time for calculate Q(N,P)
compared to the pure brute-forcing algorithm consumes time.
Working code below (I used php language). This code decides part quantity itself;
$main = array(2,4,6,1,6,3,2,3,4,3,4,1,4,7,3,1,2,1,3,4,1,7,2,4,1,2,3,1,1,1,1,4,5,7,8,9,8,0);
for($i=0;$i < count($main); $i++){
$p[]= $main[$i];
if(abs(15 - array_sum($p)) < abs(15 - (array_sum($p)+$main[$i+1])))
$pi[] = $i+1;
$pc = count($pi);
$ba = $pi[$pc-2] ;
$part[$pa] = array_slice( $main, $ba, count($p));
echo '<br>';
echo array_sum($part[$s]);
code will output part sums like as below
I'm wondering whether the following would work:
Go from the left, as soon as sum > sigma, branch into two, one including the value that pushes it over, and one that doesn't. Recursively process data to the right with rightSum = totalSum-leftSum and rightP = P-1.
So, at the start, sum = 60
2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1
Then for 2 4 6 7, sum = 19 > sigma, so split into:
2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1
2 4 6 7 6 3 3 3 4 3 4 4 4 3 3 1
Then we process 7 6 3 3 3 4 3 4 4 4 3 3 1 and 6 3 3 3 4 3 4 4 4 3 3 1 with P = 4-1 and sum = 60-12 and sum = 60-19 respectively.
This results in, I think, O(P*n).
It might be a problem when 1 or 2 values is by far the largest, but, for any value >= sigma, we can probably just put that in it's own partition (preprocessing the array to find these might be the best idea (and reduce sum appropriately)).
If it works, it should hopefully minimise sum-of-squared-error (or close to that), which seems like the desired measure.
I propose an algorithm based on backtracking. The main function chosen randomly select an element from the original array and adds it to an array partitioned. For each addition will check to obtain a better solution than the original. This will be achieved by using a function that calculates the deviation, distinguishing each adding a new element to the page. Anyway, I thought it would be good to add an original variables in loops that you can not reach desired solution will force the program ends. By desired solution I means to add all elements with respect of condition imposed by condition from if.
Read P
initialize P vectors, with names vector_partition[i], i=1..P
list_vector initialize a list what pointed this P vectors
initialize a diferences_vector with dimension of P
//that can easy visualize like a vector of vectors
//construct a non-recursive backtracking algorithm
function Deviation(vector) //function for calculate deviation of elements from a vector
for i=0 to Size(vector)-1 do
return dev
//fix some maximum number of iteration for while loop
Read max_iteration
//as the number of iterations will be higher the more it will get
//a more accurate solution
for i=1 to Size(list_vector) do
if(IsEmpty(vector)) break from while loop
el=SelectElement(vector) //you can implement that function using a randomized
//choice of element
PutOnBackVector(vector_list[i], el)
ExtractFromBackVectorAndPutOnSecondVector(list_vector, vector)
//prevent to enter in some infinite loop
if (iteration>max_iteration) break from while loop
You can change this by adding in first if some code witch increment with a amount the calculated deviation.
ExtractFromBackVectorAndPutOnSecondVector(list_vector, vector)
//delete second if from first version
Your problem is very similar to, or the same as, the minimum makespan scheduling problem, depending on how you define your objective. In the case that you want to minimize the maximum |sum_i - sigma|, it is exactly that problem.
As referenced in the Wikipedia article, this problem is NP-complete for p > 2. Graham's list scheduling algorithm is optimal for p <= 3, and provides an approximation ratio of 2 - 1/p. You can check out the Wikipedia article for other algorithms and their approximation.
All the algorithms given on this page are either solving for a different objective, incorrect/suboptimal, or can be used to solve any problem in NP :)
This is very similar to the case of the one-dimensional bin packing problem, see In the associated book, The Algorithm Design Manual, Skienna suggests a first-fit decreasing approach. I.e. figure out your bin size (mean = sum / N), and then allocate the largest remaining object into the first bin that has room for it. You either get to a point where you have to start over-filling a bin, or if you're lucky you get a perfect fit. As Skiena states "First-fit decreasing has an intuitive appeal to it, for we pack the bulky objects first and hope that little objects can fill up the cracks."
As a previous poster said, the problem looks like it's NP-complete, so you're not going to solve it perfectly in reasonable time, and you need to look for heuristics.
I recently needed this and did as follows;
create an initial sub-arrays array of length given sub arrays count. sub arrays should have a sum property too. ie [[sum:0],[sum:0]...[sum:0]]
sort the main array descending.
search for the sub-array with the smallest sum and insert one item from main array and increment the sub arrays sum property by the inserted item's value.
repeat item 3 up until the end of main array is reached.
return the initial array.
This is the code in JS.
function groupTasks(tasks,groupCount){
var sum = tasks.reduce((p,c) => p+c),
initial = [...Array(groupCount)].map(sa => (sa = [], sa.sum = 0, sa));
return tasks.sort((a,b) => b-a)
.reduce((groups,task) => { var group = groups.reduce((p,c) => p.sum < c.sum ? p : c);
group.sum += task;
return groups;
var tasks = [...Array(50)].map(_ => ~~(Math.random()*10)+1), // create an array of 100 random elements among 1 to 10
result = groupTasks(tasks,7); // distribute them into 10 sub arrays with closest sums
console.log("input array:", JSON.stringify(tasks));
console.log(> [JSON.stringify(r),"sum: " + r.sum]));
You can use Max Flow algorithm.

Generating also non-unique (duplicated) permutations

I've written a basic permutation program in C.
The user types a number, and it prints all the permutations of that number.
Basically, this is how it works (the main algorithm is the one used to find the next higher permutation):
int currentPerm = toAscending(num);
int lastPerm = toDescending(num);
int counter = 1;
printf("%d", currentPerm);
while (currentPerm != lastPerm)
currentPerm = nextHigherPerm(currentPerm);
printf("%d", currentPerm);
However, when the number input includes repeated digits - duplicates - some permutations are not being generated, since they're duplicates. The counter shows a different number than it's supposed to - Instead of showing the factorial of the number of digits in the number, it shows a smaller number, of only unique permutations.
For example:
num = 1234567
counter = 5040 (!7 - all unique)
num = 1123456
counter = 2520
num = 1112345
counter = 840
I want to it to treat repeated/duplicated digits as if they were different - I don't want to generate only unique permutations - but rather generate all the permutations, regardless of whether they're repeated and duplicates of others.
Uhm... why not just calculate the factorial of the length of the input string then? ;)
I want to it to treat repeated/duplicated digits as if they were
different - I don't want to calculate only the number of unique
If the only information that nextHigherPerm() uses is the number that's passed in, you're out of luck. Consider nextHigherPerm(122). How can the function know how many versions of 122 it has already seen? Should nextHigherPerm(122) return 122 or 212? There's no way to know unless you keep track of the current state of the generator separately.
When you have 3 letters for example ABC, you can make: ABC, ACB, BAC, BCA, CAB, CBA, 6 combinations (6!). If 2 of those letters repeat like AAB, you can make: AAB, ABA, BAA, IT IS NOT 3! so What is it? From where does it comes from? The real way to calculate it when a digit or letter is repeated is with combinations -> ( n k ) = n! / ( n! * ( n! - k! ) )
Let's make another illustrative example: AAAB, then the possible combinations are AAAB, AABA, ABAA, BAAA only four combinations, and if you calcualte them by the formula 4C3 = 4.
How is the correct procedure to generate all these lists:
Store the digits in an array. Example ABCD.
Set the 0 element of the array as the pivot element, and exclude it from the temp array. A {BCD}
Then as you want all the combinations (Even the repeated), move the elements of the temporal array to the right or left (However you like) until you reach the n element.
Do the second step again but with the temp array.
Do the third step again but inside the second temp array.
Go to the first array and move the array, BCDA, set B as pivot, and do this until you find all combinations.
Why not convert it to a string then treat your program like an anagram generator?
