I want to write a function that takes an array of letters as an argument and a number of those letters to select.
Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:
8! / ((8 - 3)! * 3!) = 56
Arrays (or words) in return consisting of 3 letters each.
Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.
Gray Codes
An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- 20C3 = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.
Some of the original papers describing gray codes:
Some Hamilton Paths and a Minimal Change Algorithm
Adjacent Interchange Combination Generation Algorithm
Here are some other papers covering the topic:
An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
Combination Generators
Survey of Combinatorial Gray Codes (PostScript)
An Algorithm for Gray Codes
Chase's Twiddle (algorithm)
Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)
The algorithm in C...
Index of Combinations in Lexicographical Order (Buckles Algorithm 515)
You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.
So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).
The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Note:
Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
This method has an implicit 0 to start the set for the first difference.
Index of Combinations in Lexicographical Order (McCaffrey)
There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:
The set that maximizes , where .
For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:
(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
(* maximize function -- maximize a that is aCb *)
(* return largest c where c < i and choose(c,i) <= z *)
let rec maximize a b x =
if (choose a b ) <= x then a else maximize (a-1) b x
in
let rec iterate n x i = match i with
| 0 -> []
| i ->
let max = maximize n i x in
max :: iterate n (x - (choose max i)) (i-1)
in
if x < 0 then failwith "errors" else
let idxs = iterate (List.length set) x k in
List.map (List.nth set) (List.sort (-) idxs)
A small and simple combinations iterator
The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k.
We will start with the iterator, which will call a user provided function for each combination
let iter_combs n k f =
let rec iter v s j =
if j = k then f v
else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
iter [] 0 0
A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,
let fold_combs n k f x =
let rec loop i s c x =
if i < n then
loop (i+1) s c ##
let c = i::c and s = s + 1 and i = i + 1 in
if s < k then loop i s c x else f c x
else x in
loop 0 0 [] x
In C#:
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));
}
Usage:
var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);
Result:
123
124
125
134
135
145
234
235
245
345
Short java solution:
import java.util.Arrays;
public class Combination {
public static void main(String[] args){
String[] arr = {"A","B","C","D","E","F"};
combinations2(arr, 3, 0, new String[3]);
}
static void combinations2(String[] arr, int len, int startPosition, String[] result){
if (len == 0){
System.out.println(Arrays.toString(result));
return;
}
for (int i = startPosition; i <= arr.length-len; i++){
result[result.length - len] = arr[i];
combinations2(arr, len-1, i+1, result);
}
}
}
Result will be
[A, B, C]
[A, B, D]
[A, B, E]
[A, B, F]
[A, C, D]
[A, C, E]
[A, C, F]
[A, D, E]
[A, D, F]
[A, E, F]
[B, C, D]
[B, C, E]
[B, C, F]
[B, D, E]
[B, D, F]
[B, E, F]
[C, D, E]
[C, D, F]
[C, E, F]
[D, E, F]
May I present my recursive Python solution to this problem?
def choose_iter(elements, length):
for i in xrange(len(elements)):
if length == 1:
yield (elements[i],)
else:
for next in choose_iter(elements[i+1:], length-1):
yield (elements[i],) + next
def choose(l, k):
return list(choose_iter(l, k))
Example usage:
>>> len(list(choose_iter("abcdefgh",3)))
56
I like it for its simplicity.
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
Written in code this look something like that
void print_combinations(const char *string)
{
int i, j, k;
int len = strlen(string);
for (i = 0; i < len - 2; i++)
{
for (j = i + 1; j < len - 1; j++)
{
for (k = j + 1; k < len; k++)
printf("%c%c%c\n", string[i], string[j], string[k]);
}
}
}
The following recursive algorithm picks all of the k-element combinations from an ordered set:
choose the first element i of your combination
combine i with each of the combinations of k-1 elements chosen recursively from the set of elements larger than i.
Iterate the above for each i in the set.
It is essential that you pick the rest of the elements as larger than i, to avoid repetition. This way [3,5] will be picked only once, as [3] combined with [5], instead of twice (the condition eliminates [5] + [3]). Without this condition you get variations instead of combinations.
Short example in Python:
def comb(sofar, rest, n):
if n == 0:
print sofar
else:
for i in range(len(rest)):
comb(sofar + rest[i], rest[i+1:], n-1)
>>> comb("", "abcde", 3)
abc
abd
abe
acd
ace
ade
bcd
bce
bde
cde
For explanation, the recursive method is described with the following example:
Example: A B C D E
All combinations of 3 would be:
A with all combinations of 2 from the rest (B C D E)
B with all combinations of 2 from the rest (C D E)
C with all combinations of 2 from the rest (D E)
I found this thread useful and thought I would add a Javascript solution that you can pop into Firebug. Depending on your JS engine, it could take a little time if the starting string is large.
function string_recurse(active, rest) {
if (rest.length == 0) {
console.log(active);
} else {
string_recurse(active + rest.charAt(0), rest.substring(1, rest.length));
string_recurse(active, rest.substring(1, rest.length));
}
}
string_recurse("", "abc");
The output should be as follows:
abc
ab
ac
a
bc
b
c
In C++ the following routine will produce all combinations of length distance(first,k) between the range [first,last):
#include <algorithm>
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
{
/* Credits: Mark Nelson http://marknelson.us */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator i1 = first;
Iterator i2 = last;
++i1;
if (last == i1)
return false;
i1 = last;
--i1;
i1 = k;
--i2;
while (first != i1)
{
if (*--i1 < *i2)
{
Iterator j = k;
while (!(*i1 < *j)) ++j;
std::iter_swap(i1,j);
++i1;
++j;
i2 = k;
std::rotate(i1,j,last);
while (last != j)
{
++j;
++i2;
}
std::rotate(k,i2,last);
return true;
}
}
std::rotate(first,k,last);
return false;
}
It can be used like this:
#include <string>
#include <iostream>
int main()
{
std::string s = "12345";
std::size_t comb_size = 3;
do
{
std::cout << std::string(s.begin(), s.begin() + comb_size) << std::endl;
} while (next_combination(s.begin(), s.begin() + comb_size, s.end()));
return 0;
}
This will print the following:
123
124
125
134
135
145
234
235
245
345
static IEnumerable<string> Combinations(List<string> characters, int length)
{
for (int i = 0; i < characters.Count; i++)
{
// only want 1 character, just return this one
if (length == 1)
yield return characters[i];
// want more than one character, return this one plus all combinations one shorter
// only use characters after the current one for the rest of the combinations
else
foreach (string next in Combinations(characters.GetRange(i + 1, characters.Count - (i + 1)), length - 1))
yield return characters[i] + next;
}
}
Simple recursive algorithm in Haskell
import Data.List
combinations 0 lst = [[]]
combinations n lst = do
(x:xs) <- tails lst
rest <- combinations (n-1) xs
return $ x : rest
We first define the special case, i.e. selecting zero elements. It produces a single result, which is an empty list (i.e. a list that contains an empty list).
For n > 0, x goes through every element of the list and xs is every element after x.
rest picks n - 1 elements from xs using a recursive call to combinations. The final result of the function is a list where each element is x : rest (i.e. a list which has x as head and rest as tail) for every different value of x and rest.
> combinations 3 "abcde"
["abc","abd","abe","acd","ace","ade","bcd","bce","bde","cde"]
And of course, since Haskell is lazy, the list is gradually generated as needed, so you can partially evaluate exponentially large combinations.
> let c = combinations 8 "abcdefghijklmnopqrstuvwxyz"
> take 10 c
["abcdefgh","abcdefgi","abcdefgj","abcdefgk","abcdefgl","abcdefgm","abcdefgn",
"abcdefgo","abcdefgp","abcdefgq"]
And here comes granddaddy COBOL, the much maligned language.
Let's assume an array of 34 elements of 8 bytes each (purely arbitrary selection.) The idea is to enumerate all possible 4-element combinations and load them into an array.
We use 4 indices, one each for each position in the group of 4
The array is processed like this:
idx1 = 1
idx2 = 2
idx3 = 3
idx4 = 4
We vary idx4 from 4 to the end. For each idx4 we get a unique combination
of groups of four. When idx4 comes to the end of the array, we increment idx3 by 1 and set idx4 to idx3+1. Then we run idx4 to the end again. We proceed in this manner, augmenting idx3,idx2, and idx1 respectively until the position of idx1 is less than 4 from the end of the array. That finishes the algorithm.
1 --- pos.1
2 --- pos 2
3 --- pos 3
4 --- pos 4
5
6
7
etc.
First iterations:
1234
1235
1236
1237
1245
1246
1247
1256
1257
1267
etc.
A COBOL example:
01 DATA_ARAY.
05 FILLER PIC X(8) VALUE "VALUE_01".
05 FILLER PIC X(8) VALUE "VALUE_02".
etc.
01 ARAY_DATA OCCURS 34.
05 ARAY_ITEM PIC X(8).
01 OUTPUT_ARAY OCCURS 50000 PIC X(32).
01 MAX_NUM PIC 99 COMP VALUE 34.
01 INDEXXES COMP.
05 IDX1 PIC 99.
05 IDX2 PIC 99.
05 IDX3 PIC 99.
05 IDX4 PIC 99.
05 OUT_IDX PIC 9(9).
01 WHERE_TO_STOP_SEARCH PIC 99 COMP.
* Stop the search when IDX1 is on the third last array element:
COMPUTE WHERE_TO_STOP_SEARCH = MAX_VALUE - 3
MOVE 1 TO IDX1
PERFORM UNTIL IDX1 > WHERE_TO_STOP_SEARCH
COMPUTE IDX2 = IDX1 + 1
PERFORM UNTIL IDX2 > MAX_NUM
COMPUTE IDX3 = IDX2 + 1
PERFORM UNTIL IDX3 > MAX_NUM
COMPUTE IDX4 = IDX3 + 1
PERFORM UNTIL IDX4 > MAX_NUM
ADD 1 TO OUT_IDX
STRING ARAY_ITEM(IDX1)
ARAY_ITEM(IDX2)
ARAY_ITEM(IDX3)
ARAY_ITEM(IDX4)
INTO OUTPUT_ARAY(OUT_IDX)
ADD 1 TO IDX4
END-PERFORM
ADD 1 TO IDX3
END-PERFORM
ADD 1 TO IDX2
END_PERFORM
ADD 1 TO IDX1
END-PERFORM.
Another C# version with lazy generation of the combination indices. This version maintains a single array of indices to define a mapping between the list of all values and the values for the current combination, i.e. constantly uses O(k) additional space during the entire runtime. The code generates individual combinations, including the first one, in O(k) time.
public static IEnumerable<T[]> Combinations<T>(this T[] values, int k)
{
if (k < 0 || values.Length < k)
yield break; // invalid parameters, no combinations possible
// generate the initial combination indices
var combIndices = new int[k];
for (var i = 0; i < k; i++)
{
combIndices[i] = i;
}
while (true)
{
// return next combination
var combination = new T[k];
for (var i = 0; i < k; i++)
{
combination[i] = values[combIndices[i]];
}
yield return combination;
// find first index to update
var indexToUpdate = k - 1;
while (indexToUpdate >= 0 && combIndices[indexToUpdate] >= values.Length - k + indexToUpdate)
{
indexToUpdate--;
}
if (indexToUpdate < 0)
yield break; // done
// update combination indices
for (var combIndex = combIndices[indexToUpdate] + 1; indexToUpdate < k; indexToUpdate++, combIndex++)
{
combIndices[indexToUpdate] = combIndex;
}
}
}
Test code:
foreach (var combination in new[] {'a', 'b', 'c', 'd', 'e'}.Combinations(3))
{
System.Console.WriteLine(String.Join(" ", combination));
}
Output:
a b c
a b d
a b e
a c d
a c e
a d e
b c d
b c e
b d e
c d e
Here is an elegant, generic implementation in Scala, as described on 99 Scala Problems.
object P26 {
def flatMapSublists[A,B](ls: List[A])(f: (List[A]) => List[B]): List[B] =
ls match {
case Nil => Nil
case sublist#(_ :: tail) => f(sublist) ::: flatMapSublists(tail)(f)
}
def combinations[A](n: Int, ls: List[A]): List[List[A]] =
if (n == 0) List(Nil)
else flatMapSublists(ls) { sl =>
combinations(n - 1, sl.tail) map {sl.head :: _}
}
}
If you can use SQL syntax - say, if you're using LINQ to access fields of an structure or array, or directly accessing a database that has a table called "Alphabet" with just one char field "Letter", you can adapt following code:
SELECT A.Letter, B.Letter, C.Letter
FROM Alphabet AS A, Alphabet AS B, Alphabet AS C
WHERE A.Letter<>B.Letter AND A.Letter<>C.Letter AND B.Letter<>C.Letter
AND A.Letter<B.Letter AND B.Letter<C.Letter
This will return all combinations of 3 letters, notwithstanding how many letters you have in table "Alphabet" (it can be 3, 8, 10, 27, etc.).
If what you want is all permutations, rather than combinations (i.e. you want "ACB" and "ABC" to count as different, rather than appear just once) just delete the last line (the AND one) and it's done.
Post-Edit: After re-reading the question, I realise what's needed is the general algorithm, not just a specific one for the case of selecting 3 items. Adam Hughes' answer is the complete one, unfortunately I cannot vote it up (yet). This answer's simple but works only for when you want exactly 3 items.
I had a permutation algorithm I used for project euler, in python:
def missing(miss,src):
"Returns the list of items in src not present in miss"
return [i for i in src if i not in miss]
def permutation_gen(n,l):
"Generates all the permutations of n items of the l list"
for i in l:
if n<=1: yield [i]
r = [i]
for j in permutation_gen(n-1,missing([i],l)): yield r+j
If
n<len(l)
you should have all combination you need without repetition, do you need it?
It is a generator, so you use it in something like this:
for comb in permutation_gen(3,list("ABCDEFGH")):
print comb
https://gist.github.com/3118596
There is an implementation for JavaScript. It has functions to get k-combinations and all combinations of an array of any objects. Examples:
k_combinations([1,2,3], 2)
-> [[1,2], [1,3], [2,3]]
combinations([1,2,3])
-> [[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]]
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
A B C D E F G H
^ ^ ^
i j k
First you vary k, so the next step looks like that:
A B C D E F G H
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
Once you j reached G you start also to vary i.
A B C D E F G H
^ ^ ^
i j k
A B C D E F G H
^ ^ ^
i j k
...
function initializePointers($cnt) {
$pointers = [];
for($i=0; $i<$cnt; $i++) {
$pointers[] = $i;
}
return $pointers;
}
function incrementPointers(&$pointers, &$arrLength) {
for($i=0; $i<count($pointers); $i++) {
$currentPointerIndex = count($pointers) - $i - 1;
$currentPointer = $pointers[$currentPointerIndex];
if($currentPointer < $arrLength - $i - 1) {
++$pointers[$currentPointerIndex];
for($j=1; ($currentPointerIndex+$j)<count($pointers); $j++) {
$pointers[$currentPointerIndex+$j] = $pointers[$currentPointerIndex]+$j;
}
return true;
}
}
return false;
}
function getDataByPointers(&$arr, &$pointers) {
$data = [];
for($i=0; $i<count($pointers); $i++) {
$data[] = $arr[$pointers[$i]];
}
return $data;
}
function getCombinations($arr, $cnt)
{
$len = count($arr);
$result = [];
$pointers = initializePointers($cnt);
do {
$result[] = getDataByPointers($arr, $pointers);
} while(incrementPointers($pointers, count($arr)));
return $result;
}
$result = getCombinations([0, 1, 2, 3, 4, 5], 3);
print_r($result);
Based on https://stackoverflow.com/a/127898/2628125, but more abstract, for any size of pointers.
Here you have a lazy evaluated version of that algorithm coded in C#:
static bool nextCombination(int[] num, int n, int k)
{
bool finished, changed;
changed = finished = false;
if (k > 0)
{
for (int i = k - 1; !finished && !changed; i--)
{
if (num[i] < (n - 1) - (k - 1) + i)
{
num[i]++;
if (i < k - 1)
{
for (int j = i + 1; j < k; j++)
{
num[j] = num[j - 1] + 1;
}
}
changed = true;
}
finished = (i == 0);
}
}
return changed;
}
static IEnumerable Combinations<T>(IEnumerable<T> elements, int k)
{
T[] elem = elements.ToArray();
int size = elem.Length;
if (k <= size)
{
int[] numbers = new int[k];
for (int i = 0; i < k; i++)
{
numbers[i] = i;
}
do
{
yield return numbers.Select(n => elem[n]);
}
while (nextCombination(numbers, size, k));
}
}
And test part:
static void Main(string[] args)
{
int k = 3;
var t = new[] { "dog", "cat", "mouse", "zebra"};
foreach (IEnumerable<string> i in Combinations(t, k))
{
Console.WriteLine(string.Join(",", i));
}
}
Hope this help you!
Another version, that forces all the first k to appear firstly, then all the first k+1 combinations, then all the first k+2 etc.. It means that if you have sorted array, the most important on the top, it would take them and expand gradually to the next ones - only when it is must do so.
private static bool NextCombinationFirstsAlwaysFirst(int[] num, int n, int k)
{
if (k > 1 && NextCombinationFirstsAlwaysFirst(num, num[k - 1], k - 1))
return true;
if (num[k - 1] + 1 == n)
return false;
++num[k - 1];
for (int i = 0; i < k - 1; ++i)
num[i] = i;
return true;
}
For instance, if you run the first method ("nextCombination") on k=3, n=5 you'll get:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
But if you'll run
int[] nums = new int[k];
for (int i = 0; i < k; ++i)
nums[i] = i;
do
{
Console.WriteLine(string.Join(" ", nums));
}
while (NextCombinationFirstsAlwaysFirst(nums, n, k));
You'll get this (I added empty lines for clarity):
0 1 2
0 1 3
0 2 3
1 2 3
0 1 4
0 2 4
1 2 4
0 3 4
1 3 4
2 3 4
It's adding "4" only when must to, and also after "4" was added it adds "3" again only when it must to (after doing 01, 02, 12).
Array.prototype.combs = function(num) {
var str = this,
length = str.length,
of = Math.pow(2, length) - 1,
out, combinations = [];
while(of) {
out = [];
for(var i = 0, y; i < length; i++) {
y = (1 << i);
if(y & of && (y !== of))
out.push(str[i]);
}
if (out.length >= num) {
combinations.push(out);
}
of--;
}
return combinations;
}
Clojure version:
(defn comb [k l]
(if (= 1 k) (map vector l)
(apply concat
(map-indexed
#(map (fn [x] (conj x %2))
(comb (dec k) (drop (inc %1) l)))
l))))
Algorithm:
Count from 1 to 2^n.
Convert each digit to its binary representation.
Translate each 'on' bit to elements of your set, based on position.
In C#:
void Main()
{
var set = new [] {"A", "B", "C", "D" }; //, "E", "F", "G", "H", "I", "J" };
var kElement = 2;
for(var i = 1; i < Math.Pow(2, set.Length); i++) {
var result = Convert.ToString(i, 2).PadLeft(set.Length, '0');
var cnt = Regex.Matches(Regex.Escape(result), "1").Count;
if (cnt == kElement) {
for(int j = 0; j < set.Length; j++)
if ( Char.GetNumericValue(result[j]) == 1)
Console.Write(set[j]);
Console.WriteLine();
}
}
}
Why does it work?
There is a bijection between the subsets of an n-element set and n-bit sequences.
That means we can figure out how many subsets there are by counting sequences.
e.g., the four element set below can be represented by {0,1} X {0, 1} X {0, 1} X {0, 1} (or 2^4) different sequences.
So - all we have to do is count from 1 to 2^n to find all the combinations. (We ignore the empty set.) Next, translate the digits to their binary representation. Then substitute elements of your set for 'on' bits.
If you want only k element results, only print when k bits are 'on'.
(If you want all subsets instead of k length subsets, remove the cnt/kElement part.)
(For proof, see MIT free courseware Mathematics for Computer Science, Lehman et al, section 11.2.2. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-mathematics-for-computer-science-fall-2010/readings/ )
short python code, yielding index positions
def yield_combos(n,k):
# n is set size, k is combo size
i = 0
a = [0]*k
while i > -1:
for j in range(i+1, k):
a[j] = a[j-1]+1
i=j
yield a
while a[i] == i + n - k:
i -= 1
a[i] += 1
All said and and done here comes the O'caml code for that.
Algorithm is evident from the code..
let combi n lst =
let rec comb l c =
if( List.length c = n) then [c] else
match l with
[] -> []
| (h::t) -> (combi t (h::c))#(combi t c)
in
combi lst []
;;
Here is a method which gives you all combinations of specified size from a random length string. Similar to quinmars' solution, but works for varied input and k.
The code can be changed to wrap around, ie 'dab' from input 'abcd' w k=3.
public void run(String data, int howMany){
choose(data, howMany, new StringBuffer(), 0);
}
//n choose k
private void choose(String data, int k, StringBuffer result, int startIndex){
if (result.length()==k){
System.out.println(result.toString());
return;
}
for (int i=startIndex; i<data.length(); i++){
result.append(data.charAt(i));
choose(data,k,result, i+1);
result.setLength(result.length()-1);
}
}
Output for "abcde":
abc abd abe acd ace ade bcd bce bde cde
Short javascript version (ES 5)
let combine = (list, n) =>
n == 0 ?
[[]] :
list.flatMap((e, i) =>
combine(
list.slice(i + 1),
n - 1
).map(c => [e].concat(c))
);
let res = combine([1,2,3,4], 3);
res.forEach(e => console.log(e.join()));
Another python recusive solution.
def combination_indicies(n, k, j = 0, stack = []):
if len(stack) == k:
yield list(stack)
return
for i in range(j, n):
stack.append(i)
for x in combination_indicies(n, k, i + 1, stack):
yield x
stack.pop()
list(combination_indicies(5, 3))
Output:
[[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
I created a solution in SQL Server 2005 for this, and posted it on my website: http://www.jessemclain.com/downloads/code/sql/fn_GetMChooseNCombos.sql.htm
Here is an example to show usage:
SELECT * FROM dbo.fn_GetMChooseNCombos('ABCD', 2, '')
results:
Word
----
AB
AC
AD
BC
BD
CD
(6 row(s) affected)
Here is my proposition in C++
I tried to impose as little restriction on the iterator type as i could so this solution assumes just forward iterator, and it can be a const_iterator. This should work with any standard container. In cases where arguments don't make sense it throws std::invalid_argumnent
#include <vector>
#include <stdexcept>
template <typename Fci> // Fci - forward const iterator
std::vector<std::vector<Fci> >
enumerate_combinations(Fci begin, Fci end, unsigned int combination_size)
{
if(begin == end && combination_size > 0u)
throw std::invalid_argument("empty set and positive combination size!");
std::vector<std::vector<Fci> > result; // empty set of combinations
if(combination_size == 0u) return result; // there is exactly one combination of
// size 0 - emty set
std::vector<Fci> current_combination;
current_combination.reserve(combination_size + 1u); // I reserve one aditional slot
// in my vector to store
// the end sentinel there.
// The code is cleaner thanks to that
for(unsigned int i = 0u; i < combination_size && begin != end; ++i, ++begin)
{
current_combination.push_back(begin); // Construction of the first combination
}
// Since I assume the itarators support only incrementing, I have to iterate over
// the set to get its size, which is expensive. Here I had to itrate anyway to
// produce the first cobination, so I use the loop to also check the size.
if(current_combination.size() < combination_size)
throw std::invalid_argument("combination size > set size!");
result.push_back(current_combination); // Store the first combination in the results set
current_combination.push_back(end); // Here I add mentioned earlier sentinel to
// simplyfy rest of the code. If I did it
// earlier, previous statement would get ugly.
while(true)
{
unsigned int i = combination_size;
Fci tmp; // Thanks to the sentinel I can find first
do // iterator to change, simply by scaning
{ // from right to left and looking for the
tmp = current_combination[--i]; // first "bubble". The fact, that it's
++tmp; // a forward iterator makes it ugly but I
} // can't help it.
while(i > 0u && tmp == current_combination[i + 1u]);
// Here is probably my most obfuscated expression.
// Loop above looks for a "bubble". If there is no "bubble", that means, that
// current_combination is the last combination, Expression in the if statement
// below evaluates to true and the function exits returning result.
// If the "bubble" is found however, the ststement below has a sideeffect of
// incrementing the first iterator to the left of the "bubble".
if(++current_combination[i] == current_combination[i + 1u])
return result;
// Rest of the code sets posiotons of the rest of the iterstors
// (if there are any), that are to the right of the incremented one,
// to form next combination
while(++i < combination_size)
{
current_combination[i] = current_combination[i - 1u];
++current_combination[i];
}
// Below is the ugly side of using the sentinel. Well it had to haave some
// disadvantage. Try without it.
result.push_back(std::vector<Fci>(current_combination.begin(),
current_combination.end() - 1));
}
}
Here is a code I recently wrote in Java, which calculates and returns all the combination of "num" elements from "outOf" elements.
// author: Sourabh Bhat (heySourabh#gmail.com)
public class Testing
{
public static void main(String[] args)
{
// Test case num = 5, outOf = 8.
int num = 5;
int outOf = 8;
int[][] combinations = getCombinations(num, outOf);
for (int i = 0; i < combinations.length; i++)
{
for (int j = 0; j < combinations[i].length; j++)
{
System.out.print(combinations[i][j] + " ");
}
System.out.println();
}
}
private static int[][] getCombinations(int num, int outOf)
{
int possibilities = get_nCr(outOf, num);
int[][] combinations = new int[possibilities][num];
int arrayPointer = 0;
int[] counter = new int[num];
for (int i = 0; i < num; i++)
{
counter[i] = i;
}
breakLoop: while (true)
{
// Initializing part
for (int i = 1; i < num; i++)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i] = counter[i - 1] + 1;
}
// Testing part
for (int i = 0; i < num; i++)
{
if (counter[i] < outOf)
{
continue;
} else
{
break breakLoop;
}
}
// Innermost part
combinations[arrayPointer] = counter.clone();
arrayPointer++;
// Incrementing part
counter[num - 1]++;
for (int i = num - 1; i >= 1; i--)
{
if (counter[i] >= outOf - (num - 1 - i))
counter[i - 1]++;
}
}
return combinations;
}
private static int get_nCr(int n, int r)
{
if(r > n)
{
throw new ArithmeticException("r is greater then n");
}
long numerator = 1;
long denominator = 1;
for (int i = n; i >= r + 1; i--)
{
numerator *= i;
}
for (int i = 2; i <= n - r; i++)
{
denominator *= i;
}
return (int) (numerator / denominator);
}
}
This is the code that I had tried to find the consecutive zero which are in the order of 5 or more.
a=[0,0,0,0,0,0,0,0,9,8,5,6,0,0,0,0,0,0,3,4,6,8,0,0,9,8,4,0,0,7,8,9,5,0,0,0,0,0,8,9,0,5,8,7,0,0,0,0,0];
[x,y]=size(a);
for i=0:y
i+1;
k=1;
l=0;
n=i;
count=0;
while (a==0)
count+1;
break;
n+1;
end
if(count>=5)
v([]);
for l=k:l<n
v(m)=l+1;
m+1;
end
end
count=1;
i=n;
end
for i = o : i<m
i+1;
fprintf('index of continous zero more than 5 or equal=%d',v(i));
end
If you want to find the starting indices of runs of n or more zeros:
v = find(conv(double(a==0),ones(1,n),'valid')==n); %// find n zeros
v = v([true diff(v)>n]); %// remove similar indices, indicating n+1, n+2... zeros
In your example, this gives
v =
1 13 34 45
One-liner strfind approach to find the starting indices of 5 consecutive zeros -
out = strfind(['0' num2str(a==0,'%1d')],'011111')
Output -
out =
1 13 34 45
The above code could be generalised like this -
n = 5 %// number of consecutive matches
match = 0 %// match to be used
out = strfind(['0' num2str(a==match,'%1d')],['0' repmat('1',1,n)]) %// starting indices of n consecutive matches
If you are looking to find all the indices where the n consecutive matches were found, you can add this code -
outb = strfind([num2str(a==match,'%1d'),'0'],[repmat('1',1,n) '0'])+n-1
allind = find(any(bsxfun(#ge,1:numel(a),out') & bsxfun(#le,1:numel(a),outb')))
If you want to find the general case of a "run of n or more values x in vector V", you could do the following:
% your particular case:
n = 5;
x = 0;
V = [0,0,0,0,0,0,0,0,9,8,5,6,0,0,0,0, ...
0,0,3,4,6,8,0,0,9,8,4,0,0,7,8,9, ...
5,0,0,0,0,0,8,9,0,5,8,7,0,0,0,0,0];
b = (V == x); % create boolean array: ones and zeros
d = diff( [0 b 0] ); % turn the start and end of a run into +1 and -1
startRun = find( d==1 );
endRun = find( d==-1 );
runlength = endRun - startRun;
answer = find(runlength > n);
runs = runlength(answer);
disp([answer(:) runs(:)]);
This will display the start of the run, and its length, for all runs > n of value x.
given an array of 0s and 1s, find maximum subarray such that number of zeros and 1s are equal.
This needs to be done in O(n) time and O(1) space.
I have an algo which does it in O(n) time and O(n) space. It uses a prefix sum array and exploits the fact that if the number of 0s and 1s are same then
sumOfSubarray = lengthOfSubarray/2
#include<iostream>
#define M 15
using namespace std;
void getSum(int arr[],int prefixsum[],int size) {
int i;
prefixsum[0]=arr[0]=0;
prefixsum[1]=arr[1];
for (i=2;i<=size;i++) {
prefixsum[i]=prefixsum[i-1]+arr[i];
}
}
void find(int a[],int &start,int &end) {
while(start < end) {
int mid = (start +end )/2;
if((end-start+1) == 2 * (a[end] - a[start-1]))
break;
if((end-start+1) > 2 * (a[end] - a[start-1])) {
if(a[start]==0 && a[end]==1)
start++; else
end--;
} else {
if(a[start]==1 && a[end]==0)
start++; else
end--;
}
}
}
int main() {
int size,arr[M],ps[M],start=1,end,width;
;
cin>>size;
arr[0]=0;
end=size;
for (int i=1;i<=size;i++)
cin>>arr[i];
getSum(arr,ps,size);
find(ps,start,end);
if(start!=end)
cout<<(start-1)<<" "<<(end-1)<<endl; else cout<<"No soln\n";
return 0;
}
Now my algorithm is O(n) time and O(Dn) space where Dn is the total imblance in the list.
This solution doesn't modify the list.
let D be the difference of 1s and 0s found in the list.
First, let's step linearily through the list and calculate D, just to see how it works:
I'm gonna use this list as an example : l=1100111100001110
Element D
null 0
1 1
1 2 <-
0 1
0 0
1 1
1 2
1 3
1 4
0 3
0 2
0 1
0 0
1 1
1 2
1 3
0 2 <-
Finding the longest balanced subarray is equivalent to finding 2 equal elements in D that are the more far appart. (in this example the 2 2s marked with arrows.)
The longest balanced subarray is between first occurence of element +1 and last occurence of element. (first arrow +1 and last arrow : 00111100001110)
Remark:
The longest subarray will always be between 2 elements of D that are
between [0,Dn] where Dn is the last element of D. (Dn = 2 in the
previous example) Dn is the total imbalance between 1s and 0s in the
list. (or [Dn,0] if Dn is negative)
In this example it means that I don't need to "look" at 3s or 4s
Proof:
Let Dn > 0 .
If there is a subarray delimited by P (P > Dn). Since 0 < Dn < P,
before reaching the first element of D which is equal to P we reach one
element equal to Dn. Thus, since the last element of the list is equal to Dn, there is a longest subarray delimited by Dns than the one delimited by Ps.And therefore we don't need to look at Ps
P cannot be less than 0 for the same reasons
the proof is the same for Dn <0
Now let's work on D, D isn't random, the difference between 2 consecutive element is always 1 or -1. Ans there is an easy bijection between D and the initial list. Therefore I have 2 solutions for this problem:
the first one is to keep track of first and last appearance of each
element in D that are between 0 and Dn (cf remark).
second is to transform the list into D, and then work on D.
FIRST SOLUTION
For the time being I cannot find a better approach than the first one:
First calculate Dn (in O(n)) . Dn=2
Second instead of creating D, create a dictionnary where the keys are the value of D (between [0 and Dn]) and the value of each keys is a couple (a,b) where a is the first occurence of the key and b the last.
Element D DICTIONNARY
null 0 {0:(0,0)}
1 1 {0:(0,0) 1:(1,1)}
1 2 {0:(0,0) 1:(1,1) 2:(2,2)}
0 1 {0:(0,0) 1:(1,3) 2:(2,2)}
0 0 {0:(0,4) 1:(1,3) 2:(2,2)}
1 1 {0:(0,4) 1:(1,5) 2:(2,2)}
1 2 {0:(0,4) 1:(1,5) 2:(2,6)}
1 3 { 0:(0,4) 1:(1,5) 2:(2,6)}
1 4 {0:(0,4) 1:(1,5) 2:(2,6)}
0 3{0:(0,4) 1:(1,5) 2:(2,6) }
0 2 {0:(0,4) 1:(1,5) 2:(2,9) }
0 1 {0:(0,4) 1:(1,10) 2:(2,9) }
0 0 {0:(0,11) 1:(1,10) 2:(2,9) }
1 1 {0:(0,11) 1:(1,12) 2:(2,9) }
1 2 {0:(0,11) 1:(1,12) 2:(2,13)}
1 3 {0:(0,11) 1:(1,12) 2:(2,13)}
0 2 {0:(0,11) 1:(1,12) 2:(2,15)}
and you chose the element with the largest difference : 2:(2,15) and is l[3:15]=00111100001110 (with l=1100111100001110).
Time complexity :
2 passes, the first one to caclulate Dn, the second one to build the
dictionnary.
find the max in the dictionnary.
Total is O(n)
Space complexity:
the current element in D : O(1) the dictionnary O(Dn)
I don't take 3 and 4 in the dictionnary because of the remark
The complexity is O(n) time and O(Dn) space (in average case Dn <<
n).
I guess there is may be a better way than a dictionnary for this approach.
Any suggestion is welcome.
Hope it helps
SECOND SOLUTION (JUST AN IDEA NOT THE REAL SOLUTION)
The second way to proceed would be to transform your list into D. (since it's easy to go back from D to the list it's ok). (O(n) time and O(1) space, since I transform the list in place, even though it might not be a "valid" O(1) )
Then from D you need to find the 2 equal element that are the more far appart.
it looks like finding the longest cycle in a linked list, A modification of Richard Brent algorithm might return the longest cycle but I don't know how to do it, and it would take O(n) time and O(1) space.
Once you find the longest cycle, go back to the first list and print it.
This algorithm would take O(n) time and O(1) space complexity.
Different approach but still O(n) time and memory. Start with Neil's suggestion, treat 0 as -1.
Notation: A[0, …, N-1] - your array of size N, f(0)=0, f(x)=A[x-1]+f(x-1) - a function
If you'd plot f, you'll see, that what you look for are points for which f(m)=f(n), m=n-2k where k-positive natural. More precisely, only for x such that A[x]!=A[x+1] (and the last element in an array) you must check whether f(x) already occurred. Unfortunately, now I see no improvement over having array B[-N+1…N-1] where such information would be stored.
To complete my thought: B[x]=-1 initially, B[x]=p when p = min k: f(k)=x . And the algorithm is (double-check it, as I'm very tired):
fx = 0
B = new array[-N+1, …, N-1]
maxlen = 0
B[0]=0
for i=1…N-1 :
fx = fx + A[i-1]
if B[fx]==-1 :
B[fx]=i
else if ((i==N-1) or (A[i-1]!=A[i])) and (maxlen < i-B[fx]):
We found that A[B[fx], …, i] is best than what we found so far
maxlen = i-B[fx]
Edit: Two bed-thoughts (= figured out while laying in bed :P ):
1) You could binary search the result by the length of subarray, which would give O(n log n) time and O(1) memory algorithm. Let's use function g(x)=x - x mod 2 (because subarrays which sum to 0 are always of even length). Start by checking, if the whole array sums to 0. If yes -- we're done, otherwise continue. We now assume 0 as starting point (we know there's subarray of such length and "summing-to-zero property") and g(N-1) as ending point (we know there's no such subarray). Let's do
a = 0
b = g(N-1)
while a<b :
c = g((a+b)/2)
check if there is such subarray in O(n) time
if yes:
a = c
if no:
b = c
return the result: a (length of maximum subarray)
Checking for subarray with "summing-to-zero property" of some given length L is simple:
a = 0
b = L
fa = fb = 0
for i=0…L-1:
fb = fb + A[i]
while (fa != fb) and (b<N) :
fa = fa + A[a]
fb = fb + A[b]
a = a + 1
b = b + 1
if b==N:
not found
found, starts at a and stops at b
2) …can you modify input array? If yes and if O(1) memory means exactly, that you use no additional space (except for constant number of elements), then just store your prefix table values in your input array. No more space used (except for some variables) :D
And again, double check my algorithms as I'm veeery tired and could've done off-by-one errors.
Like Neil, I find it useful to consider the alphabet {±1} instead of {0, 1}. Assume without loss of generality that there are at least as many +1s as -1s. The following algorithm, which uses O(sqrt(n log n)) bits and runs in time O(n), is due to "A.F."
Note: this solution does not cheat by assuming the input is modifiable and/or has wasted bits. As of this edit, this solution is the only one posted that is both O(n) time and o(n) space.
A easier version, which uses O(n) bits, streams the array of prefix sums and marks the first occurrence of each value. It then scans backward, considering for each height between 0 and sum(arr) the maximal subarray at that height. Some thought reveals that the optimum is among these (remember the assumption). In Python:
sum = 0
min_so_far = 0
max_so_far = 0
is_first = [True] * (1 + len(arr))
for i, x in enumerate(arr):
sum += x
if sum < min_so_far:
min_so_far = sum
elif sum > max_so_far:
max_so_far = sum
else:
is_first[1 + i] = False
sum_i = 0
i = 0
while sum_i != sum:
sum_i += arr[i]
i += 1
sum_j = sum
j = len(arr)
longest = j - i
for h in xrange(sum - 1, -1, -1):
while sum_i != h or not is_first[i]:
i -= 1
sum_i -= arr[i]
while sum_j != h:
j -= 1
sum_j -= arr[j]
longest = max(longest, j - i)
The trick to get the space down comes from noticing that we're scanning is_first sequentially, albeit in reverse order relative to its construction. Since the loop variables fit in O(log n) bits, we'll compute, instead of is_first, a checkpoint of the loop variables after each O(√(n log n)) steps. This is O(n/√(n log n)) = O(√(n/log n)) checkpoints, for a total of O(√(n log n)) bits. By restarting the loop from a checkpoint, we compute on demand each O(√(n log n))-bit section of is_first.
(P.S.: it may or may not be my fault that the problem statement asks for O(1) space. I sincerely apologize if it was I who pulled a Fermat and suggested that I had a solution to a problem much harder than I thought it was.)
If indeed your algorithm is valid in all cases (see my comment to your question noting some corrections to it), notice that the prefix array is the only obstruction to your constant memory goal.
Examining the find function reveals that this array can be replaced with two integers, thereby eliminating the dependence on the length of the input and solving your problem. Consider the following:
You only depend on two values in the prefix array in the find function. These are a[start - 1] and a[end]. Yes, start and end change, but does this merit the array?
Look at the progression of your loop. At the end, start is incremented or end is decremented only by one.
Considering the previous statement, if you were to replace the value of a[start - 1] by an integer, how would you update its value? Put another way, for each transition in the loop that changes the value of start, what could you do to update the integer accordingly to reflect the new value of a[start - 1]?
Can this process can be repeated with a[end]?
If, in fact, the values of a[start - 1] and a[end] can be reflected with two integers, doesn't the whole prefix array no longer serve a purpose? Can't it therefore be removed?
With no need for the prefix array and all storage dependencies on the length of the input removed, your algorithm will use a constant amount of memory to achieve its goal, thereby making it O(n) time and O(1) space.
I would prefer you solve this yourself based on the insights above, as this is homework. Nevertheless, I have included a solution below for reference:
#include <iostream>
using namespace std;
void find( int *data, int &start, int &end )
{
// reflects the prefix sum until start - 1
int sumStart = 0;
// reflects the prefix sum until end
int sumEnd = 0;
for( int i = start; i <= end; i++ )
sumEnd += data[i];
while( start < end )
{
int length = end - start + 1;
int sum = 2 * ( sumEnd - sumStart );
if( sum == length )
break;
else if( sum < length )
{
// sum needs to increase; get rid of the lower endpoint
if( data[ start ] == 0 && data[ end ] == 1 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
else
{
// sum needs to decrease; get rid of the higher endpoint
if( data[ start ] == 1 && data[ end ] == 0 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
}
}
int main() {
int length;
cin >> length;
// get the data
int data[length];
for( int i = 0; i < length; i++ )
cin >> data[i];
// solve and print the solution
int start = 0, end = length - 1;
find( data, start, end );
if( start == end )
puts( "No soln" );
else
printf( "%d %d\n", start, end );
return 0;
}
This algorithm is O(n) time and O(1) space. It may modify the source array, but it restores all the information back. So it is not working with const arrays. If this puzzle has several solutions, this algorithm picks the solution nearest to the array beginning. Or it might be modified to provide all solutions.
Algorithm
Variables:
p1 - subarray start
p2 - subarray end
d - difference of 1s and 0s in the subarray
Calculate d, if d==0, stop. If d<0, invert the array and after balanced subarray is found invert it back.
While d > 0 advance p2: if the array element is 1, just decrement both p2 and d. Otherwise p2 should pass subarray of the form 11*0, where * is some balanced subarray. To make backtracking possible, 11*0? is changed to 0?*00 (where ? is the value next to the subarray). Then d is decremented.
Store p1 and p2.
Backtrack p2: if the array element is 1, just increment p2. Otherwise we found element, changed on step 2. Revert the changes and pass subarray of the form 11*0.
Advance p1: if the array element is 1, just increment p1. Otherwise p1 should pass subarray of the form 0*11.
Store p1 and p2, if p2 - p1 improved.
If p2 is at the end of the array, stop. Otherwise continue with step 4.
How does it work
Algorithm iterates through all possible positions of the balanced subarray in the input array. For each subarray position p1 and p2 are kept as far from each other as possible, providing locally longest subarray. Subarray with maximum length is chosen between all these subarrays.
To determine the next best position for p1, it is advanced to the first position where the balance between 1s and 0s is changed by one. (Step 5).
To determine the next best position for p2, it is advanced to the last position where the balance between 1s and 0s is changed by one. To make it possible, step 2 detects all such positions (starting from the array's end) and modifies the array in such a way, that it is possible to iterate through these positions with linear search. (Step 4).
While performing step 2, two possible conditions may be met. Simple one: when value '1' is found; pointer p2 is just advanced to the next value, no special treatment needed. But when value '0' is found, balance is going in wrong direction, it is necessary to pass through several bits until correct balance is found. All these bits are of no interest to the algorithm, stopping p2 there will give either a balanced subarray, which is too short, or a disbalanced subarray. As a result, p2 should pass subarray of the form 11*0 (from right to left, * means any balanced subarray). There is no chance to go the same way in other direction. But it is possible to temporary use some bits from the pattern 11*0 to allow backtracking. If we change first '1' to '0', second '1' to the value next to the rightmost '0', and clear the value next to the rightmost '0': 11*0? -> 0?*00, then we get the possibility to (first) notice the pattern on the way back, since it starts with '0', and (second) find the next good position for p2.
C++ code:
#include <cstddef>
#include <bitset>
static const size_t N = 270;
void findLargestBalanced(std::bitset<N>& a, size_t& p1s, size_t& p2s)
{
// Step 1
size_t p1 = 0;
size_t p2 = N;
int d = 2 * a.count() - N;
bool flip = false;
if (d == 0) {
p1s = 0;
p2s = N;
return;
}
if (d < 0) {
flip = true;
d = -d;
a.flip();
}
// Step 2
bool next = true;
while (d > 0) {
if (p2 < N) {
next = a[p2];
}
--d;
--p2;
if (a[p2] == false) {
if (p2+1 < N) {
a[p2+1] = false;
}
int dd = 2;
while (dd > 0) {
dd += (a[--p2]? -1: 1);
}
a[p2+1] = next;
a[p2] = false;
}
}
// Step 3
p2s = p2;
p1s = p1;
do {
// Step 4
if (a[p2] == false) {
a[p2++] = true;
bool nextToRestore = a[p2];
a[p2++] = true;
int dd = 2;
while (dd > 0 && p2 < N) {
dd += (a[p2++]? 1: -1);
}
if (dd == 0) {
a[--p2] = nextToRestore;
}
}
else {
++p2;
}
// Step 5
if (a[p1++] == false) {
int dd = 2;
while (dd > 0) {
dd += (a[p1++]? -1: 1);
}
}
// Step 6
if (p2 - p1 > p2s - p1s) {
p2s = p2;
p1s = p1;
}
} while (p2 < N);
if (flip) {
a.flip();
}
}
Sum all elements in the array, then diff = (array.length - sum) will be the difference in number of 0s and 1s.
If diff is equal to array.length/2, then the maximum subarray = array.
If diff is less than array.length/2 then there are more 1s than 0s.
If diff is greater than array.length/2 then there are more 0s than 1s.
For cases 2 & 3, initialize two pointers, start & end pointing to beginning and end of array. If we have more 1s, then move the pointers inward (start++ or end--) based on whether array[start] = 1 or array[end] = 1, and update sum accordingly. At each step check if sum = (end - start) / 2. If this condition is true, then start and end represent the bounds of your maximum subarray.
Here we end up doing two passes of the array, once to calculate sum, and once which moving the pointers inward. And we are using constant space as we just need to store sum and two index values.
If anyone wants to knock up some pseudocode, you're more than welcome :)
Here's an actionscript solution that looked like it was scaling O(n). Though it might be more like O(n log n). It definitely uses only O(1) memory.
Warning I haven't checked how complete it is. I could be missing some cases.
protected function findLongest(array:Array, start:int = 0, end:int = -1):int {
if (end < start) {
end = array.length-1;
}
var startDiff:int = 0;
var endDiff:int = 0;
var diff:int = 0;
var length:int = end-start;
for (var i:int = 0; i <= length; i++) {
if (array[i+start] == '1') {
startDiff++;
} else {
startDiff--;
}
if (array[end-i] == '1') {
endDiff++;
} else {
endDiff--;
}
//We can stop when there's no chance of equalizing anymore.
if (Math.abs(startDiff) > length - i) {
diff = endDiff;
start = end - i;
break;
} else if (Math.abs(endDiff) > length - i) {
diff = startDiff;
end = i+start;
break;
}
}
var bit:String = diff > 0 ? '1': '0';
var diffAdjustment:int = diff > 0 ? -1: 1;
//Strip off the bad vars off the ends.
while (diff != 0 && array[start] == bit) {
start++;
diff += diffAdjustment;
}
while(diff != 0 && array[end] == bit) {
end--;
diff += diffAdjustment;
}
//If we have equalized end. Otherwise recurse within the sub-array.
if (diff == 0)
return end-start+1;
else
return findLongest(array, start, end);
}
I would argue that it is impossible, that an algorithm with O(1) exists, in the following way. Assume you iterate ONCE over every bit. This requires a counter which needs the space of O(log n). Possibly one could argue that n itself is part of the problem instance, then you have as input length for a binary string of the length k: k + 2-log k. Regardless how you look over them you need an additional variable, on case you need an index into that array, that already makes it non O(1).
Usually you dont have this problem, because you have for an problem of the size n, an input of n numbers of the size log k, which adds up to nlog k. Here a variable of length log k is just O(1). But here our log k is just 1. So we can only introduce a help variable that has constant length (and I mean really constant, it must be limited regardless how big the n is).
Here one problem is the description of the problem comes visible. In computer theory you have to be very careful about your encoding. E.g. you can make NP problems polynomial if you switch to unary encoding (because then input size is exponential bigger than in a n-ary (n>1) encoding.
As for n the input has just the size 2-log n, one must be careful. When you speak in this case of O(n) - this is really an algorithm that is O(2^n) (This is no point we need to discuss about - because one can argue whether the n itself is part of the description or not).
I have this algorithm running in O(n) time and O(1) space.
It makes use of simple "shrink-then-expand" trick. Comments in codes.
public static void longestSubArrayWithSameZerosAndOnes() {
// You are given an array of 1's and 0's only.
// Find the longest subarray which contains equal number of 1's and 0's
int[] A = new int[] {1, 0, 1, 1, 1, 0, 0,0,1};
int num0 = 0, num1 = 0;
// First, calculate how many 0s and 1s in the array
for(int i = 0; i < A.length; i++) {
if(A[i] == 0) {
num0++;
}
else {
num1++;
}
}
if(num0 == 0 || num1 == 0) {
System.out.println("The length of the sub-array is 0");
return;
}
// Second, check the array to find a continuous "block" that has
// the same number of 0s and 1s, starting from the HEAD and the
// TAIL of the array, and moving the 2 "pointer" (HEAD and TAIL)
// towards the CENTER of the array
int start = 0, end = A.length - 1;
while(num0 != num1 && start < end) {
if(num1 > num0) {
if(A[start] == 1) {
num1--; start++;
}
else if(A[end] == 1) {
num1--; end--;
}
else {
num0--; start++;
num0--; end--;
}
}
else if(num1 < num0) {
if(A[start] == 0) {
num0--; start++;
}
else if(A[end] == 0) {
num0--; end--;
}
else {
num1--; start++;
num1--; end--;
}
}
}
if(num0 == 0 || num1 == 0) {
start = end;
end++;
}
// Third, expand the continuous "block" just found at step #2 by
// moving "HEAD" to head of the array and "TAIL" to the end of
// the array, while still keeping the "block" balanced(containing
// the same number of 0s and 1s
while(0 < start && end < A.length - 1) {
if(A[start - 1] == 0 && A[end + 1] == 0 || A[start - 1] == 1 && A[end + 1] == 1) {
break;
}
start--;
end++;
}
System.out.println("The length of the sub-array is " + (end - start + 1) + ", starting from #" + start + " to #" + end);
}
linear time, constant space. Let me know if there is any bug I missed.
tested in python3.
def longestBalancedSubarray(A):
lo,hi = 0,len(A)-1
ones = sum(A);zeros = len(A) - ones
while lo < hi:
if ones == zeros: break
else:
if ones > zeros:
if A[lo] == 1: lo+=1; ones-=1
elif A[hi] == 1: hi+=1; ones-=1
else: lo+=1; zeros -=1
else:
if A[lo] == 0: lo+=1; zeros-=1
elif A[hi] == 0: hi+=1; zeros-=1
else: lo+=1; ones -=1
return(A[lo:hi+1])