How to find repeating sequence of characters in a given array? - arrays

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.
1: | J | A | M | E | S | O | N | J | A | M | E | S | O | N |
2: | R | O | N | R | O | N | R | O | N | R | O | N | R | O | N |
3: | S | H | A | M | I | L | S | H | A | M | I | L |
4: | C | A | R | P | E | N | T | E | R | C | A | R | P | E | N | T | E | R |
Given the previous data, the result should be:
How to deal with this problem efficiently?

Tongue-in-cheek O(NlogN) solution
Perform an FFT on your string (treating characters as numeric values). Every peak in the resulting graph corresponds to a substring periodicity.

For your examples, my first approach would be to
get the first character of the array (for your last example, that would be C)
get the index of the next appearance of that character in the array (e.g. 9)
if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
if it is found, you're done (and the result is this substring).
Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category - and I prefer the simplest solution which could possibly work :-)
If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).
Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.

In Python, you can leverage regexes thus:
def recurrence(text):
import re
for i in range(1, len(text)/2 + 1):
m = re.match(r'^(.{%d})\1+$'%i, text)
if m: return
recurrence('abcabc') # Returns 'abc'
I'm not sure how this would translate to Java or C. (That's one of the reasons I like Python, I guess. :-)

First write a method that find repeating substring sub in the container string as below.
boolean findSubRepeating(String sub, String container);
Now keep calling this method with increasing substring in the container, first try 1 character substring, then 2 characters, etc going upto container.length/2.

len = str.length
for (i in 1..len) {
if (len%i==0) {
if (str==str.substr(0,i).repeat(len/i)) {
return str.substr(0,i)
Note: For brevity, I'm inventing a "repeat" method for strings, which isn't actually part of Java's string; "abc".repeat(2)="abcabc"

Using C++:
//Splits the string into the fragments of given size
//Returns the set of of splitted strings avaialble
set<string> split(string s, int frag)
set<string> uni;
int len = s.length();
for(int i = 0; i < len; i+= frag)
uni.insert(s.substr(i, frag));
return uni;
int main()
string out;
string s = "carpentercarpenter";
int len = s.length();
//Optimistic approach..hope there are only 2 repeated strings
//If that fails, then try to break the strings with lesser number of
for(int i = len/2; i>1;--i)
set<string> uni = split(s,i);
if(uni.size() == 1)
out = *uni.begin();
return 0;

The first idea that comes to my mind is trying all repeating sequences of lengths that divide length(S) = N. There is a maximum of N/2 such lengths, so this results in a O(N^2) algorithm.
But i'm sure it can be improved...

Here is a more general solution to the problem, that will find repeating subsequences within an sequence (of anything), where the subsequences do not have to start at the beginning, nor immediately follow each other.
given an sequence b[0..n], containing the data in question, and a threshold t being the minimum subsequence length to find,
l_max = 0, i_max = 0, j_max = 0;
for (i=0; i<n-(t*2);i++) {
for (j=i+t;j<n-t; j++) {
while (i+l<j && j+l<n && b[i+l] == b[j+l])
if (l>t) {
print "Sequence of length " + l + " found at " + i + " and " + j);
if (l>l_max) {
l_max = l;
i_max = i;
j_max = j;
if (l_max>t) {
print "longest common subsequence found at " + i_max + " and " + j_max + " (" + l_max + " long)";
Start at the beginning of the data, iterate until within 2*t of the end (no possible way to have two distinct subsequences of length t in less than 2*t of space!)
For the second subsequence, start at least t bytes beyond where the first sequence begins.
Then, reset the length of the discovered subsequence to 0, and check to see if you have a common character at i+l and j+l. As long as you do, increment l.
When you no longer have a common character, you have reached the end of your common subsequence.
If the subsequence is longer than your threshold, print the result.

Just figured this out myself and wrote some code for this (written in C#) with a lot of comments. Hope this helps someone:
// Check whether the string contains a repeating sequence.
public static bool ContainsRepeatingSequence(string str)
if (string.IsNullOrEmpty(str)) return false;
for (int i=0; i<str.Length; i++)
// Every iteration, cut down the string from i to the end.
string toCheck = str.Substring(i);
// Set N equal to half the length of the substring. At most, we have to compare half the string to half the string. If the string length is odd, the last character will not be checked against, but it will be checked in the next iteration.
int N = toCheck.Length / 2;
// Check strings of all lengths from 1 to N against the subsequent string of length 1 to N.
for (int j=1; j<=N; j++)
// Check from beginning to j-1, compare against j to j+j.
if (toCheck.Substring(0, j) == toCheck.Substring(j, j)) return true;
return false;
Feel free to ask any questions if it's unclear why it works.

and here is a concrete working example:
/* find greatest repeated substring */
char *fgrs(const char *s,size_t *l)
char *r=0,*a=s;
while( *a )
char *e=strrchr(a+1,*a);
if( !e )
do {
size_t t=1;
for(;&a[t]!=e && a[t]==e[t];++t);
if( t>*l )
while( --e!=a && *e!=*a );
} while( e!=a && *e==*a );
return r;
size_t t;
const char *p;
while( t-- ) putchar(*p++);
while( t-- ) putchar(*p++);
while( t-- ) putchar(*p++);
while( t-- ) putchar(*p++);

Not sure how you define "efficiently". For easy/fast implementation you could do this in Java:
private static String findSequence(String text) {
Pattern pattern = Pattern.compile("(.+?)\\1+");
Matcher matcher = pattern.matcher(text);
return matcher.matches() ? : null;
it tries to find the shortest string (.+?) that must be repeated at least once (\1+) to match the entire input text.

This is a solution I came up with using the queue, it passed all the test cases of a similar problem in codeforces. Problem No is 745A.
using namespace std;
typedef long long ll;
int main()
string s, s1, s2; cin >> s; queue<char> qu; qu.push(s[0]); bool flag = true; int ind = -1;
s1 = s.substr(0, s.size() / 2);
s2 = s.substr(s.size() / 2);
if(s1 == s2)
for(int i=0; i<s1.size(); i++)
s += s1[i];
//cout << s1 << " " << s2 << " " << s << "\n";
for(int i=1; i<s.size(); i++)
if(qu.front() == s[i]) {qu.pop();}
int cycle = qu.size();
/*queue<char> qu2 = qu; string str = "";
cout << qu2.front() << " ";
str += qu2.front();
if(s[++ind] != qu.front()) {flag = false; break;}
flag == true ? cout << cycle : cout << s.size();
return 0;

I'd convert the array to a String object and use regex

Put all your character in an array e.x. a[]
i=0; j=0;
for( 0 < i < count )
if (a[i] == a[i+j+1])
Then the ratio of (i/j) = repeat count in your array.
You must pay attention to limits of i and j, but it is the simple solution.


How to count the frequency of the two first letters in a word from a dictionary?

I have a 143k lowcase word dictionary and I want to count the frequency of the first two letters
(ie: aa* = 14, ab* = 534, ac = 714 ... za = 65, ... zz = 0 ) and put it in a bidimensional array.
However I have no idea how to even go about iterating them without switches or a bunch of if elses I tried looking on google for a solution to this but I could only find counting amount of letters in the whole word and mostly only things in python.
I've sat here for a while thinking how could I do this and my brain keeps blocking this was what I came up with but I really don't know where to head.
int main (void) {
char *line = NULL;
size_t len = 0;
ssize_t read;
char *arr[143091];
FILE *fp = fopen("large", “r”);
if (*fp == NULL)
return 1;
int i = 0;
while ((read = getline(&line, &len, fp)) != -1)
arr[i] = line;
char c1 = 'a';
char c2 = 'a';
i = 0;
int j = 0;
while (c1 <= 'z')
while (arr[k][0] == c1)
while (arr[k][1] == c2)
if (line)
return 0;
Am I being an idiot or am I just missing someting really basic? How can I go about this problem?
Edit: I forgot to mention that the dictionary is only lowercase and has some edge cases like just an a or an e and some words have ' (like e'erand e's) there are no accentuated latin characters and they are all accii lowercase
The code assumes that the input has one word per line without leading spaces and will count all words that start with two ASCII letters from 'a'..'z'. As the statement in the question is not fully clear, I further assume that the character encoding is ASCII or at least ASCII compatible. (The question states: "there are no accentuated latin characters and they are all accii lowercase")
If you want to include words that consist of only one letter or words that contain ', the calculation of the index values from the characters would be a bit more complicated. In this case I would add a function to calculate the index from the character value.
Also for non-ASCII letters the simple calculation of the array index would not work.
The program reads the input line by line without storing all lines, checks the input as defined above and converts the first two characters from range 'a'..'z' to index values in range 0..'z'-'a' to count the occurrence in a two-dimensional array.
#include <stdio.h>
#include <stdlib.h>
int main (void) {
char *line = NULL;
size_t len = 0;
ssize_t read;
/* Counter array, initialized with 0. The highest possible index will
* be 'z'-'a', so the size in each dimension is 1 more */
unsigned long count['z'-'a'+1]['z'-'a'+1] = {0};
FILE *fp = fopen("large", "r");
if (fp == NULL)
return 1;
while ((read = getline(&line, &len, fp)) != -1)
/* ignore short input */
if(read >= 2)
/* ignore other characters */
if((line[0] >= 'a') && (line[0] <= 'z') &&
(line[1] >= 'a') && (line[1] <= 'z'))
/* convert first 2 characters to array index range and count */
if (line)
/* example output */
for(int i = 'a'-'a'; i <= 'z'-'a'; i++)
for(int j = 'a'-'a'; j <= 'z'-'a'; j++)
/* only print combinations that actually occurred */
if(count[i][j] > 0)
printf("%c%c %lu\n", i+'a', j+'a', count[i][j]);
return 0;
The example input
results in
ba 2
fi 1
fo 3
The idea is to have a two-dimensional array, each dimension holding one of the first two characters of each line. The clever bit is that in C, even a string whose length as reported by strlen() to be 1 has two char's - the character and the trailing 0 at the end, so you don't need to special-case cases like "a". Its frequency is tracked in counts['a'][0].
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
/* Reads input on stdin, outputs to stdout. Using a multibyte
* character encoding will likely cause unusual output; don't do
* that. But it will work with encodings other than ASCII. Also handles
* mixed-cased input.
int main(void) {
int *counts[UCHAR_MAX + 1] = { NULL };
char *line = NULL;
size_t bufsize = 0;
ssize_t len;
// Populate the frequency counts
while ((len = getline(&line, &bufsize, stdin)) > 0) {
if (line[len - 1] == '\n') { // Get rid of newline
line[len - 1] = 0;
if (line[0] == 0) { // Skip empty lines
unsigned fc = line[0];
unsigned sc = line[1];
if (!counts[fc]) { // Allocate the second dimension if needed
counts[fc] = calloc(UCHAR_MAX + 1, sizeof(int));
counts[fc][sc] += 1;
// Print out the frequency table.
for (int fc = 1; fc <= UCHAR_MAX; fc += 1) {
if (!counts[fc]) { // Skip unused first characters
if (counts[fc][0]) { // Single-character line count
printf("%c\t%d\n", fc, counts[fc][0]);
for (int sc = 1; sc <= UCHAR_MAX; sc += 1) {
if (counts[fc][sc]) {
printf("%c%c\t%d\n", fc, sc, counts[fc][sc]);
return 0;
$ perl -Ci -ne 'print if /^[[:ascii:]]+$/ && /^[[:lower:]]+$/' /usr/share/dict/american-english-large | ./freqs
a 1
aa 6
ab 483
ac 651
ad 497
ae 112
af 198
ag 235
ah 7
ai 161
How to count the frequency of the two first letters in a word from a dictionary?
Use a simple state machine to read one character at a time, detect when the character is first 2 letters of a word, then increment a 26x26 table. Words do not need to be on seperate lines. Any word length is allowed.
unsigned long long frequency[26][26] = { 0 }; // Set all to 0
FILE *fp = fopen("large", "r");
int ch;
// Below 2 objects are the state machine
int word[2];
int word_length = 0;
while ((ch = fgetc(fp)) != EOF) {
if (isalpha(ch)) {
if (word_length < 2) {
word[word_length++] = tolower(ch);
if (word_length == 2) { // 2nd letter just arrived
assert(word[0] >= 'a' && word[0] <= 'z'); // Note 1
assert(word[1] >= 'a' && word[1] <= 'z');
frequency[word[0] - 'a'][word[1] - 'a']++;
} else {
word_length = 0; // Make ready for a new word
for (int L0 = 'a'; L0 <= 'z'; L0++);
for (int L1 = 'a'; L1 <= 'z'; L1++);
unsigned long long sum = frequency[L0 - 'a'][L1 - 'a'];
if (sum) {
printf("%c%c %llu\n", L0, L1, sum);
Note 1, in locales that have more than a-z letters, like á, é, í, ó, ú, ü, ñ, additional needed. A simple approach is to use a frequency[256][256] - somewhat memory hoggish.
There is no need to read the entire dictionary into memory, or even to buffer lines. The dictionary consists of words, one per line. This means it has this structure:
The first two characters of the file are the first digraph. The other interesting digraphs are all characters which immediately follow a newline.
This can be read by a state machine, which we can code into a loop like this. Suppose f is the FILE * handle to the stream reading from the dictionary file:
for (;;) {
/* Read two characters from the dictionary file. */
int ch0 = getc(f);
int ch1 = getc(f);
/* Is the ch0 newline? That means we read an empty line,
and one character after that. So, let us move that character
into ch0, and read another ch1. Keep doing this until
ch0 is not a newline, and bail at EOF. */
while (ch0 == '\n' && ch1 != EOF) {
ch0 = ch1;
ch1 = getc(f);
/* After the above, if we have EOF, we are done: bail the loop */
if (ch0 == EOF || ch1 == EOF)
/* We know that ch0 isn't newline. But ch1 could be newline;
i.e. we found a one-letter-long dictionary entry. We don't
process those, only two or more letters. */
if (ch1 != '\n') {
/* Here we put the code which looks up the ch0-ch1 pair
in our frequency table and increments the count. */
/* Now drop characters until the end of the line. If ch1
is newline, we are already there. If not, let's just use
ch1 for reading more characters until we get a newline. */
while (ch1 != '\n' && ch1 != EOF)
ch = getc(f);
/* Watch out for EOF in the middle of a line that isn't
newline-terminated. */
if (ch == EOF)
I would do this with a state machine:
enum { begin, have_ch0, scan_eol } state = begin;
int ch0, ch1;
for (;;) {
int c = getc(f);
if (c == EOF)
switch (state) {
case begin:
/* stay in begin state if newline seen */
if (c != \n') {
/* otherwise accumulate ch0,
and switch to have_ch0 state */
ch0 = c;
state = have_ch0;
case have_ch0:
if (c == '\n') {
/* newline in ch0 state: back to begin */
state = begin;
} else {
/* we got a second character! */
ch1 = c;
/* code for processing ch0 and ch1 goes here! */
state = scan_eol; /* switch to scanning for EOL. */
case scan_eol:
if (c == '\n') {
/* We got the newline we are looking for; go
to begin state. */
state = begin;
Now we have a tidy loop around a single call to getc. EOF is checked in one place where we bail out of the loop. The state machine recognizes the situation when we have the first two characters of a line which is at least two characters long; there is a single place in the code where to put the logic for dealing with the two characters.
We are not allocating any buffers; we are not malloc-ing lines, so there is nothing to free. There is no limit on the dictionary size we can scan (just we have to watch for overflowing frequency counters).
You are started in the right direction. You do need a 2D array 27 x 27 for a single case (e.g. lowercase or uppercase), not including digits. To handle digits, just add another 11 x 11 array and map 2-digit frequencies there. The reason you can't use a flat 1D array and map to it without serious indexing gymnastics is that the ASCII sum of "ab" and "ba" would be the same.
The 2D array solves that problem allowing the map of the 1st character ASCII value to the first index, and the map of the ASCII of the 2nd character to the 2nd index or after in that row.
An easy way to think of it is to just take a lowercase example. Let's look at the word "accent". You have your 2D array:
| a | a | b | c | d | e | ...
| b | a | b | c | d | e | ...
| c | a | b | c | d | e | ...
The first column tracks the first letter and then the remaining columns (the next 'a' - 'z' characters) track the 2nd character that follows the first character. (you can do this will an array of struct holding the 1st char and a 26 char array as well -- up to you) This way, you remove ambiguity of which combination "ab" or "ba".
Now note -- you do not actually need a 27 x 27 arrays with the 1st column repeated. Recall, by mapping the ASCII value to the first index, it designates the first character associated with the row on its own, e.g. row[0][..] indicates the first character was 'a'. So a 26 x 26 array is fine (and the same for digits). So you simply need:
| a | b | c | d | e | ...
| a | b | c | d | e | ...
| a | b | c | d | e | ...
So the remainder of the approach is simple. Open the file, read the word into a buffer, validate there is a 1st character (e.g. not the nul-character), then validate the 2nd character (continue to get the next word if either validation fails). Convert both to lowercase (or add the additional arrays if tracking both cases -- that gets ugly). Now just map the ASCII value for each character to an index in the array, e.g.
int ltrfreq[ALPHABET][ALPHABET] = {{0}};
while (fgets (buf, SZBUF, fp)) { /* read each line into buf */
int ch1 = *buf, ch2; /* initialize ch1 with 1st char */
if (!ch1 || !isalpha(ch1)) /* validate 1st char or get next word */
ch2 = buf[1]; /* assign 2nd char */
if (!ch1 || !isalpha(ch2)) /* validate 2nd char or get next word */
ch1 = tolower (ch1); /* convert to lower to eliminate case */
ch2 = tolower (ch2);
ltrfreq[ch1-'a'][ch2-'a']++; /* map ASCII to index, increment */
With our example word "accent", that would increment the array element [0][2], so that corresponds to row 0 and column 2 for "ac" in:
| a | b | c | d | e | ...
... ^ [0][2]
Where you increment the value at that index. So ltrfreq[0][2]++ now holds the value 1 for the combination "ac" having been seen once. When encountered again, the element would be incremented to 2 and so on... Since the value is incremented it is imperative the array be initialized all zero when declared.
When you output the results, you just have to remember to subtract 1 from the j index when mapping from index back to ASCII, e.g.
for (int i = 0; i < ALPHABET; i++) /* loop over all 1st char index */
for (int j = 0; j < ALPHABET; j++) /* loop over all 2nd char index */
if (ltrfreq[i][j]) /* map i, j back to ASCII, output freq */
printf ("%c%c = %d\n", i + 'a', j + 'a', ltrfreq[i][j]);
That's it. Putting it altogether in an example that takes the filename to read as the first argument to the program (or reads from stdin if no argument is given), you would have:
#include <stdio.h>
#include <ctype.h>
#define ALPHABET 26
#define SZBUF 1024
int main (int argc, char **argv) {
char buf[SZBUF] = "";
int ltrfreq[ALPHABET][ALPHABET] = {{0}};
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) { /* validate file open for reading */
perror ("file open failed");
return 1;
while (fgets (buf, SZBUF, fp)) { /* read each line into buf */
int ch1 = *buf, ch2; /* initialize ch1 with 1st char */
if (!ch1 || !isalpha(ch1)) /* validate 1st char or get next word */
ch2 = buf[1]; /* assign 2nd char */
if (!ch1 || !isalpha(ch2)) /* validate 2nd char or get next word */
ch1 = tolower (ch1); /* convert to lower to eliminate case */
ch2 = tolower (ch2);
ltrfreq[ch1-'a'][ch2-'a']++; /* map ASCII to index, increment */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
for (int i = 0; i < ALPHABET; i++) /* loop over all 1st char index */
for (int j = 0; j < ALPHABET; j++) /* loop over all 2nd char index */
if (ltrfreq[i][j]) /* map i, j back to ASCII, output freq */
printf ("%c%c = %d\n", i + 'a', j + 'a', ltrfreq[i][j]);
Example Input Dictionary
In the file dat/ltrfreq2.txt:
$ cat dat/ltrfreq2.txt
Example Use/Output
$ ./bin/ltrfreq2 dat/ltrfreq2.txt
an = 1
ca = 2
do = 1
fl = 1
ha = 2
lu = 1
my = 2
no = 1
Where both "cat" words accurately account for ca = 2, both "has" for ha = 2 and "My" and "my" for my = 2. The rest are just the 2 character prefixes for words that appear once in the dictionary.
Or with the entire 307993 words dictionary that comes with SuSE, timed to show the efficiency of the approach (all within 15 ms):
$ time ./bin/ltrfreq2 /var/lib/dict/words
aa = 40
ab = 990
ac = 1391
ad = 1032
ae = 338
af = 411
ag = 608
ah = 68
ai = 369
aj = 18
ak = 70
al = 2029
zn = 2
zo = 434
zr = 2
zs = 2
zu = 57
zw = 25
zy = 135
zz = 1
real 0m0.015s
user 0m0.015s
sys 0m0.001s
A bit about the array type. Since you have 143K words, that rules out using a short or unsigned short type -- just in case you have a bad dictionary with all 143K words being "aardvark".... The int type is more than capable of handling all words -- even if you have a bad dictionary containing only "aardvark".
Look things over and let me know if this is what you need, if not let me know where I misunderstood. Also, let me know if you have further questions.
Such job is more suitable for languages like Python, Perl, Ruby etc. instead of C. I suggest at least trying C++.
If you don't have to write it in C, here is my Python version: (since you didn't mention it in the question - are you working on an embedded system or something where C/ASM are the only options?)
FILENAME = '/etc/dictionaries-common/words'
with open(FILENAME) as f:
flattened = [ line[:2] for line in f ]
dic = {
key: flattened.count(key)
for key in sorted(frozenset(flattened))
for k, v in dic.items():
print(f'{k} = {v}')
A' = 1
AM = 2
AO = 2
AW = 2
Aa = 6
Ab = 44
Ac = 37
Ad = 68
Ae = 18
Af = 22
Ag = 36
Ah = 12
Ai = 17
Aj = 2
Ak = 14
Al = 284
Am = 91
An = 223
Ap = 44
Aq = 13
Ar = 185
As = 88
At = 56
Au = 81
Av = 28
Ax = 2
... ...

Maximum binary number a binary string will result to if only one operation is allowed i.e. Right-Rotate By K-Bits where K = [0, Length of String]

Suppose you have a binary string S and you are only allowed to do one operation i.e. Right-Rotate by K-bits where K = [0, Length of the string]. Write an algorithm that will print the maximum binary-number you can create by the defined process.
For Example:
S = [00101] then maximum value I can get from the process is 10100 i.e. 20.
S = [011010] then maximum value I can get from the process is 110100 i.e. 52.
S = [1100] then maximum value I can get from the process is 1100 i.e. 12.
The length of the string S has an upper-limit i.e. 5*(10^5).
The idea which I thought of is kind of very naive which is: as we know that when you right-rotate any binary number by 1-bit, you get the same binary number after m rotations where m = number of bits required to represent that number.
So, I right-rotate by 1 until I get to the number with which I started with and during the process, I keep track of the max-value I encountered and in last I print the max-value.
Is there an efficient approach to solve the problem?
UPD1: This the source of the problem One-Zero, it all boils down to the statement I have described above.
UPD2: As the answer can be huge, the program will print the answer modulo 10^9 + 7.
You want to find the largest number expressed in a binary encoded string with wrap around.
Here are steps for a solution:
let len be the length of the string
allocate an array of size 2 * len and duplicate the string to it.
using linear search, find the position pos of the largest substring of length len in this array (lexicographical order can be used for that).
compute res, the converted number modulo 109+7, reading len bits starting at pos.
free the array and return res.
Here is a simple implementation as a function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
long max_bin(const char *S) {
size_t i, pos, len;
char *a;
// long has at least 31 value bits, enough for numbers upto 2 * 1000000007
long res;
if ((len = strlen(S)) == 0)
return 0;
if ((a = malloc(len + len)) == NULL)
return -1;
memcpy(a, S, len);
memcpy(a + len, S, len);
// find the smallest right rotation for the greatest substring
for (pos = i = len; --i > 0;) {
if (memcmp(a + i, a + pos, len) > 0)
pos = i;
res = 0;
for (i = 0; i < len; i++) {
res = res + res + a[pos + i] - '0';
if (res >= 1000000007)
res -= 1000000007;
return res;
int main(int argc, char *argv[]) {
for (int i = 1; i < argc; i++) {
printf("[%s] -> %ld\n", argv[i], max_bin(argv[i]));
return 0;
It is feasible to avoid memory allocation if it is a requirement.
It's me again.
I got to thinking a bit more about your problem in the shower this morning, and it occurred to me that you could do a QuickSelect (if you're familiar with that) over an array of the start indexes of the input string and determine the index of the most "valuable" rotate based on that.
What I show here does not concern itself with presenting the result the way you are required to, only with determining what the best offset for rotation is.
This is not a textbook QuickSelect implementation but rather a simplified method that does the same thing while taking into account that it's a string of zeros and ones that we are dealing with.
Main driver logic:
static void Main(string[] args)
Console.WriteLine(FindBestIndex("")); // exp -1
Console.WriteLine(FindBestIndex("1")); // exp 0
Console.WriteLine(FindBestIndex("0")); // exp 0
Console.WriteLine(FindBestIndex("110100")); // exp 0
Console.WriteLine(FindBestIndex("100110")); // exp 3
Console.WriteLine(FindBestIndex("01101110")); // exp 4
Console.WriteLine(FindBestIndex("11001110011")); // exp 9
Console.WriteLine(FindBestIndex("1110100111110000011")); // exp 17
Set up the index array that we'll be sorting, then call FindHighest to do the actual work:
static int FindBestIndex(string input)
if (string.IsNullOrEmpty(input))
return -1;
int[] indexes = new int[input.Length];
for (int i = 0; i < indexes.Length; i++)
indexes[i] = i;
return FindHighest(input, indexes, 0, input.Length);
Partition the index array into two halves depending on whether each index points to a string that starts with zero or one at this offset within the string.
Once that's done, if we have just one element that started with one, we have the best string, else if we have more, partition those based on the next index. If none started with one, proceed with zero in the same way.
static int FindHighest(string s, int[] p, int index, int len)
// allocate two new arrays,
// one for the elements of p that have zero at this index, and
// one for the elements of p that have one at this index
int[] zero = new int[len];
int[] one = new int[len];
int count_zero = 0;
int count_one = 0;
// run through p and distribute its elements to 'zero' and 'one'
for (int i = 0; i < len; i++)
int ix = p[i]; // index into string
int pos = (ix + index) % s.Length; // wrap for string length
if (s[pos] == '0')
zero[count_zero++] = ix;
one[count_one++] = ix;
// if we have a one in this position, use that, else proceed with zero (below)
if (count_one > 1)
// if we have more than one, sort on the next position (index)
return FindHighest(s, one, index + 1, count_one);
} else if (count_one == 1)
// we have exactly one entry left in ones, so that's the best one we have overall
return one[0];
if (count_zero > 1)
// if we have more than one, sort on the next position (index)
return FindHighest(s, zero, index + 1, count_zero);
else if (count_zero == 1)
// we have exactly one entry left in zeroes, and we didn't have any in ones,
// so this is the best one we have overall
return zero[0];
return -1;
Note that this can be optimized further by expanding the logic: If the input string has any ones at all, there's no point in adding indexes where the string starts with zero to the index array in FindBestIndex since those will be inferior. Also, if an index does start with a one but the previous index also did, you can omit the current one as well because the previous index will always be better since that string flows into this character.
If you like, you can also refactor to remove the recursion in favor of a loop.
If I were tackling this I would do so as follows.
I think it's all to do with counting alternating runs of '1' and runs of '0', treating a run of '1's followed by a run of '0's as a pair, then bashing a list of those pairs.
Let us start by scanning to the first '1', and setting start position s. Then count each run of '1's c1 and the following run of '0's c0, creating pairs (c1,c0).
The scan then proceeds forwards to the end, wrapping round as required. If we represent runs of one or more '1' and '0' as single digits, and '|' as the start and end of the string, then we have cases:
^ initial start position s -- the string ends neatly with a run of '0's
^ new start position s -- the string starts and ends in a '1', so the first
run of '1's and run of '0's are rotated (left),
to be appended to the last run of '1's
Note that this changes our start position.
^ initial start position s -- the first run of '0's is rotated (left),
to follow the last run of '1's.
^ initial start position s -- the first run of '0's is rotated (left),
to be appended to the last run of '0's.
NB: if the string both starts and ends with a '1', there are, initially, n runs of '0's and n+1 runs of '1's, but the rotation reduces that to n runs of '1's. And similarly, but conversely, if the string both starts and ends with a '0'.
Let us use A as shorthand for the pair (a1,a0). Suppose we have another pair, X -- (x1,x0) -- then can compare the two pairs, thus:
if a1 > x1 or (a1 = x1 and (a0 < x0) => A > X -- A is better start
if a1 = x1 and a0 = x0 => A = X
if a1 < x1 or (a1 = x1 and (a0 > x0) => A < X -- X is better start
The trick is probably to pack each pair into an integer -- say (x1 * N) - x0, where N is at least the maximum allowed length of the string -- for ease of comparison.
During the scan of the string (described above) let us construct a vector of pairs. During that process, collect the largest pair value A, and a list of the positions, s, of each appearance of A. Each s on the list is a potential best start position. (The recorded s needs to be the index in the vector of pairs and the offset in the original string.)
[If the input string is truly vast, then constructing the entire vector of all pairs will eat memory. In which case the vector would need to be handled as a "virtual" vector, and when an item in that vector is required, it would have to be created by reading the respective portion of the actual string.]
let us simplify groups of two or more contiguous A. Clearly the second and subsequent A's in such a group cannot be the best start, since there is a better one immediately to the left. So, in the scan we need to record only the s for the first A of such groups.
if the string starts with one or more A's and ends with one or more A's, need to "rotate" to collect those as a single group, and record the s only for the leftmost A in that group (in the usual way).
If there is only one s on the list, our work is done. If the string is end-to-end A, that will be spotted here.
Otherwise, we need to consider the pairs which follow each of the s for our (initial) A's -- where when we say 'follow' we include wrapping round from the end to the start of the string (and, equivalently, the list of pairs).
NB: at this point we know that all the (initial) A's on our list are followed by zero or more A's and then at least one x, where x < A.
So, set i = 1, and consider all the pairs at s+i for our list of s. Keep only the s for the instances of the largest pair found. So for i = 1, in this example we are considering pairs x, y and z:
And if x is the largest, this pass discards Ay and both Az. If only one s remains -- in the example, y is the largest -- our work is done. Otherwise, repeat for i = i+1.
There is one last (I think) wrinkle. Suppose after finding z to be the largest of the ith pairs, we have:
where the two runs === are the same as each other. By the same logic that told us to ignore second and subsequent A's in runs of same, we can now discard the second A===z. Indeed we can discard third, fourth, etc. contiguous A===z. Happily that deals with the extreme case of (say):
where the string is a sequence of A===z !
I dunno, that all seems more complicated than I expected when I set out with my pencil and paper :-(
I imagine someone much cleverer than I can reduce this to some standard greatest prefix-string problem.
I was bored today, so I knocked out some code (and revised it on 10-Apr-2020).
typedef unsigned int uint ; // assume that's uint32_t or similar
enum { max_string_len = 5 * 100000 } ; // 5 * 10^5
typedef uint64_t pair_t ;
static uint
one_zero(const char* str, uint N)
enum { debug = false } ;
void* mem ;
size_t nmx ;
uint s1, s0 ; // initial run of '1'/'0's to be rotated
uint s ;
pair_t* pv ; // pair vector
uint* psi ; // pair string index
uint* spv ; // starts vector -- pair indexes
uint pn ; // count of pairs
uint sn ; // count of starts
pair_t bp ; // current best start pair value
uint bpi ; // current best start pair index
uint bsc ; // count of further best starts
char ch ;
if (N > max_string_len)
printf("*** N = %u > max_string_len (%u)\n", N, max_string_len) ;
return UINT_MAX ;
} ;
if (N < 1)
printf("*** N = 0 !!\n") ;
return UINT_MAX ;
} ;
// Decide on initial start position.
s = s1 = s0 = 0 ;
if (str[0] == '0')
// Start at first '1' after initial run of '0's
s += 1 ;
if (s == N)
return 0 ; // String is all '0's !!
while (str[s] == '0') ;
s0 = s ; // rotate initial run of '0's
// First digit is '1', but if last digit is also '1', need to rotate.
if (str[N-1] == '1')
// Step past the leading run of '1's and set length s1.
// This run will be appended to the last run of '1's in the string
s += 1 ;
if (s == N)
return 0 ; // String is all '1's !!
while (str[s] == '1') ;
s1 = s ; // rotate initial run of '1's
// Step past the (now) leading run of '0's and set length s0.
// This run will be appended to the last run of '1's in the string
// NB: we know there is at least one '0' and at least one '1' before
// the end of the string
do { s += 1 ; } while (str[s] == '0') ;
s0 = s - s1 ;
} ;
} ;
// Scan the string to construct the vector of pairs and the list of potential
// starts.
nmx = (((N / 2) + 64) / 64) * 64 ;
mem = malloc(nmx * (sizeof(pair_t) + sizeof(uint) + sizeof(uint))) ;
pv = (pair_t*)mem ;
spv = (uint*)(pv + nmx) ;
psi = (uint*)(spv + nmx) ;
pn = 0 ;
bp = 0 ; // no pair is 0 !
bpi = 0 ;
bsc = 0 ; // no best yet
uint x1, x0 ;
pair_t xp ;
psi[pn] = s ;
x1 = x0 = 0 ;
x1 += 1 ;
s += 1 ;
ch = (s < N) ? str[s] : '\0' ;
while (ch == '1') ;
if (ch == '\0')
x1 += s1 ;
x0 = s0 ;
x0 += 1 ;
s += 1 ;
ch = (s < N) ? str[s] : '\0' ;
while (str[s] == '0') ;
if (ch == '\0')
x0 += s0 ;
} ;
// Register pair (x1,x0)
pv[pn] = xp = ((uint64_t)x1 << 32) - x0 ;
if (debug && (N == 264))
printf("si=%u, sn=%u, pn=%u, xp=%lx bp=%lx\n", psi[sn], sn, pn, xp, bp) ;
if (xp > bp)
// New best start.
bpi = pn ;
bsc = 0 ;
bp = xp ;
bsc += (xp == bp) ;
pn += 1 ;
while (ch != '\0') ;
// If there are 2 or more best starts, need to find them all, but discard
// second and subsequent contiguous ones.
spv[0] = bpi ;
sn = 1 ;
if (bsc != 0)
uint pi ;
bool rp ;
pi = bpi ;
rp = true ;
pi += 1 ;
if (pv[pi] != bp)
rp = false ;
bsc -= 1 ;
if (!rp)
spv[sn++] = pi ;
rp = true ;
} ;
} ;
while (bsc != 0) ;
} ;
// We have: pn pairs in pv[]
// sn start positions in sv[]
for (uint i = 1 ; sn > 1 ; ++i)
uint sc ;
uint pi ;
pair_t bp ;
if (debug && (N == 264))
printf("i=%u, sn=%u, pv:", i, sn) ;
for (uint s = 0 ; s < sn ; ++s)
printf(" %u", psi[spv[s]]) ;
printf("\n") ;
} ;
pi = spv[0] + i ; // index of first s+i pair
if (pi >= pn) { pi -= pn ; } ;
bp = pv[pi] ; // best value, so far.
sc = 1 ; // new count of best starts
for (uint sj = 1 ; sj < sn ; ++sj)
// Consider the ith pair ahead of sj -- compare with best so far.
uint pb, pj ;
pair_t xp ;
pb = spv[sj] ;
pj = pb + i ; // index of next s+i pair
if (pj >= pn) { pj -= pn ; } ;
xp = pv[pj] ;
if (xp == bp)
// sj is equal to the best so far
// So we keep both, unless we have the 'A==zA==z' case,
// where 'z == xp == sp', the current 'ith' position.
uint pa ;
pa = pi + 1 ;
if (pa == pn) { pa = 0 ; } ; // position after first 'z'
// If this is not an A==zA==z case, keep sj
// Otherwise, drop sj (by not putting it back into the list),
// but update pi so can spot A==zA==zA===z etc. cases.
if (pa != pb)
spv[sc++] = spv[sj] ; // keep sj
pi = pj ; // for further repeats
else if (xp < bp)
// sj is less than best -- do not put it back into the list
// sj is better than best -- discard everything so far, and
// set new best.
sc = 1 ; // back to one start
spv[0] = spv[sj] ; // new best
pi = pj ; // new index of ith pair
bp = xp ; // new best pair
} ;
} ;
sn = sc ;
} ;
s = psi[spv[0]] ;
free(mem) ;
return s ;
I have tested this against the brute force method given elsewhere, and as far as I can see this is (now, on 10-Apr-2020) working code.
When I timed this on my machine, for 100,000 random strings of 400,000..500,000 characters (at random) I got:
Brute Force: 281.8 secs CPU
My method: 130.3 secs CPU
and that's excluding the 8.3 secs to construct the random string and run an empty test. (That may sound a lot, but for 100,000 strings of 450,000 characters, on average, that's a touch less than 1 CPU cycle per character.)
So for random strings, my complicated method is a little over twice as fast as brute-force. But it uses ~N*16 bytes of memory, where the brute-force method uses N*2 bytes. Given the effort involved, the result is not hugely gratifying.
However, I also tried two pathological cases, (1) repeated "10" and (2) repeated "10100010" and for just 1000 (not 100000) strings of 400,000..500,000 characters (at random) I got:
Brute Force: (1) 1730.9 (2) 319.0 secs CPU
My method: 0.7 0.7 secs CPU
That O(n^2) will kill you every time !
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
int convt(int N,string S)
int sum=0;
for(int i=0; i<N; i++)
int num=S[i];
sum += pow(2,N-1-i)*(num-48);
return sum;
string rot(int N, string S)
int temp;
temp = S[0];
for( int i=0; i<N;i++)
return S;
int main() {
int t;
int N,K;
char S[N];
for(int i=0; i<N; i++)
string SS= S;
int mx_val=INT_MIN;
for(int i=0;i<N;i++)
string S1=rot(N,SS);
SS= S1;
int k_val=convt(N,SS);
if (k_val>mx_val)
int ki=0;
int j=0;
string S2=S;
if (convt(N,S2)==mx_val)

Find the maximum possible summation of differences of consecutive elements

Array A contains the elements, A1,A2...AN. And array B contains the elements, B1,B2...BN. There is a relationship between Ai and Bi, for 1 = i = N, i.e.,
any element Ai lies between 1 and Bi.
Let the cost S of an array A be defined as:
You have to print the largest possible value of S.
The Link to the problem is Problem
size of array:5
array: 10 1 10 1 10
output : 36 (since the max value can be derived as |10 - 1| + |1 - 10| + |10 - 1| + |1 - 10|)
Approach :
The only approach i could think of was brute force. I thought i would make a overlapping recursive equation so that i could memoize it, but was not able to.
public static void func(int pos,int[] arr,int[] aux,int n)
* pos is current index in the arr
* arr is array
* aux is temp array which will store one possible combination.
* n is size of the array.
* */
//if reached at the end, check the summation of differences
if(pos == n)
long sum = 0;
for(int i = 1 ; i < n ; i++)
//System.out.print("i = " + i + ", arr[i] = " + aux[i] + " ");
sum += Math.abs(aux[i] - aux[i - 1]);
//System.out.println("sum = " + sum);
if(sum > max)
max = sum;
//else try every combination possible.
for(int i = 1 ; i <= arr[pos] ; i++)
aux[pos] = i;
func(pos + 1,arr,aux,n);
The complexity of this is O(n*2^n)
First, there is no reason that a[i] should be equal to any number besides 1 and b[i]. Realizing that we can write down a simple recurrence:
fmax(1) = fone(1) = 0
fmax(i) = max(fone(i-1) + b[i] - 1, fmax(i-1) + abs(b[i]-b[i-1]))
fone(i) = max(fone(i-1), fmax(i-1) + b[i-1] - 1)
answer = max(fmax(N), fone(N))
Where fmax(i) is a maximal sum for a[1..i] elements that end with b[i], fone(i) is a maximal sum for a[1..i] elements that end with 1.
With dynamic programming approach, the complexity is O(N).

Combination (mathematical) of structs [duplicate]

I want to write a function that takes an array of letters as an argument and a number of those letters to select.
Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:
8! / ((8 - 3)! * 3!) = 56
Arrays (or words) in return consisting of 3 letters each.
Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.
Gray Codes
An issue that you will come across is of course memory and pretty quickly, you'll have problems by 20 elements in your set -- 20C3 = 1140. And if you want to iterate over the set it's best to use a modified gray code algorithm so you aren't holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.
Some of the original papers describing gray codes:
Some Hamilton Paths and a Minimal Change Algorithm
Adjacent Interchange Combination Generation Algorithm
Here are some other papers covering the topic:
An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
Combination Generators
Survey of Combinatorial Gray Codes (PostScript)
An Algorithm for Gray Codes
Chase's Twiddle (algorithm)
Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)
The algorithm in C...
Index of Combinations in Lexicographical Order (Buckles Algorithm 515)
You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.
So, we have a set {1,2,3,4,5,6}... and we want three elements. Let's say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of 'changes' in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it's in the second place (proportional to the number of elements in the original set).
The method I've described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0...n.
Since combinations are unordered, {1,3,2} = {1,2,3} --we order them to be lexicographical.
This method has an implicit 0 to start the set for the first difference.
Index of Combinations in Lexicographical Order (McCaffrey)
There is another way:, its concept is easier to grasp and program but it's without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:
The set that maximizes , where .
For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:
(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
(* maximize function -- maximize a that is aCb *)
(* return largest c where c < i and choose(c,i) <= z *)
let rec maximize a b x =
if (choose a b ) <= x then a else maximize (a-1) b x
let rec iterate n x i = match i with
| 0 -> []
| i ->
let max = maximize n i x in
max :: iterate n (x - (choose max i)) (i-1)
if x < 0 then failwith "errors" else
let idxs = iterate (List.length set) x k in (List.nth set) (List.sort (-) idxs)
A small and simple combinations iterator
The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations.
They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k.
We will start with the iterator, which will call a user provided function for each combination
let iter_combs n k f =
let rec iter v s j =
if j = k then f v
else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
iter [] 0 0
A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won't use the for-loop, but instead, use recursion,
let fold_combs n k f x =
let rec loop i s c x =
if i < n then
loop (i+1) s c ##
let c = i::c and s = s + 1 and i = i + 1 in
if s < k then loop i s c x else f c x
else x in
loop 0 0 [] x
In C#:
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));
var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);
Short java solution:
import java.util.Arrays;
public class Combination {
public static void main(String[] args){
String[] arr = {"A","B","C","D","E","F"};
combinations2(arr, 3, 0, new String[3]);
static void combinations2(String[] arr, int len, int startPosition, String[] result){
if (len == 0){
for (int i = startPosition; i <= arr.length-len; i++){
result[result.length - len] = arr[i];
combinations2(arr, len-1, i+1, result);
Result will be
[A, B, C]
[A, B, D]
[A, B, E]
[A, B, F]
[A, C, D]
[A, C, E]
[A, C, F]
[A, D, E]
[A, D, F]
[A, E, F]
[B, C, D]
[B, C, E]
[B, C, F]
[B, D, E]
[B, D, F]
[B, E, F]
[C, D, E]
[C, D, F]
[C, E, F]
[D, E, F]
May I present my recursive Python solution to this problem?
def choose_iter(elements, length):
for i in xrange(len(elements)):
if length == 1:
yield (elements[i],)
for next in choose_iter(elements[i+1:], length-1):
yield (elements[i],) + next
def choose(l, k):
return list(choose_iter(l, k))
Example usage:
>>> len(list(choose_iter("abcdefgh",3)))
I like it for its simplicity.
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
^ ^ ^
i j k
First you vary k, so the next step looks like that:
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
^ ^ ^
i j k
^ ^ ^
i j k
Once you j reached G you start also to vary i.
^ ^ ^
i j k
^ ^ ^
i j k
Written in code this look something like that
void print_combinations(const char *string)
int i, j, k;
int len = strlen(string);
for (i = 0; i < len - 2; i++)
for (j = i + 1; j < len - 1; j++)
for (k = j + 1; k < len; k++)
printf("%c%c%c\n", string[i], string[j], string[k]);
The following recursive algorithm picks all of the k-element combinations from an ordered set:
choose the first element i of your combination
combine i with each of the combinations of k-1 elements chosen recursively from the set of elements larger than i.
Iterate the above for each i in the set.
It is essential that you pick the rest of the elements as larger than i, to avoid repetition. This way [3,5] will be picked only once, as [3] combined with [5], instead of twice (the condition eliminates [5] + [3]). Without this condition you get variations instead of combinations.
Short example in Python:
def comb(sofar, rest, n):
if n == 0:
print sofar
for i in range(len(rest)):
comb(sofar + rest[i], rest[i+1:], n-1)
>>> comb("", "abcde", 3)
For explanation, the recursive method is described with the following example:
Example: A B C D E
All combinations of 3 would be:
A with all combinations of 2 from the rest (B C D E)
B with all combinations of 2 from the rest (C D E)
C with all combinations of 2 from the rest (D E)
I found this thread useful and thought I would add a Javascript solution that you can pop into Firebug. Depending on your JS engine, it could take a little time if the starting string is large.
function string_recurse(active, rest) {
if (rest.length == 0) {
} else {
string_recurse(active + rest.charAt(0), rest.substring(1, rest.length));
string_recurse(active, rest.substring(1, rest.length));
string_recurse("", "abc");
The output should be as follows:
In C++ the following routine will produce all combinations of length distance(first,k) between the range [first,last):
#include <algorithm>
template <typename Iterator>
bool next_combination(const Iterator first, Iterator k, const Iterator last)
/* Credits: Mark Nelson */
if ((first == last) || (first == k) || (last == k))
return false;
Iterator i1 = first;
Iterator i2 = last;
if (last == i1)
return false;
i1 = last;
i1 = k;
while (first != i1)
if (*--i1 < *i2)
Iterator j = k;
while (!(*i1 < *j)) ++j;
i2 = k;
while (last != j)
return true;
return false;
It can be used like this:
#include <string>
#include <iostream>
int main()
std::string s = "12345";
std::size_t comb_size = 3;
std::cout << std::string(s.begin(), s.begin() + comb_size) << std::endl;
} while (next_combination(s.begin(), s.begin() + comb_size, s.end()));
return 0;
This will print the following:
static IEnumerable<string> Combinations(List<string> characters, int length)
for (int i = 0; i < characters.Count; i++)
// only want 1 character, just return this one
if (length == 1)
yield return characters[i];
// want more than one character, return this one plus all combinations one shorter
// only use characters after the current one for the rest of the combinations
foreach (string next in Combinations(characters.GetRange(i + 1, characters.Count - (i + 1)), length - 1))
yield return characters[i] + next;
Simple recursive algorithm in Haskell
import Data.List
combinations 0 lst = [[]]
combinations n lst = do
(x:xs) <- tails lst
rest <- combinations (n-1) xs
return $ x : rest
We first define the special case, i.e. selecting zero elements. It produces a single result, which is an empty list (i.e. a list that contains an empty list).
For n > 0, x goes through every element of the list and xs is every element after x.
rest picks n - 1 elements from xs using a recursive call to combinations. The final result of the function is a list where each element is x : rest (i.e. a list which has x as head and rest as tail) for every different value of x and rest.
> combinations 3 "abcde"
And of course, since Haskell is lazy, the list is gradually generated as needed, so you can partially evaluate exponentially large combinations.
> let c = combinations 8 "abcdefghijklmnopqrstuvwxyz"
> take 10 c
And here comes granddaddy COBOL, the much maligned language.
Let's assume an array of 34 elements of 8 bytes each (purely arbitrary selection.) The idea is to enumerate all possible 4-element combinations and load them into an array.
We use 4 indices, one each for each position in the group of 4
The array is processed like this:
idx1 = 1
idx2 = 2
idx3 = 3
idx4 = 4
We vary idx4 from 4 to the end. For each idx4 we get a unique combination
of groups of four. When idx4 comes to the end of the array, we increment idx3 by 1 and set idx4 to idx3+1. Then we run idx4 to the end again. We proceed in this manner, augmenting idx3,idx2, and idx1 respectively until the position of idx1 is less than 4 from the end of the array. That finishes the algorithm.
1 --- pos.1
2 --- pos 2
3 --- pos 3
4 --- pos 4
First iterations:
A COBOL example:
05 IDX1 PIC 99.
05 IDX2 PIC 99.
05 IDX3 PIC 99.
05 IDX4 PIC 99.
05 OUT_IDX PIC 9(9).
* Stop the search when IDX1 is on the third last array element:
Another C# version with lazy generation of the combination indices. This version maintains a single array of indices to define a mapping between the list of all values and the values for the current combination, i.e. constantly uses O(k) additional space during the entire runtime. The code generates individual combinations, including the first one, in O(k) time.
public static IEnumerable<T[]> Combinations<T>(this T[] values, int k)
if (k < 0 || values.Length < k)
yield break; // invalid parameters, no combinations possible
// generate the initial combination indices
var combIndices = new int[k];
for (var i = 0; i < k; i++)
combIndices[i] = i;
while (true)
// return next combination
var combination = new T[k];
for (var i = 0; i < k; i++)
combination[i] = values[combIndices[i]];
yield return combination;
// find first index to update
var indexToUpdate = k - 1;
while (indexToUpdate >= 0 && combIndices[indexToUpdate] >= values.Length - k + indexToUpdate)
if (indexToUpdate < 0)
yield break; // done
// update combination indices
for (var combIndex = combIndices[indexToUpdate] + 1; indexToUpdate < k; indexToUpdate++, combIndex++)
combIndices[indexToUpdate] = combIndex;
Test code:
foreach (var combination in new[] {'a', 'b', 'c', 'd', 'e'}.Combinations(3))
System.Console.WriteLine(String.Join(" ", combination));
a b c
a b d
a b e
a c d
a c e
a d e
b c d
b c e
b d e
c d e
Here is an elegant, generic implementation in Scala, as described on 99 Scala Problems.
object P26 {
def flatMapSublists[A,B](ls: List[A])(f: (List[A]) => List[B]): List[B] =
ls match {
case Nil => Nil
case sublist#(_ :: tail) => f(sublist) ::: flatMapSublists(tail)(f)
def combinations[A](n: Int, ls: List[A]): List[List[A]] =
if (n == 0) List(Nil)
else flatMapSublists(ls) { sl =>
combinations(n - 1, sl.tail) map {sl.head :: _}
If you can use SQL syntax - say, if you're using LINQ to access fields of an structure or array, or directly accessing a database that has a table called "Alphabet" with just one char field "Letter", you can adapt following code:
SELECT A.Letter, B.Letter, C.Letter
FROM Alphabet AS A, Alphabet AS B, Alphabet AS C
WHERE A.Letter<>B.Letter AND A.Letter<>C.Letter AND B.Letter<>C.Letter
AND A.Letter<B.Letter AND B.Letter<C.Letter
This will return all combinations of 3 letters, notwithstanding how many letters you have in table "Alphabet" (it can be 3, 8, 10, 27, etc.).
If what you want is all permutations, rather than combinations (i.e. you want "ACB" and "ABC" to count as different, rather than appear just once) just delete the last line (the AND one) and it's done.
Post-Edit: After re-reading the question, I realise what's needed is the general algorithm, not just a specific one for the case of selecting 3 items. Adam Hughes' answer is the complete one, unfortunately I cannot vote it up (yet). This answer's simple but works only for when you want exactly 3 items.
I had a permutation algorithm I used for project euler, in python:
def missing(miss,src):
"Returns the list of items in src not present in miss"
return [i for i in src if i not in miss]
def permutation_gen(n,l):
"Generates all the permutations of n items of the l list"
for i in l:
if n<=1: yield [i]
r = [i]
for j in permutation_gen(n-1,missing([i],l)): yield r+j
you should have all combination you need without repetition, do you need it?
It is a generator, so you use it in something like this:
for comb in permutation_gen(3,list("ABCDEFGH")):
print comb
There is an implementation for JavaScript. It has functions to get k-combinations and all combinations of an array of any objects. Examples:
k_combinations([1,2,3], 2)
-> [[1,2], [1,3], [2,3]]
-> [[1],[2],[3],[1,2],[1,3],[2,3],[1,2,3]]
Lets say your array of letters looks like this: "ABCDEFGH". You have three indices (i, j, k) indicating which letters you are going to use for the current word, You start with:
^ ^ ^
i j k
First you vary k, so the next step looks like that:
^ ^ ^
i j k
If you reached the end you go on and vary j and then k again.
^ ^ ^
i j k
^ ^ ^
i j k
Once you j reached G you start also to vary i.
^ ^ ^
i j k
^ ^ ^
i j k
function initializePointers($cnt) {
$pointers = [];
for($i=0; $i<$cnt; $i++) {
$pointers[] = $i;
return $pointers;
function incrementPointers(&$pointers, &$arrLength) {
for($i=0; $i<count($pointers); $i++) {
$currentPointerIndex = count($pointers) - $i - 1;
$currentPointer = $pointers[$currentPointerIndex];
if($currentPointer < $arrLength - $i - 1) {
for($j=1; ($currentPointerIndex+$j)<count($pointers); $j++) {
$pointers[$currentPointerIndex+$j] = $pointers[$currentPointerIndex]+$j;
return true;
return false;
function getDataByPointers(&$arr, &$pointers) {
$data = [];
for($i=0; $i<count($pointers); $i++) {
$data[] = $arr[$pointers[$i]];
return $data;
function getCombinations($arr, $cnt)
$len = count($arr);
$result = [];
$pointers = initializePointers($cnt);
do {
$result[] = getDataByPointers($arr, $pointers);
} while(incrementPointers($pointers, count($arr)));
return $result;
$result = getCombinations([0, 1, 2, 3, 4, 5], 3);
Based on, but more abstract, for any size of pointers.
Here you have a lazy evaluated version of that algorithm coded in C#:
static bool nextCombination(int[] num, int n, int k)
bool finished, changed;
changed = finished = false;
if (k > 0)
for (int i = k - 1; !finished && !changed; i--)
if (num[i] < (n - 1) - (k - 1) + i)
if (i < k - 1)
for (int j = i + 1; j < k; j++)
num[j] = num[j - 1] + 1;
changed = true;
finished = (i == 0);
return changed;
static IEnumerable Combinations<T>(IEnumerable<T> elements, int k)
T[] elem = elements.ToArray();
int size = elem.Length;
if (k <= size)
int[] numbers = new int[k];
for (int i = 0; i < k; i++)
numbers[i] = i;
yield return numbers.Select(n => elem[n]);
while (nextCombination(numbers, size, k));
And test part:
static void Main(string[] args)
int k = 3;
var t = new[] { "dog", "cat", "mouse", "zebra"};
foreach (IEnumerable<string> i in Combinations(t, k))
Console.WriteLine(string.Join(",", i));
Hope this help you!
Another version, that forces all the first k to appear firstly, then all the first k+1 combinations, then all the first k+2 etc.. It means that if you have sorted array, the most important on the top, it would take them and expand gradually to the next ones - only when it is must do so.
private static bool NextCombinationFirstsAlwaysFirst(int[] num, int n, int k)
if (k > 1 && NextCombinationFirstsAlwaysFirst(num, num[k - 1], k - 1))
return true;
if (num[k - 1] + 1 == n)
return false;
++num[k - 1];
for (int i = 0; i < k - 1; ++i)
num[i] = i;
return true;
For instance, if you run the first method ("nextCombination") on k=3, n=5 you'll get:
0 1 2
0 1 3
0 1 4
0 2 3
0 2 4
0 3 4
1 2 3
1 2 4
1 3 4
2 3 4
But if you'll run
int[] nums = new int[k];
for (int i = 0; i < k; ++i)
nums[i] = i;
Console.WriteLine(string.Join(" ", nums));
while (NextCombinationFirstsAlwaysFirst(nums, n, k));
You'll get this (I added empty lines for clarity):
0 1 2
0 1 3
0 2 3
1 2 3
0 1 4
0 2 4
1 2 4
0 3 4
1 3 4
2 3 4
It's adding "4" only when must to, and also after "4" was added it adds "3" again only when it must to (after doing 01, 02, 12).
Array.prototype.combs = function(num) {
var str = this,
length = str.length,
of = Math.pow(2, length) - 1,
out, combinations = [];
while(of) {
out = [];
for(var i = 0, y; i < length; i++) {
y = (1 << i);
if(y & of && (y !== of))
if (out.length >= num) {
return combinations;
Clojure version:
(defn comb [k l]
(if (= 1 k) (map vector l)
(apply concat
#(map (fn [x] (conj x %2))
(comb (dec k) (drop (inc %1) l)))
Count from 1 to 2^n.
Convert each digit to its binary representation.
Translate each 'on' bit to elements of your set, based on position.
In C#:
void Main()
var set = new [] {"A", "B", "C", "D" }; //, "E", "F", "G", "H", "I", "J" };
var kElement = 2;
for(var i = 1; i < Math.Pow(2, set.Length); i++) {
var result = Convert.ToString(i, 2).PadLeft(set.Length, '0');
var cnt = Regex.Matches(Regex.Escape(result), "1").Count;
if (cnt == kElement) {
for(int j = 0; j < set.Length; j++)
if ( Char.GetNumericValue(result[j]) == 1)
Why does it work?
There is a bijection between the subsets of an n-element set and n-bit sequences.
That means we can figure out how many subsets there are by counting sequences.
e.g., the four element set below can be represented by {0,1} X {0, 1} X {0, 1} X {0, 1} (or 2^4) different sequences.
So - all we have to do is count from 1 to 2^n to find all the combinations. (We ignore the empty set.) Next, translate the digits to their binary representation. Then substitute elements of your set for 'on' bits.
If you want only k element results, only print when k bits are 'on'.
(If you want all subsets instead of k length subsets, remove the cnt/kElement part.)
(For proof, see MIT free courseware Mathematics for Computer Science, Lehman et al, section 11.2.2. )
short python code, yielding index positions
def yield_combos(n,k):
# n is set size, k is combo size
i = 0
a = [0]*k
while i > -1:
for j in range(i+1, k):
a[j] = a[j-1]+1
yield a
while a[i] == i + n - k:
i -= 1
a[i] += 1
All said and and done here comes the O'caml code for that.
Algorithm is evident from the code..
let combi n lst =
let rec comb l c =
if( List.length c = n) then [c] else
match l with
[] -> []
| (h::t) -> (combi t (h::c))#(combi t c)
combi lst []
Here is a method which gives you all combinations of specified size from a random length string. Similar to quinmars' solution, but works for varied input and k.
The code can be changed to wrap around, ie 'dab' from input 'abcd' w k=3.
public void run(String data, int howMany){
choose(data, howMany, new StringBuffer(), 0);
//n choose k
private void choose(String data, int k, StringBuffer result, int startIndex){
if (result.length()==k){
for (int i=startIndex; i<data.length(); i++){
choose(data,k,result, i+1);
Output for "abcde":
abc abd abe acd ace ade bcd bce bde cde
Short javascript version (ES 5)
let combine = (list, n) =>
n == 0 ?
[[]] :
list.flatMap((e, i) =>
list.slice(i + 1),
n - 1
).map(c => [e].concat(c))
let res = combine([1,2,3,4], 3);
res.forEach(e => console.log(e.join()));
Another python recusive solution.
def combination_indicies(n, k, j = 0, stack = []):
if len(stack) == k:
yield list(stack)
for i in range(j, n):
for x in combination_indicies(n, k, i + 1, stack):
yield x
list(combination_indicies(5, 3))
[[0, 1, 2],
[0, 1, 3],
[0, 1, 4],
[0, 2, 3],
[0, 2, 4],
[0, 3, 4],
[1, 2, 3],
[1, 2, 4],
[1, 3, 4],
[2, 3, 4]]
I created a solution in SQL Server 2005 for this, and posted it on my website:
Here is an example to show usage:
SELECT * FROM dbo.fn_GetMChooseNCombos('ABCD', 2, '')
(6 row(s) affected)
Here is my proposition in C++
I tried to impose as little restriction on the iterator type as i could so this solution assumes just forward iterator, and it can be a const_iterator. This should work with any standard container. In cases where arguments don't make sense it throws std::invalid_argumnent
#include <vector>
#include <stdexcept>
template <typename Fci> // Fci - forward const iterator
std::vector<std::vector<Fci> >
enumerate_combinations(Fci begin, Fci end, unsigned int combination_size)
if(begin == end && combination_size > 0u)
throw std::invalid_argument("empty set and positive combination size!");
std::vector<std::vector<Fci> > result; // empty set of combinations
if(combination_size == 0u) return result; // there is exactly one combination of
// size 0 - emty set
std::vector<Fci> current_combination;
current_combination.reserve(combination_size + 1u); // I reserve one aditional slot
// in my vector to store
// the end sentinel there.
// The code is cleaner thanks to that
for(unsigned int i = 0u; i < combination_size && begin != end; ++i, ++begin)
current_combination.push_back(begin); // Construction of the first combination
// Since I assume the itarators support only incrementing, I have to iterate over
// the set to get its size, which is expensive. Here I had to itrate anyway to
// produce the first cobination, so I use the loop to also check the size.
if(current_combination.size() < combination_size)
throw std::invalid_argument("combination size > set size!");
result.push_back(current_combination); // Store the first combination in the results set
current_combination.push_back(end); // Here I add mentioned earlier sentinel to
// simplyfy rest of the code. If I did it
// earlier, previous statement would get ugly.
unsigned int i = combination_size;
Fci tmp; // Thanks to the sentinel I can find first
do // iterator to change, simply by scaning
{ // from right to left and looking for the
tmp = current_combination[--i]; // first "bubble". The fact, that it's
++tmp; // a forward iterator makes it ugly but I
} // can't help it.
while(i > 0u && tmp == current_combination[i + 1u]);
// Here is probably my most obfuscated expression.
// Loop above looks for a "bubble". If there is no "bubble", that means, that
// current_combination is the last combination, Expression in the if statement
// below evaluates to true and the function exits returning result.
// If the "bubble" is found however, the ststement below has a sideeffect of
// incrementing the first iterator to the left of the "bubble".
if(++current_combination[i] == current_combination[i + 1u])
return result;
// Rest of the code sets posiotons of the rest of the iterstors
// (if there are any), that are to the right of the incremented one,
// to form next combination
while(++i < combination_size)
current_combination[i] = current_combination[i - 1u];
// Below is the ugly side of using the sentinel. Well it had to haave some
// disadvantage. Try without it.
current_combination.end() - 1));
Here is a code I recently wrote in Java, which calculates and returns all the combination of "num" elements from "outOf" elements.
// author: Sourabh Bhat (
public class Testing
public static void main(String[] args)
// Test case num = 5, outOf = 8.
int num = 5;
int outOf = 8;
int[][] combinations = getCombinations(num, outOf);
for (int i = 0; i < combinations.length; i++)
for (int j = 0; j < combinations[i].length; j++)
System.out.print(combinations[i][j] + " ");
private static int[][] getCombinations(int num, int outOf)
int possibilities = get_nCr(outOf, num);
int[][] combinations = new int[possibilities][num];
int arrayPointer = 0;
int[] counter = new int[num];
for (int i = 0; i < num; i++)
counter[i] = i;
breakLoop: while (true)
// Initializing part
for (int i = 1; i < num; i++)
if (counter[i] >= outOf - (num - 1 - i))
counter[i] = counter[i - 1] + 1;
// Testing part
for (int i = 0; i < num; i++)
if (counter[i] < outOf)
} else
break breakLoop;
// Innermost part
combinations[arrayPointer] = counter.clone();
// Incrementing part
counter[num - 1]++;
for (int i = num - 1; i >= 1; i--)
if (counter[i] >= outOf - (num - 1 - i))
counter[i - 1]++;
return combinations;
private static int get_nCr(int n, int r)
if(r > n)
throw new ArithmeticException("r is greater then n");
long numerator = 1;
long denominator = 1;
for (int i = n; i >= r + 1; i--)
numerator *= i;
for (int i = 2; i <= n - r; i++)
denominator *= i;
return (int) (numerator / denominator);

I need to add string characters in C. A + B must = C. Literally

I am writing a program that is due tonight at midnight, and I am utterly stuck. The program is written in C, and takes input from the user in the form SOS where S = a string of characters, O = an operator (I.E. '+', '-', '*', '/'). The example input and output in the book is the following:
Input> abc+aab
Output: abc + aab => bce
And that's literally, not variable. Like, a + a must = b.
What is the code to do this operation? I will post the code I have so far, however all it does is take the input and divide it between each part.
#include <stdio.h>
#include <string.h>
int main() {
char in[20], s1[10], s2[10], o[2], ans[15];
while(1) {
printf("\nInput> ");
scanf("%s", in);
if (in[0] == 'q' && in[1] == 'u' && in[2] == 'i' && in[3] == 't') {
return 0;
int i, hold, breakNum;
for (i = 0; i < 20; i++) {
if (in[i] == '+' || in[i] == '-' || in[i] == '/' || in[i] == '*') {
hold = i;
if (in[i] == '\0') {
breakNum = i;
int j;
for (j = 0; j < hold; j++) {
s1[j] = in[j];
s1[hold] = '\0';
o[0] = in[hold];
o[1] = '\0';
int k;
int l = 0;
for (k = (hold + 1); k < breakNum; k++) {
s2[l] = in[k];
s2[breakNum] = '\0';
printf("%s %s %s =>\n", s1, o, s2);
Since this is homework, let's focus on how to solve this, rather than providing a bunch of code which I suspect your instructor would frown upon.
First, don't do everything from within the main() function. Break it up into smaller functions each of which do part of the task.
Second, break the task into its component pieces and write out the pseudocode:
while ( 1 )
// read input "abc + def"
// convert input into tokens "abc", "+", "def"
// evaluate tokens 1 and 3 as operands ("abc" -> 123, "def" -> 456)
// perform the operation indicated by token 2
// format the result as a series of characters (579 -> "egi")
Finally, write each of the functions. Of course, if you stumble upon roadblocks along the way, be sure to come back to ask your specific questions.
Based on your examples, it appears “a” acts like 1, “b” acts like 2, and so on. Given this, you can perform the arithmetic on individual characters like this:
// Map character from first string to an integer.
int c1 = s1[j] - 'a' + 1;
// Map character from second string to an integer.
int c2 = s2[j] - 'a' + 1;
// Perform operation.
int result = c1 + c2;
// Map result to a character.
char c = result - 1 + 'a';
There are some things you have to add to this:
You have to put this in a loop, to do it for each character in the strings.
You have to vary the operation according to the operator specified in the input.
You have to do something with each result, likely printing it.
You have to do something about results that extended beyond the alphabet, like “y+y”, “a-b”, or “a/b”.
If we assume, from your example answer, that a is going to be the representation of 1, then you can find the representation values of all the other values and subtract the value representation of a from it.
for (i = 0; i < str_len; i++) {
int s1Int = (int)s1[i];
int s2Int = (int)s1[i];
int addAmount = 1 + abs((int)'a' - s2Int);
output[i] = (char)(s1Int + addAmount)
1) For the length of the s1 or s2
2) Retrieve the decimal value of the first char
3) Retrieve the decimal value of the second char
4) Find the difference between the letter a (97) and the second char + 1 <-- assuming a is the representation of 1
5) Add the difference to the s1 char and convert the decimal representation back to a character.
Example 1:
if S1 char is a, S2 char is b:
s1Int = 97
s2Int = 98
addAmount = abs((int)'a' - s2Int)) = 1 + abs(97 - 98) = 2
output = s1Int + addAmount = 97 + 2 = 99 = c
Example 2:
if S1 char is c, S2 char is a:
s1Int = 99
s2Int = 97
addAmount = abs((int)'a' - s2Int)) = 1 + abs(97 - 97) = 1
output = s1Int + addAmount = 99 + 1 = 100 = d
