Fastest Permutation algorithms for unique permutations using smallest memory [duplicate] - permutation

For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)

For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');

This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.

Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa

Related

Shifting array data structure C#

I was curious to know if there is an efficient way to store data into a container that has a maximum amount of values and when that value is reached it start removing the oldest values in order to add new ones. And all this in an ordered fashion (meaning that new data should come after the last new data).
I know I could achieve this using a queue
q.Enqueue(1);
q.Enqueue(2);
q.Enqueue(3); // 1 2 3
q.Dequeue(); // 2 3
q.Enqueue(4); // 2 3 4
but in order to iterate through the data afterwards requires to transform the queue in an array, which I'm not sure how efficient it is.
Maybe it's better to have an array with a fixed size and have an index that shifts to the start when the array is full and using some modulo magic iterate always backwards to query the data from most recent to less recent. This would be less readable but working and more efficient I guess.
So my question would be, is there a better more readable and efficient way?
And also, what is the efficiency of using ToArray() when using other data structures (e.g. List, Queue, Stack..). When should this be avoided?
At the end I decided to implement my idea. Not sure if this is the best way, but it works well for my needs:
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
public class DataHolder<T> : IEnumerable
{
private T[] _data;
private int _maxSize;
private int _currIdx;
private int _currSize;
public DataHolder(int size)
{
_maxSize = size;
_data = new T[_maxSize];
_currIdx = -1;
_currSize = 0;
}
public void Add(T data)
{
if (_currSize < _maxSize) _currSize++;
_currIdx = NioUtils.PositiveMod((_currIdx + 1), _maxSize);
_data[_currIdx] = data;
}
///<summary>
/// Gets the element at index. The 0 element is the last one added.
///</summary>
public T GetElementAt(int index)
{
if (index >= _currSize)
{
throw new System.ArgumentException("Index out of bounds exception.", "index");
}
int shiftIndex = NioUtils.PositiveMod((_currIdx - index), _maxSize);
return _data[shiftIndex];
}
/// Implement interface IEnumerable in order to iterate throught this object.
public IEnumerator GetEnumerator()
{
int count = 0;
int index = _currIdx;
while (count < _currSize)
{
count++;
yield return _data[index];
index = NioUtils.PositiveMod((index - 1), _maxSize);
}
}
}
Where PositiveMod is:
public static int PositiveMod(int value, int n)
{
int v = value % n;
return v >= 0 ? v : n + v;
}

Order of init calls in Kotlin Array initialization

In the constructor of an Array is there a guarantee that the init function will be called for the indexes in an increasing order?
It would make sense but I did not find any such information in the docs:
https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/-array/-init-.html#kotlin.Array%24%28kotlin.Int%2C+kotlin.Function1%28%28kotlin.Int%2C+kotlin.Array.T%29%29%29%2Finit
There is no guarantee for this in the API.
TLDR: If you need the sequential execution, because you have some state that changes see bottom.
First lets have a look at the implementations of the initializer:
Native: It is implemented in increasing order for Kotlin Native.
#InlineConstructor
public constructor(size: Int, init: (Int) -> Char): this(size) {
for (i in 0..size - 1) {
this[i] = init(i)
}
}
JVM: Decompiling the Kotlin byte code for
class test {
val intArray = IntArray(100) { it * 2 }
}
to Java in Android Studio yields:
public final class test {
#NotNull
private final int[] intArray;
#NotNull
public final int[] getIntArray() {
return this.intArray;
}
public test() {
int size$iv = 100;
int[] result$iv = new int[size$iv];
int i$iv = 0;
for(int var4 = result$iv.length; i$iv < var4; ++i$iv) {
int var6 = false;
int var11 = i$iv * 2;
result$iv[i$iv] = var11;
}
this.intArray = result$iv;
}
}
which supports the claim that it is initialized in ascending order.
Conclusion: It commonly is implemented to be executed in ascending order.
BUT: You can not rely on the execution order, as the implementation is not guaranteed by the API. It can change and it can be different for different platforms (although both is unlikely).
Solution: You can initialize the array manually in a loop, then you have control about the execution order.
The following example outlines a possible implementation that has a stable initialisation with random values, e.g. for tests.
val intArray = IntArray(100).also {
val random = Random(0)
for (index in it.indices) {
it[index] = index * random.nextInt()
}
}
Starting from the version 1.3.50 Kotlin has guaranteed sequential array initialization order in its API documentation: https://kotlinlang.org/api/latest/jvm/stdlib/kotlin/-array/-init-.html
The function init is called for each array element sequentially starting from the first one. It should return the value for an array element given its index.

How to find all sequences of three in an array of values

first question ever here...
I am coding a simple 3-card poker hand evaluator and am having problems finding/extracting multiple "straights" (sequential series of values) from an array of values.
I need to extract and return EVERY straight the array possibly has. Here's an example:
(assume array is first sorted numerically incrementing)
myArray = [1h,2h,3c,3h,4c]
Possible three-value sequences are:
[1h,2h,3c]
[1h,2h,3h]
[2h,3c,4c]
[2h,3h,4c]
Here is my original code to find sequences of 3, where the array contains card objects with .value and .suit. For simplicity in this question I just put "2h" etc here:
private var _pokerHand = [1h,2h,3c,3h,4c];
private function getAllStraights(): Array
{
var foundStraights:Array = new Array();
for (var i: int = 0; i < (_handLength - 2); i++)
{
if ((_pokerHand[i].value - _pokerHand[i + 1].value) == 1 && (_pokerHand[i + 1].value - _pokerHand[i + 2].value) == 1)
{
trace("found a straight!");
foundStraights.push(new Array(_pokerHand[i], _pokerHand[i + 1], _pokerHand[i + 2]));
}
}
return foundStraights;
}
but it of course fails when there are value duplicates (like the 3's above). I cannot discard duplicates because they could be of different suits. I need every possible straight as in the example above. This allows me to run the straights through a "Flush" function to find "straight flush".
What array iteration technique am I missing?
This is an interesting problem. Given the popularity of poker games (and Flash) I'm sure this has been solved many times before, but I couldn't find an example online. Here's how I would approach it:
Look at it like a path finding problem.
Begin with every card in the hand as the start of a possible path (straight).
While there are possible straights:
Remove one from the list.
Find all the next valid steps, (could be none, or up to 4 following cards with the same value), and for each next valid step:
If it reaches the goal (completes a straight) add it to a list of found straights.
Otherwise add the possible straight with the next step back to the stack.
This seems to do what you want (Card object has .value as int):
private function getAllStraights(cards:Vector.<Card>, straightLength:uint = 3):Vector.<Vector.<Card>> {
var foundStraights:Vector.<Vector.<Card>> = new <Vector.<Card>>[];
var possibleStraights:Vector.<Vector.<Card>> = new <Vector.<Card>>[];
for each (var startingCard:Card in cards) {
possibleStraights.push(new <Card>[startingCard]);
}
while (possibleStraights.length) {
var possibleStraight:Vector.<Card> = possibleStraights.shift();
var lastCard:Card = possibleStraight[possibleStraight.length - 1];
var possibleNextCards:Vector.<Card> = new <Card>[];
for (var i:int = cards.indexOf(lastCard) + 1; i < cards.length; i++) {
var nextCard:Card = cards[i];
if (nextCard.value == lastCard.value)
continue;
if (nextCard.value == lastCard.value + 1)
possibleNextCards.push(nextCard);
else
break;
}
for each (var possibleNextCard:Card in possibleNextCards) {
var possibleNextStraight:Vector.<Card> = possibleStraight.slice().concat(new <Card>[possibleNextCard]);
if (possibleNextStraight.length == straightLength)
foundStraights.push(possibleNextStraight);
else
possibleStraights.push(possibleNextStraight);
}
}
return foundStraights;
}
Given [1♥,2♥,3♣,3♥,4♣] you get: [1♥,2♥,3♣], [1♥,2♥,3♥], [2♥,3♣,4♣], [2♥,3♥,4♣]
It gets really interesting when you have a lot of duplicates, like [1♥,1♣,1♦,1♠,2♥,2♣,3♦,3♠,4♣,4♦,4♥]. This gives you:
[1♥,2♥,3♦], [1♥,2♥,3♠], [1♥,2♣,3♦], [1♥,2♣,3♠], [1♣,2♥,3♦], [1♣,2♥,3♠], [1♣,2♣,3♦], [1♣,2♣,3♠], [1♦,2♥,3♦], [1♦,2♥,3♠], [1♦,2♣,3♦], [1♦,2♣,3♠], [1♠,2♥,3♦], [1♠,2♥,3♠], [1♠,2♣,3♦], [1♠,2♣,3♠], [2♥,3♦,4♣], [2♥,3♦,4♦], [2♥,3♦,4♥], [2♥,3♠,4♣], [2♥,3♠,4♦], [2♥,3♠,4♥], [2♣,3♦,4♣], [2♣,3♦,4♦], [2♣,3♦,4♥], [2♣,3♠,4♣], [2♣,3♠,4♦], [2♣,3♠,4♥]
I haven't checked this thoroughly but it looks right at a glance.

Given an array, find combinations of n numbers that are less than c

This is a tough one, at least for my minimal c skills.
Basically, the user enters a list of prices into an array, and then the desired number of items he wants to purchase, and finally a maximum cost not to exceed.
I need to check how many combinations of the desired number of items are less than or equal to the cost given.
If the problem was a fixed number of items in the combination, say 3, it would be much easier with just three loops selecting each price and adding them to test.
Where I get stumped is the requirement that the user enter any number of items, up to the number of items in the array.
This is what I decided on at first, before realizing that the user could specify combinations of any number, not just three. It was created with help from a similar topic on here, but again it only works if the user specifies he wants 3 items per combination. Otherwise it doesn't work.
// test if any combinations of items can be made
for (one = 0; one < (count-2); one++) // count -2 to account for the two other variables
{
for (two = one + 1; two < (count-1); two++) // count -1 to account for the last variable
{
for (three = two + 1; three < count; three++)
{
total = itemCosts[one] + itemCosts[two] + itemCosts[three];
if (total <= funds)
{
// DEBUG printf("\nMatch found! %d + %d + %d, total: %d.", itemCosts[one], itemCosts[two], itemCosts[three], total);
combos++;
}
}
}
}
As far as I can tell there's no easy way to adapt this to be flexible based on the user's desired number of items per combination.
I would really appreciate any help given.
One trick to flattening nested iterations is to use recursion.
Make a function that takes an array of items that you have selected so far, and the number of items you've picked up to this point. The algorithm should go like this:
If you have picked the number of items equal to your target of N, compute the sum and check it against the limit
If you have not picked enough items, add one more item to your list, and make a recursive call.
To ensure that you do not pick the same item twice, pass the smallest index from which the function may pick. The declaration of the function may look like this:
int count_combinations(
int itemCosts[]
, size_t costCount
, int pickedItems[]
, size_t pickedCount
, size_t pickedTargetCount
, size_t minIndex
, int funds
) {
if (pickedCount == pickedTargetCount) {
// This is the base case. It has the code similar to
// the "if" statement from your code, but the number of items
// is not fixed.
int sum = 0;
for (size_t i = 0 ; i != pickedCount ; i++) {
sum += pickedItems[i];
}
// The following line will return 0 or 1,
// depending on the result of the comparison.
return sum <= funds;
} else {
// This is the recursive case. It is similar to one of your "for"
// loops, but instead of setting "one", "two", or "three"
// it sets pickedItems[0], pickedItems[1], etc.
int res = 0;
for (size_t i = minIndex ; i != costCount ; i++) {
pickedItems[pickedCount] = itemCosts[i];
res += count_combinations(
itemCosts
, costCount
, pickedItems
, pickedCount+1
, pickedTargetCount
, i+1
, funds
);
}
return res;
}
}
You call this function like this:
int itemCosts[C] = {...}; // The costs
int pickedItems[N]; // No need to initialize this array
int res = count_combinations(itemCosts, C, pickedItems, 0, N, 0, funds);
Demo.
This can be done by using a backtracking algorithm. This is equivalent to implementing a list of nested for loops. This can be better understood by trying to see the execution pattern of a sequence of nested for loops.
For example lets say you have, as you presented, a sequence of 3 fors and the code execution has reached the third level (the innermost). After this goes through all its iterations you return to the second level for where you go to the next iteration in which you jump again in third level for. Similarly, when the second level finishes all its iteration you jump back to the first level for which continues with the next iteration in which you jump in the second level and from there in the third.
So, in a given level you try go to the deeper one (if there is one) and if there are no more iterations you go back a level (back track).
Using the backtracking you represent the nested for by an array where each element is an index variable: array[0] is the index for for level 0, and so on.
Here is a sample implementation for your problem:
#define NUMBER_OF_OBJECTS 10
#define FORLOOP_DEPTH 4 // This is equivalent with the number of
// of nested fors and in the problem is
// the number of requested objects
#define FORLOOP_ARRAY_INIT -1 // This is a init value for each "forloop" variable
#define true 1
#define false 0
typedef int bool;
int main(void)
{
int object_prices[NUMBER_OF_OBJECTS];
int forLoopsArray[FORLOOP_DEPTH];
bool isLoopVariableValueUsed[NUMBER_OF_OBJECTS];
int forLoopLevel = 0;
for (int i = 0; i < FORLOOP_DEPTH; i++)
{
forLoopsArray[i] = FORLOOP_ARRAY_INIT;
}
for (int i = 0; i < NUMBER_OF_OBJECTS; i++)
{
isLoopVariableValueUsed[i] = false;
}
forLoopLevel = 0; // Start from level zero
while (forLoopLevel >= 0)
{
bool isOkVal = false;
if (forLoopsArray[forLoopLevel] != FORLOOP_ARRAY_INIT)
{
// We'll mark the loopvariable value from the last iterration unused
// since we'll use a new one (in this iterration)
isLoopVariableValueUsed[forLoopsArray[forLoopLevel]] = false;
}
/* All iterations (in all levels) start basically from zero
* Because of that here I check that the loop variable for this level
* is different than the previous ones or try the next value otherwise
*/
while ( isOkVal == false
&& forLoopsArray[forLoopLevel] < (NUMBER_OF_OBJECTS - 1))
{
forLoopsArray[forLoopLevel]++; // Try a new value
if (loopVariableValueUsed[forLoopsArray[forLoopLevel]] == false)
{
objectUsed[forLoopsArray[forLoopLevel]] = true;
isOkVal = true;
}
}
if (isOkVal == true) // Have we found in this level an different item?
{
if (forLoopLevel == FORLOOP_DEPTH - 1) // Is it the innermost?
{
/* Here is the innermost level where you can test
* if the sum of all selected items is smaller than
* the target
*/
}
else // Nope, go a level deeper
{
forLoopLevel++;
}
}
else // We've run out of values in this level, go back
{
forLoopsArray[forLoopLevel] = FORLOOP_ARRAY_INIT;
forLoopLevel--;
}
}
}

Algorithm for joining e.g. an array of strings

I have wondered for some time, what a nice, clean solution for joining an array of strings might look like.
Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented.
When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string).
I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
The most elegant solution i found for problems like this is something like this (in pseudocode)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
I usually go with something like...
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
join([]) = ""
join([x]) = "x"
join([x, rest]) = "x," + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
(reduce #'* '(1 2 3 4 5)) => 120
#Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join() strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
Consider join() in Python. It can be easily used:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join() method is implemented in C (taken from trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything. code is a small part of the whole function.
In pseudocode:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
' Pseudo code Assume zero based
ResultString = InputArray[0]
n = 1
while n (is less than) Number_Of_Strings
ResultString (concatenate) ", "
ResultString (concatenate) InputArray[n]
n = n + 1
loop
In Perl, I just use the join command:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
In languages that don't have a built in, like C, I use simple iteration (untested):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
You either have to special case the first entry or the last.
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
Collection >> asStringWith:
so, using that, you'd write:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
join() function in Ruby:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
join() in Perl:
use List::Util qw(reduce);
sub mjoin($#) {$sep = shift; reduce {$a.$sep.$b} #_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce:
sub mjoin($#)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (#_);
$sum or ''
}
Perl 6
sub join( $separator, #strings ){
my $return = shift #strings;
for #strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
Use the String.join method in C#
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
In Java 5, with unit test:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
#Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}

Resources