C fprintf %g correspondence in Scala/Java - c

I have to automatically compare text output from a C program and a conversion to Scala. Unfortunately, I cannot find a way to format double values the same way. In C I have for example:
double diff = -1.0; // just an example
fprintf(stdout, "diff %g\n", diff);
This is written as
diff -1
(no decimal point).
In Scala I have
val diff = -1.0; // just an example
println(f"diff $diff%g\n")
This is written as
diff -1.00000
In other words, the %g formatter doesn't do the same. I am looking at this document, but can't a way to have 100% compatible fprintf format.
Edit: Without using JNI or writing page long special methods.

Scala uses the the Java Formatter class documented here. To get what you want:
scala> "%.0g".format(-1.0)
res0: String = -1

Ok, so this is an identical question with a great answer by #john-douthat, which needs some tiny adjustments:
def ¬(x: Double) = {
val obj = x.toFloat.asInstanceOf[AnyRef]
String.format("%.6g", obj).replaceFirst("\\.?0+(e|$)", "$1")
}
val diff = -1.0
println(s"diff ${¬(diff)}")
println(s"Pi ${¬(math.Pi)}")
Output:
diff -1
Pi 3.14159
There are still some problems, though. For example -3.59847e-10 gets output as -3.59847e-1. This is bug in the regexp for replaceFirst. If someone can fix it better than this, I'd be very happy (regexps have a mental mismatch with me):
def ¬(x: Double) = {
val obj = x.toFloat.asInstanceOf[AnyRef]
val s0 = String.format("%.6g", obj)
val i0 = s0.lastIndexOf('e')
val i = if (i0 < 0) s0.length else i0
s0.substring(0, i).replaceFirst("\\.?0+($)", "$1") + s0.substring(i)
}

Check this
val diff = -1.0 //> diff : Double = -1.0
"diff %g\n".format(diff) //> res6: String = "diff -1.00000
//| "

Related

Scala - convert Array[String] to Array[Double]

I am a newbie to functional programming language and I am learning it in Scala for a University project.
This may seem simple, but I am unable to find enough help online for this or a straightforward way of doing this - how can I convert an Array[String] to Array[Double]? I have a CSV file which, when read into the REPL is interpreted as String values (each line of the file has a mix of integer and string values) which would return a type Array[String]. I want to encode the string values with a double/int values to return Array[Double] in order to make the array homogeneous. Is there a straightforward way of doing this? Any guidance will be much appreciated.
What I have done until now is:
def retrieveExamplesFromFile(fileName : String) : Array[Array[String]] = {
val items = for {
line <- Source.fromFile(fileName).getLines()
entries = line.split(",")
} yield entries
return items.toArray
}
The format of each line (returned as String[]) is so:
[[1.0, 2.0, item1], [5, 8.9, item2],....]
And to convert each line in the CSV file into double array, I only have a psuedo definition drafted so:
def generateNumbersForStringValues(values : Array[String]) : Array[Double] = {
val line = for(item <- values)
{
//correct way?
item.replace("item1", "1.0")
item.replace("item2", "1.0")
}
return //unable to typecast/convert
}
Any ideas are welcome. Thank you for your time.
You probably want to use map along with toDouble:
values.map(x => x.toDouble)
Or more concisely:
values.map(_.toDouble)
And for the fallback for non-double strings, you might consider using the Try monad (in scala.util):
values.map(x => Try(x.toDouble).getOrElse(1.0))
If you know what each line will look like, you could also do pattern matching:
values map {
case Array(a, b, c) => Array(a.toDouble, b.toDouble, 1.0)
}
Expanding on #DaunnC's comment, you can use the Try utility to do this and pattern match on the result so you can avoid calling get or wrapping your result in an Option:
import scala.util.{Try, Success, Failure}
def main = {
val maybeDoubles = Array("5", "1.0", "8.5", "10.0", "item1", "item2")
val convertDoubles = maybeDoubles.map { x =>
Try(x.toDouble)
}
val convertedArray = convertDoubles.map {
_ match {
case Success(res) => res
case Failure(f) => 1.0
}
}
convertedArray
}
This allows you to pattern match on the result of Try, which is always either a Success or Failure, without having to call get or otherwise wrap your results.
Here is some more information on Try courtesy of Mauricio Linhares: https://mauricio.github.io/2014/02/17/scala-either-try-and-the-m-word.html
You mean to convert all strings to double with a fallback to 1.0 for all inconvertible strings? That would be:
val x = Array(
Array("1.0", "2.0", "item1"),
Array("5", "8.9", "item2"))
x.map( _.map { y =>
try {
y.toDouble
} catch {
case _: NumberFormatException => 1.0
}
})
Scala 2.13 introduced String::toDoubleOption which used within a map transformation, can be associated with Option::getOrElse to safely cast Strings into Doubles:
Array("5", "1.0", ".", "8.5", "int").map(_.toDoubleOption.getOrElse(1d))
// Array[Double] = Array(5.0, 1.0, 1.0, 8.5, 1.0)

Ruby native C big integer segfault

I'm working on a Ruby native C method: power mod. Here's what I got:
#define TO_BIGNUM(x) (FIXNUM_P(x) ? rb_int2big(FIX2LONG(x)) : x)
#define CONST2BIGNUM(x) (TO_BIGNUM(INT2NUM(x)))
VALUE method_big_power_mod(VALUE self, VALUE base, VALUE exp, VALUE mod){
VALUE res = TO_BIGNUM(INT2NUM(1));
base = TO_BIGNUM(base);
exp = TO_BIGNUM(exp);
mod = TO_BIGNUM(mod);
while (rb_big_cmp(exp, CONST2BIGNUM(0))) {
if (rb_big_modulo(exp, CONST2BIGNUM(2))) {
VALUE mul = rb_big_mul(res, base);
res = rb_big_modulo(mul, mod);
}
base = rb_big_modulo(rb_big_pow(base, CONST2BIGNUM(2)), mod);
exp = rb_big_div(exp, CONST2BIGNUM(2));
}
return res;
}
It segfaults every time. I isolated the problem to rb_big_modulo calls. gdb stacktrace says that it crashes in the bigdivrem method after calling rb_big_modulo. I tried to look through the source of bignum.c, but I can't figure out what's causing the crash. Am I doing something wrong?
There are two problems that are causing the segfault:
1 - The functions rb_big_* sometimes doesn't return a Bignum object, but when you call then the first arg must be a Bignum object. For example:
if (rb_big_modulo(exp, CONST2BIGNUM(2))) {
VALUE mul = rb_big_mul(res, base); // This maybe return a Fixnum
res = rb_big_modulo(mul, mod); // This will cause a segfault :(
}
2 - The function rb_big_pow when you call it with both args Bignum, it will warn you and will return a Float object where you can't convert easily to a Bignum object. So, you should replace the line where you call it by:
VALUE x = TO_BIGNUM(rb_big_pow(base, INT2NUM(2))); // Power by a Fixnum instead a Bignum
base = TO_BIGNUM(rb_big_modulo(x , mod));
The final implementation will be:
#define TO_BIGNUM(x) (FIXNUM_P(x) ? rb_int2big(FIX2LONG(x)) : x)
#define CONST2BIGNUM(x) (TO_BIGNUM(INT2NUM(x)))
VALUE method_big_power_mod(VALUE self, VALUE base, VALUE exp, VALUE mod){
VALUE res = TO_BIGNUM(INT2NUM(1));
base = TO_BIGNUM(base);
exp = TO_BIGNUM(exp);
mod = TO_BIGNUM(mod);
while (rb_big_cmp(exp, CONST2BIGNUM(0))) {
if (rb_big_modulo(exp, CONST2BIGNUM(2))) {
VALUE mul = TO_BIGNUM(rb_big_mul(res, base));
res = TO_BIGNUM(rb_big_modulo(mul, mod));
}
VALUE x = TO_BIGNUM(rb_big_pow(base, INT2NUM(2)));
base = TO_BIGNUM(rb_big_modulo(x , mod));
exp = TO_BIGNUM(rb_big_div(exp, CONST2BIGNUM(2)));
}
return res;
}
I don't know the performance impact with all these conversions. Maybe, you should test when it is a Fixnum or a Bignumand calculate it using the proper function or benchmark both approaches.
When I ran it, I went thought an infinite loop, but I don't know if I call it with the correct values.

Convert a string in char to array in MATLAB

I have a list of strings in a char array:
'gvs(0.000000000000000e+000, 1.601985139535780e+002)'
'gvs(-5.000000000000000e-005, 1.365231866954370e+002)'
'gvs(-1.000000000000000e-004, 1.169431404340180e+002)'
'gvs(-5.000000000000000e-004, 3.187711314514890e+001)'
'gvs(-2.000000000000000e-004, 8.589930648472340e+001)'
Which I am trying to convert to an array of just the numbers (ignoring gvs, the comma and the brackets), but I can't quite work out what I'm doing wrong?
cols = length(Variables) + length(Parameters);
% currently unused
rows = length(Results);
for a = 1:rows;
Res(a,:) = sscanf ((Results{a,1}(1,:)),'%*s %f %f');
end
I've also tried textscan, but I can't get that to work right either
for a = 1:rows;
Res = cell (textscan ((Results{a,1}(1,:)),'%*s %f %f','Delimiter', {'(',' '},'MultipleDelimsAsOne',1));
end
Any help much appreciated!
Thanks
Replace
Res(a,:) = sscanf ((Results{a,1}(1,:)),'%*s %f %f');
with
Res(a,:) = sscanf ((Results{a,1}(1,:)),'gvs(%f, %f)');
Assuming you have a char array (not a cellstring):
s = ['gvs( 0.000000000000000e+000, 1.601985139535780e+002)'
'gvs(-5.000000000000000e-005, 1.365231866954370e+002)'
'gvs(-1.000000000000000e-004, 1.169431404340180e+002)'
'gvs(-5.000000000000000e-004, 3.187711314514890e+001)'
'gvs(-2.000000000000000e-004, 8.589930648472340e+001)']
Then you can simply textscan():
data = textscan(s','gvs(%f%f)','CollectOutput',1,'Delimiter',',');
data = data{1}
data =
0 160.1985
-0.0001 136.5232
-0.0001 116.9431
-0.0005 31.8771
-0.0002 85.8993
If s is a cellstring, then before calling textscan, convert to char():
s = char(s);
Considering that your string is almost in MATLAB-array compatible format, you could commit a sin and use evalc:
>> s = {
'gvs( 0.000000000000000e+000, 1.601985139535780e+002)'
'gvs(-5.000000000000000e-005, 1.365231866954370e+002)'
'gvs(-1.000000000000000e-004, 1.169431404340180e+002)'
'gvs(-5.000000000000000e-004, 3.187711314514890e+001)'
'gvs(-2.000000000000000e-004, 8.589930648472340e+001)'};
>> C = evalc(['[' regexprep([s{:}], {'gvs\(' '\)'}, {'' ';'}) ']'])
ans =
0 1.601985139535780e+002
-5.000000000000000e-005 1.365231866954370e+002
-1.000000000000000e-004 1.169431404340180e+002
-5.000000000000000e-004 3.187711314514890e+001
-2.000000000000000e-004 8.589930648472340e+001

How would I return a value from a function which iterates over a for loop in F#

I am trying loop over an array and return a value as shown below. But this gives me an error on the line after the if statement. It says "This expression was expected to have type unit but has type int"
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
for i = inputBits.Length - 1 to 0 do
if inputBits.[i] then
i
done
How would I do this? I am in the middle of recoding this with a recursive loop, as it seems to be the more accepted way of doing such loops in functional languages, but I still want to know what I was doing wrong above.
for loops are not supposed to return values, they only do an operation a fixed number of times then return () (unit). If you want to iterate and finally return something, you may :
have outside the loop a reference where you put the final result when you get it, then after the loop return the reference content
use a recursive function directly
use a higher-order function that will encapsulate the traversal for you, and let you concentrate on the application logic
The higher-function is nice if your data structure supports it. Simple traversal functions such as fold_left, however, don't support stopping the iteration prematurely. If you wish to support this (and clearly it would be interesting in your use case), you must use a traversal with premature exit support. For easy functions such as yours, a simple recursive function is probably the simplest.
In F# it should also be possible to write your function in imperative style, using yield to turn it into a generator, then finally forcing the generator to get the result. This could be seen as a counterpart of the OCaml technique of using an exception to jump out of the loop.
Edit: A nice solution to avoid the "premature stop" questions is to use a lazy intermediate data structure, which will only be built up to the first satisfying result. This is elegant and good scripting style, but still less efficient than direct exit support or simple recursion. I guess it depends on your needs; is this function to be used in a critical path?
Edit: following are some code sample. They're OCaml and the data structures are different (some of them use libraries from Batteries), but the ideas are the same.
(* using a reference as accumulator *)
let most_significant_bit input_bits =
let result = ref None in
for i = Array.length input_bits - 1 downto 0 do
if input_bits.(i) then
if !result = None then
result := Some i
done;
!result
let most_significant_bit input_bits =
let result = ref None in
for i = 0 to Array.length input_bits - 1 do
if input_bits.(i) then
(* only the last one will be kept *)
result := Some i
done;
!result
(* simple recursive version *)
let most_significant_bit input_bits =
let rec loop = function
| -1 -> None
| i ->
if input_bits.(i) then Some i
else loop (i - 1)
in
loop (Array.length input_bits - 1)
(* higher-order traversal *)
open Batteries_uni
let most_significant_bit input_bits =
Array.fold_lefti
(fun result i ->
if input_bits.(i) && result = None then Some i else result)
None input_bits
(* traversal using an intermediate lazy data structure
(a --- b) is the decreasing enumeration of integers in [b; a] *)
open Batteries_uni
let most_significant_bit input_bits =
(Array.length input_bits - 1) --- 0
|> Enum.Exceptionless.find (fun i -> input_bits.(i))
(* using an exception to break out of the loop; if I understand
correctly, exceptions are rather discouraged in F# for efficiency
reasons. I proposed to use `yield` instead and then force the
generator, but this has no direct OCaml equivalent. *)
exception Result of int
let most_significant_bit input_bits =
try
for i = Array.length input_bits - 1 downto 0 do
if input_bits.(i) then raise (Result i)
done;
None
with Result i -> Some i
Why using a loop when you can use high-order functions?
I would write:
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
Seq.cast<bool> inputBits |> Seq.tryFindIndex id
Seq module contains many functions for manipulating collections. It is often a good alternative to using imperative loops.
but I still want to know what I was
doing wrong above.
The body of a for loop is an expression of type unit. The only thing you can do from there is doing side-effects (modifying a mutable value, printing...).
In F#, a if then else is similar to ? : from C languages. The then and the else parts must have the same type, otherwise it doesn't make sense in a language with static typing. When the else is missing, the compiler assumes it is else (). Thus, the then must have type unit. Putting a value in a for loop doesn't mean return, because everything is a value in F# (including a if then).
+1 for gasche
Here are some examples in F#. I added one (the second) to show how yield works with for within a sequence expression, as gasche mentioned.
(* using a mutable variable as accumulator as per gasche's example *)
let findMostSignificantBitPosition (inputBits: BitArray) =
let mutable ret = None // 0
for i = inputBits.Length - 1 downto 0 do
if inputBits.[i] then ret <- i
ret
(* transforming to a Seq of integers with a for, then taking the first element *)
let findMostSignificantBitPosition2 (inputBits: BitArray) =
seq {
for i = 0 to inputBits.Length - 1 do
if inputBits.[i] then yield i
} |> Seq.head
(* casting to a sequence of bools then taking the index of the first "true" *)
let findMostSignificantBitPosition3 (inputBits: BitArray) =
inputBits|> Seq.cast<bool> |> Seq.findIndex(fun f -> f)
Edit: versions returning an Option
let findMostSignificantBitPosition (inputBits: BitArray) =
let mutable ret = None
for i = inputBits.Length - 1 downto 0 do
if inputBits.[i] then ret <- Some i
ret
let findMostSignificantBitPosition2 (inputBits: BitArray) =
seq {
for i = 0 to inputBits.Length - 1 do
if inputBits.[i] then yield Some(i)
else yield None
} |> Seq.tryPick id
let findMostSignificantBitPosition3 (inputBits: BitArray) =
inputBits|> Seq.cast<bool> |> Seq.tryFindIndex(fun f -> f)
I would recommend using a higher-order function (as mentioned by Laurent) or writing a recursive function explicitly (which is a general approach to replace loops in F#).
If you want to see some fancy F# solution (which is probably better version of using some temporary lazy data structure), then you can take a look at my article which defines imperative computation builder for F#. This allows you to write something like:
let findMostSignificantBitPosition (inputBits:BitArray) = imperative {
for b in Seq.cast<bool> inputBits do
if b then return true
return false }
There is some overhead (as with using other temporary lazy data structures), but it looks just like C# :-).
EDIT I also posted the samples on F# Snippets: http://fssnip.net/40
I think the reason your having issues with how to write this code is that you're not handling the failure case of not finding a set bit. Others have posted many ways of finding the bit. Here are a few ways of handling the failure case.
failure case by Option
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
None
elif inputBits.[i] then
Some i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
match findMostSignificantBitPosition test with
| Some i -> printf "Most Significant Bit: %i" i
| None -> printf "Most Significant Bit Not Found"
failure case by Exception
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
failwith "Most Significant Bit Not Found"
elif inputBits.[i] then
i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
try
let i = findMostSignificantBitPosition test
printf "Most Significant Bit: %i" i
with
| Failure msg -> printf "%s" msg
failure case by -1
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
i
elif inputBits.[i] then
i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
let i = findMostSignificantBitPosition test
if i <> -1 then
printf "Most Significant Bit: %i" i
else
printf "Most Significant Bit Not Found"
One of the options is to use seq and findIndex method as:
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
seq {
for i = inputBits.Length - 1 to 0 do
yield inputBits.[i]
} |> Seq.findIndex(fun e -> e)

Algorithm for joining e.g. an array of strings

I have wondered for some time, what a nice, clean solution for joining an array of strings might look like.
Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented.
When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string).
I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
The most elegant solution i found for problems like this is something like this (in pseudocode)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
I usually go with something like...
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
join([]) = ""
join([x]) = "x"
join([x, rest]) = "x," + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
(reduce #'* '(1 2 3 4 5)) => 120
#Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join() strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
Consider join() in Python. It can be easily used:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join() method is implemented in C (taken from trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything. code is a small part of the whole function.
In pseudocode:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
' Pseudo code Assume zero based
ResultString = InputArray[0]
n = 1
while n (is less than) Number_Of_Strings
ResultString (concatenate) ", "
ResultString (concatenate) InputArray[n]
n = n + 1
loop
In Perl, I just use the join command:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
In languages that don't have a built in, like C, I use simple iteration (untested):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
You either have to special case the first entry or the last.
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
Collection >> asStringWith:
so, using that, you'd write:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
join() function in Ruby:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
join() in Perl:
use List::Util qw(reduce);
sub mjoin($#) {$sep = shift; reduce {$a.$sep.$b} #_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce:
sub mjoin($#)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (#_);
$sum or ''
}
Perl 6
sub join( $separator, #strings ){
my $return = shift #strings;
for #strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
Use the String.join method in C#
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
In Java 5, with unit test:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
#Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}

Resources