I have wondered for some time, what a nice, clean solution for joining an array of strings might look like.
Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented.
When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string).
I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
The most elegant solution i found for problems like this is something like this (in pseudocode)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
I usually go with something like...
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
join([]) = ""
join([x]) = "x"
join([x, rest]) = "x," + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
(reduce #'* '(1 2 3 4 5)) => 120
#Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join() strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
Consider join() in Python. It can be easily used:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join() method is implemented in C (taken from trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything. code is a small part of the whole function.
In pseudocode:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
' Pseudo code Assume zero based
ResultString = InputArray[0]
n = 1
while n (is less than) Number_Of_Strings
ResultString (concatenate) ", "
ResultString (concatenate) InputArray[n]
n = n + 1
loop
In Perl, I just use the join command:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
In languages that don't have a built in, like C, I use simple iteration (untested):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
You either have to special case the first entry or the last.
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
Collection >> asStringWith:
so, using that, you'd write:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
join() function in Ruby:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
join() in Perl:
use List::Util qw(reduce);
sub mjoin($#) {$sep = shift; reduce {$a.$sep.$b} #_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce:
sub mjoin($#)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (#_);
$sum or ''
}
Perl 6
sub join( $separator, #strings ){
my $return = shift #strings;
for #strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
Use the String.join method in C#
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
In Java 5, with unit test:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
#Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}
Related
For example, string "AAABBB" will have permutations:
"ABAABB",
"BBAABA",
"ABABAB",
etc
What's a good algorithm for generating the permutations? (And what's its time complexity?)
For a multiset, you can solve recursively by position (JavaScript code):
function f(multiset,counters,result){
if (counters.every(x => x === 0)){
console.log(result);
return;
}
for (var i=0; i<counters.length; i++){
if (counters[i] > 0){
_counters = counters.slice();
_counters[i]--;
f(multiset,_counters,result + multiset[i]);
}
}
}
f(['A','B'],[3,3],'');
This is not full answer, just an idea.
If your strings has fixed number of only two letters I'll go with binary tree and good recursion function.
Each node is object that contains name with prefix of parent name and suffix A or B furthermore it have numbers of A and B letters in the name.
Node constructor gets name of parent and number of A and B from parent so it needs only to add 1 to number of A or B and one letter to name.
It doesn't construct next node if there is more than three A (in case of A node) or B respectively, or their sum is equal to the length of starting string.
Now you can collect leafs of 2 trees (their names) and have all permutations that you need.
Scala or some functional language (with object-like features) would be perfect for implementing this algorithm. Hope this helps or just sparks some ideas.
Since you actually want to generate the permutations instead of just counting them, the best complexity you can hope for is O(size_of_output).
Here's a good solution in java that meets that bound and runs very quickly, while consuming negligible space. It first sorts the letters to find the lexographically smallest permutation, and then generates all permutations in lexographic order.
It's known as the Pandita algorithm: https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographic_order
import java.util.Arrays;
import java.util.function.Consumer;
public class UniquePermutations
{
static void generateUniquePermutations(String s, Consumer<String> consumer)
{
char[] array = s.toCharArray();
Arrays.sort(array);
for (;;)
{
consumer.accept(String.valueOf(array));
int changePos=array.length-2;
while (changePos>=0 && array[changePos]>=array[changePos+1])
--changePos;
if (changePos<0)
break; //all done
int swapPos=changePos+1;
while(swapPos+1 < array.length && array[swapPos+1]>array[changePos])
++swapPos;
char t = array[changePos];
array[changePos] = array[swapPos];
array[swapPos] = t;
for (int i=changePos+1, j = array.length-1; i < j; ++i,--j)
{
t = array[i];
array[i] = array[j];
array[j] = t;
}
}
}
public static void main (String[] args) throws java.lang.Exception
{
StringBuilder line = new StringBuilder();
generateUniquePermutations("banana", s->{
if (line.length() > 0)
{
if (line.length() + s.length() >= 75)
{
System.out.println(line.toString());
line.setLength(0);
}
else
line.append(" ");
}
line.append(s);
});
System.out.println(line);
}
}
Here is the output:
aaabnn aaanbn aaannb aabann aabnan aabnna aanabn aananb aanban aanbna
aannab aannba abaann abanan abanna abnaan abnana abnnaa anaabn anaanb
anaban anabna ananab ananba anbaan anbana anbnaa annaab annaba annbaa
baaann baanan baanna banaan banana bannaa bnaaan bnaana bnanaa bnnaaa
naaabn naaanb naaban naabna naanab naanba nabaan nabana nabnaa nanaab
nanaba nanbaa nbaaan nbaana nbanaa nbnaaa nnaaab nnaaba nnabaa nnbaaa
I'm working on "Move Zeroes" of leetcode with scala. https://leetcode.com/problems/move-zeroes/description/
Given an array nums, write a function to move all 0's to the end of it while maintaining the relative order of the non-zero elements. You must do this in-place without making a copy of the array.
I have a solution which works well in IntelliJ but get the same Array with input while executing in Leetcode, also I'm not sure whether it is done in-place... Something wrong with my code ?
Thanks
def moveZeroes(nums: Array[Int]): Array[Int] = {
val lengthOrig = nums.length
val lengthFilfter = nums.filter(_ != 0).length
var numsWithoutZero = nums.filter(_ != 0)
var numZero = lengthOrig - lengthFilfter
while (numZero > 0){
numsWithoutZero = numsWithoutZero :+ 0
numZero = numZero - 1
}
numsWithoutZero
}
And one more thing: the template code given by leetcode returns Unit type but mine returns Array.
def moveZeroes(nums: Array[Int]): Unit = {
}
While I agree with #ayush, Leetcode is explicitly asking you to use mutable states. You need to update the input array so that it contains the changes. Also, they ask you to do that in a minimal number of operations.
So, while it is not idiomatic Scala code, I suggest you a solution allong these lines:
def moveZeroes(nums: Array[Int]): Unit = {
var i = 0
var lastNonZeroFoundAt = 0
while (i < nums.size) {
if(nums(i) != 0) {
nums(lastNonZeroFoundAt) = nums(i)
lastNonZeroFoundAt += 1
}
i += 1
i = lastNonZeroFoundAt
while(i < nums.size) {
nums(i) = 0
i += 1
}
}
As this is non-idomatic Scala, writing such code is not encouraged and thus, a little bit difficult to read. The C++ version that is shown in the solutions may actually be easier to read and help you to understand my code above:
void moveZeroes(vector<int>& nums) {
int lastNonZeroFoundAt = 0;
// If the current element is not 0, then we need to
// append it just in front of last non 0 element we found.
for (int i = 0; i < nums.size(); i++) {
if (nums[i] != 0) {
nums[lastNonZeroFoundAt++] = nums[i];
}
}
// After we have finished processing new elements,
// all the non-zero elements are already at beginning of array.
// We just need to fill remaining array with 0's.
for (int i = lastNonZeroFoundAt; i < nums.size(); i++) {
nums[i] = 0;
}
}
Your answer gives TLE (Time Limit Exceeded) error in LeetCode..I do not know what the criteria is for that to occur..However i see a lot of things in your code that are not perfect .
Pure functional programming discourages use of any mutable state and rather focuses on using val for everything.
I would try it this way --
def moveZeroes(nums: Array[Int]): Array[Int] = {
val nonZero = nums.filter(_ != 0)
val numZero = nums.length - nonZero.length
val zeros = Array.fill(numZero){0}
nonZero ++ zeros
}
P.S - This also gives TLE in Leetcode but still i guess in terms of being functional its better..Open for reviews though.
I have a NDIS 6 filter driver. And I need to compare two NDIS_STRING in a case-insensitive manner at IRQL = DISPATCH_LEVEL. I know the RtlEqualUnicodeString function can compare strings in a case-insensitive manner. But it's only callable at PASSIVE_LEVEL.
So I have to write my own function using the basic memory copy way. And I found my function is not well functioned, because some of my users complain that this function returns FALSE when it's supposed to return TRUE. So there should be some bugs in my code. But I didn't find it by myself.
BOOLEAN
NPF_EqualAdapterName(
PNDIS_STRING s1,
PNDIS_STRING s2
)
{
int i;
BOOLEAN bResult;
TRACE_ENTER();
if (s1->Length != s2->Length)
{
IF_LOUD(DbgPrint("NPF_EqualAdapterName: length not the same, s1->Length = %d, s2->Length = %d\n", s1->Length, s2->Length);)
TRACE_EXIT();
return FALSE;
}
for (i = 0; i < s2->Length / 2; i ++)
{
if (s1->Buffer[i] >= L'A' && s1->Buffer[i] <= L'Z')
{
s1->Buffer[i] += (L'a' - L'A');
}
if (s2->Buffer[i] >= L'A' && s2->Buffer[i] <= L'Z')
{
s2->Buffer[i] += (L'a' - L'A');
}
}
bResult = RtlEqualMemory(s1->Buffer, s2->Buffer, s2->Length);
IF_LOUD(DbgPrint("NPF_EqualAdapterName: bResult = %d, s1 = %ws, s2 = %ws\n", i, bResult, s1->Buffer, s2->Buffer);)
TRACE_EXIT();
return bResult;
}
The entire code is here: https://github.com/nmap/npcap/blob/master/packetWin7/npf/npf/Openclos.c, if you want to know it.
So my question is simply, is there any bugs in the above code? Thanks!
UPDATE:
The adapter names (e.g., s1 and s2) are some GUIDs like {1CC605D7-B674-440B-9D58-3F68E0529B54}. They can be upper-case or lower-case. So they are definitely English.
An idea of using an index or key would be storing the names in GUID structure instead of strings. I noticed that Windows has provided RtlStringFromGUID and RtlGUIDFromString functions. However, these two functions also only work at IRQL = PASSIVE_LEVEL.
And much of my code just runs under DISPATCH_LEVEL. So I'm afraid storing in GUID is not a good idea.
I was recently dealing with a hash that I wanted to print in a nice manner.
To simplify, it is just n array with two fields a['name']="me", a['age']=77 and the data I want to print like key1:value1,key2:value2,... and end with a new line. That is:
name=me,age=77
Since it is not an array whose indices are autoincremented values, I do not know how to loop through them and know when I am processing the last one.
This is important because it allows to use a different separator on the case I am in the last one. Like this, a different character can be printed in this case (new line) instead of the one that is printed after the rest of the files (comma).
I ended up using a counter to compare to the length of the array:
awk 'BEGIN {a["name"]="me"; a["age"]=77;
n = length(a);
for (i in a) {
count++;
printf "%s=%s%s", i, a[i], (count<n?",":ORS)
}
}'
This works well. However, is there any other better way to handle this? I don't like the fact of adding an extra count++.
In general when you know the end point of the loop you put the OFS or ORS after each field:
for (i=1; i<=n; i++) {
printf "%s%s", $i, (i<n?OFS:ORS)
}
but if you don't then you put the OFS before the second and subsequent fields and print the ORS after the loop:
for (idx in array) {
printf "%s%s", (++i>1?OFS:""), array[idx]
}
print ""
I do like the:
n = length(array)
for (idx in array) {
printf "%s%s", array[idx], (++i<n?OFS:ORS)
}
idea to get the end of the loop too, but length(array) is gawk-specific and the resulting code isn't any more concise or efficient than the 2nd loop above:
$ cat tst.awk
BEGIN {
OFS = ","
array["name"] = "me"
array["age"] = 77
for (idx in array) {
printf "%s%s=%s", (++i>1?OFS:""), array[idx], idx
}
print ""
}
vs
$ cat tst.awk
BEGIN {
OFS = ","
array["name"] = "me"
array["age"] = 77
n = length(array) # or non-gawk: for (idx in array) n++
for (idx in array) {
printf "%s=%s%s", array[idx], idx, (++i<n?OFS:ORS)
}
}
Relative Performance of Symbol#to_proc in Popular Ruby Implementations states that in MRI Ruby 1.8.7, Symbol#to_proc is slower than the alternative in their benchmark by 30% to 130%, but that this isn't the case in YARV Ruby 1.9.2.
Why is this the case? The creators of 1.8.7 didn't write Symbol#to_proc in pure Ruby.
Also, are there any gems that provide faster Symbol#to_proc performance for 1.8?
(Symbol#to_proc is starting to appear when I use ruby-prof, so I don't think I'm guilty of premature optimization)
The to_proc implementation in 1.8.7 looks like this (see object.c):
static VALUE
sym_to_proc(VALUE sym)
{
return rb_proc_new(sym_call, (VALUE)SYM2ID(sym));
}
Whereas the 1.9.2 implementation (see string.c) looks like this:
static VALUE
sym_to_proc(VALUE sym)
{
static VALUE sym_proc_cache = Qfalse;
enum {SYM_PROC_CACHE_SIZE = 67};
VALUE proc;
long id, index;
VALUE *aryp;
if (!sym_proc_cache) {
sym_proc_cache = rb_ary_tmp_new(SYM_PROC_CACHE_SIZE * 2);
rb_gc_register_mark_object(sym_proc_cache);
rb_ary_store(sym_proc_cache, SYM_PROC_CACHE_SIZE*2 - 1, Qnil);
}
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
}
If you strip away all the busy work of initializing sym_proc_cache, then you're left with (more or less) this:
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
So the real difference is the 1.9.2's to_proc caches the generated Procs while 1.8.7 generates a brand new one every single time you call to_proc. The performance difference between these two will be magnified by any benchmarking you do unless each iteration is done in a separate process; however, one iteration per-process would mask what you're trying to benchmark with the start-up cost.
The guts of rb_proc_new look pretty much the same (see eval.c for 1.8.7 or proc.c for 1.9.2) but 1.9.2 might benefit slightly from any performance improvements in rb_iterate. The caching is probably the big performance difference.
It is worth noting that the symbol-to-hash cache is a fixed size (67 entries but I'm not sure where 67 comes from, probably related to the number of operators and such that are commonly used for symbol-to-proc conversions):
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
/* ... */
if (aryp[index] == sym) {
If you use more than 67 symbols as procs or if your symbol IDs overlap (mod 67) then you won't get the full benefit of the caching.
The Rails and 1.9 programming style involves a lot of shorthands like:
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
rather than the longer explicit block forms:
ints = strings.collect { |s| s.to_i }
sum = ints.inject(0) { |s,i| s += i }
Given that (popular) programming style, it makes sense to trade memory for speed by caching the lookup.
You're not likely to get a faster implementation from a gem as the gem would have to replace a chunk of the core Ruby functionality. You could patch the 1.9.2 caching into your 1.8.7 source though.
The following ordinary Ruby code:
if defined?(RUBY_ENGINE).nil? # No RUBY_ENGINE means it's MRI 1.8.7
class Symbol
alias_method :old_to_proc, :to_proc
# Class variables are considered harmful, but I don't think
# anyone will subclass Symbol
##proc_cache = {}
def to_proc
##proc_cache[self] ||= old_to_proc
end
end
end
Will make Ruby MRI 1.8.7 Symbol#to_proc slightly less slow than before, but not as fast as an ordinary block or a pre-existing proc.
However, it'll make YARV, Rubinius and JRuby slower, hence the if around the monkeypatch.
The slowness of using Symbol#to_proc isn't solely due to MRI 1.8.7 creating a proc each time - even if you re-use an existing one, it's still slower than using a block.
Using Ruby 1.8 head
Size Block Pre-existing proc New Symbol#to_proc Old Symbol#to_proc
0 0.36 0.39 0.62 1.49
1 0.50 0.60 0.87 1.73
10 1.65 2.47 2.76 3.52
100 13.28 21.12 21.53 22.29
For the full benchmark and code, see https://gist.github.com/1053502
In addition to not caching procs, 1.8.7 also creates (approximately) one array each time a proc is called. I suspect it's because the generated proc creates an array to accept the arguments - this happens even with an empty proc that takes no arguments.
Here's a script to demonstrate the 1.8.7 behavior. Only the :diff value is significant here, which shows the increase in array count.
# this should really be called count_arrays
def count_objects(&block)
GC.disable
ct1 = ct2 = 0
ObjectSpace.each_object(Array) { ct1 += 1 }
yield
ObjectSpace.each_object(Array) { ct2 += 1 }
{:count1 => ct1, :count2 => ct2, :diff => ct2-ct1}
ensure
GC.enable
end
to_i = :to_i.to_proc
range = 1..1000
puts "map(&to_i)"
p count_objects {
range.map(&to_i)
}
puts "map {|e| to_i[e] }"
p count_objects {
range.map {|e| to_i[e] }
}
puts "map {|e| e.to_i }"
p count_objects {
range.map {|e| e.to_i }
}
Sample output:
map(&to_i)
{:count1=>6, :count2=>1007, :diff=>1001}
map {|e| to_i[e] }
{:count1=>1008, :count2=>2009, :diff=>1001}
map {|e| e.to_i }
{:count1=>2009, :count2=>2010, :diff=>1}
It seems that merely calling a proc will create the array for every iteration, but a literal block only seems to create an array once.
But multi-arg blocks may still suffer from the problem:
plus = :+.to_proc
puts "inject(&plus)"
p count_objects {
range.inject(&plus)
}
puts "inject{|sum, e| plus.call(sum, e) }"
p count_objects {
range.inject{|sum, e| plus.call(sum, e) }
}
puts "inject{|sum, e| sum + e }"
p count_objects {
range.inject{|sum, e| sum + e }
}
Sample output. Note how we incur a double penalty in case #2, because we use a multi-arg block, and also call the proc.
inject(&plus)
{:count1=>2010, :count2=>3009, :diff=>999}
inject{|sum, e| plus.call(sum, e) }
{:count1=>3009, :count2=>5007, :diff=>1998}
inject{|sum, e| sum + e }
{:count1=>5007, :count2=>6006, :diff=>999}