Why is Symbol#to_proc slower in Ruby 1.8.7? - c

Relative Performance of Symbol#to_proc in Popular Ruby Implementations states that in MRI Ruby 1.8.7, Symbol#to_proc is slower than the alternative in their benchmark by 30% to 130%, but that this isn't the case in YARV Ruby 1.9.2.
Why is this the case? The creators of 1.8.7 didn't write Symbol#to_proc in pure Ruby.
Also, are there any gems that provide faster Symbol#to_proc performance for 1.8?
(Symbol#to_proc is starting to appear when I use ruby-prof, so I don't think I'm guilty of premature optimization)

The to_proc implementation in 1.8.7 looks like this (see object.c):
static VALUE
sym_to_proc(VALUE sym)
{
return rb_proc_new(sym_call, (VALUE)SYM2ID(sym));
}
Whereas the 1.9.2 implementation (see string.c) looks like this:
static VALUE
sym_to_proc(VALUE sym)
{
static VALUE sym_proc_cache = Qfalse;
enum {SYM_PROC_CACHE_SIZE = 67};
VALUE proc;
long id, index;
VALUE *aryp;
if (!sym_proc_cache) {
sym_proc_cache = rb_ary_tmp_new(SYM_PROC_CACHE_SIZE * 2);
rb_gc_register_mark_object(sym_proc_cache);
rb_ary_store(sym_proc_cache, SYM_PROC_CACHE_SIZE*2 - 1, Qnil);
}
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
}
If you strip away all the busy work of initializing sym_proc_cache, then you're left with (more or less) this:
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
So the real difference is the 1.9.2's to_proc caches the generated Procs while 1.8.7 generates a brand new one every single time you call to_proc. The performance difference between these two will be magnified by any benchmarking you do unless each iteration is done in a separate process; however, one iteration per-process would mask what you're trying to benchmark with the start-up cost.
The guts of rb_proc_new look pretty much the same (see eval.c for 1.8.7 or proc.c for 1.9.2) but 1.9.2 might benefit slightly from any performance improvements in rb_iterate. The caching is probably the big performance difference.
It is worth noting that the symbol-to-hash cache is a fixed size (67 entries but I'm not sure where 67 comes from, probably related to the number of operators and such that are commonly used for symbol-to-proc conversions):
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
/* ... */
if (aryp[index] == sym) {
If you use more than 67 symbols as procs or if your symbol IDs overlap (mod 67) then you won't get the full benefit of the caching.
The Rails and 1.9 programming style involves a lot of shorthands like:
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
rather than the longer explicit block forms:
ints = strings.collect { |s| s.to_i }
sum = ints.inject(0) { |s,i| s += i }
Given that (popular) programming style, it makes sense to trade memory for speed by caching the lookup.
You're not likely to get a faster implementation from a gem as the gem would have to replace a chunk of the core Ruby functionality. You could patch the 1.9.2 caching into your 1.8.7 source though.

The following ordinary Ruby code:
if defined?(RUBY_ENGINE).nil? # No RUBY_ENGINE means it's MRI 1.8.7
class Symbol
alias_method :old_to_proc, :to_proc
# Class variables are considered harmful, but I don't think
# anyone will subclass Symbol
##proc_cache = {}
def to_proc
##proc_cache[self] ||= old_to_proc
end
end
end
Will make Ruby MRI 1.8.7 Symbol#to_proc slightly less slow than before, but not as fast as an ordinary block or a pre-existing proc.
However, it'll make YARV, Rubinius and JRuby slower, hence the if around the monkeypatch.
The slowness of using Symbol#to_proc isn't solely due to MRI 1.8.7 creating a proc each time - even if you re-use an existing one, it's still slower than using a block.
Using Ruby 1.8 head
Size Block Pre-existing proc New Symbol#to_proc Old Symbol#to_proc
0 0.36 0.39 0.62 1.49
1 0.50 0.60 0.87 1.73
10 1.65 2.47 2.76 3.52
100 13.28 21.12 21.53 22.29
For the full benchmark and code, see https://gist.github.com/1053502

In addition to not caching procs, 1.8.7 also creates (approximately) one array each time a proc is called. I suspect it's because the generated proc creates an array to accept the arguments - this happens even with an empty proc that takes no arguments.
Here's a script to demonstrate the 1.8.7 behavior. Only the :diff value is significant here, which shows the increase in array count.
# this should really be called count_arrays
def count_objects(&block)
GC.disable
ct1 = ct2 = 0
ObjectSpace.each_object(Array) { ct1 += 1 }
yield
ObjectSpace.each_object(Array) { ct2 += 1 }
{:count1 => ct1, :count2 => ct2, :diff => ct2-ct1}
ensure
GC.enable
end
to_i = :to_i.to_proc
range = 1..1000
puts "map(&to_i)"
p count_objects {
range.map(&to_i)
}
puts "map {|e| to_i[e] }"
p count_objects {
range.map {|e| to_i[e] }
}
puts "map {|e| e.to_i }"
p count_objects {
range.map {|e| e.to_i }
}
Sample output:
map(&to_i)
{:count1=>6, :count2=>1007, :diff=>1001}
map {|e| to_i[e] }
{:count1=>1008, :count2=>2009, :diff=>1001}
map {|e| e.to_i }
{:count1=>2009, :count2=>2010, :diff=>1}
It seems that merely calling a proc will create the array for every iteration, but a literal block only seems to create an array once.
But multi-arg blocks may still suffer from the problem:
plus = :+.to_proc
puts "inject(&plus)"
p count_objects {
range.inject(&plus)
}
puts "inject{|sum, e| plus.call(sum, e) }"
p count_objects {
range.inject{|sum, e| plus.call(sum, e) }
}
puts "inject{|sum, e| sum + e }"
p count_objects {
range.inject{|sum, e| sum + e }
}
Sample output. Note how we incur a double penalty in case #2, because we use a multi-arg block, and also call the proc.
inject(&plus)
{:count1=>2010, :count2=>3009, :diff=>999}
inject{|sum, e| plus.call(sum, e) }
{:count1=>3009, :count2=>5007, :diff=>1998}
inject{|sum, e| sum + e }
{:count1=>5007, :count2=>6006, :diff=>999}

Related

Integer vs Boolean array Swift Performance

I tried executing Sieve Of Eratosthenes algorithm using a large Integer array and a large Bool array.
The integer version seems to execute MUCH faster than the boolean one. What is the possible reason for this?
import Foundation
var n : Int = 100000000;
var prime = [Bool](repeating: true, count: n+1)
var p = 2
let start = DispatchTime.now()
while((p*p)<=n)
{
if(prime[p] == true)
{
var i = p*2
while (i<=n)
{
prime[i] = false
i = i + p
}
}
p = p+1
}
let stop = DispatchTime.now()
let time = (Double)(stop.uptimeNanoseconds - start.uptimeNanoseconds) / 1000000.0
print("Time = \(time) ms")
Boolean array execution time : 78223.342295 ms
import Foundation
var n : Int = 100000000;
var prime = [Int](repeating: 1, count: n+1)
var p = 2
let start = DispatchTime.now()
while((p*p)<=n)
{
if(prime[p] == 1)
{
var i = p*2
while (i<=n)
{
prime[i] = 0
i = i + p
}
}
p = p+1
}
let stop = DispatchTime.now()
let time = (Double)(stop.uptimeNanoseconds - start.uptimeNanoseconds) / 1000000.0
print("Time = \(time) ms")
Integer array execution time : 8535.54546 ms
TL, DR:
Do not attempt to optimize your code in a Debug build. Always run it through the Profiler. Int was faster then Bool in Debug but the oposite was true when run through the Profiler.
Heap allocation is expensive. Use your memory judiciously. (This question discusses the complications in C, but also applicable to Swift)
Long answer
First, let's refactor your code for easier execution:
func useBoolArray(n: Int) {
var prime = [Bool](repeating: true, count: n+1)
var p = 2
while((p*p)<=n)
{
if(prime[p] == true)
{
var i = p*2
while (i<=n)
{
prime[i] = false
i = i + p
}
}
p = p+1
}
}
func useIntArray(n: Int) {
var prime = [Int](repeating: 1, count: n+1)
var p = 2
while((p*p)<=n)
{
if(prime[p] == 1)
{
var i = p*2
while (i<=n)
{
prime[i] = 0
i = i + p
}
}
p = p+1
}
}
Now, run it in the Debug build:
let count = 100_000_000
let start = DispatchTime.now()
useBoolArray(n: count)
let boolStop = DispatchTime.now()
useIntArray(n: count)
let intStop = DispatchTime.now()
print("Bool array:", Double(boolStop.uptimeNanoseconds - start.uptimeNanoseconds) / Double(NSEC_PER_SEC))
print("Int array:", Double(intStop.uptimeNanoseconds - boolStop.uptimeNanoseconds) / Double(NSEC_PER_SEC))
// Bool array: 70.097249517
// Int array: 8.439799614
So Bool is a lot slower than Int right? Let's run it through the Profiler by pressing Cmd + I and choose the Time Profile template. (Somehow the Profiler wasn't able to separate these functions, probably because they were inlined so I had to run only 1 function per attempt):
let count = 100_000_000
useBoolArray(n: count)
// useIntArray(n: count)
// Bool: 1.15ms
// Int: 2.36ms
Not only they are an order of magnitude faster than Debug but the results are reversed to: Bool is now faster than Int!!! The Profiler doesn't tell us why how so we must go on a witch hunt. Let's check the memory allocation by adding an Allocation instrument:
Ha! Now the differences are laid bare. The Bool array uses only one-eight as much memory as Int array. Swift array uses the same internals as NSArray so it's allocated on the heap and heap allocation is slow.
When you think even more about it: a Bool value only take up 1 bit, an Int takes 64 bits on a 64-bit machine. Swift may have chosen to represent a Bool with a single byte, while an Int takes 8 bytes, hence the memory ratio. In Debug, this difference may have caused all the difference as the runtime must do all kinds of checks to ensure that it's actually dealing with a Bool value so the Bool array method takes significantly longer.
Moral of the lesson: don't optimize your code in Debug mode. It can be misleading!
(A partial answer ...)
As #MartinR mentions in his comments to the question, there is no such major difference between the two cases if you build for release mode (with optimizations); the Bool case is slightly faster due its smaller memory footprint (but equally fast as e.g. UInt8 which has the same footprint).
Running instruments to profile the (non-optimized) debug build, we clearly see that the array element access & assignment is the culprit for the Bool case (an as far as my brief testing has seen; for all types except the integer ones, Int, UInt16, and so on).
We can further ascertain that its not the writing part in particular that yields the overhead, but rather the repeated accessing of the i:th element.
The same explicit read-access tests for an array of integer elements show no such large overhead.
It would almost seem as if the random element access is, for some reason, not working as it should (for non-integer types) when compiling with debug build config.

Math::Complex screwing up my array references

I'm trying to optimize some code here, and wrote two different simple subroutines that will subtract one vector from another. I pass a pair of vectors to these subroutines and the subtraction is then performed. The first subroutine uses an intermediary variable to store the result whereas the second one does an inline operation using the '-=' operator. The full code is located at the bottom of this question.
When I use purely real numbers, the program works fine and there are no issues. However, if I am using complex operands, then the original vectors (the ones originally passed to the subroutines) are modified! Why does this program work fine for purely real numbers but do this sort of data modification when using complex numbers?
Note my process:
Generate random vectors (either real or complex depending on the commented out code)
Print the main vectors to the screen
Perform the first subroutine subtraction (using the third variable intermediary within the subroutine)
Print the main vectors to the screen again to prove that they have not changed, no matter the use of real or complex vectors
Perform the second subroutine subtraction (using the inline computation method)
Print the main vectors to the screen again, showing that #main_v1 has changed when using complex vectors, but will not change when using real vectors (#main_v2 is unaffected)
Print the final answers to the subtraction, which are always the correct answers, regardless of real or complex vectors
The issue arises because in the case of the second subroutine (which is quite a bit faster), I don't want the #main_v1 vector changed. I need that vector to do further calculations down the road, so I need it to stay the same.
Any idea on how to fix this, or what I'm doing wrong? My entire code is below, and should be functional. I've been using the CLI syntax shown below to run the program. I choose 5 just to keep everything easy for me to read.
c:\> bench.pl 5 REAL
or
c:\> bench.pl 5 IMAG
#!/usr/local/bin/perl
# when debugging: add -w option above
#
use strict;
use warnings;
use Benchmark qw (:all);
use Math::Complex;
use Math::Trig;
use Time::HiRes qw (gettimeofday);
system('cls');
my $dimension = $ARGV[0];
my $type = $ARGV[1];
if(!$dimension || !$type){
print "bench.pl <n> <REAL | IMAG>\n";
print " <n> indicates the dimension of the vector to generate\n";
print " <REAL | IMAG> dictates to use real or complex vectors\n";
exit(0);
}
my #main_v1;
my #main_v2;
my #vector_sum1;
my #vector_sum2;
for($a=1;$a<=$dimension;$a++){
my $r1 = sprintf("%.0f", 9*rand)+1;
my $r2 = sprintf("%.0f", 9*rand)+1;
my $i1 = sprintf("%.0f", 9*rand)+1;
my $i2 = sprintf("%.0f", 9*rand)+1;
if(uc($type) eq "IMAG"){
# Using complex vectors has the issue
$main_v1[$a] = cplx($r1,$i1);
$main_v2[$a] = cplx($r2,$i2);
}elsif(uc($type) eq "REAL"){
# Using real vectors shows no issue
$main_v1[$a] = $r1;
$main_v2[$a] = $r2;
}else {
print "bench.pl <n> <REAL | IMAG>\n";
print " <n> indicates the dimension of the vector to generate\n";
print " <REAL | IMAG> dictates to use real or complex vectors\n";
exit(0);
}
}
# cmpthese(-5, {
# v1 => sub {#vector_sum1 = vector_subtract(\#main_v1, \#main_v2)},
# v2 => sub {#vector_sum2 = vector_subtract_v2(\#main_v1, \#main_v2)},
# });
# print "\n";
print "main vectors as defined initially\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
#vector_sum1 = vector_subtract(\#main_v1, \#main_v2);
print "main vectors after the subtraction using 3rd variable\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
#vector_sum2 = vector_subtract_v2(\#main_v1, \#main_v2);
print "main vectors after the inline subtraction\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
print "subtracted vectors from both subroutines\n";
print_vector_matlab(#vector_sum1);
print_vector_matlab(#vector_sum2);
sub vector_subtract {
# subroutine to subtract one [n x 1] vector from another
# result = vector1 - vector2
#
my #vector1 = #{$_[0]};
my #vector2 = #{$_[1]};
my #result;
my $row = 0;
my $dim1 = #vector1 - 1;
my $dim2 = #vector2 - 1;
if($dim1 != $dim2){
syswrite STDOUT, "ERROR: attempting to subtract vectors of mismatched dimensions\n";
exit;
}
for($row=1;$row<=$dim1;$row++){$result[$row] = $vector1[$row] - $vector2[$row]}
return(#result);
}
sub vector_subtract_v2 {
# subroutine to subtract one [n x 1] vector from another
# implements the inline subtraction method for alleged speedup
# result = vector1 - vector2
#
my #vector1 = #{$_[0]};
my #vector2 = #{$_[1]};
my $row = 0;
my $dim1 = #vector1 - 1;
my $dim2 = #vector2 - 1;
if($dim1 != $dim2){
syswrite STDOUT, "ERROR: attempting to subtract vectors of mismatched dimensions\n";
exit;
}
for($row=1;$row<=$dim1;$row++){$vector1[$row] -= $vector2[$row]} # subtract inline
return(#vector1);
}
sub print_vector_matlab { # for use with outputting square matrices only
my (#junk) = (#_);
my $dimension = #junk - 1;
print "V=[";
for($b=1;$b<=$dimension;$b++){
# $temp_real = sprintf("%.3f", Re($junk[$b][$c]));
# $temp_imag = sprintf("%.3f", Im($junk[$b][$c]));
# $temp_cplx = cplx($temp_real,$temp_imag);
print "$junk[$b];";
# print "$temp_cplx,";
}
print "];\n";
}
I've even tried modifying the second subroutine so that it has the following lines, and it STILL alters the #main_v1 vector when using complex numbers...I am completely confused as to what's going on.
#result = #vector1;
for($row=1;$row<=$dim1;$row++){$result[$row] -= $vector2[$row]}
return(#result);
and I've tried this too...still modifies #main_V1 with complex numbers
for($row-1;$row<=$dim1;$row++){$result[$row] = $vector1[$row]}
for($row=1;$row<=$dim1;$row++){$result[$row] -= $vector2[$row]}
return(#result);
Upgrade Math::Complex to at least version 1.57. As the changelog explains, one of the changes in that version was:
Add copy constructor and arrange for it to be called appropriately, problem found by David Madore and Alexandr Ciornii.
In Perl, an object is a blessed reference; so an array of Math::Complexes is an array of references. This is not true of real numbers, which are just ordinary scalars.
If you change this:
$vector1[$row] -= $vector2[$row]
to this:
$vector1[$row] = $vector1[$row] - $vector2[$row]
you'll be good to go: that will set $vector1[$row] to refer to a new object, rather than modifying the existing one.

How would I return a value from a function which iterates over a for loop in F#

I am trying loop over an array and return a value as shown below. But this gives me an error on the line after the if statement. It says "This expression was expected to have type unit but has type int"
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
for i = inputBits.Length - 1 to 0 do
if inputBits.[i] then
i
done
How would I do this? I am in the middle of recoding this with a recursive loop, as it seems to be the more accepted way of doing such loops in functional languages, but I still want to know what I was doing wrong above.
for loops are not supposed to return values, they only do an operation a fixed number of times then return () (unit). If you want to iterate and finally return something, you may :
have outside the loop a reference where you put the final result when you get it, then after the loop return the reference content
use a recursive function directly
use a higher-order function that will encapsulate the traversal for you, and let you concentrate on the application logic
The higher-function is nice if your data structure supports it. Simple traversal functions such as fold_left, however, don't support stopping the iteration prematurely. If you wish to support this (and clearly it would be interesting in your use case), you must use a traversal with premature exit support. For easy functions such as yours, a simple recursive function is probably the simplest.
In F# it should also be possible to write your function in imperative style, using yield to turn it into a generator, then finally forcing the generator to get the result. This could be seen as a counterpart of the OCaml technique of using an exception to jump out of the loop.
Edit: A nice solution to avoid the "premature stop" questions is to use a lazy intermediate data structure, which will only be built up to the first satisfying result. This is elegant and good scripting style, but still less efficient than direct exit support or simple recursion. I guess it depends on your needs; is this function to be used in a critical path?
Edit: following are some code sample. They're OCaml and the data structures are different (some of them use libraries from Batteries), but the ideas are the same.
(* using a reference as accumulator *)
let most_significant_bit input_bits =
let result = ref None in
for i = Array.length input_bits - 1 downto 0 do
if input_bits.(i) then
if !result = None then
result := Some i
done;
!result
let most_significant_bit input_bits =
let result = ref None in
for i = 0 to Array.length input_bits - 1 do
if input_bits.(i) then
(* only the last one will be kept *)
result := Some i
done;
!result
(* simple recursive version *)
let most_significant_bit input_bits =
let rec loop = function
| -1 -> None
| i ->
if input_bits.(i) then Some i
else loop (i - 1)
in
loop (Array.length input_bits - 1)
(* higher-order traversal *)
open Batteries_uni
let most_significant_bit input_bits =
Array.fold_lefti
(fun result i ->
if input_bits.(i) && result = None then Some i else result)
None input_bits
(* traversal using an intermediate lazy data structure
(a --- b) is the decreasing enumeration of integers in [b; a] *)
open Batteries_uni
let most_significant_bit input_bits =
(Array.length input_bits - 1) --- 0
|> Enum.Exceptionless.find (fun i -> input_bits.(i))
(* using an exception to break out of the loop; if I understand
correctly, exceptions are rather discouraged in F# for efficiency
reasons. I proposed to use `yield` instead and then force the
generator, but this has no direct OCaml equivalent. *)
exception Result of int
let most_significant_bit input_bits =
try
for i = Array.length input_bits - 1 downto 0 do
if input_bits.(i) then raise (Result i)
done;
None
with Result i -> Some i
Why using a loop when you can use high-order functions?
I would write:
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
Seq.cast<bool> inputBits |> Seq.tryFindIndex id
Seq module contains many functions for manipulating collections. It is often a good alternative to using imperative loops.
but I still want to know what I was
doing wrong above.
The body of a for loop is an expression of type unit. The only thing you can do from there is doing side-effects (modifying a mutable value, printing...).
In F#, a if then else is similar to ? : from C languages. The then and the else parts must have the same type, otherwise it doesn't make sense in a language with static typing. When the else is missing, the compiler assumes it is else (). Thus, the then must have type unit. Putting a value in a for loop doesn't mean return, because everything is a value in F# (including a if then).
+1 for gasche
Here are some examples in F#. I added one (the second) to show how yield works with for within a sequence expression, as gasche mentioned.
(* using a mutable variable as accumulator as per gasche's example *)
let findMostSignificantBitPosition (inputBits: BitArray) =
let mutable ret = None // 0
for i = inputBits.Length - 1 downto 0 do
if inputBits.[i] then ret <- i
ret
(* transforming to a Seq of integers with a for, then taking the first element *)
let findMostSignificantBitPosition2 (inputBits: BitArray) =
seq {
for i = 0 to inputBits.Length - 1 do
if inputBits.[i] then yield i
} |> Seq.head
(* casting to a sequence of bools then taking the index of the first "true" *)
let findMostSignificantBitPosition3 (inputBits: BitArray) =
inputBits|> Seq.cast<bool> |> Seq.findIndex(fun f -> f)
Edit: versions returning an Option
let findMostSignificantBitPosition (inputBits: BitArray) =
let mutable ret = None
for i = inputBits.Length - 1 downto 0 do
if inputBits.[i] then ret <- Some i
ret
let findMostSignificantBitPosition2 (inputBits: BitArray) =
seq {
for i = 0 to inputBits.Length - 1 do
if inputBits.[i] then yield Some(i)
else yield None
} |> Seq.tryPick id
let findMostSignificantBitPosition3 (inputBits: BitArray) =
inputBits|> Seq.cast<bool> |> Seq.tryFindIndex(fun f -> f)
I would recommend using a higher-order function (as mentioned by Laurent) or writing a recursive function explicitly (which is a general approach to replace loops in F#).
If you want to see some fancy F# solution (which is probably better version of using some temporary lazy data structure), then you can take a look at my article which defines imperative computation builder for F#. This allows you to write something like:
let findMostSignificantBitPosition (inputBits:BitArray) = imperative {
for b in Seq.cast<bool> inputBits do
if b then return true
return false }
There is some overhead (as with using other temporary lazy data structures), but it looks just like C# :-).
EDIT I also posted the samples on F# Snippets: http://fssnip.net/40
I think the reason your having issues with how to write this code is that you're not handling the failure case of not finding a set bit. Others have posted many ways of finding the bit. Here are a few ways of handling the failure case.
failure case by Option
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
None
elif inputBits.[i] then
Some i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
match findMostSignificantBitPosition test with
| Some i -> printf "Most Significant Bit: %i" i
| None -> printf "Most Significant Bit Not Found"
failure case by Exception
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
failwith "Most Significant Bit Not Found"
elif inputBits.[i] then
i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
try
let i = findMostSignificantBitPosition test
printf "Most Significant Bit: %i" i
with
| Failure msg -> printf "%s" msg
failure case by -1
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
let rec loop i =
if i = -1 then
i
elif inputBits.[i] then
i
else
loop (i - 1)
loop (inputBits.Length - 1)
let test = new BitArray(1)
let i = findMostSignificantBitPosition test
if i <> -1 then
printf "Most Significant Bit: %i" i
else
printf "Most Significant Bit Not Found"
One of the options is to use seq and findIndex method as:
let findMostSignificantBitPosition (inputBits:System.Collections.BitArray) =
seq {
for i = inputBits.Length - 1 to 0 do
yield inputBits.[i]
} |> Seq.findIndex(fun e -> e)

How to efficiently merge two hashes in Ruby C API?

I am writing a C extension for Ruby that really needs to merge two hashes, however the rb_hash_merge() function is STATIC in Ruby 1.8.6. I have tried instead to use:
rb_funcall(hash1, rb_intern("merge"), 1, hash2);
but this is much too slow, and performance is very critical in this application.
Does anyone know how to go about performing this merge with efficiency and speed in mind?
(Note I have tried simply looking at the source for rb_hash_merge() and replicating it but it is RIDDLED with other static functions, which are themselves riddled with yet more static functions so it seems almost impossible to disentangle...i need another way)
Ok, looks like might be not possible to optimize within the published API.
Test code:
#extconf.rb
require 'mkmf'
dir_config("hello")
create_makefile("hello")
// hello.c
#include "ruby.h"
static VALUE rb_mHello;
static VALUE rb_cMyCalc;
static void calc_mark(void *f) { }
static void calc_free(void *f) { }
static VALUE calc_alloc(VALUE klass) { return Data_Wrap_Struct(klass, calc_mark, calc_free, NULL); }
static VALUE calc_init(VALUE obj) { return Qnil; }
static VALUE calc_merge(VALUE obj, VALUE h1, VALUE h2) {
return rb_funcall(h1, rb_intern("merge"), 1, h2);
}
static VALUE
calc_merge2(VALUE obj, VALUE h1, VALUE h2)
{
VALUE h3 = rb_hash_new();
VALUE keys;
VALUE akey;
keys = rb_funcall(h1, rb_intern("keys"), 0);
while (akey = rb_each(keys)) {
rb_hash_aset(h3, akey, rb_hash_aref(h1, akey));
}
keys = rb_funcall(h2, rb_intern("keys"), 0);
while (akey = rb_each(keys)) {
rb_hash_aset(h3, akey, rb_hash_aref(h2, akey));
}
return h3;
}
static VALUE
calc_merge3(VALUE obj, VALUE h1, VALUE h2)
{
VALUE keys;
VALUE akey;
keys = rb_funcall(h1, rb_intern("keys"), 0);
while (akey = rb_each(keys)) {
rb_hash_aset(h2, akey, rb_hash_aref(h1, akey));
}
return h2;
}
void
Init_hello()
{
rb_mHello = rb_define_module("Hello");
rb_cMyCalc = rb_define_class_under(rb_mHello, "Calculator", rb_cObject);
rb_define_alloc_func(rb_cMyCalc, calc_alloc);
rb_define_method(rb_cMyCalc, "initialize", calc_init, 0);
rb_define_method(rb_cMyCalc, "merge", calc_merge, 2);
rb_define_method(rb_cMyCalc, "merge2", calc_merge, 2);
rb_define_method(rb_cMyCalc, "merge3", calc_merge, 2);
}
# test.rb
require "hello"
h1 = Hash.new()
h2 = Hash.new()
1.upto(100000) { |x| h1[x] = x+1; }
1.upto(100000) { |x| h2["#{x}-12"] = x+1; }
c = Hello::Calculator.new()
puts c.merge(h1, h2).keys.length if ARGV[0] == "1"
puts c.merge2(h1, h2).keys.length if ARGV[0] == "2"
puts c.merge3(h1, h2).keys.length if ARGV[0] == "3"
Now the test results:
$ time ruby test.rb
real 0m1.021s
user 0m0.940s
sys 0m0.080s
$ time ruby test.rb 1
200000
real 0m1.224s
user 0m1.148s
sys 0m0.076s
$ time ruby test.rb 2
200000
real 0m1.219s
user 0m1.132s
sys 0m0.084s
$ time ruby test.rb 3
200000
real 0m1.220s
user 0m1.128s
sys 0m0.092s
So it looks like we might shave off at maximum ~0.004s on a 0.2s operation.
Given that there's probably not that much besides setting the values, there might not be that much space for further optimizations. Maybe try to hack the ruby source itself - but at that point you no longer really develop "extension" but rather change the language, so it probably won't work.
If the join of hashes is something that you need to do many times in the C part - then probably using the internal data structures and only exporting them into Ruby hash in the final pass would be the only way of optimizing things.
p.s. The initial skeleton for the code borrowed from this excellent tutorial

Algorithm for joining e.g. an array of strings

I have wondered for some time, what a nice, clean solution for joining an array of strings might look like.
Example: I have ["Alpha", "Beta", "Gamma"] and want to join the strings into one, separated by commas – "Alpha, Beta, Gamma".
Now I know that most programming languages offer some kind of join method for this. I just wonder how these might be implemented.
When I took introductory courses, I often tried to go it alone, but never found a satisfactory algorithm. Everything seemed rather messy, the problem being that you can not just loop through the array, concatenating the strings, as you would add one too many commas (either before or after the last string).
I don’t want to check conditions in the loop. I don’t really want to add the first or the last string before/after the loop (I guess this is maybe the best way?).
Can someone show me an elegant solution? Or tell me exactly why there can’t be anything more elegant?
The most elegant solution i found for problems like this is something like this (in pseudocode)
separator = ""
foreach(item in stringCollection)
{
concatenatedString += separator + item
separator = ","
}
You just run the loop and only after the second time around the separator is set. So the first time it won't get added. It's not as clean as I'd like it to be so I'd still add comments but it's better than an if statement or adding the first or last item outside the loop.
All of these solutions are decent ones, but for an underlying library, both independence of separator and decent speed are important. Here is a function that fits the requirement assuming the language has some form of string builder.
public static string join(String[] strings, String sep) {
if(strings.length == 0) return "";
if(strings.length == 1) return strings[0];
StringBuilder sb = new StringBuilder();
sb.append(strings[0]);
for(int i = 1; i < strings.length; i++) {
sb.append(sep);
sb.append(strings[i]);
}
return sb.toString();
}
EDIT: I suppose I should mention why this would be speedier. The main reason would be because any time you call c = a + b; the underlying construct is usually c = (new StringBuilder()).append(a).append(b).toString();. By reusing the same string builder object, we can reduce the amount of allocations and garbage we produce.
And before someone chimes in with optimization is evil, we're talking about implementing a common library function. Acceptable, scalable performance is one of the requirements them. A join that takes a long time is one that's going to be not oft used.
Most languages nowadays - e.g. perl (mention by Jon Ericson), php, javascript - have a join() function or method, and this is by far the most elegant solution. Less code is better code.
In response to Mendelt Siebenga, if you do require a hand-rolled solution, I'd go with the ternary operator for something like:
separator = ","
foreach (item in stringCollection)
{
concatenatedString += concatenatedString ? separator + item : item
}
I usually go with something like...
list = ["Alpha", "Beta", "Gamma"];
output = "";
separator = "";
for (int i = 0; i < list.length ; i++) {
output = output + separator;
output = output + list[i];
separator = ", ";
}
This works because on the first pass, separator is empty (so you don't get a comma at the start, but on every subsequent pass, you add a comma before adding the next element.
You could certainly unroll this a little to make it a bit faster (assigning to the separator over and over isn't ideal), though I suspect that's something the compiler could do for you automatically.
In the end though, I suspect pretty this is what most language level join functions come down to. Nothing more than syntax sugar, but it sure is sweet.
For pure elegance, a typical recursive functional-language solution is quite nice. This isn't in an actual language syntax but you get the idea (it's also hardcoded to use comma separator):
join([]) = ""
join([x]) = "x"
join([x, rest]) = "x," + join(rest)
In reality you would write this in a more generic way, to reuse the same algorithm but abstract away the data type (doesn't have to be strings) and the operation (doesn't have to be concatenation with a comma in the middle). Then it usually gets called 'reduce', and many functional languages have this built in, e.g. multiplying all numbers in a list, in Lisp:
(reduce #'* '(1 2 3 4 5)) => 120
#Mendelt Siebenga
Strings are corner-stone objects in programming languages. Different languages implement strings differently. An implementation of join() strongly depends on underlying implementation of strings. Pseudocode doesn't reflect underlying implementation.
Consider join() in Python. It can be easily used:
print ", ".join(["Alpha", "Beta", "Gamma"])
# Alpha, Beta, Gamma
It could be easily implemented as follow:
def join(seq, sep=" "):
if not seq: return ""
elif len(seq) == 1: return seq[0]
return reduce(lambda x, y: x + sep + y, seq)
print join(["Alpha", "Beta", "Gamma"], ", ")
# Alpha, Beta, Gamma
And here how join() method is implemented in C (taken from trunk):
PyDoc_STRVAR(join__doc__,
"S.join(sequence) -> string\n\
\n\
Return a string which is the concatenation of the strings in the\n\
sequence. The separator between elements is S.");
static PyObject *
string_join(PyStringObject *self, PyObject *orig)
{
char *sep = PyString_AS_STRING(self);
const Py_ssize_t seplen = PyString_GET_SIZE(self);
PyObject *res = NULL;
char *p;
Py_ssize_t seqlen = 0;
size_t sz = 0;
Py_ssize_t i;
PyObject *seq, *item;
seq = PySequence_Fast(orig, "");
if (seq == NULL) {
return NULL;
}
seqlen = PySequence_Size(seq);
if (seqlen == 0) {
Py_DECREF(seq);
return PyString_FromString("");
}
if (seqlen == 1) {
item = PySequence_Fast_GET_ITEM(seq, 0);
if (PyString_CheckExact(item) || PyUnicode_CheckExact(item)) {
Py_INCREF(item);
Py_DECREF(seq);
return item;
}
}
/* There are at least two things to join, or else we have a subclass
* of the builtin types in the sequence.
* Do a pre-pass to figure out the total amount of space we'll
* need (sz), see whether any argument is absurd, and defer to
* the Unicode join if appropriate.
*/
for (i = 0; i < seqlen; i++) {
const size_t old_sz = sz;
item = PySequence_Fast_GET_ITEM(seq, i);
if (!PyString_Check(item)){
#ifdef Py_USING_UNICODE
if (PyUnicode_Check(item)) {
/* Defer to Unicode join.
* CAUTION: There's no gurantee that the
* original sequence can be iterated over
* again, so we must pass seq here.
*/
PyObject *result;
result = PyUnicode_Join((PyObject *)self, seq);
Py_DECREF(seq);
return result;
}
#endif
PyErr_Format(PyExc_TypeError,
"sequence item %zd: expected string,"
" %.80s found",
i, Py_TYPE(item)->tp_name);
Py_DECREF(seq);
return NULL;
}
sz += PyString_GET_SIZE(item);
if (i != 0)
sz += seplen;
if (sz < old_sz || sz > PY_SSIZE_T_MAX) {
PyErr_SetString(PyExc_OverflowError,
"join() result is too long for a Python string");
Py_DECREF(seq);
return NULL;
}
}
/* Allocate result space. */
res = PyString_FromStringAndSize((char*)NULL, sz);
if (res == NULL) {
Py_DECREF(seq);
return NULL;
}
/* Catenate everything. */
p = PyString_AS_STRING(res);
for (i = 0; i < seqlen; ++i) {
size_t n;
item = PySequence_Fast_GET_ITEM(seq, i);
n = PyString_GET_SIZE(item);
Py_MEMCPY(p, PyString_AS_STRING(item), n);
p += n;
if (i < seqlen - 1) {
Py_MEMCPY(p, sep, seplen);
p += seplen;
}
}
Py_DECREF(seq);
return res;
}
Note that the above Catenate everything. code is a small part of the whole function.
In pseudocode:
/* Catenate everything. */
for each item in sequence
copy-assign item
if not last item
copy-assign separator
' Pseudo code Assume zero based
ResultString = InputArray[0]
n = 1
while n (is less than) Number_Of_Strings
ResultString (concatenate) ", "
ResultString (concatenate) InputArray[n]
n = n + 1
loop
In Perl, I just use the join command:
$ echo "Alpha
Beta
Gamma" | perl -e 'print(join(", ", map {chomp; $_} <> ))'
Alpha, Beta, Gamma
(The map stuff is mostly there to create a list.)
In languages that don't have a built in, like C, I use simple iteration (untested):
for (i = 0; i < N-1; i++){
strcat(s, a[i]);
strcat(s, ", ");
}
strcat(s, a[N]);
Of course, you'd need to check the size of s before you add more bytes to it.
You either have to special case the first entry or the last.
collecting different language implementations ?
Here is, for your amusement, a Smalltalk version:
join:collectionOfStrings separatedBy:sep
|buffer|
buffer := WriteStream on:''.
collectionOfStrings
do:[:each | buffer nextPutAll:each ]
separatedBy:[ buffer nextPutAll:sep ].
^ buffer contents.
Of course, the above code is already in the standard library found as:
Collection >> asStringWith:
so, using that, you'd write:
#('A' 'B' 'C') asStringWith:','
But here's my main point:
I would like to put more emphasis on the fact that using a StringBuilder (or what is called "WriteStream" in Smalltalk) is highly recommended. Do not concatenate strings using "+" in a loop - the result will be many many intermediate throw-away strings. If you have a good Garbage Collector, thats fine. But some are not and a lot of memory needs to be reclaimed. StringBuilder (and WriteStream, which is its grand-grand-father) use a buffer-doubling or even adaptive growing algorithm, which needs MUCH less scratch memory.
However, if its only a few small strings you are concatenating, dont care, and "+" them; the extra work using a StringBuilder might be actually counter-productive, up to an implementation- and language-dependent number of strings.
The following is no longer language-agnostic (but that doesn't matter for the discussion because the implementation is easily portable to other languages). I tried to implement Luke's (theretically best) solution in an imperative programming language. Take your pick; mine's C#. Not very elegant at all. However, (without any testing whatsoever) I could imagine that its performance is quite decent because the recursion is in fact tail recursive.
My challenge: give a better recursive implementation (in an imperative language). You say what “better” means: less code, faster, I'm open for suggestions.
private static StringBuilder RecJoin(IEnumerator<string> xs, string sep, StringBuilder result) {
result.Append(xs.Current);
if (xs.MoveNext()) {
result.Append(sep);
return RecJoin(xs, sep, result);
} else
return result;
}
public static string Join(this IEnumerable<string> xs, string separator) {
var i = xs.GetEnumerator();
if (!i.MoveNext())
return string.Empty;
else
return RecJoin(i, separator, new StringBuilder()).ToString();
}
join() function in Ruby:
def join(seq, sep)
seq.inject { |total, item| total << sep << item } or ""
end
join(["a", "b", "c"], ", ")
# => "a, b, c"
join() in Perl:
use List::Util qw(reduce);
sub mjoin($#) {$sep = shift; reduce {$a.$sep.$b} #_ or ''}
say mjoin(', ', qw(Alpha Beta Gamma));
# Alpha, Beta, Gamma
Or without reduce:
sub mjoin($#)
{
my ($sep, $sum) = (shift, shift);
$sum .= $sep.$_ for (#_);
$sum or ''
}
Perl 6
sub join( $separator, #strings ){
my $return = shift #strings;
for #strings -> ( $string ){
$return ~= $separator ~ $string;
}
return $return;
}
Yes I know it is pointless because Perl 6 already has a join function.
I wrote a recursive version of the solution in lisp. If the length of the list is greater that 2 it splits the list in half as best as it can and then tries merging the sublists
(defun concatenate-string(list)
(cond ((= (length list) 1) (car list))
((= (length list) 2) (concatenate 'string (first list) "," (second list)))
(t (let ((mid-point (floor (/ (- (length list) 1) 2))))
(concatenate 'string
(concatenate-string (subseq list 0 mid-point))
","
(concatenate-string (subseq list mid-point (length list))))))))
(concatenate-string '("a" "b"))
I tried applying the divide and conquer strategy to the problem, but I guess that does not give a better result than plain iteration. Please let me know if this could have been done better.
I have also performed an analysis of the recursion obtained by the algorithm, it is available here.
Use the String.join method in C#
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx
In Java 5, with unit test:
import junit.framework.Assert;
import org.junit.Test;
public class StringUtil
{
public static String join(String delim, String... strings)
{
StringBuilder builder = new StringBuilder();
if (strings != null)
{
for (String str : strings)
{
if (builder.length() > 0)
{
builder.append(delim);
}
builder.append(str);
}
}
return builder.toString();
}
#Test
public void joinTest()
{
Assert.assertEquals("", StringUtil.join(", ", null));
Assert.assertEquals("", StringUtil.join(", ", ""));
Assert.assertEquals("", StringUtil.join(", ", new String[0]));
Assert.assertEquals("test", StringUtil.join(", ", "test"));
Assert.assertEquals("foo, bar", StringUtil.join(", ", "foo", "bar"));
Assert.assertEquals("foo, bar, baz", StringUtil.join(", ", "foo", "bar", "baz"));
}
}

Resources