I'm trying to optimize some code here, and wrote two different simple subroutines that will subtract one vector from another. I pass a pair of vectors to these subroutines and the subtraction is then performed. The first subroutine uses an intermediary variable to store the result whereas the second one does an inline operation using the '-=' operator. The full code is located at the bottom of this question.
When I use purely real numbers, the program works fine and there are no issues. However, if I am using complex operands, then the original vectors (the ones originally passed to the subroutines) are modified! Why does this program work fine for purely real numbers but do this sort of data modification when using complex numbers?
Note my process:
Generate random vectors (either real or complex depending on the commented out code)
Print the main vectors to the screen
Perform the first subroutine subtraction (using the third variable intermediary within the subroutine)
Print the main vectors to the screen again to prove that they have not changed, no matter the use of real or complex vectors
Perform the second subroutine subtraction (using the inline computation method)
Print the main vectors to the screen again, showing that #main_v1 has changed when using complex vectors, but will not change when using real vectors (#main_v2 is unaffected)
Print the final answers to the subtraction, which are always the correct answers, regardless of real or complex vectors
The issue arises because in the case of the second subroutine (which is quite a bit faster), I don't want the #main_v1 vector changed. I need that vector to do further calculations down the road, so I need it to stay the same.
Any idea on how to fix this, or what I'm doing wrong? My entire code is below, and should be functional. I've been using the CLI syntax shown below to run the program. I choose 5 just to keep everything easy for me to read.
c:\> bench.pl 5 REAL
or
c:\> bench.pl 5 IMAG
#!/usr/local/bin/perl
# when debugging: add -w option above
#
use strict;
use warnings;
use Benchmark qw (:all);
use Math::Complex;
use Math::Trig;
use Time::HiRes qw (gettimeofday);
system('cls');
my $dimension = $ARGV[0];
my $type = $ARGV[1];
if(!$dimension || !$type){
print "bench.pl <n> <REAL | IMAG>\n";
print " <n> indicates the dimension of the vector to generate\n";
print " <REAL | IMAG> dictates to use real or complex vectors\n";
exit(0);
}
my #main_v1;
my #main_v2;
my #vector_sum1;
my #vector_sum2;
for($a=1;$a<=$dimension;$a++){
my $r1 = sprintf("%.0f", 9*rand)+1;
my $r2 = sprintf("%.0f", 9*rand)+1;
my $i1 = sprintf("%.0f", 9*rand)+1;
my $i2 = sprintf("%.0f", 9*rand)+1;
if(uc($type) eq "IMAG"){
# Using complex vectors has the issue
$main_v1[$a] = cplx($r1,$i1);
$main_v2[$a] = cplx($r2,$i2);
}elsif(uc($type) eq "REAL"){
# Using real vectors shows no issue
$main_v1[$a] = $r1;
$main_v2[$a] = $r2;
}else {
print "bench.pl <n> <REAL | IMAG>\n";
print " <n> indicates the dimension of the vector to generate\n";
print " <REAL | IMAG> dictates to use real or complex vectors\n";
exit(0);
}
}
# cmpthese(-5, {
# v1 => sub {#vector_sum1 = vector_subtract(\#main_v1, \#main_v2)},
# v2 => sub {#vector_sum2 = vector_subtract_v2(\#main_v1, \#main_v2)},
# });
# print "\n";
print "main vectors as defined initially\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
#vector_sum1 = vector_subtract(\#main_v1, \#main_v2);
print "main vectors after the subtraction using 3rd variable\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
#vector_sum2 = vector_subtract_v2(\#main_v1, \#main_v2);
print "main vectors after the inline subtraction\n";
print_vector_matlab(#main_v1);
print_vector_matlab(#main_v2);
print "\n";
print "subtracted vectors from both subroutines\n";
print_vector_matlab(#vector_sum1);
print_vector_matlab(#vector_sum2);
sub vector_subtract {
# subroutine to subtract one [n x 1] vector from another
# result = vector1 - vector2
#
my #vector1 = #{$_[0]};
my #vector2 = #{$_[1]};
my #result;
my $row = 0;
my $dim1 = #vector1 - 1;
my $dim2 = #vector2 - 1;
if($dim1 != $dim2){
syswrite STDOUT, "ERROR: attempting to subtract vectors of mismatched dimensions\n";
exit;
}
for($row=1;$row<=$dim1;$row++){$result[$row] = $vector1[$row] - $vector2[$row]}
return(#result);
}
sub vector_subtract_v2 {
# subroutine to subtract one [n x 1] vector from another
# implements the inline subtraction method for alleged speedup
# result = vector1 - vector2
#
my #vector1 = #{$_[0]};
my #vector2 = #{$_[1]};
my $row = 0;
my $dim1 = #vector1 - 1;
my $dim2 = #vector2 - 1;
if($dim1 != $dim2){
syswrite STDOUT, "ERROR: attempting to subtract vectors of mismatched dimensions\n";
exit;
}
for($row=1;$row<=$dim1;$row++){$vector1[$row] -= $vector2[$row]} # subtract inline
return(#vector1);
}
sub print_vector_matlab { # for use with outputting square matrices only
my (#junk) = (#_);
my $dimension = #junk - 1;
print "V=[";
for($b=1;$b<=$dimension;$b++){
# $temp_real = sprintf("%.3f", Re($junk[$b][$c]));
# $temp_imag = sprintf("%.3f", Im($junk[$b][$c]));
# $temp_cplx = cplx($temp_real,$temp_imag);
print "$junk[$b];";
# print "$temp_cplx,";
}
print "];\n";
}
I've even tried modifying the second subroutine so that it has the following lines, and it STILL alters the #main_v1 vector when using complex numbers...I am completely confused as to what's going on.
#result = #vector1;
for($row=1;$row<=$dim1;$row++){$result[$row] -= $vector2[$row]}
return(#result);
and I've tried this too...still modifies #main_V1 with complex numbers
for($row-1;$row<=$dim1;$row++){$result[$row] = $vector1[$row]}
for($row=1;$row<=$dim1;$row++){$result[$row] -= $vector2[$row]}
return(#result);
Upgrade Math::Complex to at least version 1.57. As the changelog explains, one of the changes in that version was:
Add copy constructor and arrange for it to be called appropriately, problem found by David Madore and Alexandr Ciornii.
In Perl, an object is a blessed reference; so an array of Math::Complexes is an array of references. This is not true of real numbers, which are just ordinary scalars.
If you change this:
$vector1[$row] -= $vector2[$row]
to this:
$vector1[$row] = $vector1[$row] - $vector2[$row]
you'll be good to go: that will set $vector1[$row] to refer to a new object, rather than modifying the existing one.
Related
New to Perl. I need to figure out how to read from a file, separated by (:), into an array. Then I can manipulate the data.
Here is a sample of the file 'serverFile.txt' (Just threw in random #'s)
The fields are Name : CPU Utilization: avgMemory Usage : disk free
Server1:8:6:2225410
Server2:75:68:64392
Server3:95:90:12806
Server4:14:7:1548700
I would like to figure out how to get each field into its appropriate array to then perform functions on. For instance, find the server with the least amount of free disk space.
The way I have it set up now, I do not think will work. So how do I put each element in each line into an array?
#!usr/bin/perl
use warnings;
use diagnostics;
use v5.26.1;
#Opens serverFile.txt or reports and error
open (my $fh, "<", "/root//Perl/serverFile.txt")
or die "System cannot find the file specified. $!";
#Prints out the details of the file format
sub header(){
print "Server ** CPU Util% ** Avg Mem Usage ** Free Disk\n";
print "-------------------------------------------------\n";
}
# Creates our variables
my ($name, $cpuUtil, $avgMemUsage, $diskFree);
my $count = 0;
my $totalMem = 0;
header();
# Loops through the program looking to see if CPU Utilization is greater than 90%
# If it is, it will print out the Server details
while(<$fh>) {
# Puts the file contents into the variables
($name, $cpuUtil, $avgMemUsage, $diskFree) = split(":", $_);
print "$name ** $cpuUtil% ** $avgMemUsage% ** $diskFree% ", "\n\n", if $cpuUtil > 90;
$totalMem = $avgMemUsage + $totalMem;
$count++;
}
print "The average memory usage for all servers is: ", $totalMem / $count. "%\n";
# Closes the file
close $fh;
For this use case, a hash is much better than an array.
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use List::Util qw{ min };
my %server;
while (<>) {
chomp;
my ($name, $cpu_utilization, $avg_memory, $disk_free)
= split /:/;
#{ $server{$name} }{qw{ cpu_utilization avg_memory disk_free }}
= ($cpu_utilization, $avg_memory, $disk_free);
}
my $least_disk = min(map $server{$_}{disk_free}, keys %server);
say for grep $server{$_}{disk_free} == $least_disk, keys %server;
choroba's answer
is ideal, but I think your own code could be improved
Don't use v5.26.1 unless you need a specific feature that is available only in the given version of Perl. Note that it also enables use strict, which should be at the top of every Perl program you write
die "System cannot find the file specified. $!" is wrong: there are multiple reasons why an open may fail, beyond that it "cannot be found". Your die string should include the path to the file you're trying to open; the reason for the failure is in $!
Don't use subroutine prototypes: they don't do what you think they do. sub header() { ... } should be just sub header { ... }
There's no point in declaring a subroutine only to call it a few lines later. Put your code for header in line
You have clearly come from another language. Declare your variables with my as late as possible. In this case only $count and $totalMem must be declared outside the while loop
perl will close all open file handles when the program exits. There is rarely a need for an explicit close call, which just makes your code more noisy
$totalMem = $avgMemUsage + $totalMem is commonly written $totalMem += $avgMemUsage
I hope that helps
To your original question about how to store the data in an array...
First, initialize an empty array outside the file read loop:
my #servers = ();
Then, within the loop, after you have your data pieces parsed out, you can store them in your array as sub-arrays (the resulting data structure is a two dimensional array):
$servers[$count] = [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
Note, the square brackets on the right create the sub-array for the server's data pieces and return a reference to this new array. Also, on the left side we just use the current value of $count as an index within the #servers array and as the value increases, the size of the #servers array will grow automatically (this is called autovivification of new elements). Alternatively, you can push new elements onto the #servers array inside the loop, like this:
push #servers, [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
This way, you explicitly ask for a new element to be added to the array and the square brackets still do the same creation of the sub-array.
In any case, the end result is that after you are finished with the file read loop, you now have a 2D array where you can access the first server and its disk free field (the 4-th field at index 3) like this:
my $df = $servers[0][3];
Or inspect all the servers in a loop to find the minimum disk free:
my $min_s = 0;
for ( my $s = 0; $s < #servers; $s++ ) {
$min_s = $s if ( $servers[$s][3] < $servers[$min_s][3] );
}
print "Server $min_s has least disk free: $servers[$min_s][3]\n";
Like #choroba suggested, you can store the server data pieces/fields in hashes, so that your code will be more readable. You can still store your list of servers in an array but the second dimension can be hash:
$servers[$count] = {
name => $name,
cpu_util => $cpuUtil,
avg_mem_usage => $avgMemUsage,
disk_free => $diskFree
};
So, your resulting structure will be an array of hashes. Here, the curly braces on the right create a new hash and return the reference to it. So, you can later refer to:
my $df = $servers[0]{disk_free};
I know the newer version is better, but company does not allows me to. So the question is related to AutoHotKey, ver 1.0.47.06.
I am trying to refactor my 400 lines program, by separating them into functions.
CaseNumberArray := "" ; The array to store all the case numbers
CaseNumberArrayCount := 0
; Helper function to load the case number into the array
ReadInputFile() {
Loop, Read, U:\case.txt
{
global CaseNumberArrayCount
CaseNumberArrayCount += 1 ; Increment the ArrayCount
CaseNumberArray%CaseNumberArrayCount% := A_LoopReadLine
current := CaseNumberArray%CaseNumberArrayCount%
}
}
CreateOutputHeader()
ReadInputFile()
MsgBox, There are %CaseNumberArrayCount% case(s) in the file.
Loop, %CaseNumberArrayCount%
{
case_number := CaseNumberArray%A_Index%
MsgBox, %case_number%
}
The last part of the code is testing if I can retrieve the case numbers I loaded into the array named CaseNumberArray, but it is currently all blank.
I studied this question, the author user1944441 wrote:
Important: YourArray must not be global and the counter in
YourArray%counter% must not be global, the rest doesn't matter.
I experimented by placing the global variables in different location, but it still does not work. I know the CaseArrayCount is correctly stored, and the Read Loop is working as well (When it is outside of a function). Is it possible to separate the code into a function?
Usually, global/local declarations are placed right below the method header, not somewhere in some subsequent code block. After all, these declarations apply only to the entire function.
You have to distinguish between simple loop counter variables and variables holding the actual size of the array. In your code, CaseNumberArrayCount describes the size of CaseNumberArray whereas in the answer to which you're referring, it's a counter only used to iterate over the array, which might as well be local.
But you don't have to use two "variables" anyway. Your pseudo array (which can be accessed like CaseNumberArray1, CaseNumberArray2, CaseNumberArray2, ...) has an unused CaseNumberArray0, why not not store the size there?
A pseudo array is actually a collection of sequentially numbered variables. global CaseNumberArray (which by the way you didn't seem to try) will only allow access to the variable named CaseNumberArray, but not CaseNumberArray1 or CaseNumberArray2 and so on.
One solution would be to use Assume-global mode which makes every global variable accessible by default:
; Now, CaseNumberArray0 will hold the array length,
; rendering CaseNumberArrayCount unnecessary
CaseNumberArray0 := 0
; Helper function to load the case number into the array
ReadInputFile() {
; We want to access every global variable we have,
; beware of name conflicts within your function!
global
Loop, Read, test.txt
{
CaseNumberArray0 += 1
CaseNumberArray%CaseNumberArray0% := A_LoopReadLine
}
}
; Here's an alternative: Let AHK build the pseudo array!
ReadInputFileAlternative() {
global caseAlt0
FileRead, fileCont, test.txt
StringSplit, caseAlt, fileCont, `n, `r
}
ReadInputFile()
out := ""
Loop, %CaseNumberArray0%
{
out .= CaseNumberArray%A_Index% "`n"
}
MsgBox, There are %CaseNumberArray0% case(s) in the file:`n`n%out%
; Now, let's test the alternative!
ReadInputFileAlternative()
out := ""
Loop, %caseAlt0%
{
out .= caseAlt%A_Index% "`n"
}
MsgBox, There are %caseAlt0% case(s) in the alternative pseudo-array:`n`n%out%
Edit: "Real Arrays"
As suggested in the comments, here's what I would do instead: I would convince my boss to allow the use of an up-to-date version of AHK and then work with real arrays. This comes with several benefits:
Real arrays are fully managed by AHK, which means that things like inserting, removing, iterating and indexing can all automagically be done by AHK.
A real array resides in one real variable, meaning that you can pass it along functions and anywhere you want, without having to worry about the current scope and whether you can access it in the first place.
The array syntax is very similar to most other languages, making your code intuitive and easier to read. And maybe it helps you in the future when dealing with another language.
Primitive n-dimensional arrays (and primitive AHK objects in general) can be expressed using JSON. This provides you with an easy way to (de-)serialize AHK objects.
The following code snippet shows the two methods used above (reading loop and splitting), but with real arrays. You will notice that we don't need any global declarations anymore, since we now can declare the array inside our function, and simply pass it back to the caller. In my opinion, this is what functions should really look like: A "black box" that doesn't affect its surroundings.
; Method 1: Line by line
ReadLineByLine(file) {
out := []
Loop, Read, % file
{
out.Insert(A_LoopReadLine)
}
return out
}
; Method 2: StrSplit
ReadAndSplit(file) {
FileRead, fileCont, % file
return StrSplit(fileCont, "`n", "`r")
}
caseNumbers := ReadLineByLine("test.txt")
out := "ReadLineByLine() yields " caseNumbers.MaxIndex() " entries:`n`n"
; using the for loop
for idx, caseNumber in caseNumbers
{
out .= caseNumber "`n"
}
MsgBox % out
caseNumbers := ReadAndSplit("test.txt")
out := "ReadAndSplit() yields " caseNumbers.MaxIndex() " entries:`n`n"
; using the normal loop
Loop % caseNumbers.MaxIndex()
{
out .= caseNumbers[A_Index] "`n"
}
MsgBox % out
MsgBox % "The second item is " caseNumbers[2]
I think I'm incorrectly accessing and assigning variables from a matrix. I understand that arrays, matrices, and tables are different in R. What I want to end up with is an array of values called "c" that has either a 1 or 2 assigning an element from input to either Mew(number 1) or Mewtwo(number 2. I also want the distance from Mew to all other points in an array called dMew as well as dMewtwo an array of the distance from Mewtwo to all other elements in input. What I end up with is NA_real_ for all variables except input. There is a lot of great information on accessing rows or columns of various data structures in R but I'm interested in accessing single elements. Any advice would be most helpful. I apologize if this has been answered before but I couldn't find it anywhere.
#Read input from a csv data file
input = read.csv("~/Desktop/Engineering/iris.csv",head=FALSE)
input = input[c(0:3)]
input = as.matrix(input)
#set random centroids
Mew = input[1,1]
Mewtwo = input[nrow(input),ncol(input)]
#Determine Distance
dist <- function(x, y) {
return(sqrt((x - y)^2))
}
#Determine the clusters
dMew = matrix(,nrow(input), ncol(input))
dMewtwo = matrix(,nrow(input), ncol(input))
c = matrix(,nrow(input), ncol(input))
for (i in 1:nrow(input)) {
for (j in 1:ncol(input)) {
dMew[i,j] = dist(Mew, input[i,j])
dMewtwo[i,j] = dist(Mewtwo, input[i,j])
if (dMew[i,j] > dMewtwo[i,j]) {
c[i,j] = 2
} else {
c[i,j] = 1
}
}
}
#Update the centroids
Mew = mean(dMew)
Mewtwo = mean(dMewtwo)
I have no problem running the code with the following input:
input = data.frame(V1=1:5,V2=1:5,V3=1:5)
so it seems to be a problem related to your data. Also you should avoid using "c" as a variable name and note that dist() is already a function in the stats package. Also, you can avoid the for-loops by using apply() and ifelse():
#Read input from a csv data file
input = data.frame(V1=1:5,V2=1:5,V3=1:5)
input = input[c(1:3)]
input = as.matrix(input)
#set random centroids
Mew = input[1,1]
Mewtwo = input[nrow(input),ncol(input)]
#Determine Distance
dist.eu <- function(x, y) {
return(sqrt((x - y)^2))
}
dMew<-apply(input,c(1,2),dist.eu,Mew)
dMewtwo<-apply(input,c(1,2),dist.eu,Mewtwo)
c.mat<-ifelse(dMew > dMewtwo,2,1)
c.mat
I have a Perl-Script, which executes a recursive function. Within it compares two elements of a 2dimensional Array:
I call the routine with a 2D-Array "#data" and "0" as a starting value. First I load the parameters into a separate 2D-Array "#test"
Then I want to see, if the array contains only one Element --> Compare if the last Element == the first. And this is where the Error occurs: Modification of non creatable array value attempted, subscript -1.
You tried to make an array value spring into existence, and the subscript was probably negative, even counting from end of the array backwards.
This didn't help me much...I'm pretty sure it has to do with the if-clause "$counter-1". But I don't know what, hope you guys can help me!
routine(#data,0);
sub routine {
my #test #(2d-Array)
my $counter = $_[-1]
for(my $c=0; $_[$c] ne $_[-1]; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $_[$c][$j];
}
}
if ($test[$counter-1][1] eq $test[-1][1]{
$puffertime = $test[$counter][4];
}
else{
for (my $l=0; $l<=$counter;$l++){
$puffertime+= $test[$l][4]
}
}
}
#
#
#
if ($puffertime <90){
if($test[$counter][8]==0){
$counter++;
routine(#test,$counter);
}
else{ return (print"false");}
}
else{return (print "true");}
Weird thing is that I tried it out this morning, and it worked. After a short time of running he again came up with this error message. Might be that I didn't catch up a error constellation, which could happen by the dynamic database-entries.
Your routine() function would be easier to read if it starts off like this:
sub routine {
my #data = #_;
my $counter = pop(#data);
my #test;
for(my $c=0; $c <= $#data; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $data[$c][$j];
}
}
You can check to see if #data only has one element by doing scalar(#data) == 1 or $#data == 0. From your code snippet, I do not see why you need to copy the data to passed to routine() to #test. Seems superfluous. You can just as well skip all this copying if you are not going to modify any of the data passed to your routine.
Your next code might look like this:
if ($#test == 0) {
$puffertime = $test[0][4];
} else {
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}
}
But if your global variable $puffertime was initialized to zero then you can replace this code with:
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}
Relative Performance of Symbol#to_proc in Popular Ruby Implementations states that in MRI Ruby 1.8.7, Symbol#to_proc is slower than the alternative in their benchmark by 30% to 130%, but that this isn't the case in YARV Ruby 1.9.2.
Why is this the case? The creators of 1.8.7 didn't write Symbol#to_proc in pure Ruby.
Also, are there any gems that provide faster Symbol#to_proc performance for 1.8?
(Symbol#to_proc is starting to appear when I use ruby-prof, so I don't think I'm guilty of premature optimization)
The to_proc implementation in 1.8.7 looks like this (see object.c):
static VALUE
sym_to_proc(VALUE sym)
{
return rb_proc_new(sym_call, (VALUE)SYM2ID(sym));
}
Whereas the 1.9.2 implementation (see string.c) looks like this:
static VALUE
sym_to_proc(VALUE sym)
{
static VALUE sym_proc_cache = Qfalse;
enum {SYM_PROC_CACHE_SIZE = 67};
VALUE proc;
long id, index;
VALUE *aryp;
if (!sym_proc_cache) {
sym_proc_cache = rb_ary_tmp_new(SYM_PROC_CACHE_SIZE * 2);
rb_gc_register_mark_object(sym_proc_cache);
rb_ary_store(sym_proc_cache, SYM_PROC_CACHE_SIZE*2 - 1, Qnil);
}
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
}
If you strip away all the busy work of initializing sym_proc_cache, then you're left with (more or less) this:
aryp = RARRAY_PTR(sym_proc_cache);
if (aryp[index] == sym) {
return aryp[index + 1];
}
else {
proc = rb_proc_new(sym_call, (VALUE)id);
aryp[index] = sym;
aryp[index + 1] = proc;
return proc;
}
So the real difference is the 1.9.2's to_proc caches the generated Procs while 1.8.7 generates a brand new one every single time you call to_proc. The performance difference between these two will be magnified by any benchmarking you do unless each iteration is done in a separate process; however, one iteration per-process would mask what you're trying to benchmark with the start-up cost.
The guts of rb_proc_new look pretty much the same (see eval.c for 1.8.7 or proc.c for 1.9.2) but 1.9.2 might benefit slightly from any performance improvements in rb_iterate. The caching is probably the big performance difference.
It is worth noting that the symbol-to-hash cache is a fixed size (67 entries but I'm not sure where 67 comes from, probably related to the number of operators and such that are commonly used for symbol-to-proc conversions):
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
/* ... */
if (aryp[index] == sym) {
If you use more than 67 symbols as procs or if your symbol IDs overlap (mod 67) then you won't get the full benefit of the caching.
The Rails and 1.9 programming style involves a lot of shorthands like:
id = SYM2ID(sym);
index = (id % SYM_PROC_CACHE_SIZE) << 1;
rather than the longer explicit block forms:
ints = strings.collect { |s| s.to_i }
sum = ints.inject(0) { |s,i| s += i }
Given that (popular) programming style, it makes sense to trade memory for speed by caching the lookup.
You're not likely to get a faster implementation from a gem as the gem would have to replace a chunk of the core Ruby functionality. You could patch the 1.9.2 caching into your 1.8.7 source though.
The following ordinary Ruby code:
if defined?(RUBY_ENGINE).nil? # No RUBY_ENGINE means it's MRI 1.8.7
class Symbol
alias_method :old_to_proc, :to_proc
# Class variables are considered harmful, but I don't think
# anyone will subclass Symbol
##proc_cache = {}
def to_proc
##proc_cache[self] ||= old_to_proc
end
end
end
Will make Ruby MRI 1.8.7 Symbol#to_proc slightly less slow than before, but not as fast as an ordinary block or a pre-existing proc.
However, it'll make YARV, Rubinius and JRuby slower, hence the if around the monkeypatch.
The slowness of using Symbol#to_proc isn't solely due to MRI 1.8.7 creating a proc each time - even if you re-use an existing one, it's still slower than using a block.
Using Ruby 1.8 head
Size Block Pre-existing proc New Symbol#to_proc Old Symbol#to_proc
0 0.36 0.39 0.62 1.49
1 0.50 0.60 0.87 1.73
10 1.65 2.47 2.76 3.52
100 13.28 21.12 21.53 22.29
For the full benchmark and code, see https://gist.github.com/1053502
In addition to not caching procs, 1.8.7 also creates (approximately) one array each time a proc is called. I suspect it's because the generated proc creates an array to accept the arguments - this happens even with an empty proc that takes no arguments.
Here's a script to demonstrate the 1.8.7 behavior. Only the :diff value is significant here, which shows the increase in array count.
# this should really be called count_arrays
def count_objects(&block)
GC.disable
ct1 = ct2 = 0
ObjectSpace.each_object(Array) { ct1 += 1 }
yield
ObjectSpace.each_object(Array) { ct2 += 1 }
{:count1 => ct1, :count2 => ct2, :diff => ct2-ct1}
ensure
GC.enable
end
to_i = :to_i.to_proc
range = 1..1000
puts "map(&to_i)"
p count_objects {
range.map(&to_i)
}
puts "map {|e| to_i[e] }"
p count_objects {
range.map {|e| to_i[e] }
}
puts "map {|e| e.to_i }"
p count_objects {
range.map {|e| e.to_i }
}
Sample output:
map(&to_i)
{:count1=>6, :count2=>1007, :diff=>1001}
map {|e| to_i[e] }
{:count1=>1008, :count2=>2009, :diff=>1001}
map {|e| e.to_i }
{:count1=>2009, :count2=>2010, :diff=>1}
It seems that merely calling a proc will create the array for every iteration, but a literal block only seems to create an array once.
But multi-arg blocks may still suffer from the problem:
plus = :+.to_proc
puts "inject(&plus)"
p count_objects {
range.inject(&plus)
}
puts "inject{|sum, e| plus.call(sum, e) }"
p count_objects {
range.inject{|sum, e| plus.call(sum, e) }
}
puts "inject{|sum, e| sum + e }"
p count_objects {
range.inject{|sum, e| sum + e }
}
Sample output. Note how we incur a double penalty in case #2, because we use a multi-arg block, and also call the proc.
inject(&plus)
{:count1=>2010, :count2=>3009, :diff=>999}
inject{|sum, e| plus.call(sum, e) }
{:count1=>3009, :count2=>5007, :diff=>1998}
inject{|sum, e| sum + e }
{:count1=>5007, :count2=>6006, :diff=>999}