I'm playing a little with Haskell and dynamic programming.
I have implemented a lot of problems, but in Fibonacci's case i'm getting some results that are PC dependants and i would like to confirm.
Assume the following implementations:
1)- List:
memoized_fib_list n = fibTab !! n
where fibTab = map fibm [0 ..]
fibm 0 = 0
fibm 1 = 1
fibm n = fibTab !! (n-2) + fibTab !! (n-1)
2)- Array:
memoized_fib_array n = fibTab ! n
where fibTab = listArray (0, n) [mfib x | x <- [0..n]]
mfib 0 = 0
mfib 1 = 1
mfib x = fibTab ! (x - 1) + fibTab ! (x - 2)
Result (with Criterion):
N = 15.000:
List implementation: 171.5 μs
Array implementation: 8.782 ms
N = 100.000:
List implementation: 2.289 ms
Array implementation: 195.7 ms
N = 130.000:
List implementation: 3.708 ms
Array implementation: 410.4 ms
The tests were run on a Notebook with a Core i7 Skylake, 8gb DDR4 and SSD (Ubuntu).
I was expecting the array implementation to be much better, and this was the only problem where the list implementation is better.
Could it be because of the sequential access? On some hardware with lower specs the list implementation has worse performance.
Note: I'm using the last (edit: latest) version of GHC.
Thanks.
Edit:
benchmark n = defaultMain [
bgroup "fibonacci" [
bench "memoized_fib_list" $ whnf (memoized_fib_list) n
, bench "memoized_fib_array" $ whnf (memoized_fib_array) n
]
]
main = do
{
putStrLn "--------------EJECUTANDO BENCHMARK N=40------------------";
benchmark 40;
putStrLn "--------------EJECUTANDO BENCHMARK N=15000---------------";
benchmark 15000;
putStrLn "--------------EJECUTANDO BENCHMARK N=50000---------------";
benchmark 50000;
putStrLn "--------------EJECUTANDO BENCHMARK N=100000--------------";
benchmark 100000;
putStrLn "--------------EJECUTANDO BENCHMARK N=130000--------------";
benchmark 130000;
}
Edit2: I installed Haskell Platform 8.2.2 on my windows 10 PC and got very similar results.
Intel i5 6600K, 16gb DDR4, SSD.
-------------------EJECUTANDO BENCHMARK N=130000------------------------
benchmarking best algo/memoized_fib_list
time 1.818 ms (1.774 ms .. 1.855 ms)
0.993 R² (0.985 R² .. 0.998 R²)
mean 1.853 ms (1.826 ms .. 1.904 ms)
std dev 119.2 μs (84.15 μs .. 191.3 μs)
variance introduced by outliers: 48% (moderately inflated)
benchmarking best algo/memoized_fib_array
time 139.8 ms (63.05 ms .. 221.8 ms)
0.884 R² (0.623 R² .. 1.000 R²)
mean 287.0 ms (221.4 ms .. 353.0 ms)
std dev 83.83 ms (64.91 ms .. 101.6 ms)
variance introduced by outliers: 78% (severely inflated)
Edit3: Some additional information after running criterion with Linear Regression. All the values correspond to the execution with N = 130000.
-Number of garbage collections:
List implementation:
numGcs: NaN R² (NaN R² .. NaN R²)
iters 0.000 (0.000 .. 0.000)
y 0.000 (0.000 .. 0.000)
Array Implementation:
numGcs: 1.000 R² (1.000 R² .. 1.000 R²)
iters 739.000 (739.000 .. 739.000)
y 2.040e-12 (-3.841e-12 .. 2.130e-12)
-Bytes allocated:
List implementation:
allocated: 0.001 R² (0.000 R² .. 0.089 R²)
iters 1.285 (-9.751 .. 13.730)
y 2344.014 (1748.809 .. 2995.439)
Array Implementation:
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.586e8 (7.586e8 .. 7.586e8)
y 1648.000 (1648.000 .. NaN)
-CPU cycles:
List implementation:
cycles: 0.992 R² (0.984 R² .. 0.997 R²)
iters 6759303.406 (6579945.392 .. 6962148.091)
y -141047.582 (-4701325.840 .. 4674847.149)
Array Implementation:
cycles: 1.000 R² (NaN R² .. 1.000 R²)
iters 1.729e9 (1.680e9 .. 1.757e9)
y -3311041.000 (NaN .. 6.513e7)
What's happening here is quite simple: with -O2, GHC decides to make the memoisation-list in memoized_fib_list global.
$ ghc -fforce-recomp wtmpf-file4545.hs -O2 -ddump-prep
...
Main.memoizedFib_list :: GHC.Types.Int -> GHC.Integer.Type.Integer
[GblId, Arity=1, Str=<S(S),1*U(U)>, Unf=OtherCon []]
Main.memoizedFib_list
= \ (n_sc61 [Occ=Once] :: GHC.Types.Int) ->
GHC.List.!! # GHC.Integer.Type.Integer Main.main_fibTab n_sc61
...
Main.main_fibTab :: [GHC.Integer.Type.Integer]
[GblId]
Main.main_fibTab
= case Main.$wgo 0# of
{ (# ww1_sc5M [Occ=Once], ww2_sc5N [Occ=Once] #) ->
GHC.Types.: # GHC.Integer.Type.Integer ww1_sc5M ww2_sc5N
}
...
That means, your criterion benchmark doesn't actually evaluate the fibonacci function repeatedly – it just performs repeated lookups in the same global list. And averaged over many evaluations, this gives a very good score, which is however not representative of how fast the calculation is.
GHC performs this optimisation in the list implementation because you don't need lists of different length – it's always an infinite list of all Fibonacci number. That's not possible in the array implementation, so this can't keep up here.
The simples way to prevent this globalisation would be to make fibm explicitly dependent on n, by just trimming it to the needed finite lenght like the arrays are as well.
memoizedFib_list :: Int -> Integer
memoizedFib_list n = fibTab !! n
where fibTab = map fibm [0 ..]
fibm 0 = 0
fibm 1 = 1
fibm n = fibTab !! (n-2) + fibTab !! (n-1)
With this, the list implementation becomes much slower than the array one, as one would expect seeing memo-lookup is O(n) for lists:
$ ghc -fforce-recomp wtmpf-file4545.hs -O2 && ./wtmpf-file4545
[1 of 1] Compiling Main ( wtmpf-file4545.hs, wtmpf-file4545.o )
Linking wtmpf-file4545 ...
--------------EJECUTANDO BENCHMARK N=40------------------
benchmarking fibonacci/memoizedFib_list
time 10.47 μs (10.42 μs .. 10.51 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 10.40 μs (10.35 μs .. 10.44 μs)
std dev 163.3 ns (122.2 ns .. 225.8 ns)
variance introduced by outliers: 13% (moderately inflated)
benchmarking fibonacci/memoizedFib_array
time 1.618 μs (1.617 μs .. 1.620 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.620 μs (1.618 μs .. 1.623 μs)
std dev 7.521 ns (4.079 ns .. 12.48 ns)
benchmarking fibonacci/memoizedFib_vector
time 1.573 μs (1.572 μs .. 1.574 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.572 μs (1.571 μs .. 1.573 μs)
std dev 2.351 ns (1.417 ns .. 4.040 ns)
--------------EJECUTANDO BENCHMARK N=1500----------------
benchmarking fibonacci/memoizedFib_list
time 18.52 ms (18.41 ms .. 18.68 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 18.65 ms (18.53 ms .. 18.84 ms)
std dev 355.1 μs (204.8 μs .. 592.1 μs)
benchmarking fibonacci/memoizedFib_array
time 135.2 μs (131.2 μs .. 140.1 μs)
0.996 R² (0.991 R² .. 1.000 R²)
mean 132.7 μs (131.9 μs .. 135.0 μs)
std dev 4.463 μs (2.024 μs .. 8.327 μs)
variance introduced by outliers: 32% (moderately inflated)
benchmarking fibonacci/memoizedFib_vector
time 131.8 μs (130.6 μs .. 133.2 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 132.5 μs (131.4 μs .. 134.1 μs)
std dev 4.383 μs (3.463 μs .. 5.952 μs)
variance introduced by outliers: 31% (moderately inflated)
Vector which I also tested here performs yet a bit faster, but not really significantly. I think as soon as you use a container with O(1) lookup, the performance is dominated by the additions of the pretty huge numbers, so you're really benchmarking GMP rather than anything Haskell has to do with.
import qualified Data.Vector as V
memoizedFib_vector :: Int -> Integer
memoizedFib_vector n = fibTab V.! n
where fibTab = V.generate (n+1) mfib
mfib 0 = 0
mfib 1 = 1
mfib x = fibTab V.! (x - 1) + fibTab V.! (x - 2)
Related
I want to turn a array of arrays into a matrix. To illustrate; let the array of arrays be:
[ [1,2,3], [4,5,6], [7,8,9]]
I would like to turn this into the 3x3 matrix:
[1 2 3
4 5 6
7 8 9]
How would you do this in Julia?
There are several ways of doing this. For instance, something along the lines of vcat(transpose.(a)...) will work as a one-liner
julia> a = [[1,2,3], [4,5,6], [7,8,9]]
3-element Vector{Vector{Int64}}:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
julia> vcat(transpose.(a)...)
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
though note that
Since your inner arrays are column-vectors as written, you need to transpose them all before you can vertically concatenate (aka vcat) them (either that or horizontally concatenate and then transpose the whole result after, i.e., transpose(hcat(a...))), and
The splatting operator ... which makes this one-liner work will not be very efficient when applied to Arrays in general, and especially not when applied to larger arrays-of-arrays.
Performance-wise for larger arrays-of-arrays, it will likely actually be hard to beat preallocating a result of the right size and then simply filling with a loop, e.g.
result = similar(first(a), length(a), length(first(a)))
for i=1:length(a)
result[i,:] = a[i] # Aside: `=` is actually slightly faster than `.=` here, though either will have the same practical result in this case
end
Some quick benchmarks for reference:
julia> using BenchmarkTools
julia> #benchmark vcat(transpose.($a)...)
BechmarkTools.Trial: 10000 samples with 405 evaluations.
Range (min … max): 241.289 ns … 3.994 μs ┊ GC (min … max): 0.00% … 92.59%
Time (median): 262.836 ns ┊ GC (median): 0.00%
Time (mean ± σ): 289.105 ns ± 125.940 ns ┊ GC (mean ± σ): 2.06% ± 4.61%
▁▆▇█▇▆▅▅▅▄▄▄▄▃▂▂▂▃▃▂▂▁▁▁▂▄▃▁▁ ▁ ▁ ▂
████████████████████████████████▇▆▅▆▆▄▆▆▆▄▄▃▅▅▃▄▆▄▁▃▃▃▅▄▁▃▅██ █
241 ns Histogram: log(frequency) by time 534 ns <
Memory estimate: 320 bytes, allocs estimate: 5.
julia> #benchmark for i=1:length($a)
$result[i,:] = $a[i]
end
BechmarkTools.Trial: 10000 samples with 993 evaluations.
Range (min … max): 33.966 ns … 124.918 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 36.710 ns ┊ GC (median): 0.00%
Time (mean ± σ): 39.795 ns ± 7.566 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄▄██▄▅▃ ▅▃ ▄▁▂ ▂▁▂▅▂▁ ▄▂▁ ▂
██████████████▇██████▆█▇▆███▆▇███▇▆▆▅▆▅▅▄▄▅▄▆▆▆▄▁▃▄▁▃▄▅▅▃▁▄█ █
34 ns Histogram: log(frequency) by time 77.7 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
In general, filling column-by-column (if possible) will be faster than filling row-by-row as we have done here, since Julia is column-major.
Expanding on #cbk's answer, another (slightly more efficient) one-liner is
julia> transpose(reduce(hcat, a))
3×3 transpose(::Matrix{Int64}) with eltype Int64:
1 2 3
4 5 6
7 8 9
[1 2 3; 4 5 6; 7 8 9]
# or
reshape(1:9, 3, 3)' # remember that ' makes the transpose of a Matrix
* (defparameter lst (make-list 1000))
LST
* (time (loop for x in lst
for i from 0
unless (= i 500)
collect x))
Evaluation took:
0.000 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
100.00% CPU
47,292 processor cycles
0 bytes consed
How does SBCL build the return list with 0 bytes consed?
Your test case is too small for time. Try (defparameter lst 100000).
Evaluation took:
0.003 seconds of real time
0.003150 seconds of total run time (0.002126 user, 0.001024 system)
100.00% CPU
8,518,420 processor cycles
1,579,472 bytes consed
I have a function, which returns a two dimensional Array:
2-element Array{Float64,1}:
0.809919
2.00754
I now want to efficiently sample over it and store all the results in an array with 2 rows and n columns. The problem is that I get a Vector of vectors. How could I flatten it or construct it?
A toy example is the following:
julia> [rand(2) for i=1:3]
3-element Array{Array{Float64,1},1}:
[0.906644, 0.614673]
[0.426492, 0.67645]
[0.473704, 0.726284]
julia> [rand(2)' for i=1:3]
3-element Array{RowVector{Float64,Array{Float64,1}},1}:
[0.403384 0.431918]
[0.410625 0.546614]
[0.224933 0.118778]
And I would like to have the result in a form like this:
julia> [rand(2) rand(2) rand(2)]
2×3 Array{Float64,2}:
0.360833 0.205969 0.209643
0.507417 0.317295 0.588516
Actually my dream would be:
julia> [rand(2) rand(2) rand(2)]'
3×2 Array{Float64,2}:
0.0320955 0.821869
0.358808 0.26685
0.230355 0.31273
Any ideas? I know that I could construct it via a for loop, but was looking for a more efficient way.
Thanks!
RecursiveArrayTools.jl has a VectorOfArray type which dispatches in the way you'd want:
julia> using RecursiveArrayTools
julia> A = [rand(2) for i=1:3]
3-element Array{Array{Float64,1},1}:
[0.957228, 0.104218]
[0.293985, 0.83882]
[0.788157, 0.454772]
julia> VectorOfArray(A)'
3×2 Array{Float64,2}:
0.957228 0.104218
0.293985 0.83882
0.788157 0.454772
As for timing:
julia> #benchmark VectorOfArray(A)'
BenchmarkTools.Trial:
memory estimate: 144 bytes
allocs estimate: 2
--------------
minimum time: 100.658 ns (0.00% GC)
median time: 111.740 ns (0.00% GC)
mean time: 127.159 ns (3.29% GC)
maximum time: 1.360 μs (82.71% GC)
--------------
samples: 10000
evals/sample: 951
VectorOfArray itself is almost no overhead, and the ' uses the Cartesian indexing to be fast.
Something along these lines
using BenchmarkTools
function createSample!(vec::AbstractVector)
vec .= randn(length(vec))
return vec
end
function createSamples!(A::Matrix)
for row in indices(A, 1)
createSample!(view(A, row, :))
end
return A
end
A = zeros(10, 2)
#benchmark createSamples!(A)
might help. The timing on my laptop gives:
Main> #benchmark createSamples!(A)
BenchmarkTools.Trial:
memory estimate: 1.41 KiB
allocs estimate: 20
--------------
minimum time: 539.104 ns (0.00% GC)
median time: 581.194 ns (0.00% GC)
mean time: 694.601 ns (13.34% GC)
maximum time: 10.324 μs (90.10% GC)
--------------
samples: 10000
evals/sample: 193
I have a data structure that I have loaded in from json that resembles the below
json_in =
[ Dict("customer" => "cust1", "transactions" => 1:10^6)
, Dict("customer" => "cust2", "transactions" => 1:10^6)
, Dict("customer" => "cust3", "transactions" => 1:10^6)]
I know of two methods to collapse the transactions into one array
#time methodA = reduce(vcat,[cust["transactions"] for cust in json_in])
#time methodB = vcat(json_in[1]["transactions"],json_in[2]["transactions"],json_in[3]["transactions"])
However the timing of methodA is ~0.22s vs ~0.02s for methodB on my computer. I intend to perform this thousands of times so 10x quicker performance is a big deal.
I see methodB is not very robust as it can only deal with 3 Dicts (customers) so even though it's performant it doesn't generalise.
What would be the most efficient way to concatenate arrays that are elements in an array of Dict efficiently?
As #Gnimuc states in his comment, you should not benchmark in global scope, and benchmarks are best done using BenchmarkTools.jl - here are the timings done right:
julia> methodA(json_in) = reduce(vcat,[cust["transactions"] for cust in json_in])
method1 (generic function with 1 method)
julia> methodB(json_in) = vcat(json_in[1]["transactions"],json_in[2]["transactions"],json_in[3]["transactions"])
method2 (generic function with 1 method)
#Gnimuc's syntax from his comment
julia> methodC(json_in) = mapreduce(x->x["transactions"], vcat, json_in)
method3 (generic function with 1 method)
julia> using BenchmarkTools
julia> #benchmark methodA(json_in)
BenchmarkTools.Trial:
memory estimate: 38.15 MiB
allocs estimate: 15
--------------
minimum time: 10.584 ms (3.10% GC)
median time: 14.781 ms (32.02% GC)
mean time: 15.112 ms (32.19% GC)
maximum time: 69.341 ms (85.28% GC)
--------------
samples: 331
evals/sample: 1
julia> #benchmark methodB(json_in)
BenchmarkTools.Trial:
memory estimate: 22.89 MiB
allocs estimate: 2
--------------
minimum time: 5.921 ms (5.92% GC)
median time: 8.402 ms (32.48% GC)
mean time: 8.701 ms (33.46% GC)
maximum time: 69.268 ms (91.09% GC)
--------------
samples: 574
evals/sample: 1
julia> #benchmark methodC(json_in)
BenchmarkTools.Trial:
memory estimate: 38.15 MiB
allocs estimate: 12
--------------
minimum time: 10.599 ms (3.37% GC)
median time: 14.843 ms (32.12% GC)
mean time: 15.228 ms (32.24% GC)
maximum time: 71.954 ms (85.95% GC)
--------------
samples: 328
evals/sample: 1
Method B is still like twice as fast. That is exactly because it is more specialized, on an array with exactly three elements.
An alternative solution that might work well here is to use a MappedArray, which creates a lazy view into the original array:
using MappedArrays
method4(json_in) = mappedarray(x->x["transactions"], json_in)
Of course this doesn't concatenate the arrays, but you can concatenate views using the CatView package:
using CatViews
julia> method5(json_in) = reduce(CatView, mappedarray(x->x["transactions"], json_in))
method5 (generic function with 1 method)
julia> #benchmark method5(json_in)
BenchmarkTools.Trial:
memory estimate: 1.73 KiB
allocs estimate: 46
--------------
minimum time: 23.320 μs (0.00% GC)
median time: 23.916 μs (0.00% GC)
mean time: 25.466 μs (0.00% GC)
maximum time: 179.092 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Because it doesn't allocate it is like 300x faster than method B (but it's possible it's slower to use the result because of nonlocality - worth benchmarking).
Thanks for the help, after some research I came up with this idea to inline expand the code using macros, see code below, and it performs pretty well on the benchmarks (on Juliabox.com 21Sep2017)
macro inline_vcat(a)
quote
astr = $(string(a))
s = reduce(string, string(astr,"[",aa,"][\"transactions\"],") for aa in 1:length($a))
string("vcat(", s[1:(end-1)],")")
end
end
methodE(json_in) = (#inline_vcat json_in) |> parse |> eval
using BenchmarkTools
#benchmark methodE(json_in)
One shortcoming of this method is that if there are a large (~1million) customers in the JSON then the code generated will be long and parsing it would take a long time I assume well. Hence it's probably not a good idea for large datasets.
Is it possible to interleave two arrays in julia?
For example if a=[1:10] and b=[11:20] I want to be able to return
20-element Array{Int64,1}:
1
11
2
12
3
13
4
14
.
.
.
Somewhat similar to what ruby can do Merge and interleave two arrays in Ruby
There is a straightforward way to do this without needing to use the reshape() function. In particular, we can just bind the vectors into a matrix and then use [:] on the transpose of that matrix. For example:
julia> a = 1:10
julia> b = 11:20
julia> [a b]'[:]
20-element Array{Int64,1}:
1
11
2
12
3
13
.
.
.
20
Taking the transpose of the matrix [a b] gives us a 2-by-10 matrix, and then [:] returns all of its elements in the form of a vector. The reason [:] works so nicely for us is because Julia uses column-major ordering.
Figured it out!
reshape([a b]',20,1)
and for something more general:
reshape([a b].',size(a,1)+size(b,1),1)
we can use a hack to get vectors instead of 1D arrays:
reshape([a b].',size(a,1)+size(b,1),1)[:]
You could just use
reshape([a b].', length(a)+length(b))
to get a vector.
julia> #benchmark collect(Iterators.flatten(zip(a,b))) setup = begin a=rand(100); b=rand(100) end
BenchmarkTools.Trial: 10000 samples with 714 evaluations.
Range (min … max): 190.895 ns … 1.548 μs ┊ GC (min … max): 0.00% … 65.85%
Time (median): 238.843 ns ┊ GC (median): 0.00%
Time (mean ± σ): 265.428 ns ± 148.757 ns ┊ GC (mean ± σ): 8.40% ± 11.75%
▅▅██▅▃▂▂▁ ▁▁ ▂
██████████▇█▇▇▅▄▄▃▁▁▁▃▄█▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▃▃▁▁▁▁▁▃▁▃▄▆▇███ █
191 ns Histogram: log(frequency) by time 1.11 μs <
Memory estimate: 1.77 KiB, allocs estimate: 1.
it seems that
collect(Iterators.flatten(zip(a,b)))
is much faster
For completeness, expanding #bdeonovic's solution to 2 dimensional arrays.
julia> a
2×2 Array{Int64,2}:
1 2
3 4
julia> b
2×2 Array{Int64,2}:
6 7
8 9
Interweaving rows:
julia> reshape([a[:] b[:]]', 4, 2)
4×2 Array{Int64,2}:
1 2
6 7
3 4
8 9
Interweaving columns:
julia> reshape( [a' b']', 2, 4 )
2×4 Array{Int64,2}:
1 6 2 7
3 8 4 9
Interweaving arrays (stacking/vcatting):
julia> reshape([a' b']', 4, 2)
4×2 Array{Int64,2}:
1 2
3 4
6 7
8 9