Consider the following Vector:
numbers = Int32[1,2,3,4,5,6,7,8,9,10]
If I want to create a 2x5 matrix with the result:
1 2 3 4 5
6 7 8 9 10
I can't use reshape(numbers,2,5) or else I'll get:
1 3 5 7 9
2 4 6 8 10
Using slice or view(), you can extract the top row and bottom row, convert them to a matrix row, and then use vcat().
I'm not saying using slice or view() is the only or best way of doing it, perhaps there is a faster way using reshape(), I just haven't figured it out.
numbers = Int32[1,2,3,4,5,6,7,8,9,10]
println("Using Slice:")
#time numbers_slice_matrix_top = permutedims(numbers[1:5])
#time numbers_slice_matrix_bottom = permutedims(numbers[6:10])
#time vcat(numbers_slice_matrix_top,numbers_slice_matrix_bottom)
println("Using view():")
#time numbers_view_matrix_top = permutedims(view(numbers,1:5))
#time numbers_view_matrix_bottom = permutedims(view(numbers,6:10))
#time vcat(numbers_view_matrix_top,numbers_view_matrix_bottom)
Output:
Using Slice:
0.026763 seconds (5.48 k allocations: 329.155 KiB, 99.78% compilation time)
0.000015 seconds (3 allocations: 208 bytes)
0.301833 seconds (177.09 k allocations: 10.976 MiB, 93.30% compilation time)
Using view():
0.103084 seconds (72.25 k allocations: 4.370 MiB, 99.90% compilation time)
0.000011 seconds (2 allocations: 112 bytes)
0.503787 seconds (246.63 k allocations: 14.537 MiB, 99.85% compilation time)
Why is slice faster? In a few rare cases view() was faster, but not by much.
From view() documentation:
For example, if x is an array and v = #view x[1:10], then v acts like
a 10-element array, but its data is actually accessing the first 10
elements of x. Writing to a view, e.g. v[3] = 2, writes directly to
the underlying array x (in this case modifying x[3]).
I don't know enough, but from my understanding, because view() has to convert the Vector to a matrix row (the original Vector) through another array (the view()), it's slower. Using slice we create a copy and don't have to worry about manipulating the original Vector.
Your results actually show that view is faster not slicing. The point is that only the second tests is measuring the time to run the code while in the tests 1 and 3 you are measuring the time to compile the code.
This is a common misunderstanding how to run benchmarks in Julia. The point is that when a Julia function is run for the first time it needs to be compiled to an assembly code. Normally in production codes compile times do not matter because you compile only once for a fraction of a second and then run computations for many minutes, hours or days.
More than that - your code is using a global variable so in such a microbenchmark you are also measuring "how long does it take to resolve a global variable type" which is slow in Julia and not used in a production code.
Here is the correct way to run the benchmark using BenchmarkTools:
julia> #btime vcat(permutedims($numbers[1:5]),permutedims($numbers[6:10]));
202.326 ns (7 allocations: 448 bytes)
julia> #btime vcat(permutedims(view($numbers,1:5)),permutedims(view($numbers,6:10)));
88.736 ns (1 allocation: 96 bytes)
Note the interpolation symbol $ that makes numbers a type stable variable.
reshape(numbers, 5, 2)' can be used also to create the desired 2x5 matrix.
I have executed the intersection of two arrays using intersection in prolog. How can I execute this manually without using 'intersection'?
intersection([],[],[]).
intersection(_,[], []).
intersection(List1, [Head2|Tail2], output):-
\+(member(Head2, List1)), intersection(List1, Tail2, output).
intersection(List1, [Head2|Tail2], [Head2|output]):-
member(Head2, List1), intersection(List1, Tail2, output).
In fact, depending on the compiler used, it is possible to create a predicate to compute set intersection that is more efficient than the predefined predicate intersection/3. In SWI-Prolog, version 8.2.1, I've obtained the following results:
intersection1(A, B, I) :-
setup_call_cleanup(
dynamic('$elem'/1),
( maplist([X] >> assertz('$elem'(X)), A),
convlist([X,X] >> retract('$elem'(X)), B, I)),
abolish('$elem'/1)).
intersection2(A, B, I) :-
setup_call_cleanup(
dynamic('$elem'/1),
( forall(member(X,A), assertz('$elem'(X))),
findall(X, (member(X,B), retract('$elem'(X))), I)),
abolish('$elem'/1)).
test_intersection(N) :-
M is 10*N,
randseq(N, M, A),
randseq(N, M, B),
time(intersection( A, B, I0)),
time(intersection1(A, B, I1)),
time(intersection2(A, B, I2)),
sort(I0, S),
sort(I1, S),
sort(I2, S).
Some tests:
?- intersection([1,3,4,7,8,9], [0,2,3,4,6,9], I).
I = [3, 4, 9].
?- intersection1([1,3,4,7,8,9], [0,2,3,4,6,9], I).
I = [3, 4, 9].
?- intersection2([1,3,4,7,8,9], [0,2,3,4,6,9], I).
I = [3, 4, 9].
Execution time for bigger sets:
?- test_intersection(10000).
% 20,992 inferences, 2.875 CPU in 2.875 seconds (100% CPU, 7302 Lips)
% 70,013 inferences, 0.016 CPU in 0.016 seconds (100% CPU, 4480832 Lips)
% 41,014 inferences, 0.016 CPU in 0.016 seconds (100% CPU, 2624896 Lips)
true.
?- test_intersection(20000).
% 42,053 inferences, 11.203 CPU in 11.196 seconds (100% CPU, 3754 Lips)
% 140,013 inferences, 0.031 CPU in 0.031 seconds (100% CPU, 4480416 Lips)
% 82,075 inferences, 0.031 CPU in 0.031 seconds (100% CPU, 2626400 Lips)
true.
?- test_intersection(40000).
% 83,948 inferences, 47.563 CPU in 47.645 seconds (100% CPU, 1765 Lips)
% 280,013 inferences, 0.078 CPU in 0.070 seconds (112% CPU, 3584166 Lips)
% 163,970 inferences, 0.078 CPU in 0.078 seconds (100% CPU, 2098816 Lips)
true.
As we can see, doubling the size of the sets, the execution times of intersection1/3 and intersection2/3 almost double as well [i.e., time complexity is O(n)], while the execution time of the predefined predicate intersection/3 is approximately four times larger [i.e., time complexity is O(n^2)].
Of course, as already said, that will depend on the compiler used.
Here's my approach:
The idea is to to compare each element of one list with each element of the other list. If H1=H2 then we have an intersection, else, we don't. So, when intersection found put it in the new list, else, don't. We are using flatten to give back the final answer removing the empty brackets.
intersection([H|T],[H2|T2],List):-
intersection1([H|T],[H2|T2],R),
flatten(R,List).
intersection1([],_,[]).
intersection1([H|T],[H2|T2],[R|List]):-
intersection2(H,[H2|T2],R),
intersection1(T,[H2|T2],List).
intersection2(_,[],[]).
intersection2(H1,[H2|T2],[H1|L]):-
H1=H2,
intersection2(H1,T2,L).
intersection2(H1,[H2|T2],L):-
H1\=H2,
intersection2(H1,T2,L).
Examples:
?-intersection([12, 3, 9,4,1],[6,7,4,8],List).
List = [4]
?-intersection([6,4,8,2],[6,7,4,8],List).
List = [6, 4, 8]
?-intersection([12,6,4],[5,2,1],List).
List = []
?-intersection([12,6,4,7,88],[88,5,7,2,1],List).
List = [7, 88]
I'm playing a little with Haskell and dynamic programming.
I have implemented a lot of problems, but in Fibonacci's case i'm getting some results that are PC dependants and i would like to confirm.
Assume the following implementations:
1)- List:
memoized_fib_list n = fibTab !! n
where fibTab = map fibm [0 ..]
fibm 0 = 0
fibm 1 = 1
fibm n = fibTab !! (n-2) + fibTab !! (n-1)
2)- Array:
memoized_fib_array n = fibTab ! n
where fibTab = listArray (0, n) [mfib x | x <- [0..n]]
mfib 0 = 0
mfib 1 = 1
mfib x = fibTab ! (x - 1) + fibTab ! (x - 2)
Result (with Criterion):
N = 15.000:
List implementation: 171.5 μs
Array implementation: 8.782 ms
N = 100.000:
List implementation: 2.289 ms
Array implementation: 195.7 ms
N = 130.000:
List implementation: 3.708 ms
Array implementation: 410.4 ms
The tests were run on a Notebook with a Core i7 Skylake, 8gb DDR4 and SSD (Ubuntu).
I was expecting the array implementation to be much better, and this was the only problem where the list implementation is better.
Could it be because of the sequential access? On some hardware with lower specs the list implementation has worse performance.
Note: I'm using the last (edit: latest) version of GHC.
Thanks.
Edit:
benchmark n = defaultMain [
bgroup "fibonacci" [
bench "memoized_fib_list" $ whnf (memoized_fib_list) n
, bench "memoized_fib_array" $ whnf (memoized_fib_array) n
]
]
main = do
{
putStrLn "--------------EJECUTANDO BENCHMARK N=40------------------";
benchmark 40;
putStrLn "--------------EJECUTANDO BENCHMARK N=15000---------------";
benchmark 15000;
putStrLn "--------------EJECUTANDO BENCHMARK N=50000---------------";
benchmark 50000;
putStrLn "--------------EJECUTANDO BENCHMARK N=100000--------------";
benchmark 100000;
putStrLn "--------------EJECUTANDO BENCHMARK N=130000--------------";
benchmark 130000;
}
Edit2: I installed Haskell Platform 8.2.2 on my windows 10 PC and got very similar results.
Intel i5 6600K, 16gb DDR4, SSD.
-------------------EJECUTANDO BENCHMARK N=130000------------------------
benchmarking best algo/memoized_fib_list
time 1.818 ms (1.774 ms .. 1.855 ms)
0.993 R² (0.985 R² .. 0.998 R²)
mean 1.853 ms (1.826 ms .. 1.904 ms)
std dev 119.2 μs (84.15 μs .. 191.3 μs)
variance introduced by outliers: 48% (moderately inflated)
benchmarking best algo/memoized_fib_array
time 139.8 ms (63.05 ms .. 221.8 ms)
0.884 R² (0.623 R² .. 1.000 R²)
mean 287.0 ms (221.4 ms .. 353.0 ms)
std dev 83.83 ms (64.91 ms .. 101.6 ms)
variance introduced by outliers: 78% (severely inflated)
Edit3: Some additional information after running criterion with Linear Regression. All the values correspond to the execution with N = 130000.
-Number of garbage collections:
List implementation:
numGcs: NaN R² (NaN R² .. NaN R²)
iters 0.000 (0.000 .. 0.000)
y 0.000 (0.000 .. 0.000)
Array Implementation:
numGcs: 1.000 R² (1.000 R² .. 1.000 R²)
iters 739.000 (739.000 .. 739.000)
y 2.040e-12 (-3.841e-12 .. 2.130e-12)
-Bytes allocated:
List implementation:
allocated: 0.001 R² (0.000 R² .. 0.089 R²)
iters 1.285 (-9.751 .. 13.730)
y 2344.014 (1748.809 .. 2995.439)
Array Implementation:
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.586e8 (7.586e8 .. 7.586e8)
y 1648.000 (1648.000 .. NaN)
-CPU cycles:
List implementation:
cycles: 0.992 R² (0.984 R² .. 0.997 R²)
iters 6759303.406 (6579945.392 .. 6962148.091)
y -141047.582 (-4701325.840 .. 4674847.149)
Array Implementation:
cycles: 1.000 R² (NaN R² .. 1.000 R²)
iters 1.729e9 (1.680e9 .. 1.757e9)
y -3311041.000 (NaN .. 6.513e7)
What's happening here is quite simple: with -O2, GHC decides to make the memoisation-list in memoized_fib_list global.
$ ghc -fforce-recomp wtmpf-file4545.hs -O2 -ddump-prep
...
Main.memoizedFib_list :: GHC.Types.Int -> GHC.Integer.Type.Integer
[GblId, Arity=1, Str=<S(S),1*U(U)>, Unf=OtherCon []]
Main.memoizedFib_list
= \ (n_sc61 [Occ=Once] :: GHC.Types.Int) ->
GHC.List.!! # GHC.Integer.Type.Integer Main.main_fibTab n_sc61
...
Main.main_fibTab :: [GHC.Integer.Type.Integer]
[GblId]
Main.main_fibTab
= case Main.$wgo 0# of
{ (# ww1_sc5M [Occ=Once], ww2_sc5N [Occ=Once] #) ->
GHC.Types.: # GHC.Integer.Type.Integer ww1_sc5M ww2_sc5N
}
...
That means, your criterion benchmark doesn't actually evaluate the fibonacci function repeatedly – it just performs repeated lookups in the same global list. And averaged over many evaluations, this gives a very good score, which is however not representative of how fast the calculation is.
GHC performs this optimisation in the list implementation because you don't need lists of different length – it's always an infinite list of all Fibonacci number. That's not possible in the array implementation, so this can't keep up here.
The simples way to prevent this globalisation would be to make fibm explicitly dependent on n, by just trimming it to the needed finite lenght like the arrays are as well.
memoizedFib_list :: Int -> Integer
memoizedFib_list n = fibTab !! n
where fibTab = map fibm [0 ..]
fibm 0 = 0
fibm 1 = 1
fibm n = fibTab !! (n-2) + fibTab !! (n-1)
With this, the list implementation becomes much slower than the array one, as one would expect seeing memo-lookup is O(n) for lists:
$ ghc -fforce-recomp wtmpf-file4545.hs -O2 && ./wtmpf-file4545
[1 of 1] Compiling Main ( wtmpf-file4545.hs, wtmpf-file4545.o )
Linking wtmpf-file4545 ...
--------------EJECUTANDO BENCHMARK N=40------------------
benchmarking fibonacci/memoizedFib_list
time 10.47 μs (10.42 μs .. 10.51 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 10.40 μs (10.35 μs .. 10.44 μs)
std dev 163.3 ns (122.2 ns .. 225.8 ns)
variance introduced by outliers: 13% (moderately inflated)
benchmarking fibonacci/memoizedFib_array
time 1.618 μs (1.617 μs .. 1.620 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.620 μs (1.618 μs .. 1.623 μs)
std dev 7.521 ns (4.079 ns .. 12.48 ns)
benchmarking fibonacci/memoizedFib_vector
time 1.573 μs (1.572 μs .. 1.574 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.572 μs (1.571 μs .. 1.573 μs)
std dev 2.351 ns (1.417 ns .. 4.040 ns)
--------------EJECUTANDO BENCHMARK N=1500----------------
benchmarking fibonacci/memoizedFib_list
time 18.52 ms (18.41 ms .. 18.68 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 18.65 ms (18.53 ms .. 18.84 ms)
std dev 355.1 μs (204.8 μs .. 592.1 μs)
benchmarking fibonacci/memoizedFib_array
time 135.2 μs (131.2 μs .. 140.1 μs)
0.996 R² (0.991 R² .. 1.000 R²)
mean 132.7 μs (131.9 μs .. 135.0 μs)
std dev 4.463 μs (2.024 μs .. 8.327 μs)
variance introduced by outliers: 32% (moderately inflated)
benchmarking fibonacci/memoizedFib_vector
time 131.8 μs (130.6 μs .. 133.2 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 132.5 μs (131.4 μs .. 134.1 μs)
std dev 4.383 μs (3.463 μs .. 5.952 μs)
variance introduced by outliers: 31% (moderately inflated)
Vector which I also tested here performs yet a bit faster, but not really significantly. I think as soon as you use a container with O(1) lookup, the performance is dominated by the additions of the pretty huge numbers, so you're really benchmarking GMP rather than anything Haskell has to do with.
import qualified Data.Vector as V
memoizedFib_vector :: Int -> Integer
memoizedFib_vector n = fibTab V.! n
where fibTab = V.generate (n+1) mfib
mfib 0 = 0
mfib 1 = 1
mfib x = fibTab V.! (x - 1) + fibTab V.! (x - 2)
I have a data structure that I have loaded in from json that resembles the below
json_in =
[ Dict("customer" => "cust1", "transactions" => 1:10^6)
, Dict("customer" => "cust2", "transactions" => 1:10^6)
, Dict("customer" => "cust3", "transactions" => 1:10^6)]
I know of two methods to collapse the transactions into one array
#time methodA = reduce(vcat,[cust["transactions"] for cust in json_in])
#time methodB = vcat(json_in[1]["transactions"],json_in[2]["transactions"],json_in[3]["transactions"])
However the timing of methodA is ~0.22s vs ~0.02s for methodB on my computer. I intend to perform this thousands of times so 10x quicker performance is a big deal.
I see methodB is not very robust as it can only deal with 3 Dicts (customers) so even though it's performant it doesn't generalise.
What would be the most efficient way to concatenate arrays that are elements in an array of Dict efficiently?
As #Gnimuc states in his comment, you should not benchmark in global scope, and benchmarks are best done using BenchmarkTools.jl - here are the timings done right:
julia> methodA(json_in) = reduce(vcat,[cust["transactions"] for cust in json_in])
method1 (generic function with 1 method)
julia> methodB(json_in) = vcat(json_in[1]["transactions"],json_in[2]["transactions"],json_in[3]["transactions"])
method2 (generic function with 1 method)
#Gnimuc's syntax from his comment
julia> methodC(json_in) = mapreduce(x->x["transactions"], vcat, json_in)
method3 (generic function with 1 method)
julia> using BenchmarkTools
julia> #benchmark methodA(json_in)
BenchmarkTools.Trial:
memory estimate: 38.15 MiB
allocs estimate: 15
--------------
minimum time: 10.584 ms (3.10% GC)
median time: 14.781 ms (32.02% GC)
mean time: 15.112 ms (32.19% GC)
maximum time: 69.341 ms (85.28% GC)
--------------
samples: 331
evals/sample: 1
julia> #benchmark methodB(json_in)
BenchmarkTools.Trial:
memory estimate: 22.89 MiB
allocs estimate: 2
--------------
minimum time: 5.921 ms (5.92% GC)
median time: 8.402 ms (32.48% GC)
mean time: 8.701 ms (33.46% GC)
maximum time: 69.268 ms (91.09% GC)
--------------
samples: 574
evals/sample: 1
julia> #benchmark methodC(json_in)
BenchmarkTools.Trial:
memory estimate: 38.15 MiB
allocs estimate: 12
--------------
minimum time: 10.599 ms (3.37% GC)
median time: 14.843 ms (32.12% GC)
mean time: 15.228 ms (32.24% GC)
maximum time: 71.954 ms (85.95% GC)
--------------
samples: 328
evals/sample: 1
Method B is still like twice as fast. That is exactly because it is more specialized, on an array with exactly three elements.
An alternative solution that might work well here is to use a MappedArray, which creates a lazy view into the original array:
using MappedArrays
method4(json_in) = mappedarray(x->x["transactions"], json_in)
Of course this doesn't concatenate the arrays, but you can concatenate views using the CatView package:
using CatViews
julia> method5(json_in) = reduce(CatView, mappedarray(x->x["transactions"], json_in))
method5 (generic function with 1 method)
julia> #benchmark method5(json_in)
BenchmarkTools.Trial:
memory estimate: 1.73 KiB
allocs estimate: 46
--------------
minimum time: 23.320 μs (0.00% GC)
median time: 23.916 μs (0.00% GC)
mean time: 25.466 μs (0.00% GC)
maximum time: 179.092 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
Because it doesn't allocate it is like 300x faster than method B (but it's possible it's slower to use the result because of nonlocality - worth benchmarking).
Thanks for the help, after some research I came up with this idea to inline expand the code using macros, see code below, and it performs pretty well on the benchmarks (on Juliabox.com 21Sep2017)
macro inline_vcat(a)
quote
astr = $(string(a))
s = reduce(string, string(astr,"[",aa,"][\"transactions\"],") for aa in 1:length($a))
string("vcat(", s[1:(end-1)],")")
end
end
methodE(json_in) = (#inline_vcat json_in) |> parse |> eval
using BenchmarkTools
#benchmark methodE(json_in)
One shortcoming of this method is that if there are a large (~1million) customers in the JSON then the code generated will be long and parsing it would take a long time I assume well. Hence it's probably not a good idea for large datasets.
Anyone know how can I interpole a energy spectrum matrix linearrly spaced to a matrix where one of the axis is logarithimically spaced instead of linearly spaced?
The size of my energy spectrum matrix is 64x165. The original x axis represents the energy variation in terms of directions and the original y axis represents the energy variation in terms of frequencies. Both vectors are spaced linearly (the same interval between each vector position). I want to interpolate this matrix to a 24x25 format where the x axis (directions) continues linearly spaced (now a vector with 24 positions instead of 64) but the y axis (frequency) is not linearly spaced anymore; it is a vector with different intervals between positions (the interval between the position 2 and the position 1 is smaller than the interval between the position 3 and the position 2 of this vector... and so on up to position 25).
It is important to point out that all vectors (including the new frequency logarithmically spaced vector) are known (I don't wanna to generate them).
I tried the function interp2 and griddata. Both functions showed the same result, but this result is completely different from the original spectrum (what I would not expect to happen since I just did an interpolation). Anyone could help? I'm using Matlab 2011 for Windows.
Small example:
freq_input=[0.038592 0.042451 0.046311 0.05017 0.054029 0.057888 0.061747 0.065607 0.069466 0.073325]; %Linearly spaced
dir_input=[0 45 90 135 180 225 270 315]; %Linearly spaced
matrix_input=[0.004 0.006 1.31E-06 0.011 0.032 0.0007 0.010 0.013 0.001 0.008
0.007 0.0147 3.95E-05 0.023 0.142 0.003 0.022 0.022 0.003 0.017
0.0122 0.0312 0.0012 0.0351 0.285 0.024 0.048 0.036 0.015 0.036
0.0154 0.0530 0.0185 0.0381 0.242 0.102 0.089 0.058 0.060 0.075
0.0148 0.0661 0.1209 0.0345 0.095 0.219 0.132 0.087 0.188 0.140
0.0111 0.0618 0.2232 0.0382 0.027 0.233 0.156 0.119 0.370 0.187
0.0069 0.0470 0.1547 0.0534 0.010 0.157 0.154 0.147 0.436 0.168
0.0041 0.0334 0.0627 0.0646 0.009 0.096 0.136 0.163 0.313 0.112]; %8 lines (directions) and 10 columns (frequencies)
freq_output=[0.412E-01 0.453E-01 0.498E-01 0.548E-01 0.603E-01]; %Logarithimically spaced
dir_output=[0 45 90 135 180 225 270 315]; %The same as dir_input
After did a meshgrid with the freq_input and dir_input vectors, and a meshgrid using freq_output and dir_output, I tried interp2(freq_input,dir_input,matrix,freq_output,dir_output) and griddata(freq_input,dir_input,matrix,freq_output,dir_output) and the results seems wrong.
The course of action you described should work fine, so it's possible that you misinterpreted your results after interpolation when you said "the result seems wrong".
Here's what I mean, assuming your dummy data from the question:
% interpolate using griddata
matrix_output = griddata(freq_input,dir_input,matrix_input,freq_output.',dir_output);
% need 2d arrays later for scatter plotting the result
[freq_2d,dir_2d] = meshgrid(freq_output,dir_output);
figure;
% plot the original data
surf(freq_input,dir_input,matrix_input);
hold on;
scatter3(freq_2d(:),dir_2d(:),matrix_output(:),'rs');
The result shows the surface plot (based on the original input data) with red squares superimposed on it: the interpolated values
You can see that the linearly interpolated data values follow the bilinear surface drawn by surf perfectly (rotating the figure around in 3d makes this even more obvious). In other words, the interpolation and subsequent plotting is fine.