Julia count() struct array from function - arrays

How can I get the following Julia code to work (counting adults in a house) using count() instead of the for loop?
mutable struct Person
age
end
mutable struct House
people::Array{Person}
end
function Adults(h::House)
numAdults = 0
for n in 1:length(h.people)
if h.people[n].age > 18; numAdults = numAdults + 1; end
end
numAdults
# count(h.people.age > 18, h.people) is there some variant of this that works?
end
p1 = Person(10)
p2 = Person(40)
h1 = House([p1, p2])
Adults(h1)

There's nothing wrong with a for loop in Julia! It's often just as fast (if not faster) than the equivalent "vectorized" version. That said, it can be nice to use higher order functions at times to make your code more concise. In this case, you want to pass an anonymous function to count that computes the comparison you want for a single element.
julia> f = (x->x.age > 18)
#7 (generic function with 1 method)
julia> f(p1)
false
julia> f(p2)
true
You can pass this to any of Julia's higher order functions and it'll apply it to each element as it does its operations:
julia> count(x->x.age > 18, h1.people)
1
julia> map(x->x.age > 18, h1.people)
2-element Array{Bool,1}:
0
1
julia> filter(x->x.age > 18, h1.people)
1-element Array{Person,1}:
Person(40)
(As an aside, you may want to ensure your struct fields are concretely typed for the best performance; that'll similarly affect performance for both the for loop and count.)

It's only syntactic sugar for an anonymous function, but you can use a do block:
function adults(h::House)
return count(h.people) do person
person.age > 18
end
end
Closer to what you wrote in the comment is
adults(h::House) = count(getproperty.(h.people, :age) .> 18)
But this is somewhat less readable (there's no sugar for property broadcasting), and construct an unnecessary intermediate array.
There's a somewhat intermediate form using a generator, which doesn't add excessive memory:
adults(h::House) = count(person.age > 18 for person in h.people)
This is probably what I'd go for.
Finally, let it be said that of all versions, the one you wrote is not really less idiomatic, and will most likely be the fastest of all in a micro-benchmark, although I'd write it like this:
function adults(h::House)
count = 0
for i in eachindex(h.people)
count += Int(h.people[i].age > 18)
end
return count
end
Finally finally: this function is a natural map-reduce task, opening more possibilities if you go for purely functional approaches (like using Transducers or #distributed for).

Related

Create Enumerable In Place Slice Of Array in Ruby

I'm looking to find a way to take an array in ruby, two indices in that array and return an enumerable object which will yield, in order, all the elements between and including the two indices. But for performance reasons, I want to do this subject to the following two conditions:
This slice to enum does not create a copy of the subarray I want a return an enum over. This rules out array[i..j].to_enum, for example because array[i..j] is creating a new array.
It's not necessary to loop over the entire array to create the enum.
I'm wondering if there's a way to do this using the standard library's enumerable or array functionality without having to explicitly create my own custom enumerator.
What I'm looking for is a cleaner way to create the below enumerator:
def enum_slice(array, i, j)
Enumerator.new do |y|
while i <= j
y << array[i] # this is confusing syntax for yield (see here: https://ruby-doc.org/core-2.6/Enumerator.html#method-c-new)
i += 1
end
end
end
That seems pretty reasonable, and could even be turned into an extension to Array itself:
module EnumSlice
def enum_slice(i, j)
Enumerator.new do |y|
while i <= j
y << self[i]
i += 1
end
end
end
end
Now within the Enumerator block, y represents a Proc you call when you have more data. If that block ends it's presumed you're done enumerating. There's no requirement to ever terminate, an infinite Enumerator is allowed, and in that case it's up to the caller to stop iterating.
So in other words, the y block argument can be called zero or more times, and each time it's called output is "emitted" from the enumerator. When that block exits the enumerator is considered done and is closed out, y is invalid at that point.
All y << x does is call the << method on Enumerator::Yielder, which is a bit of syntactical sugar to avoid having to do y.call(x) or y[x], both of which look kind of ugly.
Now you can add this to Array:
Array.include(EnumSlice)
Where now you can do stuff like this:
[ 1, 2, 3, 4, 5, 6 ].enum_slice(2, 4).each do |v|
p v
end
Giving you the correct output.
It's worth noting that despite having gone through all this work, this really doesn't save you any time. There's already built-in methods for this. Your enum_slice(a, i, j) method is equivalent to:
a.drop(i).take(j)
Is that close in terms of performance? A a quick benchmark can help test that theory:
require 'benchmark'
Benchmark.bm do |bm|
count = 10000
a = (0..100_000).to_a
bm.report(:enum_slice) do
count.times do
a.enum_slice(50_000, 25_000).each do
end
end
end
bm.report(:drop_take) do
count.times do
a.drop(50_000).take(25_000).each do
end
end
end
end
The results are:
user system total real
enum_slice 0.020536 0.000200 0.020736 ( 0.020751)
drop_take 7.682218 0.019815 7.702033 ( 7.720876)
So your approach is about 374x faster. Not bad!

Updating a static array, in a nested function without making a temporary array? <Julia>

I have been banging my head against a wall trying to use static arrays in julia.
https://github.com/JuliaArrays/StaticArrays.jl
They are fast but updating them is a pain. This is no surprise, they are meant to be immutable!
But it is continuously recommended to me that I use static arrays even though I have to update them. In my case, the static arrays are small, just length 3, and i have a vector of them, but I only update 1 length three SVector at a time.
Option 1
There is a really neat package called Setfield that allows you to do inplace updates of SVectors in Julia.
https://github.com/jw3126/Setfield.jl
The catch... it updates the local copy. So if you are in a nested function, it updates the local copy. So it comes with some book keeping since you have to inplace update the local copy and then return that copy and update the actual array of interest. You can't pass in your desired array and update it in place, at least, not that I can figure out! Now, I do not mind bookeeping, but I feel like updating a local copy, then returning the value, updating another local copy, and then returning the values and finally updating the actual array must come with a speed penalty. I could be wrong.
Option 2
It bugs me that in order to do an update a static array I must
exampleSVector::SVector{3,Float64} <-- just to make clear its type and size
exampleSVector = [value1, value2, value3]
This will update the desired array even if it is inside a function, which is nice and the goal, but if you do this inside a function it creates a temporary array. And this kills me because my function is in a loop that gets called 4+ million times, so this creates a ton of allocations and slows things down.
How do I update an SVector for the Option 2 scenario without creating a temporary array?
For the Option 1 scenario, can I update the actual array of interest rather than the local copy?
If this requires a simple example code, please say so in the comments, and I will make one. My thinking is that it is answerable without one, but I will make one if it is needed.
EDIT:
MCVE code - Option 1 works, option 2 does not.
using Setfield
using StaticArrays
struct Keep
dreaming::Vector{SVector{3,Float64}}
end
function INNER!(vec::SVector{3,Float64},pre::SVector{3,Float64})
# pretend series of calculations
for i = 1:3 # illustrate use of Setfield (used in real code for this)
pre = #set pre[i] = rand() * i * 1000
end
# more pretend calculations
x = 25.0 # assume more calculations equals x
################## OPTION 1 ########################
vec = #set vec = x * [ pre[1], pre[2], pre[3] ] # UNCOMMENT FOR FOR OPTION 1
return vec # UNCOMMENT FOR FOR OPTION 1
################## OPTION 2 ########################
#vec = x * [ pre[1], pre[2], pre[3] ] # UNCOMMENT FOR FOR OPTION 2
#nothing # UNCOMMENT FOR FOR OPTION 2
end
function OUTER!(always::Keep)
preAllocate = SVector{3}(0.0,0.0,0.0)
for i=1:length(always.dreaming)
always.dreaming[i] = INNER!(always.dreaming[i], preAllocate) # UNCOMMENT FOR FOR OPTION 1
#INNER!(always.dreaming[i], preAllocate) # UNCOMMENT FOR FOR OPTION 2
end
end
code = Keep([zero(SVector{3}) for i=1:5])
OUTER!(code)
println(code.dreaming)
I hope that I have understood your question correctly. It's a bit hard with a MWE like this, that does a lot of things that are mostly redundant and a bit confusing.
There seems to be two alternative interpretations here: Either you really need to update ('mutate') an SVector, but your MWE fails to demonstrate why. Or, you have convinced yourself that you need to mutate, but you actually don't.
I have decided to focus on alternative 2: You don't really need to 'mutate'. Rewriting your code from that point of view simplifies it greatly.
I couldn't find any reason for you to mutate any static vectors here, so I just removed that. The behaviour of the INNER! function with the inputs was very confusing. You provide two inputs but don't use either of them, so I removed those inputs.
function inner()
pre = #SVector [rand() * 1000i for i in 1:3]
x = 25
return pre .* x
end
function outer!(always::Keep)
always.dreaming .= inner.() # notice the dot in inner.()
end
code = Keep([zero(SVector{3}) for i in 1:5])
outer!(code)
display(code.dreaming)
This runs fast and with zero allocations. In general with StaticArrays, don't try to mutate things, just create new instances.
Even though it's not clear from your MWE, there may be some legitimate reason why you may want to 'mutate' an SVector. In that case you can use the setindex method of StaticArrays, you don't need Setfield.jl:
julia> v = rand(SVector{3})
3-element SArray{Tuple{3},Float64,1,3}:
0.4730258499237898
0.23658547518737905
0.9140206579322541
julia> v = setindex(v, -3.1, 2)
3-element SArray{Tuple{3},Float64,1,3}:
0.4730258499237898
-3.1
0.9140206579322541
To clarify: setindex (without a !) does not mutate its input, but creates a new instance with one index value changed.
If you really do need to 'mutate', perhaps you can make a new MWE that shows this. I would recommend that you try to simplify it a bit, because it is quite confusing now. For example, the inclusion of the type Keep seems entirely unnecessary and distracting. Just make a Vector of SVectors and show what you want to do with that.
Edit: Here's an attempt based on the comments below. As far as I understand it now, the question is about modifying a vector of SVectors. You cannot really mutate the SVectors, but you can replace them using a convenient syntax, setindex, where you can keep some of the elements and change some of the others:
oldvec = [zero(SVector{3}) for _ in 1:5]
replacevec = [rand(SVector{3}) for _ in 1:5]
Now we replace the second element of each element of oldvec with the corresponding one in replacevec. First a one-liner:
oldvec .= setindex.(oldvec, getindex.(replacevec, 2), 2)
Then an even faster one with a loop:
for i in eachindex(oldvec, replacevec)
#inbounds oldvec[i] = setindex(oldvec[i], replacevec[i][2], 2)
end
There are two types of static arrays - mutable (starting with M in type name) and immutable ones (starting with S) - just use the mutable ones! have a look at the example below:
julia> mut = MVector{3,Int64}(1:3);
julia> mut[1]=55
55
julia> mut
3-element MArray{Tuple{3},Int64,1,3}:
55
2
3
julia> immut = SVector{3,Int64}(1:3);
julia> inmut[1]=55
ERROR: setindex!(::SArray{Tuple{3},Int64,1,3}, value, ::Int) is not defined.
Let us see some simple benchmark (ordinary array, vs mutable static vs immutable static):
using BenchmarkTools
julia> ord = [1,2,3];
julia> #btime $ord.*$ord;
39.680 ns (1 allocation: 112 bytes)
3-element Array{Int64,1}:
1
4
9
julia> #btime $mut.*$mut
8.533 ns (1 allocation: 32 bytes)
3-element MArray{Tuple{3},Int64,1,3}:
3025
4
9
julia> #btime $immut.*$immut
2.133 ns (0 allocations: 0 bytes)
3-element SArray{Tuple{3},Int64,1,3}:
1
4
9

merge two sorted arrays in julia

Is there a neat function in julia which will merge two sorted arrays and return the sorted array for me? I have written:
c=1
p=1
i=1
n=length(tc)+length(tp)
t=Array{Float64}(n)
while(c<=length(tc) && p<=length(tp))
if(tp[p]<tc[c])
t[i]=tp[p]
p=p+1;
i=i+1;
else
t[i]=tc[c]
c=c+1;
i=i+1;
end
end
while(p<=length(tp))
t[i]=tp[p]
i=i+1
p=p+1
end
while(c<=length(tc))
t[i]=tc[c]
i=i+1
c=c+1
end
but is there no native function in base julia to do this?
Contrary to the other answers, there is in fact a method to do this in base Julia. BUT, it only works for arrays of integers, AND it will only work if the arrays are unique (in the sense that no integer is repeated in either array). Simply use the IntSet type as follows:
a = [2, 3, 4, 8]
b = [1, 5]
union(IntSet(a), IntSet(b))
If you run the above code, you'll note that the union function removes duplicates from the output, which is why I stated initially that your arrays must be unique (or else you must be happy to have duplicates removed in the output). You'll also notice that the union operation on the IntSet works much faster than union on a sorted Vector{Int}, since the former exploits the fact that an IntSet is pre-sorted.
Of course, the above is not really in the spirit of the question, which more concerns a solution for any type for which the lt operator is defined, as well as allowing for duplicates.
Here is a function that efficiently finds the union of two pre-sorted unique vectors. I've never had a need for the non-unique case myself so have not written a function that covers that case I'm afraid:
"union <- Return the union of the inputs as a new sorted vector"
function union_vec(x::Vector{T}, y::Vector{T})::Vector{T} where {T}
(nx, ny) = (1, 1)
z = T[]
while nx <= length(x) && ny <= length(y)
if x[nx] < y[ny]
push!(z, x[nx])
nx += 1
elseif y[ny] < x[nx]
push!(z, y[ny])
ny += 1
else
push!(z, x[nx])
nx += 1
ny += 1
end
end
if nx <= length(x)
[ push!(z, x[n]) for n = nx:length(x) ]
elseif ny <= length(y)
[ push!(z, y[n]) for n = ny:length(y) ]
end
return z
end
Another option is to look at sorted dictionaries, available in the DataStructures.jl package. I haven't done it myself, but a method that just inserts all observations into a sorted dictionary (checking for key duplication as you go) and then iterates over (keys, values) should also be a fairly efficient way to attack this problem.
Although an explicit function to merge two sorted vectors seems to be missing, one can be constructed easily from the existing building blocks (the question actually demonstrated this, but it doesn't define a function).
The following method tries to leverage the existing sort code and still remain efficient.
In code:
mergesorted(a,b) = sort!(vcat(a,b))
The following is an example:
julia> a = [1:2:11...];
julia> b = [2:3:20...];
julia> show(a)
[1,3,5,7,9,11]
julia> show(b)
[2,5,8,11,14,17,20]
julia> show(mergesorted(a,b))
[1,2,3,5,5,7,8,9,11,11,14,17,20]
I didn't benchmark the function, but QuickSort (the default sort algorithm) is usually good performing on pre-sorted arrays, so it should be OK and the allocation of a result vector is required in any implementation.
I keep coming across this in different projects, so I made a package MergeSorted (https://github.com/vvjn/MergeSorted.jl). You can use it as follows.
using MergeSorted
a = sort!(rand(1000))
b = sort!(rand(1000))
c = mergesorted(a,b)
sort!(vcat(a,b)) == c
Or without allocating new memory.
mergesorted!(c, a, b)
You can also use all of the sort options.
a = sort!(rand(1000), order=Base.Reverse)
b = sort!(rand(1000), order=Base.Reverse)
c = mergesorted(a,b, order=Base.Reverse)
sort!(vcat(a,b), order=Base.Reverse) == c
It's around 4-6 times faster than sort!(vcat(a,b)), which uses QuickSort by default, and twice as fast as sort!(vcat(a,b), alg=MergeSort) but MergeSort uses more memory.
No, such function does not exist. And actually I have not seen a language which has such function out of the box.
To do this, you have to maintain two pointers in each of the arrays, compare the values and move the smaller (based on what I see, this is exactly what you do).

Basic operations combining two SharedArrays

I've spent the last month or so learning julia and I'm very impressed. In particular I'm analysing large amount of climate model output, I put all this into SharedArrays and adjust and plot it all in parallel. So far it's very quick and efficient and I've got quite a library of code. My current problem is in creating a function that can do basic operations on two shared arrays. I've successfully written a function that takes two arrays and how you want to process them. The code is based around the example in the parallel section of the julia doc and uses the myrange function as shown there
function myrange(q::SharedArray)
idx = indexpids(q)
##show (idx)
if idx == 0
# This worker is not assigned a piece
return 1:0, 1:0
print("NO WORKERS ASSIGNED")
end
nchunks = length(procs(q))
splits = [round(Int, s) for s in linspace(0,length(q),nchunks+1)]
splits[idx]+1:splits[idx+1]
end
function combine_arrays_chunk!(array_1,array_2,output_array,func, length_range);
##show (length_range)
for i in length_range
output_array[i] = func(array_1[i], array_2[i]);
#hardwired example for func = +
#output_array[i] = +(array_1[i], array_2[i]);
end
output_array
end
combine_arrays_shared_chunk!(array_1,array_2,output_array,func) = combine_arrays_chunk!(array_1,array_2,output_array,func, myrange(array_1));
function combine_arrays_shared(array_1::SharedArray,array_2::SharedArray,func)
if size(array_1)!=size(array_2)
return print("inputs not of the same size")
end
output_array=SharedArray(Float64,size(array_1));
#sync begin
for p in procs(array_1)
#async remotecall_wait(p, combine_arrays_shared_chunk!, array_1,array_2,output_array,func)
end
end
output_array
end
The works so one can do
strain_div = combine_arrays_shared(eps_1,eps_2,+);
strain_tot = combine_arrays_shared(eps_1,eps_2,hypot);
with the correct results an the output as a shared array as required. But ... it's quite slow. It's actually quicker to combine the sharedarray as a normal array on one processor, calculate and then convert back to a sharedarray (for my test cases anyway, with each array approx 200MB, when I move up to GBs I guess not). I can hardwire the combine_arrays_shared function to only do addition (or some other function), and then you get the speed increase, but with function type being passed within combine_arrays_shared the whole thing is slow (10 times slower than the hard wired addition).
I've looked at the FastAnonymous.jl package but I can't see how it would work in this case. I tried, and failed. Any ideas?
I might just resort to writing a different combine_arrays_... function for each basic function I use, or having the func argument as a option and call different functions from within combine_arrays_shared, but I want it to be more elegant! Also this is good way to learn more about Julia.
Harry
This question actually has nothing to do with SharedArrays, and is just "how do I pass functions-as-arguments and get better performance?"
The way FastAnonymous works---and similar to the way closures will work in julia soon---is to create a type with a call method. If you're having trouble with FastAnonymous for some reason, you can always do it manually:
julia> immutable Foo end
julia> Base.call(f::Foo, x, y) = x*y
call (generic function with 1036 methods)
julia> function applyf(f, X)
s = zero(eltype(X))
for x in X
s += f(x, x)
end
s
end
applyf (generic function with 1 method)
julia> X = rand(10^6);
julia> f = Foo()
Foo()
# Run the function once with each type of argument to JIT-compile
julia> applyf(f, X)
333375.63216645207
julia> applyf(*, X)
333375.63216645207
# Compile anything used by #time
julia> #time 1
0.000004 seconds (148 allocations: 10.151 KB)
1
# Now let's benchmark
julia> #time applyf(f, X)
0.002860 seconds (5 allocations: 176 bytes)
333433.439233112
julia> #time applyf(*, X)
0.142411 seconds (4.00 M allocations: 61.035 MB, 19.24% gc time)
333433.439233112
Note the big increase in speed and greatly-reduced memory consumption.

Dynamically creating and naming an array

Consider the following code snippet
for i = 1:100
Yi= x(i:i + 3); % i in Yi is not an index but subscript,
% x is some array having sufficient values
i = i + 3
end
Basically I want that each time the for loop runs the subscript changes from 1 to 2, 3, ..., 100. SO in effect after 100 iterations I will be having 100 arrays, starting with Y1 to Y100.
What could be the simplest way to implement this in MATLAB?
UPDATE
This is to be run 15 times
Y1 = 64;
fft_x = 2 * abs(Y1(5));
For simplicity I have taken constant inputs.
Now I am trying to use cell based on Marc's answer:
Y1 = cell(15,1);
fft_x = cell(15,1);
for i = 1:15
Y1{i,1} = 64;
fft_x{i,1} = 2 * abs(Y1(5));
end
I think I need to do some changes in abs(). Please suggest.
It is impossible to make variably-named variables in matlab. The common solution is to use a cell array for Y:
Y=cell(100,1);
for i =1:100
Y{i,1}= x(i:i+3);
i=i+3;
end
Note that the line i=i+3 inside the for-loop has no effect. You can just remove it.
Y=cell(100,1);
for i =1:100
Y{i,1}= x(i:i+3);
end
It is possible to make variably-named variables in matlab. If you really want this do something like this:
for i = 1:4:100
eval(['Y', num2str((i+3)/4), '=x(i:i+3);']);
end
How you organize your indexing depends on what you plan to do with x of course...
Yes, you can dynamically name variables. However, it's almost never a good idea and there are much better/safer/faster alternatives, e.g. cell arrays as demonstrated by #Marc Claesen.
Look at the assignin function (and the related eval). You could do what asked for with:
for i = 1:100
assignin('caller',['Y' int2str(i)],rand(1,i))
end
Another related function is genvarname. Don't use these unless you really need them.

Resources