Create Enumerable In Place Slice Of Array in Ruby - arrays

I'm looking to find a way to take an array in ruby, two indices in that array and return an enumerable object which will yield, in order, all the elements between and including the two indices. But for performance reasons, I want to do this subject to the following two conditions:
This slice to enum does not create a copy of the subarray I want a return an enum over. This rules out array[i..j].to_enum, for example because array[i..j] is creating a new array.
It's not necessary to loop over the entire array to create the enum.
I'm wondering if there's a way to do this using the standard library's enumerable or array functionality without having to explicitly create my own custom enumerator.
What I'm looking for is a cleaner way to create the below enumerator:
def enum_slice(array, i, j)
Enumerator.new do |y|
while i <= j
y << array[i] # this is confusing syntax for yield (see here: https://ruby-doc.org/core-2.6/Enumerator.html#method-c-new)
i += 1
end
end
end

That seems pretty reasonable, and could even be turned into an extension to Array itself:
module EnumSlice
def enum_slice(i, j)
Enumerator.new do |y|
while i <= j
y << self[i]
i += 1
end
end
end
end
Now within the Enumerator block, y represents a Proc you call when you have more data. If that block ends it's presumed you're done enumerating. There's no requirement to ever terminate, an infinite Enumerator is allowed, and in that case it's up to the caller to stop iterating.
So in other words, the y block argument can be called zero or more times, and each time it's called output is "emitted" from the enumerator. When that block exits the enumerator is considered done and is closed out, y is invalid at that point.
All y << x does is call the << method on Enumerator::Yielder, which is a bit of syntactical sugar to avoid having to do y.call(x) or y[x], both of which look kind of ugly.
Now you can add this to Array:
Array.include(EnumSlice)
Where now you can do stuff like this:
[ 1, 2, 3, 4, 5, 6 ].enum_slice(2, 4).each do |v|
p v
end
Giving you the correct output.
It's worth noting that despite having gone through all this work, this really doesn't save you any time. There's already built-in methods for this. Your enum_slice(a, i, j) method is equivalent to:
a.drop(i).take(j)
Is that close in terms of performance? A a quick benchmark can help test that theory:
require 'benchmark'
Benchmark.bm do |bm|
count = 10000
a = (0..100_000).to_a
bm.report(:enum_slice) do
count.times do
a.enum_slice(50_000, 25_000).each do
end
end
end
bm.report(:drop_take) do
count.times do
a.drop(50_000).take(25_000).each do
end
end
end
end
The results are:
user system total real
enum_slice 0.020536 0.000200 0.020736 ( 0.020751)
drop_take 7.682218 0.019815 7.702033 ( 7.720876)
So your approach is about 374x faster. Not bad!

Related

Julia For-loops used for accessing 1D Arrays

I am trying to run a 2 for loops to access 2 elements within an array, (e.g.)
x = 100
for i in eachindex(x-1)
for j in 2:x
doSomething = Array[i] + Array[j]
end
end
And often (not always) I get this error or similar:
LoadError: BoundsError: attempt to access 36-element Array{Any,1} at index [64]
I understand there are proper ways to run these loops to avoid bounds errors, hence my use of eachindex(x-1) == 1:x, how would I do this for 2:x?
I am relatively new to Julia, if this is not the reason for the bounds error, what could it be?
- Thanks
EDIT: shortened version of what I am trying to run (Also yes, vector-arrays)
all_people = Vector{type_person}() # 1D Vector of type person
size = length(all_people)
... fill vector array to create an initial population of people ...
# Now add to the array using existing parent objects
for i in 1:size-1
for j in 2:size
if all_people[i].age >= all_people[j].age # oldest parent object forms child variable
child = type_person(all_people[i])
else
child = type_person(all_people[j])
end
push!(all_people, child) # add to the group of people
end
end
I made few guesses but perhaps this is the code you want:
struct Person
parent::Union{Nothing,Person}
age::Int
end
Person(parent::Person) = Person(parent,0)
N = 100
population = Person.(nothing, rand(20:50,N))
for i in 1:(N-1)
for j in (i+1):N
parent = population[population[i].age >= population[j].age ? i : j]
push!(population, Person(parent))
end
end
Notes:
For this types of code also have a look at the Parameters.jl package - it is very convenient for agent constructors
Notice how the constructor has been vectorized
Types in Julia are named starting with capital letters (and hence Person)
I assume that for children you wanted to try each pair of parents hence this is how to construct the loop. You do not need to evaluate the same pair of parents as (i,j) and then later as (j,i).
eachindex should be used on Arrays - does not make sense on scalars

Julia count() struct array from function

How can I get the following Julia code to work (counting adults in a house) using count() instead of the for loop?
mutable struct Person
age
end
mutable struct House
people::Array{Person}
end
function Adults(h::House)
numAdults = 0
for n in 1:length(h.people)
if h.people[n].age > 18; numAdults = numAdults + 1; end
end
numAdults
# count(h.people.age > 18, h.people) is there some variant of this that works?
end
p1 = Person(10)
p2 = Person(40)
h1 = House([p1, p2])
Adults(h1)
There's nothing wrong with a for loop in Julia! It's often just as fast (if not faster) than the equivalent "vectorized" version. That said, it can be nice to use higher order functions at times to make your code more concise. In this case, you want to pass an anonymous function to count that computes the comparison you want for a single element.
julia> f = (x->x.age > 18)
#7 (generic function with 1 method)
julia> f(p1)
false
julia> f(p2)
true
You can pass this to any of Julia's higher order functions and it'll apply it to each element as it does its operations:
julia> count(x->x.age > 18, h1.people)
1
julia> map(x->x.age > 18, h1.people)
2-element Array{Bool,1}:
0
1
julia> filter(x->x.age > 18, h1.people)
1-element Array{Person,1}:
Person(40)
(As an aside, you may want to ensure your struct fields are concretely typed for the best performance; that'll similarly affect performance for both the for loop and count.)
It's only syntactic sugar for an anonymous function, but you can use a do block:
function adults(h::House)
return count(h.people) do person
person.age > 18
end
end
Closer to what you wrote in the comment is
adults(h::House) = count(getproperty.(h.people, :age) .> 18)
But this is somewhat less readable (there's no sugar for property broadcasting), and construct an unnecessary intermediate array.
There's a somewhat intermediate form using a generator, which doesn't add excessive memory:
adults(h::House) = count(person.age > 18 for person in h.people)
This is probably what I'd go for.
Finally, let it be said that of all versions, the one you wrote is not really less idiomatic, and will most likely be the fastest of all in a micro-benchmark, although I'd write it like this:
function adults(h::House)
count = 0
for i in eachindex(h.people)
count += Int(h.people[i].age > 18)
end
return count
end
Finally finally: this function is a natural map-reduce task, opening more possibilities if you go for purely functional approaches (like using Transducers or #distributed for).

accessing boundaries of array without duplicating lots of code

When I try to compile my code using -fcheck=all I get a runtime error since it seems I step out of bounds of my array dimension size. It comes from the part of my code shown below. I think it is because my loops over i,j only run from -ny to ny, -nx to nx but I try to use points at i+1,j+1,i-1,j-1 which takes me out of bounds in my arrays. When the loop over j starts at -ny, it needs j-1, so it immediately takes me out of bounds since I'm trying to access -ny-1. Similarly when j=ny, i=-nx,nx.
My question is, how can I fix this problem efficiently using minimal code?
I need the array grad(1,i,j) correctly defined on the boundary, and it needs to be defined exactly as on the right hand side of the equality below, I just don't know an efficient way of doing this. I can explicitly define grad(1,nx,j), grad(1,-nx,j), etc, separately and only loop over i=-nx+1,nx-1,j=-ny+1,ny-1 but this causes lots of duplicated code and I have many of these arrays so I don't think this is the logical/efficient approach. If I do this, I just end up with hundreds of lines of duplicated code that makes it very hard to debug. Thanks.
integer :: i,j
integer, parameter :: nx = 50, ny = 50
complex, dimension (3,-nx:nx,-ny:ny) :: grad,psi
real, parameter :: h = 0.1
do j = -ny,ny
do i = -nx,nx
psi(1,i,j) = sin(i*h)+sin(j*h)
psi(2,i,j) = sin(i*h)+sin(j*h)
psi(3,i,j) = sin(i*h)+sin(j*h)
end do
end do
do j = -ny,ny
do i = -nx,nx
grad(1,i,j) = (psi(1,i+1,j)+psi(1,i-1,j)+psi(1,i,j+1)+psi(1,i,j-1)-4*psi(1,i,j))/h**2 &
- (psi(2,i+1,j)-psi(2,i,j))*psi(1,i,j)/h &
- (psi(3,i,j+1)-psi(3,i,j))*psi(1,i,j)/h &
- psi(2,i,j)*(psi(1,i+1,j)-psi(1,i,j))/h &
- psi(3,i,j)*(psi(1,i,j+1)-psi(1,i,j))/h
end do
end do
If I was to do this directly for grad(1,nx,j), grad(1,-nx,j), it would be given by
do j = -ny+1,ny-1
grad(1,nx,j) = (psi(1,nx,j)+psi(1,nx-2,j)+psi(1,nx,j+1)+psi(1,nx,j-1)-2*psi(1,nx-1,j)-2*psi(1,nx,j))/h**2 &
- (psi(2,nx,j)-psi(2,nx-1,j))*psi(1,nx,j)/h &
- (psi(3,nx,j+1)-psi(3,nx,j))*psi(1,nx,j)/h &
- psi(2,nx,j)*(psi(1,nx,j)-psi(1,nx-1,j))/h &
- psi(3,nx,j)*(psi(1,nx,j+1)-psi(1,nx,j))/h
grad(1,-nx,j) = (psi(1,-nx+2,j)+psi(1,-nx,j)+psi(1,-nx,j+1)+psi(1,-nx,j-1)-2*psi(1,-nx+1,j)-2*psi(1,-nx,j))/h**2 &
- (psi(2,-nx+1,j)-psi(2,-nx,j))*psi(1,-nx,j)/h &
- (psi(3,-nx,j+1)-psi(3,-nx,j))*psi(1,-nx,j)/h &
- psi(2,-nx,j)*(psi(1,-nx+1,j)-psi(1,-nx,j))/h &
- psi(3,-nx,j)*(psi(1,-nx,j+1)-psi(1,-nx,j))/h
end do
One possible way for you could be using an additional index variable for the boundaries, modified from the original index to avoid getting out-of-bounds. I mean something like this:
do j = -ny,ny
jj = max(min(j, ny-1), -ny+1)
do i = -nx,nx
ii = max(min(i, nx-1), -nx+1)
grad(1,i,j) = (psi(1,ii+1,j)+psi(1,ii-1,j)+psi(1,i,jj+1)+psi(1,i,jj-1)-4*psi(1,i,j))/h**2 &
- (psi(2,ii+1,j)-psi(2,ii,j))*psi(1,i,j)/h &
- (psi(3,i,jj+1)-psi(3,i,jj))*psi(1,i,j)/h &
- psi(2,i,j)*(psi(1,ii+1,j)-psi(1,ii,j))/h &
- psi(3,i,j)*(psi(1,i,jj+1)-psi(1,i,jj))/h
end do
end do
It's hard for me to write a proper code because it seems you trimmed part of the original expression in the code you presented in the question, but I hope you understand the idea and apply it correctly for your logic.
Opinions:
Even though this is what you are asking for (as far as I understand), I would not recommend doing this before profiling and checking if assigning the boundary conditions manually after a whole array operation wouldn't be more efficient, instead. Maybe those extra calculations on the indices on each iteration could impact on performance (arguably less than if conditionals or function calls). Using "ghost cells", as suggested by #evets, could be even more performant. You should profile and compare.
I'd recommend you declaring your arrays as dimension(-nx:nx,-ny:ny,3) instead. Fortran stores arrays in column-major order and, as you are accessing values on the neighborhood of the "x" and "y", they would be non-contiguous memory locations for a fixed "other" dimension is the leftest, and that could mean less cache-hits.
In somewhat pseudo-code, you can do
do j = -ny, ny
if (j == -ny) then
p1jm1 = XXXXX ! Some boundary condition
else
p1jm1 = psi(1,i,j-1)
end if
if (j == ny) then
p1jp1 = YYYYY ! Some other boundary condition
else
p1jp1 = psi(1,i,j+1)
end if
do i = -nx, ny
grad(1,i,j) = ... term involving p1jm1 ... term involving p1jp1 ...
...
end do
end do
The j-loop isn't bad in that you are adding 2*2*ny conditionals. The inner i-loop is adding 2*2*nx conditionals for each j iteration (or 2*2*ny * 2*2*nx conditional). Note, you need a temporary for each psi with the triplet indices are unique, ie., psi(1,i,j+1), psi(1,i,j-1), and psi(3,i,j+1).

Yielding a modified Ruby array to a block

I'm trying to turn 2 lines of ruby code into 1. For example:
def average(numbers)
result = numbers.compact
numbers.reduce(+) / numbers.length
end
I've been looking through array methods and can't find an appropriate one to turn this function into a one-liner. I had hoped something like this would work:
def average(numbers)
numbers.compact.<tap or another method> { |arr| arr.reduce(+) / arr.length }
end
Basically, I'm modifying the array (in the example I have to call compact to rid nil values), so I don't have access to the array variable, and I don't want an iterator, because I don't want to call reduce(+) and length on individual elements of the array.
Does anyone have an idea of methods I could look into?
I believe you mean for your method to be the following (reduce(:+), not reduce(+) and use result rather than numbers in the second line).
def average(numbers)
result = numbers.compact
result.reduce(:+) / result.length
end
average [1,2,3]
#=> 2
If you wish the average to be a float, change the second line to
result.reduce(0.0, :+) / result.length
There are various ways to combine the two lines of the method, but I don't prefer any of them to the above. Here are a few. (I don't see how Object#tap could be used here.)
numbers.compact.reduce(:+) / numbers.compact.length
(result = numbers.compact).reduce(:+) / result.compact.length
numbers.map(&:to_i).reduce(:+) / numbers.compact.length
Note that, even if numbers can be mutated, one cannot write
numbers.compact!.reduce(:+) / numbers.length
because numbers.compact! returns nil if numbers contains no nil elements.
In Ruby v2.4+ you can use Array#sum:
result.sum / result.length
You could change the way you call average
def average(numbers)
numbers.reduce(:+) / numbers.length
end
average(num_array.compact)

Are there any languages that have a do-until loop?

Is there any programming language that has a do-until loop?
Example:
do
{
<statements>
}
until (<condition>);
which is basically equivalent to:
do
{
<statements>
}
while (<negated condition>);
NOTE: I'm looking for post-test loops.
Ruby has until.
i=0
begin
puts i
i += 1
end until i==5
VBA!
Do-Until-Loop
Do-Loop-Until
Although I think quite a number of people here would doubt if it is a real language at all, but well, BASIC is how Microsoft started (quite weak argument for many, I know)...
It is possible in VB.Net
bExitFromLoop = False
Do
'Executes the following Statement
Loop Until bExitFromLoop
It is also possible in SDF-P on BS2000 (Fujitsu/Siemens Operating System)
/ DECLARE-VARIABLE A
/ DECLARE-VARIABLE SWITCH-1(TYPE=*BOOLEAN)
/ SET-VARIABLE A = 5
/ SET-VARIABLE SWITCH-1 = ON
/ REPEAT
/ A = A + 10
/ IF (A > 50)
/ SET-VARIABLE SWITCH-1 = OFF
/ END-IF
/ UNTIL (SWITCH-1 = OFF)
/ SHOW-VARIABLE A
A = 55
Is is also possible is C or C++ using a macro that define until
Example (definition):
#define until(cond) while(!(##cond))
Example (utilisation):
int i = 0;
do {
cout << i << "\n";
i++;
} until(i == 5);
In VB we can find something like:
Reponse = InputBox("Please Enter Pwd")
Do Until Reponse = "Bob-pwr148" ...
Eiffel offers you an until loop.
from
x := 1
until
x > 100
loop
...
end
There is also an "across" loop as well. Both are very powerful and expressive.
The design of this loop has more to offer. There are two more parts to its grammar that will help us resolve two important "correctness" problems.
Endless loop protection.
Iteration failure detection.
Endless Loop Protection
Let's modify our loop code a little by adding a loop variant.
from
x := 1
v := 1_000
until
x > 100
variant
v
loop
...
v := v - 1
end
The loop variant is (essentially) a count-down variable, but not just any old variable. By using the variant keyword, we are telling the compiler to pay attention to v. Specifically, the compiler is going to generate code that watchdogs the v variable for two conditions:
Does v decrease with each iteration of the loop (are we counting down). It does no good to try and use a count-down variable if it is (in fact) not counting down, right? If the loop variant is not counting down (decreasing by any amount), then we throw an exception.
Does v ever reach a condition of less than zero? If so, then we throw an exception.
Both of these work together through the compiler and variant variable to detect when and if our iterating loop fails to iterate or iterates too many times.
In the example above, our code is communicating to us a story that it expects to iterate zero to 1_000 times, but not more. If it is more, then we stop the loop, which leaves us to wonder: Do we really have cases were we iterate more than 1_000 times, or is there something wrong that our condition is failing to become True?
Loop Invariant
Now that we know what a loop variant is, we need to understand what a loop invariant is.
The invariant is a set of one or more Boolean conditions that must hold True after each iteration through the loop. Why do we want these?
Imagine you have 1_000_000 iterations and one of them fails. You don't have time to walk through each iteration, examining it to see it is okay or not. So, you create a set of one or more conditions that are tested upon completion of each iteration. If the one or all of the conditions fail, then you know precisely which iteration (and its deterministic state) is causing the problem!
The loop invariant might look something like:
from
x := 1
y := 0
v := 1_000
invariant
y = x - 1
until
x > 100
variant
v
loop
...
x := x + 1
y := y + 1
v := v - 1
end
In the example above, y is trailing x by 1. We expect that after each iteration, y will always be x - 1. So, we create a loop invariant using the invariant keyword that states our Boolean assertion. If y fails to be x - 1, the loop will immediately throw an exception and let us know precisely which iteration has failed to keep the assertion True.
CONCLUSION
Our loop is now quite tight and secure—well guarded against failure (bugs, errors).

Resources