Spliting arrays at specific sizes - arrays

Let's say i have an array with 10 elments:
arr = [1,2,3,4,5,6,7,8,9,10]
Then I want to define a function that takes this arr as parameter to perform
a calculation, let's say for this example the calculation is the difference of means, for example:
If N=2 (That means group the elements of arr in groups of size 2 sequentially):
results=[]
result_1 = 1+2/2 - 3+4/2
result_2 = 3+4/2 - 5+6/2
result_3 = 5+6/2 - 7+8/2
result_4 = 7+8/2 - 9+10/2
The output would be:
results = [-2,-2,-2,-2]
If N=3 (That means group the elements of arr in groups of size 3 sequentially):
results=[]
result_1 = 1+2+3/3 - 4+5+6/3
result_2 = 4+5+6/3 - 7+8+9/3
The output would be:
results = [-3,-3]
I want to do this defining two functions:
Function 1 - Creates the arrays that will be used as input for 2nd function:
Parameters: array, N
returns: k groups of arrays -> seems to be ((length(arr)/N) - 1)
Function 2 - Will be the fucntion that gets the arrays (2 by 2) and perfoms the calculations, in this case, difference of means.
Parameters: array1,array2....arr..arr..
returns: list of the results
Important Note
My idea is to apply these fucntions to a stream of data and the calculation will be the PSI (Population Stability Index)
So, if my stream has 10k samples and I set the first function to N=1000, then the output to the second function will be 1k samples + next 1k samples.
The process will be repetead till the end of the datastream
I was trying to do this in python (I already have the PSI code ready) but now I decided to use Julia for it, but I am pretty new to Julia. So, if anyone can give me some light here will be very helpfull.

In Julia if you have a big Vector and you want to calculate some statistics on groups of 3 elements you could do:
julia> a = collect(1:15); #creates a Vector [1,2,...,15]
julia> mean.(eachcol(reshape(a,3,length(a)÷3)))
5-element Vector{Float64}:
2.0
5.0
8.0
11.0
14.0
Note that both reshape and eachcol are non-allocating so no data gets copied in the process.
If the length of a is not divisible by 3 you could truncate it before reshaping - to avoid allocation use view for that:
julia> a = collect(1:16);
julia> mean.(eachcol(reshape(view(a,1:(length(a)÷3)*3),3,length(a)÷3)))
5-element Vector{Float64}:
2.0
5.0
8.0
11.0
14.0
Depending on what you actually want to do you might also want to take a look at OnlineStats.jl https://github.com/joshday/OnlineStats.jl

Well, I use JavaScript instead Of Python, But it would be same thing in python...
You need a chunks function, that take array and chunk_size (N), lets say chunks([1,2,3,4], 2) -> [[1,2], [3,4]]
we have a sum method that add all element in array, sum([1,2]) -> 3;
Both JavaScript and python support corouting, that you can use for lazy evaluation, And its called Generator function, this type of function can pause its execution, and can resume on demand! This is useful for calculate stream of data.
let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Javascript Doesn't support `chunks` method yet, So we need to create one...
Array.prototype.chunks = function* (N) {
let chunk = [];
for (let value of this) {
chunk.push(value)
if (chunk.length == N) {
yield chunk;
chunk = []
}
}
}
Array.prototype.sum = function () {
return this.reduce((a, b) => a + b)
}
function* fnName(arr, N) {
let chunks = arr.chunks(N);
let a = chunks.next().value.sum();
for (let b of chunks) {
yield (a / N) - ((a = b.sum()) / N)
}
}
console.log([...fnName(arr, 2)])
console.log([...fnName(arr, 3)])

Related

Error in swift playground: Cannot assign value of type 'Int' to type '[Int]'

var number: [Int] = [1,2,3,4]
var newArray: [Int] = []
for i in 0...number.count-1{
newArray = number[i] * number[i+1]
}
print(newArray)
I want output like this: [1 * 2, 2 * 3, 3 * 4].
I just don't get it where is the problem...
var number: [Int] = [1,2,3,4]
let things = zip(number, number.dropFirst()).map(*)
Whenever you need to turn something like [1, 2, 3, 4] into pairs (1, 2), (2, 3) etc, then the AdjacentPairs method is useful - in Swift Algorithms package - https://github.com/apple/swift-algorithms/blob/main/Sources/Algorithms/AdjacentPairs.swift
Or you can zip a collection with its dropFirst for the same result.
And whenever you need to turn an [A]s into an [B]s then map with a function that turns As into Bs. So in this example you want to turn an array of tuples of Int, like [(1,2), (2,3), (3,4)] into array of Int, like [2, 6, 12] by multiplying the 2 Ints together, so map with *
The benefit of writing it this way is you would avoid the issues with your array mutation, getting index values wrong, running off the ends of arrays etc, and it's often easier to read and think about if you express it without the indices and assignments.
The problem that the compiler flags you is that you assign a single Int value to an array of Int. The following line will resolve that immediate issue:
newArray.append(number[i] * number[i+1])
This should pass compilation but then create a runtime error at execution. The reason is that when you try to fetch number[i+1] when i == number.count-1, you actually fetch number[number.count]. This entry does not exist with 0-based indices. To get 3 sums out of 4 array entries, your loop should iterate 3 times:
for i in 0 ..< number.count-1 {
Or, if you prefer closed ranges:
for i in 0 ... number.count-2 {
A more Swifty way would be to use map, as #Dris suggested. The return type for map is implicitly given by the result of the multiplication, so you can write:
let newArray = number.indices.dropLast().map { i in
number[i] * number[i+1]
}
You can use map()
let numbers = [1,2,3,4]
let newArray = numbers.enumerated().map { $1 * numbers[($0 + 1) % numbers.count] }
May be you should not loop to count-1 but stop before and add result to array :
for i in 0..<number.count-1 {
newArray.append(number[i] * number[i+1])
}

Array subsetting in Julia

With the Julia Language, I defined a function to sample points uniformly inside the sphere of radius 3.14 using rejection sampling as follows:
function spherical_sample(N::Int64)
# generate N points uniformly distributed inside sphere
# using rejection sampling:
points = pi*(2*rand(5*N,3).-1.0)
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
return points[ind_,:][1:N,:]
end
I found a hack for subsetting arrays:
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
But, in principle array indexing should be a one-liner. How could I do this better in Julia?
The problem is that you are creating a 2-dimensional index vector. You can avoid it by using eachrow:
ind = sum.(eachrow(points.^2)) .<= pi^2
So that your full answer would be:
function spherical_sample(N::Int64)
points = pi*(2*rand(5*N,3).-1.0)
ind = sum.(eachrow(points.^2)) .<= pi^2
return points[ind,:][1:N,:]
end
Here is a one-liner:
points[(sum(points.^2,dims=2) .<= pi^2)[:],:][1:N, :]
Note that [:] is dropping a dimension so the BitArray can be used for indexing.
This does not answer your question directly (as you already got two suggestions), but I rather thought to hint how you could implement the whole procedure differently if you want it to be efficient.
The first point is to avoid generating 5*N rows of data - the problem is that it is very likely that it will be not enough to generate N valid samples. The point is that the probability of a valid sample in your model is ~50%, so it is possible that there will not be enough points to choose from and [1:N, :] selection will throw an error.
Below is the code I would use that avoids this problem:
function spherical_sample(N::Integer) # no need to require Int64 only here
points = 2 .* pi .* rand(N, 3) .- 1.0 # note that all operations are vectorized to avoid excessive allocations
while N > 0 # we will run the code until we have N valid rows
v = #view points[N, :] # use view to avoid allocating
if sum(x -> x^2, v) <= pi^2 # sum accepts a transformation function as a first argument
N -= 1 # row is valid - move to the previous one
else
rand!(v) # row is invalid - resample it in place
#. v = 2 * pi * v - 1.0 # again - do the computation in place via broadcasting
end
end
return points
end
This one is pretty fast, and uses StaticArrays. You can probably also implement something similar with ordinary tuples:
using StaticArrays
function sphsample(N)
T = SVector{3, Float64}
v = Vector{T}(undef, N)
n = 1
while n <= N
p = rand(T) .- 0.5
#inbounds v[n] = p .* 2π
n += (sum(abs2, p) <= 0.25)
end
return v
end
On my laptop it is ~9x faster than the solution with views.

In MATLAB how can I write out a multidimensional array as a string that looks like a raw numpy array?

The Goal
(Forgive me for length of this, it's mostly background and detail.)
I'm contributing to a TOML encoder/decoder for MATLAB and I'm working with numerical arrays right now. I want to input (and then be able to write out) the numerical array in the same format. This format is the nested square-bracket format that is used by numpy.array. For example, to make multi-dimensional arrays in numpy:
The following is in python, just to be clear. It is a useful example though my work is in MATLAB.
2D arrays
>> x = np.array([1,2])
>> x
array([1, 2])
>> x = np.array([[1],[2]])
>> x
array([[1],
[2]])
3D array
>> x = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>> x
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
4D array
>> x = np.array([[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]])
>> x
array([[[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]]],
[[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]]])
The input is a logical construction of the dimensions by nested brackets. Turns out this works pretty well with the TOML array structure. I can already successfully parse and decode any size/any dimension numeric array with this format from TOML to MATLAB numerical array data type.
Now, I want to encode that MATLAB numerical array back into this char/string structure to write back out to TOML (or whatever string).
So I have the following 4D array in MATLAB (same 4D array as with numpy):
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4])
x(:,:,1,1) =
1 2
3 4
x(:,:,2,1) =
5 6
7 8
x(:,:,1,2) =
9 10
11 12
x(:,:,2,2) =
13 14
15 16
And I want to turn that into a string that has the same format as the 4D numpy input (with some function named bracketarray or something):
>> str = bracketarray(x)
str =
'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'
I can then write out the string to a file.
EDIT: I should add, that the function numpy.array2string() basically does exactly what I want, though it adds some other whitespace characters. But I can't use that as part of the solution, though it is basically the functionality I'm looking for.
The Problem
Here's my problem. I have successfully solved this problem for up to 3 dimensions using the following function, but I cannot for the life of me figure out how to extend it to N-dimensions. I feel like it's an issue of the right kind of counting for each dimension, making sure to not skip any and to nest the brackets correctly.
Current bracketarray.m that works up to 3D
function out = bracketarray(in, internal)
in_size = size(in);
in_dims = ndims(in);
% if array has only 2 dimensions, create the string
if in_dims == 2
storage = cell(in_size(1), 1);
for jj = 1:in_size(1)
storage{jj} = strcat('[', strjoin(split(num2str(in(jj, :)))', ','), ']');
end
if exist('internal', 'var') || in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(storage, ','), ']')};
else
out = storage;
end
return
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
else
out = cell(in_size(end), 1);
for ii = 1:in_size(end) %<--- this doesn't track dimensions or counts of them
out(ii) = bracketarray(in(:,:,ii), 'internal'); %<--- this is limited to 3 dimensions atm. and out(indexing) need help
end
end
% bracket the final bit together
if in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(out, ','), ']')};
end
end
Help me Obi-wan Kenobis, y'all are my only hope!
EDIT 2: Added test suite below and modified current code a bit.
Test Suite
Here is a test suite to use to see if the output is what it should be. Basically just copy and paste it into the MATLAB command window. For my current posted code, they all return true except the ones more than 3D. My current code outputs as a cell. If your solution output differently (like a string), then you'll have to remove the curly brackets from the test suite.
isequal(bracketarray(ones(1,1)), {'[1]'})
isequal(bracketarray(ones(2,1)), {'[[1],[1]]'})
isequal(bracketarray(ones(1,2)), {'[1,1]'})
isequal(bracketarray(ones(2,2)), {'[[1,1],[1,1]]'})
isequal(bracketarray(ones(3,2)), {'[[1,1],[1,1],[1,1]]'})
isequal(bracketarray(ones(2,3)), {'[[1,1,1],[1,1,1]]'})
isequal(bracketarray(ones(1,1,2)), {'[[[1]],[[1]]]'})
isequal(bracketarray(ones(2,1,2)), {'[[[1],[1]],[[1],[1]]]'})
isequal(bracketarray(ones(1,2,2)), {'[[[1,1]],[[1,1]]]'})
isequal(bracketarray(ones(2,2,2)), {'[[[1,1],[1,1]],[[1,1],[1,1]]]'})
isequal(bracketarray(ones(1,1,1,2)), {'[[[[1]]],[[[1]]]]'})
isequal(bracketarray(ones(2,1,1,2)), {'[[[[1],[1]]],[[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,1,2)), {'[[[[1,1]]],[[[1,1]]]]'})
isequal(bracketarray(ones(1,1,2,2)), {'[[[[1]],[[1]]],[[[1]],[[1]]]]'})
isequal(bracketarray(ones(2,1,2,2)), {'[[[[1],[1]],[[1],[1]]],[[[1],[1]],[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,2,2)), {'[[[[1,1]],[[1,1]]],[[[1,1]],[[1,1]]]]'})
isequal(bracketarray(ones(2,2,2,2)), {'[[[[1,1],[1,1]],[[1,1],[1,1]]],[[[1,1],[1,1]],[[1,1],[1,1]]]]'})
isequal(bracketarray(permute(reshape([1:16],2,2,2,2),[2,1,3,4])), {'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'})
isequal(bracketarray(ones(1,1,1,1,2)), {'[[[[[1]]]],[[[[1]]]]]'})
I think it would be easier to just loop and use join. Your test cases pass.
function out = bracketarray_matlabbit(in)
out = permute(in, [2 1 3:ndims(in)]);
out = string(out);
dimsToCat = ndims(out);
if iscolumn(out)
dimsToCat = dimsToCat-1;
end
for i = 1:dimsToCat
out = "[" + join(out, ",", i) + "]";
end
end
This also seems to be faster than the route you were pursing:
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4]);
>> tic; for i = 1:1e4; bracketarray_matlabbit(x); end; toc
Elapsed time is 0.187955 seconds.
>> tic; for i = 1:1e4; bracketarray_cris_luengo(x); end; toc
Elapsed time is 5.859952 seconds.
The recursive function is almost complete. What is missing is a way to index the last dimension. There are several ways to do this, the neatest, I find, is as follows:
n = ndims(x);
index = cell(n-1, 1);
index(:) = {':'};
y = x(index{:}, ii);
It's a little tricky at first, but this is what happens: index is a set of n-1 strings ':'. index{:} is a comma-separated list of these strings. When we index x(index{:},ii) we actually do x(:,:,:,ii) (if n is 4).
The completed recursive function is:
function out = bracketarray(in)
n = ndims(in);
if n == 2
% Fill in your n==2 code here
else
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
index = cell(n-1, 1);
index(:) = {':'};
storage = cell(size(in, n), 1);
for ii = 1:size(in, n)
storage(ii) = bracketarray(in(index{:}, ii)); % last dimension automatically removed
end
end
out = { strcat('[', strjoin(storage, ','), ']') };
Note that I have preallocated the storage cell array, to prevent it from being resized in every loop iteration. You should do the same in your 2D case code. Preallocating is important in MATLAB for performance reasons, and the MATLAB Editor should warm you about this too.

How to get average of values in array between two given indexes in Swift

I'm trying to get the average of the values between two indexes in an array. The solution I first came to reduces the array to the required range, before taking the sum of values divided by the number of values. A simplified version looks like this:
let array = [0, 2, 4, 6, 8, 10, 12]
// The aim is to take the average of the values between array[n] and array[.count - 1].
I attempted with the following code:
func avgOf(x: Int) throws -> String {
let avgforx = solveList.count - x
// Error handling to check if x in average of x does not overstep bounds
guard avgforx > 0 else {
throw FuncError.avgNotPossible
}
solveList.removeSubrange(ClosedRange(uncheckedBounds: (lower: 0, upper: avgforx - 1)))
let avgx = (solveList.reduce(0, +)) / Double(x)
// Rounding
let roundedAvgOfX = (avgx * 1000).rounded() / 1000
print(roundedAvgOfX)
return "\(roundedAvgOfX)"
}
where avgforx is used to represent the lower bound :
array[(.count - 1) - x])
The guard statement makes sure that if the index is out of range, the error is handled properly.
solveList.removeSubrange was my initial solution, as it removes the values outside of the needed index range (and subsequently delivers the needed result), but this has proved to be problematic as the values not taken in the average should remain.
The line in removeSubrange basically takes a needed index field (e.g. array[5] to array[10]), removes all the values from array[0] to array[4], and then takes the sum of the resulting array divided by the number of elements.
Instead, the values in array[0] to array[4] should remain.
I would appreciate any help.
(Swift 4, Xcode 10)
Apart from the fact that the original array is modified, the error in your code is that it divides the sum of the remaining elements by the count of the removed elements (x) instead of dividing by the count of remaining elements.
A better approach might be to define a function which computes the average of a collection of integers:
func average<C: Collection>(of c: C) -> Double where C.Element == Int {
precondition(!c.isEmpty, "Cannot compute average of empty collection")
return Double(c.reduce(0, +))/Double(c.count)
}
Now you can use that with slices, without modifying the original array:
let array = [0, 2, 4, 6, 8, 10, 12]
let avg1 = average(of: array[3...]) // Average from index 3 to the end
let avg2 = average(of: array[2...4]) // Average from index 2 to 4
let avg3 = average(of: array[..<5]) // Average of first 5 elements

matlab maximum of array with unknown dimension

I would like to compute the maximum and, more importantly, its coordinates of an N-by-N...by-N array, without specifying its dimensions.
For example, let's take:
A = [2 3];
B = [2 3; 3 4];
The function (lets call it MAXI) should return the following values for matrix A:
[fmax, coor] = MAXI(A)
fmax =
3
coor =
2
and for matrix B:
[fmax, coor] = MAXI(B)
fmax =
4
coor=
2 2
The main problem is not to develop a code that works for one class in particular, but to develop a code that as quickly as possible works for any input (with higher dimensions).
To find the absolute maximum, you'll have to convert your input matrix into a column vector first and find the linear index of the greatest element, and then convert it to the coordinates with ind2sub. This can be a little bit tricky though, because ind2sub requires specifying a known number of output variables. For that purpose we can employ cell arrays and comma-separated lists, like so:
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
EDIT: I've added an additional if statement that checks if the input is a matrix or a vector, and in case of the latter it returns the linear index itself as is.
In a function, it looks like so:
function [fmax, coor] = maxi(x)
[fmax, coor] = max(A(:));
if ismatrix(A)
C = cell(1:ndims(A));
[C{:}] = ind2sub(size(A), coor);
coor = cell2mat(C);
end
Example
A = [2 3; 3 4];
[fmax, coor] = maxi(A)
fmax =
4
coor =
2 2

Resources