Optimizing custom fill of a 2d array in Julia - arrays

I'm a little new to Julia and am trying to use the fill! method to improve code performance on Julia. Currently, I read a 2d array from a file say read_array and perform row-operations on it to get a processed_array as follows:
function preprocess(matrix)
# Initialise
processed_array= Array{Float64,2}(undef, size(matrix));
#first row of processed_array is the difference of first two row of matrix
processed_array[1,:] = (matrix[2,:] .- matrix[1,:]) ;
#last row of processed_array is difference of last two rows of matrix
processed_array[end,:] = (matrix[end,:] .- matrix[end-1,:]);
#all other rows of processed_array is the mean-difference of other two rows
processed_array[2:end-1,:] = (matrix[3:end,:] .- matrix[1:end-2,:]) .*0.5 ;
return processed_array
end
However, when I try using the fill! method I get a MethodError.
processed_array = copy(matrix)
fill!(processed_array [1,:],d[2,:]-d[1,:])
MethodError: Cannot convert an object of type Matrix{Float64} to an object of type Float64
I'll be glad if someone can tell me what I'm missing and also suggest a method to optimize the code. Thanks in advance!

fill!(A, x) is used to fill the array A with a unique value x, so it's not what you want anyway.
What you could do for a little performance gain is to broadcast the assignments. That is, use .= instead of =. If you want, you can also use the #. macro to automatically add dots everywhere for you (for maybe cleaner/easier-to-read code):
function preprocess(matrix)
out = Array{Float64,2}(undef, size(matrix))
#views #. out[1,:] = matrix[2,:] - matrix[1,:]
#views #. out[end,:] = matrix[end,:] - matrix[end-1,:]
#views #. out[2:end-1,:] = 0.5 * (matrix[3:end,:] - matrix[1:end-2,:])
return out
end
For optimal performance, I think you probably want to write the loops explicitly and use multithreading with a package like LoopVectorization.jl for example.
PS: Note that in your code comments you wrote "cols" instead of "rows", and you wrote "mean" but take a difference. (Not sure it was intentional.)

Related

Julia: How to efficiently sort subarrays of 2 large arrays in parallel?

I have large 1D arrays a and b, and an array of pointers I that separates them into subarrays. My a and b barely fit into RAM and are of different dtypes (one contains UInt32s, the other Rational{Int64}s), so I don’t want to join them into a 2D array, to avoid changing dtypes.
For each i in I[2:end], I wish to sort the subarray a[I[i-1],I[i]-1] and apply the same permutation to the corresponding subarray b[I[i-1],I[i]-1]. My attempt at this is:
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!( a[I[i-1], I[i]-1], b[I[i-1], I[i]-1] )
end
However, already on a small example, I see that sort! does not alter the view of a subarray:
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a,b); println(a,"\n",b) # works like it should
a, b = rand(1:10,10), rand(-1000:1000,10) .//1
sort!(a[1:5],b[1:5]); println(a,"\n",b) # does nothing!!!
Any help on how to create such function sort! (as efficient as possible) are welcome.
Background: I am dealing with data coming from sparse arrays:
using SparseArrays
n=10^6; x=sprand(n,n,1000/n); #random matrix with 1000 entries per column on average
x = SparseMatrixCSC(n,n,x.colptr,x.rowval,rand(-99:99,nnz(x)).//1); #chnging entries to rationals
U = randperm(n) #permutation of rows of matrix x
a, b, I = U[x.rowval], x.nzval, x.colptr;
Thus these a,b,I serve as good examples to my posted problem. What I am trying to do is sort the row indices (and corresponding matrix values) of entries in each column.
Note: I already asked this question on Julia discourse here, but received no replies nor comments. If I can improve on the quality of the question, don't hesitate to tell me.
The problem is that a[1:5] is not a view, it's just a copy. instead make the view like
function sort!(a,b)
p=sortperm(a);
a[:], b[:] = a[p], b[p]
end
Threads.#threads for i in I[2:end]
sort!(view(a, I[i-1]:I[i]-1), view(b, I[i-1]:I[i]-1))
end
is what you are looking for
ps.
the #view a[2:3], #view(a[2:3]) or the #views macro can help making thins more readable.
First of all, you shouldn't redefine Base.sort! like this. Now, sort! will shadow Base.sort! and you'll get errors if you call sort!(a).
Also, a[I[i-1], I[i]-1] and b[I[i-1], I[i]-1] are not slices, they are just single elements, so nothing should happen if you sort them either with views or not. And sorting arrays in a moving-window way like this is not correct.
What you want to do here, since your vectors are huge, is call p = partialsortperm(a[i:end], i:i+block_size-1) repeatedly in a loop, choosing a block_size that fits into memory, and modify both a and b according to p, then continue to the remaining part of a and find next p and repeat until nothing remains in a to be sorted. I'll leave the implementation as an exercise for you, but you can come back if you get stuck on something.

Looping over array values in Lua

I have a variable as follows
local armies = {
[1] = "ARMY_1",
[2] = "ARMY_3",
[3] = "ARMY_6",
[4] = "ARMY_7",
}
Now I want to do an action for each value. What is the best way to loop over the values? The typical thing I'm finding on the internet is this:
for i, armyName in pairs(armies) do
doStuffWithArmyName(armyName)
end
I don't like that as it results in an unused variable i. The following approach avoids that and is what I am currently using:
for i in pairs(armies) do
doStuffWithArmyName(armies[i])
end
However this is still not as readable and simple as I'd like, since this is iterating over the keys and then getting the value using the key (rather imperatively). Another boon I have with both approaches is that pairs is needed. The value being looped over here is one I have control over, and I'd prefer that it can be looped over as easily as possible.
Is there a better way to do such a loop if I only care about the values? Is there a way to address the concerns I listed?
I'm using Lua 5.0 (and am quite new to the language)
The idiomatic way to iterate over an array is:
for _, armyName in ipairs(armies) do
doStuffWithArmyName(armyName)
end
Note that:
Use ipairs over pairs for arrays
If the key isn't what you are interested, use _ as placeholder.
If, for some reason, that _ placeholder still concerns you, make your own iterator. Programming in Lua provides it as an example:
function values(t)
local i = 0
return function() i = i + 1; return t[i] end
end
Usage:
for v in values(armies) do
print(v)
end

Concise way to create an array filled within a range in Matlab

I need to create an array filled within a range in Matlab
e.g.
from=2
to=6
increment=1
result
[2,3,4,5,6]
e.g.
from=15
to=25
increment=2
result
[15,17,19,21,23,25]
Obviously I can create a loop to perform this action from scratch but I wondering if there is a coincise and efficent way to do this with built-in matlab commands since seems a very common operation
EDIT
If I use linspace the operation is weird since the spacing between the points is (x2-x1)/(n-1).
This can be handled simply by the : operator in the following notation
array = from:increment:to
Note that the increment defaults to 1 if written with only one colon seperator
array = from:to
Example
array1 = 2:6 %Produces [2,3,4,5,6]
array2 = 15:2:25 %Produces [15,17,19,21,23,25]

Reducing the size of an array by averaging points within the array (IDL)

While I am sure there is an answer, and this question is very low-level (but it's always the easy things that trip you up), my main issue is trying to word the question.
Say I have the following arrays:
time=[0,1,2,3,4,5,6,7,8,9,10,11] ;in seconds
data=[0,1,2,3,4,5,6,7,8,9,10,11]
The 'time' array is in bins of '1s', but instead I would like the array to be in bins of '2s' where the data is then the mean:
time=[0,2,4,6,8,10] ;in seconds
data=[0.5,2.5,4.5,6.5,8.5,10.5]
Is there (and I am sure there is) an IDL function to implement this in IDL?
my actual data array is:
DATA DOUBLE = Array[15286473]
so I would rather use an existing, efficient, solution than unnecessarily creating my own.
Cheers,
Paul
NB: I can change the time array to what I want by interpolating the data (INTERPOL)
IDL> x=[0,1,2,3,4,5,6,7,8,9,10]
IDL> x_new=interpol(x,(n_elements(x)/2)+1.)
IDL> print, x_new
0.00000 2.00000 4.00000 6.00000 8.00000 10.0000
The issue is just with the data array
I think you need rebin: http://www.exelisvis.com/docs/REBIN.html
congrid provides similar functionality. If rebin does not solve your problem, this should work:
step = 2
select = step * indgen(floor(n_elements/step))
new_time = (smooth(time, step))[select]
new_data = (smooth(data, step))[select]
You might want to set /edge_truncate for smooth, based on your requirements. Also, won't interpol work for you?
I can think of a few ways to do this, but the easiest would be the following:
nd = N_ELEMENTS(data)
ind = LINDGEN(nd)
upi = ind[1:(nd - 1L):2]
dni = ind[0:(nd - 1L):2]
where the form of indexing I have used is described here. One can write an array as ind[s0:s1:n] where s0 is the starting element, s1 is the ending element, and n is the stride.
Now that we have the indices for the adjecent elements, then we can define the averages in a vectorized format as:
avg_data = (data[upi] + [dni])/2
You can do something similar to your time stamps or use INTERPOL if you wish.
There are more complicated methods (e.g., the trapezoid rule) to doing this, but the above is a quick and simple solution.

Distributing a function over a single dimension of an array in MATLAB?

I often find myself wanting to collapse an n-dimensional matrix across one dimension using a custom function, and can't figure out if there is a concise incantation I can use to do this.
For example, when parsing an image, I often want to do something like this. (Note! Illustrative example only. I know about rgb2gray for this specific case.)
img = imread('whatever.jpg');
s = size(img);
for i=1:s(1)
for j=1:s(2)
bw_img(i,j) = mean(img(i,j,:));
end
end
I would love to express this as something like:
bw = on(color, 3, #mean);
or
bw(:,:,1) = mean(color);
Is there a short way to do this?
EDIT: Apparently mean already does this; I want to be able to do this for any function I've written. E.g.,
...
filtered_img(i,j) = reddish_tint(img(i,j,:));
...
where
function out = reddish_tint(in)
out = in(1) * 0.5 + in(2) * 0.25 + in(3) * 0.25;
end
Many basic MATLAB functions, like MEAN, MAX, MIN, SUM, etc., are designed to operate across a specific dimension:
bw = mean(img,3); %# Mean across dimension 3
You can also take advantage of the fact that MATLAB arithmetic operators are designed to operate in an element-wise fashion on matrices. For example, the operation in your function reddish_tint can be applied to all pixels of your image with this single line:
filtered_img = 0.5.*img(:,:,1)+0.25.*img(:,:,2)+0.25.*img(:,:,3);
To handle a more general case where you want to apply a function to an arbitrary dimension of an N-dimensional matrix, you will probably want to write your function such that it accepts an additional input argument for which dimension to operate over (like the above-mentioned MATLAB functions do) and then uses some simple logic (i.e. if-else statements) and element-wise matrix operations to apply its computations to the proper dimension of the matrix.
Although I would not suggest using it, there is a quick-and-dirty solution, but it's rather ugly and computationally more expensive. You can use the function NUM2CELL to collect values along a dimension of your array into cells of a cell array, then apply your function to each cell using the function CELLFUN:
cellArray = num2cell(img,3); %# Collect values in dimension 3 into cells
filtered_img = cellfun(#reddish_tint,cellArray); %# Apply function to each cell
I wrote a helper function called 'vecfun' that might be useful for this, if it's what you're trying to achieve?
link
You could use BSXFUN for at least some of your tasks. It performs an element-wise operation among two arrays by expanding the size 1 - dimensions to match the size in the other array. The 'reddish tint' function would become
reddish_image = bsxfun(#times,img,cat(3,0.5,0.25,0.25));
filtered_img = sum(reddish_image,3);
All the above statement requires in order to work is that the third dimension of img has size 1 or 3. Number and size of the other dimensions can be chosen freely.
If you are consistently trying to apply a function to a vector comprised by the 3 dimension in a block of images, I recommend using a pair reshapes, for instance:
Img = rand(480,640,3);
sz = size(Img);
output = reshape(myFavoriteFunction(reshape(Img,[prod(sz(1:2)),sz(3)])'),sz);
This way you can swap in any function that operates on matrices along their first dimension.
edit.
The above code will crash if you input an image which has only one layer: The function below can fix it.
function o = nLayerImage2MatrixOfPixels(i)
%function o = nLayerImage2MatrixOfPixels(i)
s = size(i);
if(length(s) == 2)
s3 = 1;
else
s3 = s(3);
end
o = reshape(i,[s(1)*s(2),s(3)])';
Well, if you are only concerned with multiplying vectors together you could just use the dot product, like this:
bw(:,:,1)*[0.3;0.2;0.5]
taking care that the shapes of your vectors conform.

Resources