equivalent of numpy.c_ in julia - arrays

Hi I am going through the book https://nnfs.io/ but using JuliaLang (it's a self-challenge to get to know the language better and use it more often.. rather than doing the same old same in Python..)
I have come across a part of the book in which they have custom wrote some function and I need to recreate it in JuliaLang...
source: https://cs231n.github.io/neural-networks-case-study/
python
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
my julia version so far....
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros((N*K, D))
y = zeros(UInt8, N*K)
# See https://docs.julialang.org/en/v1/base/math/#Base.range
for j in range(0,length=K)
ix = range(N*(j), length = N+1)
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix] = ????????
end
notice the ??????? area because I am now trying to decipher if Julia has an equivalent for this numpy function
https://numpy.org/doc/stable/reference/generated/numpy.c_.html
Any help is appreciated.. or just tell me if I need to write something myself

This is a special object to provide nice syntax for column concatanation. In Julia this is just built into the language hence you can do:
julia> a=[1,2,3];
julia> b=[4,5,6];
julia> [a b]
3×2 Matrix{Int64}:
1 4
2 5
3 6
For your case the Julian equivalent of np.c_[r*np.sin(t), r*np.cos(t)] should be:
[r .* sin.(t) r .* cos.(t)]
To understand Python's motivation you can also have a look at :
numpy.r_ is not a function. What is it?

The equivalent of numpy.c_ would seem to be horizontal concatenation, which you can do with either the hcat function or with (e.g.) simply [a b]. Fixing a few other issues with the translation so far, we end up with
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros(N*K, D)
y = zeros(UInt8, N*K)
for j in range(0,length=K)
ix = (N*j+1):(N*(j+1))
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix,:] .= [radius.*sin.(theta) radius.*cos.(theta)]
y[ix] .= j
end
# visualize the data:
using Plots
scatter(X[:,1], X[:,2], zcolor=y, framestyle=:box)

Related

numpy binned mean, conserving extra axes

It seems I am stuck on the following problem with numpy.
I have an array X with shape: X.shape = (nexp, ntime, ndim, npart)
I need to compute binned statistics on this array along npart dimension, according to the values in binvals (and some bins), but keeping all the other dimensions there, because I have to use the binned statistic to remove some bias in the original array X. Binning values have shape binvals.shape = (nexp, ntime, npart).
A complete, minimal example, to explain what I am trying to do. Note that, in reality, I am working on large arrays and with several hunderds of bins (so this implementation takes forever):
import numpy as np
np.random.seed(12345)
X = np.random.randn(24).reshape(1,2,3,4)
binvals = np.random.randn(8).reshape(1,2,4)
bins = [-np.inf, 0, np.inf]
nexp, ntime, ndim, npart = X.shape
cleanX = np.zeros_like(X)
for ne in range(nexp):
for nt in range(ntime):
indices = np.digitize(binvals[ne, nt, :], bins)
for nd in range(ndim):
for nb in range(1, len(bins)):
inds = indices==nb
cleanX[ne, nt, nd, inds] = X[ne, nt, nd, inds] - \
np.mean(X[ne, nt, nd, inds], axis = -1)
Looking at the results of this may make it clearer?
In [8]: X
Out[8]:
array([[[[-0.20470766, 0.47894334, -0.51943872, -0.5557303 ],
[ 1.96578057, 1.39340583, 0.09290788, 0.28174615],
[ 0.76902257, 1.24643474, 1.00718936, -1.29622111]],
[[ 0.27499163, 0.22891288, 1.35291684, 0.88642934],
[-2.00163731, -0.37184254, 1.66902531, -0.43856974],
[-0.53974145, 0.47698501, 3.24894392, -1.02122752]]]])
In [10]: cleanX
Out[10]:
array([[[[ 0. , 0.67768523, -0.32069682, -0.35698841],
[ 0. , 0.80405255, -0.49644541, -0.30760713],
[ 0. , 0.92730041, 0.68805503, -1.61535544]],
[[ 0.02303938, -0.02303938, 0.23324375, -0.23324375],
[-0.81489739, 0.81489739, 1.05379752, -1.05379752],
[-0.50836323, 0.50836323, 2.13508572, -2.13508572]]]])
In [12]: binvals
Out[12]:
array([[[ -5.77087303e-01, 1.24121276e-01, 3.02613562e-01,
5.23772068e-01],
[ 9.40277775e-04, 1.34380979e+00, -7.13543985e-01,
-8.31153539e-01]]])
Is there a vectorized solution? I thought of using scipy.stats.binned_statistic, but I seem to be unable to understand how to use it for this aim. Thanks!
import numpy as np
np.random.seed(100)
nexp = 3
ntime = 4
ndim = 5
npart = 100
nbins = 4
binvals = np.random.rand(nexp, ntime, npart)
X = np.random.rand(nexp, ntime, ndim, npart)
bins = np.linspace(0, 1, nbins + 1)
d = np.digitize(binvals, bins)[:, :, np.newaxis, :]
r = np.arange(1, len(bins)).reshape((-1, 1, 1, 1, 1))
m = d[np.newaxis, ...] == r
counts = np.sum(m, axis=-1, keepdims=True).clip(min=1)
means = np.sum(X[np.newaxis, ...] * m, axis=-1, keepdims=True) / counts
cleanX = X - np.choose(d - 1, means)
Ok, I think I got it, mainly based on the answer by #jdehesa.
clean2 = np.zeros_like(X)
d = np.digitize(binvals, bins)
for i in range(1, len(bins)):
m = d == i
minds = np.where(m)
sl = [*minds[:2], slice(None), minds[2]]
msum = m.sum(axis=-1)
clean2[sl] = (X - \
(np.sum(X * m[...,np.newaxis,:], axis=-1) /
msum[..., np.newaxis])[..., np.newaxis])[sl]
Which gives the same results as my original code.
On the small arrays I have in the example here, this solution is approximately three times as fast as the original code. I expect it to be way faster on larger arrays.
Update:
Indeed it's faster on larger arrays (didn't do any formal test), but despite this, it just reaches the level of acceptable in terms of performance... any further suggestion on extra vectoriztaions would be very welcome.

Change diagonals of an array of matrices

I have an application with an array of matrices. I have to manipulate the diagonals several times. The other elements are unchanged. I want to do things like:
for j=1:nj
for i=1:n
g(i,i,j) = gd(i,j)
end
end
I have seen how to do this with a single matrix using logical(eye(n)) as a single index, but this does not work with an array of matrices. Surely there is a way around this problem. Thanks
Use a linear index as follows:
g = rand(3,3,2); % example data
gd = [1 4; 2 5; 3 6]; % example data. Each column will go to a diagonal
s = size(g); % size of g
ind = bsxfun(#plus, 1:s(1)+1:s(1)*s(2), (0:s(3)-1).'*s(1)*s(2)); % linear index
g(ind) = gd.'; % write values
Result:
>> g
g(:,:,1) =
1.000000000000000 0.483437118939645 0.814179952862505
0.154841697368116 2.000000000000000 0.989922194103104
0.195709075365218 0.356349047562417 3.000000000000000
g(:,:,2) =
4.000000000000000 0.585604389346560 0.279862618046844
0.802492555607293 5.000000000000000 0.610960767605581
0.272602365429990 0.551583664885735 6.000000000000000
Based on Luis Mendo's answer, a version that may perhaps be more easy to modify depending on one's specific purposes. No doubt his version will be more computationally efficient though.
g = rand(3,3,2); % example data
gd = [1 4; 2 5; 3 6]; % example data. Each column will go to a diagonal
sz = size(g); % Get size of data
sub = find(eye(sz(1))); % Find indices for 2d matrix
% Add number depending on location in third dimension.
sub = repmat(sub,sz(3),1); %
dim3 = repmat(0:sz(1)^2:prod(sz)-1, sz(1),1);
idx = sub + dim3(:);
% Replace elements.
g(idx) = gd;
Are we already playing code golf yet? Another slightly smaller and more readable solution
g = rand(3,3,2);
gd = [1 4; 2 5; 3 6];
s = size(g);
g(find(repmat(eye(s(1)),1,1,s(3))))=gd(:)
g =
ans(:,:,1) =
1.00000 0.35565 0.69742
0.85690 2.00000 0.71275
0.87536 0.13130 3.00000
ans(:,:,2) =
4.00000 0.63031 0.32666
0.33063 5.00000 0.28597
0.80829 0.52401 6.00000

Matlab array multiplication after linear indexing

I have 2 matrices defined as follows:
A=[1 2;3 4]
B=[1 4; 5 3]
Then I define Aensem, Bensem and Gensem like this:
Arow=A(:);
Brow=B(:);
Aensem=repmat(Arow,1,10);
Bensem=repmat(Brow,1,10);
G=A*B;
Grow=G(:);
Gensem=repmat(Grow,1,10);
I need to create a function that can calculate any Gensem-like arrays directly from Aensem and Bensem. I only have knowledge of Aensem and Bensem. I tried the following method, but it's not working:
function ret = mat_mult(v1, v2, r)
ret = zeros(size(v1));
for i = 1:2*r.c.M
for j = 1:2*r.c.M
sum = 0;
for k = 1:2*r.c.M
sum = sum + ...
v1(idx1(i,k,r),:) .* v2(idx1(k,j,r),:);
ret(idx1(i,j,r),:)=sum;
end
end
end
end
If I understand your question correctly, you want to calculate Gensem directly from Aensem and Bensem. This can be done as follows:
A_ = reshape(Aensem(:, 1), 2, 2); % extract A from Aensem
B_ = reshape(Bensem(:, 1), 2, 2); % extract B from Bensem
G_ = A_*B_; % calculate G based on the extracted A and B
Gensem_ = repmat(G_(:),1,10); % build Gensem

How do I combine the coordinate pairs of an array into a single index?

I have an array
A = [3, 4; 5, 6; 4, 1];
Is there a way I could convert all coordinate pairs of the array into linear indices such that:
A = [1, 2, 3]'
whereby (3,4), (5,6), and (4,1) are represented by 1, 2, and 3, respectively.
Many thanks!
The reason I need is because I need to loop through array A such that I could make use of each coordinate pairs (3,4), (5,6), and (4,1) at the same time. This is because I will need to feed each of these pairs into a function so as to make another computation. See pseudo code below:
for ii = 1: length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
whereby, at ii = 1, x=3 and y=4. The next iteration takes the pair x=5, y=6, etc.
Basically what will happen is that my kx2 array will be converted to a kx1 array. Thanks for your help.
Adapting your code, what you want was suggested by #Ander in the comments...
Your code
for ii = 1:length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
Adapted code
for ii = 1:size(A,1);
x = A(ii, 1);
y = A(ii, 2);
B = function_obtain_fit(x, y, I); % is I here supposed to be ii? I not defined...
end
Your unfamiliarly with indexing makes me think your function_obtain_fit function could probably be vectorised to accept the entire matrix A, but that's a matter for another day!
For instance, you really don't need to define x or y at all...
Better code
for ii = 1:size(A,1);
B = function_obtain_fit(A(ii, 1), A(ii, 2), I);
end
Here is a corrected version for your code:
A = [3, 4; 5, 6; 4, 1];
for k = A.'
B = function_obtain_fit(k(1),k(2),I)
end
By iterating directly on A you iterate over the columns of A. Because you want to iterate over the rows we need to take A.'. So if we just display k it is:
for k = A.'
k
end
the output is:
k =
3
4
k =
5
6
k =
4
1

Given two arrays A and B, how to get B values which are the closest to A

Suppose I have two arrays ordered in an ascending order, i.e.:
A = [1 5 7], B = [1 2 3 6 9 10]
I would like to create from B a new vector B', which contains only the closest values to A values (one for each).
I also need the indexes. So, in my example I would like to get:
B' = [1 6 9], Idx = [1 4 5]
Note that the third value is 9. Indeed 6 is closer to 7 but it is already 'taken' since it is close to 4.
Any idea for a suitable code?
Note: my true arrays are much larger and contain real (not int) values
Also, it is given that B is longer then A
Thanks!
Assuming you want to minimize the overall discrepancies between elements of A and matched elements in B, the problem can be written as an assignment problem of assigning to every row (element of A) a column (element of B) given a cost matrix C. The Hungarian (or Munkres') algorithm solves the assignment problem.
I assume that you want to minimize cumulative squared distance between A and matched elements in B, and use the function [assignment,cost] = munkres(costMat) by Yi Cao from https://www.mathworks.com/matlabcentral/fileexchange/20652-hungarian-algorithm-for-linear-assignment-problems--v2-3-:
A = [1 5 7];
B = [1 2 3 6 9 10];
[Bprime,matches] = matching(A,B)
function [Bprime,matches] = matching(A,B)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[matches,~] = munkres(C);
Bprime = B(matches);
end
Assuming instead you want to find matches recursively, as suggested by your question, you could either walk through A, for each element in A find the closest remaining element in B and discard it (sortedmatching below); or you could iteratively form and discard the distance-minimizing match between remaining elements in A and B until all elements in A are matched (greedymatching):
A = [1 5 7];
B = [1 2 3 6 9 10];
[~,~,Bprime,matches] = sortedmatching(A,B,[],[])
[~,~,Bprime,matches] = greedymatching(A,B,[],[])
function [A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches)
[~,ix] = min((A(1) - B).^2);
matches = [matches ix];
Bprime = [Bprime B(ix)];
A = A(2:end);
B(ix) = Inf;
if(not(isempty(A)))
[A,B,Bprime,matches] = sortedmatching(A,B,Bprime,matches);
end
end
function [A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches)
C = (repmat(A',1,length(B)) - repmat(B,length(A),1)).^2;
[minrows,ixrows] = min(C);
[~,ixcol] = min(minrows);
ixrow = ixrows(ixcol);
matches(ixrow) = ixcol;
Bprime(ixrow) = B(ixcol);
A(ixrow) = -Inf;
B(ixcol) = Inf;
if(max(A) > -Inf)
[A,B,Bprime,matches] = greedymatching(A,B,Bprime,matches);
end
end
While producing the same results in your example, all three methods potentially give different answers on the same data.
Normally I would run screaming from for and while loops in Matlab, but in this case I cannot see how the solution could be vectorized. At least it is O(N) (or near enough, depending on how many equally-close matches to each A(i) there are in B). It would be pretty simple to code the following in C and compile it into a mex file, to make it run at optimal speed, but here's a pure-Matlab solution:
function [out, ind] = greedy_nearest(A, B)
if nargin < 1, A = [1 5 7]; end
if nargin < 2, B = [1 2 3 6 9 10]; end
ind = A * 0;
walk = 1;
for i = 1:numel(A)
match = 0;
lastDelta = inf;
while walk < numel(B)
delta = abs(B(walk) - A(i));
if delta < lastDelta, match = walk; end
if delta > lastDelta, break, end
lastDelta = delta;
walk = walk + 1;
end
ind(i) = match;
walk = match + 1;
end
out = B(ind);
You could first get the absolute distance from each value in A to each value in B, sort them and then get the first unique value to a sequence when looking down in each column.
% Get distance from each value in A to each value in B
[~, minIdx] = sort(abs(bsxfun(#minus, A,B.')));
% Get first unique sequence looking down each column
idx = zeros(size(A));
for iCol = 1:numel(A)
for iRow = 1:iCol
if ~ismember(idx, minIdx(iRow,iCol))
idx(iCol) = minIdx(iRow,iCol);
break
end
end
end
The result when applying idx to B
>> idx
1 4 5
>> B(idx)
1 6 9

Resources