numpy binned mean, conserving extra axes - arrays

It seems I am stuck on the following problem with numpy.
I have an array X with shape: X.shape = (nexp, ntime, ndim, npart)
I need to compute binned statistics on this array along npart dimension, according to the values in binvals (and some bins), but keeping all the other dimensions there, because I have to use the binned statistic to remove some bias in the original array X. Binning values have shape binvals.shape = (nexp, ntime, npart).
A complete, minimal example, to explain what I am trying to do. Note that, in reality, I am working on large arrays and with several hunderds of bins (so this implementation takes forever):
import numpy as np
np.random.seed(12345)
X = np.random.randn(24).reshape(1,2,3,4)
binvals = np.random.randn(8).reshape(1,2,4)
bins = [-np.inf, 0, np.inf]
nexp, ntime, ndim, npart = X.shape
cleanX = np.zeros_like(X)
for ne in range(nexp):
for nt in range(ntime):
indices = np.digitize(binvals[ne, nt, :], bins)
for nd in range(ndim):
for nb in range(1, len(bins)):
inds = indices==nb
cleanX[ne, nt, nd, inds] = X[ne, nt, nd, inds] - \
np.mean(X[ne, nt, nd, inds], axis = -1)
Looking at the results of this may make it clearer?
In [8]: X
Out[8]:
array([[[[-0.20470766, 0.47894334, -0.51943872, -0.5557303 ],
[ 1.96578057, 1.39340583, 0.09290788, 0.28174615],
[ 0.76902257, 1.24643474, 1.00718936, -1.29622111]],
[[ 0.27499163, 0.22891288, 1.35291684, 0.88642934],
[-2.00163731, -0.37184254, 1.66902531, -0.43856974],
[-0.53974145, 0.47698501, 3.24894392, -1.02122752]]]])
In [10]: cleanX
Out[10]:
array([[[[ 0. , 0.67768523, -0.32069682, -0.35698841],
[ 0. , 0.80405255, -0.49644541, -0.30760713],
[ 0. , 0.92730041, 0.68805503, -1.61535544]],
[[ 0.02303938, -0.02303938, 0.23324375, -0.23324375],
[-0.81489739, 0.81489739, 1.05379752, -1.05379752],
[-0.50836323, 0.50836323, 2.13508572, -2.13508572]]]])
In [12]: binvals
Out[12]:
array([[[ -5.77087303e-01, 1.24121276e-01, 3.02613562e-01,
5.23772068e-01],
[ 9.40277775e-04, 1.34380979e+00, -7.13543985e-01,
-8.31153539e-01]]])
Is there a vectorized solution? I thought of using scipy.stats.binned_statistic, but I seem to be unable to understand how to use it for this aim. Thanks!

import numpy as np
np.random.seed(100)
nexp = 3
ntime = 4
ndim = 5
npart = 100
nbins = 4
binvals = np.random.rand(nexp, ntime, npart)
X = np.random.rand(nexp, ntime, ndim, npart)
bins = np.linspace(0, 1, nbins + 1)
d = np.digitize(binvals, bins)[:, :, np.newaxis, :]
r = np.arange(1, len(bins)).reshape((-1, 1, 1, 1, 1))
m = d[np.newaxis, ...] == r
counts = np.sum(m, axis=-1, keepdims=True).clip(min=1)
means = np.sum(X[np.newaxis, ...] * m, axis=-1, keepdims=True) / counts
cleanX = X - np.choose(d - 1, means)

Ok, I think I got it, mainly based on the answer by #jdehesa.
clean2 = np.zeros_like(X)
d = np.digitize(binvals, bins)
for i in range(1, len(bins)):
m = d == i
minds = np.where(m)
sl = [*minds[:2], slice(None), minds[2]]
msum = m.sum(axis=-1)
clean2[sl] = (X - \
(np.sum(X * m[...,np.newaxis,:], axis=-1) /
msum[..., np.newaxis])[..., np.newaxis])[sl]
Which gives the same results as my original code.
On the small arrays I have in the example here, this solution is approximately three times as fast as the original code. I expect it to be way faster on larger arrays.
Update:
Indeed it's faster on larger arrays (didn't do any formal test), but despite this, it just reaches the level of acceptable in terms of performance... any further suggestion on extra vectoriztaions would be very welcome.

Related

Quantum walk on 3D grid

I am trying to apply the quantum coin walk on a 3D grid, with 3 Hadamard coins. However I can't seem to get symmetric results after 3 steps. Is it simply not possible to have a probability distribution which is symmetric with such a coin?
Thank you
ps the implementation is based on http://susan-stepney.blogspot.com/2014/02/mathjax.html and the position vector captures a 3D grid.
pps Has this been attempted on qiskit? I couldn't use the hard coded matrix to get result perfectly symmetric for some reasons...
Not sure I answered your question, but
from the code reference you mentioned, I only changed line 30 to:ax = fig.add_subplot(111, projection = '3d') and line 3 to:from mpl_toolkits.mplot3d import Axes3D
from numpy import *
from matplotlib.pyplot import *
from mpl_toolkits.mplot3d import Axes3D
N = 100 # number of random steps
P = 2*N+1 # number of positions
coin0 = array([1, 0]) # |0>
coin1 = array([0, 1]) # |1>
C00 = outer(coin0, coin0) # |0><0|
C01 = outer(coin0, coin1) # |0><1|
C10 = outer(coin1, coin0) # |1><0|
C11 = outer(coin1, coin1) # |1><1|
C_hat = (C00 + C01 + C10 - C11)/sqrt(2.)
ShiftPlus = roll(eye(P), 1, axis=0)
ShiftMinus = roll(eye(P), -1, axis=0)
S_hat = kron(ShiftPlus, C00) + kron(ShiftMinus, C11)
U = S_hat.dot(kron(eye(P), C_hat))
posn0 = zeros(P)
posn0[N] = 1 # array indexing starts from 0, so index N is the central posn
psi0 = kron(posn0,(coin0+coin1*1j)/sqrt(2.))
psiN = linalg.matrix_power(U, N).dot(psi0)
prob = empty(P)
for k in range(P):
posn = zeros(P)
posn[k] = 1
M_hat_k = kron( outer(posn,posn), eye(2))
proj = M_hat_k.dot(psiN)
prob[k] = proj.dot(proj.conjugate()).real
fig = figure()
ax = fig.add_subplot(111, projection = '3d')
plot(arange(P), prob)
plot(arange(P), prob, 'o')
loc = range(0, P, P // 10) #Location of ticks
xticks(loc)
xlim(0, P)
ax.set_xticklabels(range(-N, N+1, P // 10))
show()

equivalent of numpy.c_ in julia

Hi I am going through the book https://nnfs.io/ but using JuliaLang (it's a self-challenge to get to know the language better and use it more often.. rather than doing the same old same in Python..)
I have come across a part of the book in which they have custom wrote some function and I need to recreate it in JuliaLang...
source: https://cs231n.github.io/neural-networks-case-study/
python
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
my julia version so far....
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros((N*K, D))
y = zeros(UInt8, N*K)
# See https://docs.julialang.org/en/v1/base/math/#Base.range
for j in range(0,length=K)
ix = range(N*(j), length = N+1)
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix] = ????????
end
notice the ??????? area because I am now trying to decipher if Julia has an equivalent for this numpy function
https://numpy.org/doc/stable/reference/generated/numpy.c_.html
Any help is appreciated.. or just tell me if I need to write something myself
This is a special object to provide nice syntax for column concatanation. In Julia this is just built into the language hence you can do:
julia> a=[1,2,3];
julia> b=[4,5,6];
julia> [a b]
3×2 Matrix{Int64}:
1 4
2 5
3 6
For your case the Julian equivalent of np.c_[r*np.sin(t), r*np.cos(t)] should be:
[r .* sin.(t) r .* cos.(t)]
To understand Python's motivation you can also have a look at :
numpy.r_ is not a function. What is it?
The equivalent of numpy.c_ would seem to be horizontal concatenation, which you can do with either the hcat function or with (e.g.) simply [a b]. Fixing a few other issues with the translation so far, we end up with
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros(N*K, D)
y = zeros(UInt8, N*K)
for j in range(0,length=K)
ix = (N*j+1):(N*(j+1))
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix,:] .= [radius.*sin.(theta) radius.*cos.(theta)]
y[ix] .= j
end
# visualize the data:
using Plots
scatter(X[:,1], X[:,2], zcolor=y, framestyle=:box)

Reshaping tensors in a 3D numpy matrix

I'm essentially trying to accomplish this and then this but with a 3D matrix, say (128,128,60,6). The 4th dimension is an array vector that represents the diffusion array at that voxel, e.g.:
d[30,30,30,:] = [dxx, dxy, dxz, dyy, dyz, dzz] = D_array
Where dxx etc. are diffusion for a particular direction. D_array can also be seen as a triangular matrix (since dxy == dyx etc.). So I can use those 2 other answers to get from D_array to D_square, e.g.
D_square = [[dxx, dxy, dxz], [dyx, dyy, dyz],[dzx, dzy, dzz]]
I can't seem to figure out the next step however - how to apply that unit transformation of a D_array into D_square to the whole 3D volume.
Here's the code snippet that works on a single tensor:
#this solves an linear eq. that provides us with diffusion arrays at each voxel in a 3D space
D = np.einsum('ijkt,tl->ijkl',X,bi_plus)
#our issue at this point is we have a vector that represents a triangular matrix.
# first make a tri matx from the vector, testing on unit tensor first
D_tri = np.zeros((3,3))
D_array = D[30][30][30]
D_tri[np.triu_indices(3)] = D_array
# then getting the full sqr matrix
D_square = D_tri.T + D_tri
np.fill_diagonal(D_square, np.diag(D_tri))
So what would be the numpy-way of formulating that unit transformation of the Diffusion tensor to the whole 3D volume all at once?
Approach #1
Here's one using row, col indices from triu_indices for indexing along last two axes into an initialized output array -
def squareformnd_rowcol_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
# Get a "rolled-axis" view with which the last two axes come to the front
# so that we could index into them just like for a 2D case
out_rolledaxes_view = out.transpose(np.roll(range(out.ndim),2,0))
# Assign permuted version of input array into rolled output version
arT = np.moveaxis(ar,-1,0)
out_rolledaxes_view[row,col] = arT
out_rolledaxes_view[col,row] = arT
return out
Approach #2
Another one with the last two axes merged into one and then indexing with linear indices -
def squareformnd_linear_integer(ar, n=3):
out_shp = ar.shape[:-1] + (n,n)
out = np.empty(out_shp, dtype=ar.dtype)
row,col = np.triu_indices(n)
idx0 = row*n+col
idx1 = col*n+row
ar2D = ar.reshape(-1,ar.shape[-1])
out.reshape(-1,n**2)[:,idx0] = ar2D
out.reshape(-1,n**2)[:,idx1] = ar2D
return out
Approach #3
Finally altogether a new method using masking and should be better with performance as most masking based ones are when it comes to indexing -
def squareformnd_masking(ar, n=3):
out = np.empty((n,n)+ar.shape[:-1] , dtype=ar.dtype)
r = np.arange(n)
m = r[:,None]<=r
arT = np.moveaxis(ar,-1,0)
out[m] = arT
out.swapaxes(0,1)[m] = arT
new_axes = range(out.ndim)[2:] + [0,1]
return out.transpose(new_axes)
Timings on (128,128,60,6) shaped random array -
In [635]: ar = np.random.rand(128,128,60,6)
In [636]: %timeit squareformnd_linear_integer(ar, n=3)
...: %timeit squareformnd_rowcol_integer(ar, n=3)
...: %timeit squareformnd_masking(ar, n=3)
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 103 ms per loop
10 loops, best of 3: 53.6 ms per loop
A vectorized way to do it:
# Gets the triangle matrix
d_tensor = np.zeros(128, 128, 60, 3, 3)
triu_idx = np.triu_indices(3)
d_tensor[:, :, :, triu_idx[0], triu_idx[1]] = d
# Make it symmetric
diagonal = np.zeros(128, 128, 60, 3, 3)
idx = np.arange(3)
diagonal[:, :, :, idx, idx] = d_tensor[:, :, :, idx, idx]
d_tensor = np.transpose(d_tensor, (0, 1, 2, 4, 3)) + d_tensor - diagonal

How to generate matrices directely into an array with a function?

I have a formula that creates matrices. Later with every single matrix of the set I have to do some time consuming stuff. So far, I'm bundling these matrices into a list with lapply(). Now, I assume operating with an array of matrices would be much faster. The thing is, I don't know how to let the matirices be generated into an array as with lapply().
I give you this example:
# matrix generating function
mxSim <- function(X, n) {
mx = matrix(NA, nrow = n, ncol = 3,
dimnames = list(NULL, c("d", "alpha", "beta")))
mx[,1] = rbinom(n, 1, .375)
mx[,2] = rnorm(n, 0, 2)
mx[,3] = .42 * rnorm(n, 0, 6)
return(mx)
}
# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)
# some stuff to be done after, like e. g.:
lapply(mx.lst, function(m) lm(d ~ alpha + beta, as.data.frame(m)))
Could anybody give me some advise how to do this with an array?
I've been looking into this answer, but for it the matrices have to be already generated, and I only could help me by listing them before again.
Enough with the hooha. Lets time it.
library(microbenchmark)
# matrix generating function
mxSim <- function(X, n) {
mx = matrix(NA, nrow = n, ncol = 3,
dimnames = list(NULL, c("d", "alpha", "beta")))
mx[,1] = rbinom(n, 1, .375)
mx[,2] = rnorm(n, 0, 2)
mx[,3] = .42 * rnorm(n, 0, 6)
return(mx)
}
# bundle matrices together
mx.lst <- lapply(1:1e1, mxSim, n = 1e4)
mx.array <- array(mx.lst,dim=c(2,5))
# some stuff to be done after, like e. g.:
#Timing...
some.fnc<-function(m)lm(d ~ alpha + beta, as.data.frame(m))
list.test<-microbenchmark(lapply(mx.lst, some.fnc))
array.test<-microbenchmark(apply(mx.array, MARGIN=c(1,2), some.fnc))
expr min lq mean median uq max neval
lapply: 74.8953 101.9424 173.8733 146.7186 234.7577 397.2494 100
apply: 77.2362 101.0338 174.4178 137.153 264.6854 418.7297 100
Naively applying a function over a list as opposed to an array is almost identical in actual performance.
For the sake of completeness I just made some other benchmarks with n=1e3 as stated in the comment of #SeldomSeenSlim's answer. In addition I made it with a list of data.frames(), and this was a bit surprising.
Here is the function for data.frames, for matrix function see above.
dfSim <- function(X, n) {
d <- rbinom(n, 1, .375)
alpha <- rnorm(n, 0, 2)
beta <- .42 * rnorm(n, 0, 6)
data.frame(d, alpha, beta)
}
Bundeling
mx.lst <- lapply(1:1e3, mxSim, n = 1e4)
mx.array <- array(mx.lst, dim = c(2, 500))
df.lst <- lapply(1:1e3, dfSim, n = 1e4)
And the microbenchmarks:
some.fnc <- function(m) lm(d ~ alpha + beta, as.data.frame(m))
list.test <- microbenchmark(lapply(mx.lst, some.fnc))
array.test <- microbenchmark(apply(mx.array, MARGIN = c(1, 2), some.fnc))
df.list.test <- microbenchmark(lapply(df.lst, some.fnc))
Results
Unit: seconds
expr min lq mean median uq max neval
lapply 9.658568 9.742613 9.831577 9.784711 9.911466 10.30035 100
apply 9.727057 9.951213 9.994986 10.00614 10.06847 10.22178 100
lapply(df) 9.121293 9.229912 9.286592 9.277967 9.327829 10.12548 100
Now, what does us tell this?
But, okay, as a bold sidenote:
microbenchmark((lapply(1:1e3, mxSim, n = 1e4)), (lapply(1:1e3, dfSim, n = 1e4)))
expr min lq mean median uq max neval cld
(lapply(mxSim)) 2.533466 2.551199 2.563864 2.555421 2.559234 2.693383 100 a
(lapply(dfSim)) 2.676869 2.695826 2.718454 2.701161 2.706249 3.293431 100 b

How do I combine the coordinate pairs of an array into a single index?

I have an array
A = [3, 4; 5, 6; 4, 1];
Is there a way I could convert all coordinate pairs of the array into linear indices such that:
A = [1, 2, 3]'
whereby (3,4), (5,6), and (4,1) are represented by 1, 2, and 3, respectively.
Many thanks!
The reason I need is because I need to loop through array A such that I could make use of each coordinate pairs (3,4), (5,6), and (4,1) at the same time. This is because I will need to feed each of these pairs into a function so as to make another computation. See pseudo code below:
for ii = 1: length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
whereby, at ii = 1, x=3 and y=4. The next iteration takes the pair x=5, y=6, etc.
Basically what will happen is that my kx2 array will be converted to a kx1 array. Thanks for your help.
Adapting your code, what you want was suggested by #Ander in the comments...
Your code
for ii = 1:length(A);
[x, y] = function_obtain_coord_pairs(A);
B = function_obtain_fit(x, y, I);
end
Adapted code
for ii = 1:size(A,1);
x = A(ii, 1);
y = A(ii, 2);
B = function_obtain_fit(x, y, I); % is I here supposed to be ii? I not defined...
end
Your unfamiliarly with indexing makes me think your function_obtain_fit function could probably be vectorised to accept the entire matrix A, but that's a matter for another day!
For instance, you really don't need to define x or y at all...
Better code
for ii = 1:size(A,1);
B = function_obtain_fit(A(ii, 1), A(ii, 2), I);
end
Here is a corrected version for your code:
A = [3, 4; 5, 6; 4, 1];
for k = A.'
B = function_obtain_fit(k(1),k(2),I)
end
By iterating directly on A you iterate over the columns of A. Because you want to iterate over the rows we need to take A.'. So if we just display k it is:
for k = A.'
k
end
the output is:
k =
3
4
k =
5
6
k =
4
1

Resources