For a general code that should be able to take scalars and arrays, I would like "#" to work with scalars, e.g.
a = 4
b = 3
as well as for arrays
a = np.array([[1, 2],[3, 4]])
b = np.array([1, 2])
The goal would that I can use for both cases
a#b
Does anyone know if this is possible?
Update 2021-06-17
#hpaulj thanks for the response.
I updated above to hopefully clarify. Currently I have a if-query to check if scalar or array and separate calculations with * and #, respectively. I was thinking it would simplify things if there was just one operation.
Maybe a stupid follow up:
Would it make sense to extend #/np.matmul to be able to do the scalar multiplication? Or is there a reason that would make this a bad idea?
It looks like numpy.dot is the most general code available for multiplications.
# is numpy.matmult and not numpy.dot, which are indeed not the same (see https://numpy.org/doc/stable/reference/generated/numpy.matmul.html):
matmul differs from dot in two important ways:
Multiplication by scalars is not allowed, use * instead.
Stacks of matrices are broadcast together as if the matrices were elements, respecting the signature (n,k),(k,m)->(n,m):
I still have to see the consequence of this aspect for my code.
Here is a simple code to show what is going on:
import numpy as np
# with matrix (only up to 2D)
a = np.matrix([[1, 2], [3, 4]])
b = np.matrix([[5, 6]]).T
c1 = a*b
c2 = a#b
c3 = np.dot(a, b)
# with array (general)
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6]]).T
C1 = A*B
C2 = A#B
C3 = np.dot(A, B)
# scalar
aS = 2
bS = 3
cS1 = aS*bS
#cS2 = aS#bS # does not work
cS3 = np.dot(aS, bS)
Related
numpy experts,
I'm using numpy.
I want to compare two arrays, get the largest value that is smaller than one of the arrays, and calculate the difference between them.
For example,
A = np.array([3, 5, 7, 12, 13, 18])
B = np.array([4, 7, 17, 20])
I want [1, 0, 4, 2] (4-3, 7-7, 17-13, 20-18) , in this case.
The problem is that the size of the A and B arrays is so large that it would take a very long time to do this by simple means. I can try to divide them to some size, but I wonder if there is a simple numpy function to solve this problem.
Or can I use numba?
For your information, This is my current very stupid codes.
delta = np.zeros_like(B)
for i in range(len(B)):
index_A = (A <= B[i]).argmin() - 1
delta[i] = B[i] - A[index_A]
I agree with #tarlen555 that the problem is mostly related to the for-loop. I guess this one is already much faster:
diff = B-A[:,np.newaxis]
diff[diff<0] = max(A.max(), B.max())
diff.min(axis=0)
In the second line, I wanted to fill all entries with negative values with something ridiculously large. Since your numbers are integer, np.inf doesn't work, but something like that could be more elegant.
EDIT:
Another way:
from scipy.spatial import cKDTree
tree = cKDTree(A.reshape(-1, 1))
k = 2
large_value = max(A.max(), B.max())
while True:
indices = tree.query(B.reshape(-1, 1), k=k)[1]
diff = B[:,np.newaxis]-A[indices]
if np.all(diff.max(axis=-1)>=0):
break
k += 1
diff[diff<0] = large_value
diff.min(axis=1)
This solution could be more memory-efficient but frankly I'm not sure how much more.
A \ B in matlab gives a special solution while numpy.linalg.lstsq doesn't.
A = [1 2 0; 0 4 3];
b = [8; 18];
c_mldivide = A \ b
c_mldivide =
0
4
0.66666666666667
c_lstsq = np.linalg.lstsq([[1 ,2, 0],[0, 4, 3]],[[8],[18]])
print c_lstsq
c_lstsq = (array([[ 0.91803279],
[ 3.54098361],
[ 1.27868852]]), array([], dtype=float64), 2, array([ 5.27316304,1.48113184]))
How does mldivide A \ B in matlab give a special solution?
Is this solution usefull in achieving computational accuracy?
Why is this solution special and how might you implement it in numpy?
For under-determined systems such as yours (rank is less than the number of variables), mldivide returns a solution with as many zero values as possible. Which of the variables will be set to zero is up to its arbitrary choice.
In contrast, the lstsq method returns the solution of minimal norm in such cases: that is, among the infinite family of exact solutions it will pick the one that has the smallest sum of squares of the variables.
So, the "special" solution of Matlab is somewhat arbitrary: one can set any of the three variables to zero in this problem. The solution given by NumPy is in fact more special: there is a unique minimal-norm solution
Which solution is better for your purpose depends on what your purpose is. The non-uniqueness of solution is usually a reason to rethink your approach to the equations. But since you asked, here is NumPy code that produces Matlab-type solutions.
import numpy as np
from itertools import combinations
A = np.matrix([[1 ,2, 0],[0, 4, 3]])
b = np.matrix([[8],[18]])
num_vars = A.shape[1]
rank = np.linalg.matrix_rank(A)
if rank == num_vars:
sol = np.linalg.lstsq(A, b)[0] # not under-determined
else:
for nz in combinations(range(num_vars), rank): # the variables not set to zero
try:
sol = np.zeros((num_vars, 1))
sol[nz, :] = np.asarray(np.linalg.solve(A[:, nz], b))
print(sol)
except np.linalg.LinAlgError:
pass # picked bad variables, can't solve
For your example it outputs three "special" solutions, the last of which is what Matlab chooses.
[[-1. ]
[ 4.5]
[ 0. ]]
[[ 8.]
[ 0.]
[ 6.]]
[[ 0. ]
[ 4. ]
[ 0.66666667]]
I have two numpy arrays of shape arr1=(~140000, 3) and arr2=(~450000, 10). The first 3 elements of each row, for both the arrays, are coordinates (z,y,x). I want to find the rows of arr2 that have the same coordinates of arr1 (which can be considered a subgroup of arr2).
for example:
arr1 = [[1,2,3],[1,2,5],[1,7,8],[5,6,7]]
arr2 = [[1,2,3,7,66,4,3,44,8,9],[1,3,9,6,7,8,3,4,5,2],[1,5,8,68,7,8,13,4,53,2],[5,6,7,6,67,8,63,4,5,20], ...]
I want to find common coordinates (same first 3 elements):
list_arr = [[1,2,3,7,66,4,3,44,8,9], [5,6,7,6,67,8,63,4,5,20], ...]
At the moment I'm doing this double loop, which is extremely slow:
list_arr=[]
for i in arr1:
for j in arr2:
if i[0]==j[0] and i[1]==j[1] and i[2]==j[2]:
list_arr.append (j)
I also tried to create (after the 1st loop) a subarray of arr2, filtering it on the value of i[0] (arr2_filt = [el for el in arr2 if el[0]==i[0]). This speed a bit the operation, but it still remains really slow.
Can you help me with this?
Approach #1
Here's a vectorized one with views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
a,b = view1D(arr1,arr2[:,:3])
out = arr2[np.in1d(b,a)]
Approach #2
Another with dimensionality-reduction for ints -
d = np.maximum(arr2[:,:3].max(0),arr1.max(0))
s = np.r_[1,d[:-1].cumprod()]
a,b = arr1.dot(s),arr2[:,:3].dot(s)
out = arr2[np.in1d(b,a)]
Improvement #1
We could use np.searchsorted to replace np.in1d for both of the approaches listed earlier -
unq_a = np.unique(a)
idx = np.searchsorted(unq_a,b)
idx[idx==len(a)] = 0
out = arr2[unq_a[idx] == b]
Improvement #2
For the last improvement on using np.searchsorted that also uses np.unique, we could use argsort instead -
sidx = a.argsort()
idx = np.searchsorted(a,b,sorter=sidx)
idx[idx==len(a)] = 0
out = arr2[a[sidx[idx]]==b]
You can do it with the help of set
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[7,8,9,11,14,34],[23,12,11,10,12,13],[1,2,3,4,5,6]])
# create array from arr2 with only first 3 columns
temp = [i[:3] for i in arr2]
aset = set([tuple(x) for x in arr])
bset = set([tuple(x) for x in temp])
np.array([x for x in aset & bset])
Output
array([[7, 8, 9],
[1, 2, 3]])
Edit
Use list comprehension
l = [list(i) for i in arr2 if i[:3] in arr]
print(l)
Output:
[[7, 8, 9, 11, 14, 34], [1, 2, 3, 4, 5, 6]]
For integers Divakar already gave an excellent answer. If you want to compare floats you have to consider e.g. the following:
1.+1e-15==1.
False
1.+1e-16==1.
True
If this behaviour could lead to problems in your code I would recommend to perform a nearest neighbour search and probably check if the distances are within a specified threshold.
import numpy as np
from scipy import spatial
def get_indices_of_nearest_neighbours(arr1,arr2):
tree=spatial.cKDTree(arr2[:,0:3])
#You can check here if the distance is small enough and otherwise raise an error
dist,ind=tree.query(arr1, k=1)
return ind
I am currently looking for an efficient way to slice multidimensional matrices in MATLAB. Ax an example, say I have a multidimensional matrix such as
A = rand(10,10,10)
I would like obtain a subset of this matrix (let's call it B) at certain indices along each dimension. To do this, I have access to the index vectors along each dimension:
ind_1 = [1,4,5]
ind_2 = [1,2]
ind_3 = [1,2]
Right now, I am doing this rather inefficiently as follows:
N1 = length(ind_1)
N2 = length(ind_2)
N3 = length(ind_3)
B = NaN(N1,N2,N3)
for i = 1:N1
for j = 1:N2
for k = 1:N3
B(i,j,k) = A(ind_1(i),ind_2(j),ind_3(k))
end
end
end
I suspect there is a smarter way to do this. Ideally, I'm looking for a solution that does not use for loops and could be used for an arbitrary N dimensional matrix.
Actually it's very simple:
B = A(ind_1, ind_2, ind_3);
As you see, Matlab indices can be vectors, and then the result is the Cartesian product of those vector indices. More information about Matlab indexing can be found here.
If the number of dimensions is unknown at programming time, you can define the indices in a cell aray and then expand into a comma-separated list:
ind = {[1 4 5], [1 2], [1 2]};
B = A(ind{:});
You can reference data in matrices by simply specifying the indices, like in the following example:
B = A(start:stop, :, 2);
In the example:
start:stop gets a range of data between two points
: gets all entries
2 gets only one entry
In your case, since all your indices are 1D, you could just simply use:
C = A(x_index, y_index, z_index);
I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)