Related
If I want to create a 5x5 zero matrix with values of 10, 20, 30, 40 just above the diagonal I can do the following:
import numpy as np
np.diag((1+np.arange(4))*10,k=1)
but how can i replace the elements above the diagonal in a 5x5 random matrix with the same array 10, 20, 30, 40 ? I have tried to use the numpy where function which works with 1D arrays like:
import numpy as np
array1 = np.array([2, 2, 2, 0, 2, 0, 2])
print np.where(array1==0, 1, array1)
but I cannot make it working in higher dimensions. I can manually assign the values, but i am looking for a better solution.
You can try advance indexing:
a = np.arange(25).reshape(5,5)
s = np.arange(len(a))
a[s[:-1], s[1:]] = [10,20,30,40]
Output:
array([[ 0, 10, 2, 3, 4],
[ 5, 6, 20, 8, 9],
[10, 11, 12, 30, 14],
[15, 16, 17, 18, 40],
[20, 21, 22, 23, 24]])
Maybe this works. For example, for this array:
arr = np.random.rand(5,5)
print(arr)
[[0.63267449 0.81436882 0.49014052 0.85241815 0.39175126]
[0.79926876 0.46784356 0.64146423 0.24392249 0.70449611]
[0.28667995 0.58503395 0.80665148 0.84331471 0.10687276]
[0.59349235 0.23448985 0.25971096 0.60335227 0.31760505]
[0.10723313 0.44694671 0.99660858 0.31529209 0.42713487]]
with np.diag(arr, k=1) you get the diagonal above the main diagonal.
diag = np.diag(arr, k=1)
you can get the indexes of the elements in diag using np.isin(...) and then replace those entries with [10, 20, 30, 40 ].
idxs = np.isin(arr, diag).nonzero()
arr[idxs] = np.array([10, 20, 30, 40 ], dtype = np.float)
arr
array([[ 0.63267449, 10. , 0.49014052, 0.85241815, 0.39175126],
[ 0.79926876, 0.46784356, 20. , 0.24392249, 0.70449611],
[ 0.28667995, 0.58503395, 0.80665148, 30. , 0.10687276],
[ 0.59349235, 0.23448985, 0.25971096, 0.60335227, 40. ],
[ 0.10723313, 0.44694671, 0.99660858, 0.31529209, 0.42713487]])
I have a Spark (Python) dataframe with two columns: a user ID and then an array of arrays, which is represented in Spark as a wrapped array like so:
[WrappedArray(9, 10, 11, 12), WrappedArray(20, 21, 22, 23, 24, 25, 26)]
In its usual representation this would look like this:
[[9, 10, 11, 12], [20, 21, 22, 23, 24, 25, 26]]
I want to perform operations on each of the subarrays, for example take a third list and check whether any of its values is in the first sub-array, but I can't seem to find solutions for pyspark 2.0 (only Scala-specific older solutions like this and this).
How does one access (and in general work with) wrapped arrays? What is an efficient way to do what I described above?
You can treat each wrapped array as individual list . in your example, if you want to which elements from 2nd wrapped array is present in first array, you could do something like -
# Prepare data
data = [[10001,[9, 10, 11, 12],[20, 10, 9, 23, 24, 25, 26]],
[10002,[8, 1, 2, 3],[49, 3, 6, 5, 6]],
]
rdd = sc.parallelize(data)
df = rdd.map(
lambda row : row+[
[x for x in row[2] if x in row[1]]
]
).toDF(["userID","array1","array2","commonElements"])
df.show()
output :
+------+---------------+--------------------+--------------+
|userID| array1| array2|commonElements|
+------+---------------+--------------------+--------------+
| 10001|[9, 10, 11, 12]|[20, 10, 9, 23, 2...| [10, 9]|
| 10002| [8, 1, 2, 3]| [49, 3, 6, 5, 6]| [3]|
+------+---------------+--------------------+--------------+
I am looking for the most efficient and pythonic algorithm for doing an array calculation. Here is the problem:
I have an array of shape (5,2,3) and its sum along the axis=0 as follows:
import numpy as np
A = np.array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
B = A.sum(axis=0)
So B is basically equal to A[0]+A[1]+A[2]+A[3]+A[4] which is:
array([[232, 142, 283],
[387, 367, 229]])
I want to know at what stage of the sum process, each of 6 elements of B has gone bigger than 100. For example element B[0,0] goes above 100 after 3 steps: A[0]+A[1]+A[2], or B[1,1] goes above 100 after 2 steps A[0]+A[1].
So the final output of the algorithm should be this array:
array([[3, 5, 2],
[2, 2, 4]])
I know I can do the calculation for each element separately but I was wondering if anyone could come up with a creative and faster algorithm.
Cheers,
Use cumsum to get a cumulative summation, compare it against the threshold and finally use argmax to catch it as the first instance of crossing that threshold -
(A.cumsum(axis=0) > 100).argmax(axis=0)+1
Sample run -
In [228]: A
Out[228]:
array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
In [229]: (A.cumsum(0) > 100).argmax(0)+1
Out[229]:
array([[3, 5, 2],
[2, 2, 4]])
USING IDLE/Python 3.5.1
May I first of all begin by saying I am a reasonably experienced programmer in VBA but am on day 2 of Python. I assure you I have conducted many searches on this question but the 30 or so documents I have read do not seem to explain my problem.
May I also please request that any answers given are properly formatted code for Python 3.5.1 rather than helpful pointers to other documentation or links?
The Problem
I am running a report and outputting results as I go. I need to store the results (presumably in an array) during this so that I can refer to them afterwards. The report (and the populating of the array) can be rerun multiple times so please bear that in mind if using concepts like 'append' when building the array. The array has dimensions [25,4] - a maximum of 25 records with four items in each.
Day X Y Z Total
1 2 3 4 9
2 3 4 5 12 ...
(Purists: The total needs to be recorded rather than calculated because of rounding.)
I could solve the problem myself if someone could translate this bit of code into Python (from VBA for illustration purposes). I do not want to import the arrays module unless it's the only way. Note: Variable l is a loop that makes the array get built twice to demonstrate that the array needs to be capable of rebuilding from scratch rather than being created just the once.
Sub sArray()
Dim a(25, 4)
For l = 1 To 2
For i = 1 To 25
For j = 1 To 4
a(i, j) = Int(100 * Rnd(1)) + 1
Debug.Print a(i, j);
Next j
Next i
Next l
End Sub
Thanks,
Tom
I am not sure I got your question correctly...
If you want to make an array (list i a better term in this case) of size [25,4] this is one way to go:
import random
a = [[int(100*random.random())+1 for j in range(4)] for i in range(25)]
>>> print a
[[74, 17, 36, 75],
[1, 79, 33, 90],
[58, 66, 47, 95],
[35, 40, 87, 38],
[43, 46, 34, 66],
[69, 34, 26, 49],
[56, 83, 44, 14],
[2, 44, 54, 97],
[50, 21, 39, 60],
[13, 94, 12, 48],
[36, 13, 2, 71],
[77, 44, 31, 11],
[56, 26, 30, 39],
[17, 13, 83, 84],
[54, 37, 34, 18],
[5, 54, 88, 100],
[22, 77, 70, 21],
[51, 88, 26, 97],
[69, 33, 86, 48],
[42, 66, 38, 78],
[71, 43, 96, 23],
[6, 46, 100, 29],
[32, 86, 15, 48],
[96, 84, 8, 56],
[29, 64, 69, 79]]
if you want to show that "the array needs to be capable of rebuilding from scratch rather than being created just the once" (why would you need this??)
for l in range(2):
a = [[int(100*random.random())+1 for j in range(4)] for i in range(25)]
Also, the way of generating random numbers is odd (I have translated you method). To get the same result in python, just use random.randint(1,100) to generate random integers from 1 (i think you do not want to have 0 there) to whatever number you like.
If I have correctly understood from your comments, this is what you want:
def report(g=25):
array = []
for _ in range(g):
x = random.randint(1,10)
y = random.randint(1,10)
z = random.randint(1,10)
total = x+y+x
row = [x,y,z,total]
print(row)
array.append(row)
return array
result = report()
#prints all the rows while computing
>>> result #stores the "array"
[8, 4, 3, 20]
[10, 7, 4, 27]
[2, 4, 5, 8]
[8, 5, 8, 21]
[9, 7, 2, 25]
[2, 2, 3, 6]
[5, 8, 6, 18]
[8, 6, 1, 22]
[7, 6, 4, 20]
[7, 2, 10, 16]
[6, 5, 9, 17]
[3, 8, 8, 14]
[9, 1, 9, 19]
[1, 7, 7, 9]
[6, 6, 2, 18]
[9, 10, 1, 28]
[4, 6, 2, 14]
[6, 1, 6, 13]
[4, 1, 3, 9]
[5, 3, 5, 13]
[7, 5, 2, 19]
[9, 5, 7, 23]
[2, 5, 8, 9]
[3, 10, 4, 16]
[5, 6, 5, 16]
I have 100 3x3x3 matrices that I would like to multiply with another large matrix of size 3x5x5 (similar to convolving one image with multiple filters, but not quite).
For the sake of explanation, this is what my large matrix looks like:
>>> x = np.arange(75).reshape(3, 5, 5)
>>> x
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],
[[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]],
[[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59],
[60, 61, 62, 63, 64],
[65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]])
In memory, I assume all sub matrices in the large matrix are stored in contiguous locations (please correct me if I'm wrong). What I want to do is, from this 3x5x5 matrix, I want to extract 3 5x3 columns from each sub-matrix of the large matrix and then join them horizontally to get a 5x9 matrix (I apologise if this part is not clear, I can explain in more detail if need be). If I were using numpy, I'd do:
>>> k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))
>>> k
array([[ 0, 1, 2, 25, 26, 27, 50, 51, 52],
[ 5, 6, 7, 30, 31, 32, 55, 56, 57],
[10, 11, 12, 35, 36, 37, 60, 61, 62],
[15, 16, 17, 40, 41, 42, 65, 66, 67],
[20, 21, 22, 45, 46, 47, 70, 71, 72]])
However, I'm not using python so I do not have any access to the numpy functions that I need in order to reshape the data blocks into a form I want to carry out multiplication... I can only directly call the cblas_sgemm function (from the BLAS library) in C, where k corresponds to input B.
Here's my call to cblas_sgemm:
cblas_sgemm( CblasRowMajor, CblasNoTrans, CblasTrans,
100, 5, 9,
1.0,
A, 9,
B, 9, // this is actually wrong, since I don't know how to specify the right parameter
0.0,
result, 5);
Basically, the ldb attribute is the offender here, because my data is not blocked the way I need it to be. I have tried different things, but I am not able to get cblas_sgemm to understand how I want it to read and understand my data.
In short, I don't know how to tell cblas_sgemm to read x like k.Is there a way I can smartly reshape my data in python before sending it to C, so that cblas_sgemm can work the way I want it to?
I will transpose k by setting CblasTrans, so during multiplication, B is 9x5. My matrix A is of shape 100x9. Hope that helps.
Any help would be appreciated. Thanks!
In short, I don't know how to tell cblas_sgemm to read x like k.
You can't. You'll have to make a copy.
Consider k:
In [20]: k
Out[20]:
array([[ 0, 1, 2, 25, 26, 27, 50, 51, 52],
[ 5, 6, 7, 30, 31, 32, 55, 56, 57],
[10, 11, 12, 35, 36, 37, 60, 61, 62],
[15, 16, 17, 40, 41, 42, 65, 66, 67],
[20, 21, 22, 45, 46, 47, 70, 71, 72]])
In a two-dimensional array, the spacing of the elements in memory must be the same in each axis. You know from how x was created that the consecutive elements in memory are 0, 1, 2, 3, 4, ..., but your first row of k contains 0, 1, 2, 25, 26, ..... The is no spacing between 1 and 2 (i.e. the memory address increases by the size of one element of the array), but there is a large jump in memory between 2 and 25. So you'll have to make a copy to create k.
Having said that, there is an alternative method to efficiently achieve your desired final result using a bit of reshaping (without copying) and numpy's einsum function.
Here's an example. First define x and A:
In [52]: x = np.arange(75).reshape(3, 5, 5)
In [53]: A = np.arange(90).reshape(10, 9)
Here's my understanding of what you want to achieve; A.dot(k.T) is the desired result:
In [54]: k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))
In [55]: A.dot(k.T)
Out[55]:
array([[ 1392, 1572, 1752, 1932, 2112],
[ 3498, 4083, 4668, 5253, 5838],
[ 5604, 6594, 7584, 8574, 9564],
[ 7710, 9105, 10500, 11895, 13290],
[ 9816, 11616, 13416, 15216, 17016],
[11922, 14127, 16332, 18537, 20742],
[14028, 16638, 19248, 21858, 24468],
[16134, 19149, 22164, 25179, 28194],
[18240, 21660, 25080, 28500, 31920],
[20346, 24171, 27996, 31821, 35646]])
Here's how you can get the same result by slicing x and reshaping A:
In [56]: x2 = x[:,:,:3]
In [57]: A2 = A.reshape(-1, 3, 3)
In [58]: einsum('ijk,jlk', A2, x2)
Out[58]:
array([[ 1392, 1572, 1752, 1932, 2112],
[ 3498, 4083, 4668, 5253, 5838],
[ 5604, 6594, 7584, 8574, 9564],
[ 7710, 9105, 10500, 11895, 13290],
[ 9816, 11616, 13416, 15216, 17016],
[11922, 14127, 16332, 18537, 20742],
[14028, 16638, 19248, 21858, 24468],
[16134, 19149, 22164, 25179, 28194],
[18240, 21660, 25080, 28500, 31920],
[20346, 24171, 27996, 31821, 35646]])