Related
I have an formula that I use multiple times in my subroutine, but my processor does not have division instruction(M0), so this is handled by the software library. To speed up this operation, I am considering using a lookup table to store the result of the inverse. However that would still take up 2kb in space (2 bytes per value). How can I optimize it further?
Formula is as follows, k is a constant known at compile time k = [10, 100]. x = [0, 1023]
(1000 * k) * ((1023/x) - 1)
EDITE: Clarification about precision. Since I have the "1000", I am considering using the result of the multiplication by 1000 to increase precision.
Assuming / is integer division
You don't need to store 1024 values, because many values of x result in the same value of 1023/x.
Specifically:
x: [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 39, 40, 42, 44, 46, 48, 51, 53, 56, 60, 63, 68, 73, 78, 85, 93, 102, 113, 127, 146, 170, 204, 255, 341, 511, 1023]
1023/x: [1023, 511, 341, 255, 204, 170, 146, 127, 113, 102, 93, 85, 78, 73, 68, 63, 60, 56, 53, 51, 48, 46, 44, 42, 40, 39, 37, 36, 35, 34, 33, 31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
You need only to store these 62 values of x and the 62 results of 1023/x.
As a bonus: if you look carefully, you'll notice those values are symmetric. The values for x are the exact mirror of the values for 1023/x. So you only need to store one of these two arrays.
You can easily shrink the lookup table to 256*2 bytes
static inline uint16_t get1023divxminus1(uint16_t x)
{
static const uint16_t table[256] = {0, 1022, 510, ....., 3};
if (x >= 512) return 0;
if (x >= 342) return 1;
if (x >= 256) return 2;
return table[x];
}
You could shrink the table even further, but I think it isn't worth the additional ifs.
You could compress the data in the table.
For example by storing full 2-byte values for every N-th value of x and store difference values for xs in between. The difference should fit in 1 byte in many cases.
If N would be 4, you'd store full values for x: 0, 4, 8, ... and difference values for x: 1, 2, 3, 5, 6, 7, 9, ...
To get the result for say x == 3, start with 2-byte value of 0 and add the 1-byte difference values of 1 and 2.
There will for sure be other 'tricks' to play if you'd have a close look at the data and think in the direction of data compression.
Accessing RAM is probably going to be slower than calculating long division, as long as your values fit within a register. In principle, calculating long division should be linear in the number of bits. Implement both and profile, but I am highly convinced that long division will be faster:
The algorithms is:
left shift the divisor until the MSD of the divisor equals the MSD of the dividend.
If the divisor is smaller than the dividend, write one, else write 0. Right shift the divisor by one. Repeat until the LSD of the divisor is also the LSD of the dividend.
Here is an explicit implementation:
https://codegolf.stackexchangechaschastitytity.com/questions/24541/divide-two-numbers-using-long-division
I want to update only 1 element in the 1d array and then start over fresh. if viewed as a matrix form I just want entries i = j to be changed.
my code so far:
import numpy as np
a = np.array([10, 20, 30, 40, 50])
for i, j in enumerate(a):
b = a
b[i] = j + 1
print(b)
I want each iteration of the for loop to only change one element and keep everything else the same.
the output I want looks like this:
[11, 20, 30, 40, 50]
[10, 21, 30, 40, 50]
[10, 20, 31, 40, 50]
[10, 20, 30, 41, 50]
[10, 20, 30, 40, 51]
but I'm getting this because b is not resetting even though I am (or at lest i think) restoring the original array at the start of each loop.
[11, 20, 30, 40, 50]
[11, 21, 30, 40, 50]
[11, 21, 31, 40, 50]
[11, 21, 31, 41, 50]
[11, 21, 31, 41, 51]
any ideas where I went wrong? TIA
Try replacing b=a with b=a.copy()
b=a, will create b and point to the same memory. Whereas b=a.copy(), creates a copy of a and stores it as b in different memory location.
I have list of list of integers as shown below:
flst = [[19],
[21, 31],
[22],
[23],
[9, 25],
[26],
[27, 29],
[28],
[27, 29],
[2, 8, 30],
[21, 31],
[5, 11, 32],
[33]]
I want to get the list of integers in increasing order as shown below:
out = [19, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33]
I want to compare every list item with the item/s in next list and get item which is greater than the preceding item:
for ex:
In the list first item is [19] and next list items are [21,31]. Both elements are greater than [19] but [21] is near to [19], so it should be selected.
I'm learning python and tried the following code:
for i in range(len(flst)-2):
for j in flst[i+1]:
if j in range(flst[j], flst[j+2]):
print(j)
Went through many codes for incremental order in stackoverflow, but unable to find any solution.
Try this:
flst[0]=flst[0][0]
for c in range(len(flst)-1):
flst[c+1]=sorted([n for n in flst[c+1] if n>flst[c]],key=lambda x: x-flst[c])[0]
Output (in flst): [19, 21, 22, 23, 25, 26, 27, 28, 29, 30, 31, 32, 33]
as close to one line as I could get
func = lambda x, t=[]: ([t.append(min([i for i in c if i > max([0]+t)])) for (index, c) in enumerate(x)], sorted(t))[1]
func(flst)
[19, 21, 23, 25, 26, 27, 29, 29, 30, 31, 31, 32, 33]
I am looking for the most efficient and pythonic algorithm for doing an array calculation. Here is the problem:
I have an array of shape (5,2,3) and its sum along the axis=0 as follows:
import numpy as np
A = np.array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
B = A.sum(axis=0)
So B is basically equal to A[0]+A[1]+A[2]+A[3]+A[4] which is:
array([[232, 142, 283],
[387, 367, 229]])
I want to know at what stage of the sum process, each of 6 elements of B has gone bigger than 100. For example element B[0,0] goes above 100 after 3 steps: A[0]+A[1]+A[2], or B[1,1] goes above 100 after 2 steps A[0]+A[1].
So the final output of the algorithm should be this array:
array([[3, 5, 2],
[2, 2, 4]])
I know I can do the calculation for each element separately but I was wondering if anyone could come up with a creative and faster algorithm.
Cheers,
Use cumsum to get a cumulative summation, compare it against the threshold and finally use argmax to catch it as the first instance of crossing that threshold -
(A.cumsum(axis=0) > 100).argmax(axis=0)+1
Sample run -
In [228]: A
Out[228]:
array([[[ 6, 15, 89],
[49, 62, 12]],
[[92, 8, 34],
[93, 81, 35]],
[[ 8, 35, 63],
[68, 89, 5]],
[[27, 20, 85],
[87, 42, 90]],
[[99, 64, 12],
[90, 93, 87]]])
In [229]: (A.cumsum(0) > 100).argmax(0)+1
Out[229]:
array([[3, 5, 2],
[2, 2, 4]])
I have 100 3x3x3 matrices that I would like to multiply with another large matrix of size 3x5x5 (similar to convolving one image with multiple filters, but not quite).
For the sake of explanation, this is what my large matrix looks like:
>>> x = np.arange(75).reshape(3, 5, 5)
>>> x
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]],
[[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39],
[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49]],
[[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59],
[60, 61, 62, 63, 64],
[65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]])
In memory, I assume all sub matrices in the large matrix are stored in contiguous locations (please correct me if I'm wrong). What I want to do is, from this 3x5x5 matrix, I want to extract 3 5x3 columns from each sub-matrix of the large matrix and then join them horizontally to get a 5x9 matrix (I apologise if this part is not clear, I can explain in more detail if need be). If I were using numpy, I'd do:
>>> k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))
>>> k
array([[ 0, 1, 2, 25, 26, 27, 50, 51, 52],
[ 5, 6, 7, 30, 31, 32, 55, 56, 57],
[10, 11, 12, 35, 36, 37, 60, 61, 62],
[15, 16, 17, 40, 41, 42, 65, 66, 67],
[20, 21, 22, 45, 46, 47, 70, 71, 72]])
However, I'm not using python so I do not have any access to the numpy functions that I need in order to reshape the data blocks into a form I want to carry out multiplication... I can only directly call the cblas_sgemm function (from the BLAS library) in C, where k corresponds to input B.
Here's my call to cblas_sgemm:
cblas_sgemm( CblasRowMajor, CblasNoTrans, CblasTrans,
100, 5, 9,
1.0,
A, 9,
B, 9, // this is actually wrong, since I don't know how to specify the right parameter
0.0,
result, 5);
Basically, the ldb attribute is the offender here, because my data is not blocked the way I need it to be. I have tried different things, but I am not able to get cblas_sgemm to understand how I want it to read and understand my data.
In short, I don't know how to tell cblas_sgemm to read x like k.Is there a way I can smartly reshape my data in python before sending it to C, so that cblas_sgemm can work the way I want it to?
I will transpose k by setting CblasTrans, so during multiplication, B is 9x5. My matrix A is of shape 100x9. Hope that helps.
Any help would be appreciated. Thanks!
In short, I don't know how to tell cblas_sgemm to read x like k.
You can't. You'll have to make a copy.
Consider k:
In [20]: k
Out[20]:
array([[ 0, 1, 2, 25, 26, 27, 50, 51, 52],
[ 5, 6, 7, 30, 31, 32, 55, 56, 57],
[10, 11, 12, 35, 36, 37, 60, 61, 62],
[15, 16, 17, 40, 41, 42, 65, 66, 67],
[20, 21, 22, 45, 46, 47, 70, 71, 72]])
In a two-dimensional array, the spacing of the elements in memory must be the same in each axis. You know from how x was created that the consecutive elements in memory are 0, 1, 2, 3, 4, ..., but your first row of k contains 0, 1, 2, 25, 26, ..... The is no spacing between 1 and 2 (i.e. the memory address increases by the size of one element of the array), but there is a large jump in memory between 2 and 25. So you'll have to make a copy to create k.
Having said that, there is an alternative method to efficiently achieve your desired final result using a bit of reshaping (without copying) and numpy's einsum function.
Here's an example. First define x and A:
In [52]: x = np.arange(75).reshape(3, 5, 5)
In [53]: A = np.arange(90).reshape(10, 9)
Here's my understanding of what you want to achieve; A.dot(k.T) is the desired result:
In [54]: k = np.hstack(np.vstack(x)[:, 0:3].reshape(3, 5, 3))
In [55]: A.dot(k.T)
Out[55]:
array([[ 1392, 1572, 1752, 1932, 2112],
[ 3498, 4083, 4668, 5253, 5838],
[ 5604, 6594, 7584, 8574, 9564],
[ 7710, 9105, 10500, 11895, 13290],
[ 9816, 11616, 13416, 15216, 17016],
[11922, 14127, 16332, 18537, 20742],
[14028, 16638, 19248, 21858, 24468],
[16134, 19149, 22164, 25179, 28194],
[18240, 21660, 25080, 28500, 31920],
[20346, 24171, 27996, 31821, 35646]])
Here's how you can get the same result by slicing x and reshaping A:
In [56]: x2 = x[:,:,:3]
In [57]: A2 = A.reshape(-1, 3, 3)
In [58]: einsum('ijk,jlk', A2, x2)
Out[58]:
array([[ 1392, 1572, 1752, 1932, 2112],
[ 3498, 4083, 4668, 5253, 5838],
[ 5604, 6594, 7584, 8574, 9564],
[ 7710, 9105, 10500, 11895, 13290],
[ 9816, 11616, 13416, 15216, 17016],
[11922, 14127, 16332, 18537, 20742],
[14028, 16638, 19248, 21858, 24468],
[16134, 19149, 22164, 25179, 28194],
[18240, 21660, 25080, 28500, 31920],
[20346, 24171, 27996, 31821, 35646]])