Filling empty list of lists with zeros to get a fixed size list of 5 tuples - arrays

l have a sample of 1000 examples. Each sample contains a list of 18 lists which are of variable length and some of lists are empty.
Here is a sample :
len(My_list)
18
print(My_list)
array([list([(17, 163, 0.11258018, 15),(78, 193, 0.99713018, 17),(478, 94, 0.7299528, 2), (63, 268, 0.77531445, 3), (169, 279, 0.7947326, 4),(456, 140, 0.65013665, 7), (61, 301, 0.7433308, 8)]),
list([]),
list([]),
list([]),
list([]),
list([]),
list([]),
list([]),
list([(63, 176, 0.18713018, 0),(199, 185, 0.88743243, 79), (282, 75, 0.752135, 84)]),
list([(62, 185, 0.13743243, 1)]),
list([]),
list([(67, 156, 0.14346971, 2)]),
list([(2, 15, 0.00639179, 3)]),
list([]),
list([]),
list([]),
list([]),
list([])],
dtype=object)
What l would like to do ?
for each list :
1-keeps the first 5 tuples
2- If a list is empty than create a list of five tuples as flollow
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]).
3- If a list is not empty but doesn't contain 5 elements then complete it to get five elements. As My_list[12] contains only one element list([(67, 156, 0.14346971, 2)]) hence :
My_list[12]=list([(67, 156, 0.14346971, 2),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)])
The expected output :
array([list([(17, 163, 0.11258018, 15),(78, 193, 0.99713018, 17),(478, 94, 0.7299528, 2), (63, 268, 0.77531445, 3), (169, 279, 0.7947326, 4)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(63, 176, 0.18713018, 0),(199, 185, 0.88743243, 79), (282, 75, 0.752135, 84),(0,0,0,0),(0,0,0,0)]),
list([(62, 185, 0.13743243, 1),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(67, 156, 0.14346971, 2),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(2, 15, 0.00639179, 3),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)]),
list([(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0),(0,0,0,0)])],
dtype=object)
What l have tried ?
My_list=np.asarray(My_list)
My_list = [joint if len(joint) != 0 else [(0, 0, 0,0)] for joint in My_list]
However, it doesn't make the job. It fills only empty lists with (0,0,0,0).Moreover, lists with one or more elements skip them. And it is expected to fill all empty lists or lists with less than five elments with (0,0,0,0) to get five elements per list.
Any cue ?

Here is one way: Glue 5 tuples to everything and trim later:
>>> ml
array([list([(17, 163, 0.11258018, 15), (78, 193, 0.99713018, 17), (478, 94, 0.7299528, 2), (63, 268, 0.77531445, 3), (169, 279, 0.7947326, 4), (456, 140, 0.65013665, 7), (61, 301, 0.7433308, 8)]),
list([]), list([]), list([]), list([]), list([]), list([]),
list([]),
list([(63, 176, 0.18713018, 0), (199, 185, 0.88743243, 79), (282, 75, 0.752135, 84)]),
list([(62, 185, 0.13743243, 1)]), list([]),
list([(67, 156, 0.14346971, 2)]), list([(2, 15, 0.00639179, 3)]),
list([]), list([]), list([]), list([]), list([])], dtype=object)
>>>
>>> z = np.array([None, 5*[4*(0,)]])[[1]]
>>> z
array([list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)])],
dtype=object)
>>>
>>> res = np.frompyfunc(list.__getitem__, 2, 1)(ml + z, slice(5))
>>> res
array([list([(17, 163, 0.11258018, 15), (78, 193, 0.99713018, 17), (478, 94, 0.7299528, 2), (63, 268, 0.77531445, 3), (169, 279, 0.7947326, 4)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(63, 176, 0.18713018, 0), (199, 185, 0.88743243, 79), (282, 75, 0.752135, 84), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(62, 185, 0.13743243, 1), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(67, 156, 0.14346971, 2), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(2, 15, 0.00639179, 3), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)])],
dtype=object)
Explanation: arrays of object dtype delegate operations like addition to their elements. Therefor ml + z combines each original list with a copy of 5x4 zeros.
Next we only need to cut every list back to 5 elements. The operation somelist[:5] can be written as somelist.__getitem__(slice(5)) or even as list.__getitem__(somelist, slice(5)). This last form is what we "vectorize" using np.frompyfunc.

This a variant on #PaulP answer (and #Eir's comment). It's close enough that I wouldn't post it, except it is faster (and possibly clearer).
Define a function that operates on one list at a time - using that idea of adding the pad, and stripping off unneeded elements:
In [209]: z = [4*(0,) for _ in range(5)]
In [210]: def foo(alist):
...: return (alist + z)[:5]
This can be applied to each list via list comprehension:
In [211]: [foo(row) for row in arr]
Out[211]:
[[(17, 163, 0.11258018, 15),
(78, 193, 0.99713018, 17),
(478, 94, 0.7299528, 2),
(63, 268, 0.77531445, 3),
(169, 279, 0.7947326, 4)],
[(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)],
....
[(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]]
But if you want an object array, #Paul's approach using frompyfunc works nicely:
In [212]: np.frompyfunc(foo,1,1)(arr)
Out[212]:
array([list([(17, 163, 0.11258018, 15), (78, 193, 0.99713018, 17), (478, 94, 0.7299528, 2), (63, 268, 0.77531445, 3), (169, 279, 0.7947326, 4)]),
list([(0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0), (0, 0, 0, 0)]),
.... dtype=object)
Timings:
In [176]: timeit np.frompyfunc(list.__getitem__, 2, 1)(arr + z, slice(5))
14.8 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [184]: timeit [foo(row) for row in arr]
7.6 µs ± 26.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [213]: timeit np.frompyfunc(foo,1,1)(arr)
8.49 µs ± 27.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Related

How to get the mean of specific values from an nd.array region?

Given an ndarray:
np.array(
(
(1, 2, 3, 3, 2),
(4, 5, 4, 3, 2),
(1, 1, 1, 1, 1),
(0, 0, 0, 0, 0),
(0, 2, 3, 4, 0),
)
)
extract the mean of the values bounded by a rectangle with coordinates: (1, 1), (3, 1), (1, 3), (3, 3).
The extracted region of the array would be:
5, 4, 3,
1, 1, 1,
0, 0, 0,
And the mean would be ~1.666666667
import numpy as np
arr = np.array(
(
(1, 2, 3, 3, 2),
(4, 5, 4, 3, 2),
(1, 1, 1, 1, 1),
(0, 0, 0, 0, 0),
(0, 2, 3, 4, 0),
)
)
mean = arr[1:4, 1:4].mean()

How to convert RBG data into a Image

I'm doing a easy cybersecurity challenge where I'm given a data file with only RBG codes like the following ones:
[(0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (0, 0, 2), (1, 1, 3), (1, 1, 3), (1, 1, 3), (0, 0, 2), (0, 0, 2), (0, 0, 2), (1, 1, 3), (0, 0, 2), (0, 0, 2), (1, 1, 3), (1, 1, 3), (1, 1, 3), (1, 1, 3), (0, 0, 2), (0, 0, 2), (1, 1, 3)]
As you may wonder, the file is much wider and longer and I don't know how I should convert all this RBG data into an image. I tried using a python script in a Lubuntu1204 but it still gave me some errors and I didnt get the image I was looking for.
Thanks for the help in advance!

python: vectorized cumulative counting

I have a numpy array and would like to count the number of occurences for each value, however, in a cumulative way
in = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]
I'm wondering if it is best to create a (sparse) matrix with ones at col = i and row = in[i]
1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0
Then we could compute the cumsums along the rows and extract the numbers from the locations where the cumsums increment.
However, if we cumsum a sparse matrix, doesn't become dense? Is there an efficient way of doing it?
Here's one vectorized approach using sorting -
def cumcount(a):
# Store length of array
n = len(a)
# Get sorted indices (use later on too) and store the sorted array
sidx = a.argsort()
b = a[sidx]
# Mask of shifts/groups
m = b[1:] != b[:-1]
# Get indices of those shifts
idx = np.flatnonzero(m)
# ID array that will store the cumulative nature at the very end
id_arr = np.ones(n,dtype=int)
id_arr[idx[1:]+1] = -np.diff(idx)+1
id_arr[idx[0]+1] = -idx[0]
id_arr[0] = 0
c = id_arr.cumsum()
# Finally re-arrange those cumulative values back to original order
out = np.empty(n, dtype=int)
out[sidx] = c
return out
Sample run -
In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])
In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])

I want to use Bilinear interpolation to calculate the summation of vectors

I have individual vectors from my last stage of code which i implemented it
The next stage of the algorithm is to calculate the summation of these vectors
As mentioned in the paper
"The vectors from the previous stage were summed together spatially by bilinearly weighting"
I think The bilinear weighting means bilinear interpolation
can any one tell or give me an example how can i use bilinear interpolation
to calculate the Summation of this vectors
V1 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2]
V2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11]
V3 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 0, 0]
V4 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 19, 19, 0, 0]
V5 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0]
V6 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 0, 0]
V7 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 18, 18, 0, 0]
V8 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 23, 23, 0, 0, 0]
V9= [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
V10 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0]
V11 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 0, 0]
V12 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 11, 11, 0, 0, 0]
V13 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
V14 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0]
V15 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0]
V16 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0]
I googled it but didn't understand the Equations
Regards and thanks in advance !
Sadly I'm having trouble understanding the paper as well. The idea, as you've said, is to weight the vectors based on their distance from the pooling centres, so that vectors farther from the pooling centres have less of an impact. The paper compares this to what is done in the famous SIFT feature, which you can read about in this tutorial.
Below is by best guess as to what the meaning is. Since this is related to machine learning, could also ask people over at cross-validated to get their opinion, or consider contacting the author of the paper.
If I understand correctly, this is amounts to a process similar to bilinear interpolation, except in reverse.
With bilinear interpolation, we are given a set of function values arranged in a grid, and we want to find a good guess for what the function values are between the gridpoints. We do this by taking a weighted average of the four surrounding function values, with the weights being the relative area of the opposite rectangle in the image below. (By "relative" I mean the area is normalized by the area of the whole grid rectangle, so the weights sum to 1.) Note how the point to be interpolated is the closest to the (x1,y2) gridpoint, so we weight it with the largest weight (the relative area of the yellow rectangle).
f(x,y) = w_11*f(x1,y1) + w_21*f(x2,y1) + w_12*f(x1,y2) + w_22*f(x2,y2)
w_ij = area of rectangle opposite (xi,yj) / total area of grid square
The "bilinear weighing" described in the paper seems to be doing the opposite: we have values (or vectors in this case) scattered throughout 2D space, and we want to "pool" their values at a set of gridpoints that we choose.
We do this by adding a fraction of each vector to the four surrounding pooling gridpoints. This fraction would again be the relative area of the opposite rectangle.
In the above image... pooling point (xi,yj) would get w_ij * f(x,y) summed along with the appropriate fraction of any other points we have in the region.
As the paper states, the spacing of the grid points is up to you. I assume it would need to be big enough to allow most polling points have at least one vector in its neighbourhood.
EDIT: Here is an example of what I mean.
(0,1) . _ _ _ _ _ . (1,1)
| |
| v |
| |
| |
(0,0) . _ _ _ _ _ . (1,0)
Let's say the vector v=[10,5] is at point (0.2,0.8)
point (0,0) gets weight 0.8*0.2=0.16, so we add 0.16*v = [1.6,0.8] to that pool
point (1,0) gets weight 0.2*0.2=0.04, so we add 0.04*v = [0.4,0.2] to that pool
point (0,1) gets weight 0.8*0.8=0.64, so we add 0.64*v = [6.4,3.2] to that pool
point (1,1) gets weight 0.2*0.8=0.16, so we add 0.16*v = [1.6,0.8] to that pool

Leading zeros calculation with intrinsic function

I'm trying to optimize some code working in an embedded system (FLAC decoding, Windows CE, ARM 926 MCU).
The default implementation uses a macro and a lookup table:
/* counts the # of zero MSBs in a word */
#define COUNT_ZERO_MSBS(word) ( \
(word) <= 0xffff ? \
( (word) <= 0xff? byte_to_unary_table[word] + 24 : \
byte_to_unary_table[(word) >> 8] + 16 ) : \
( (word) <= 0xffffff? byte_to_unary_table[word >> 16] + 8 : \
byte_to_unary_table[(word) >> 24] ) \
)
static const unsigned char byte_to_unary_table[] = {
8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
However most CPU already have a dedicated instruction, bsr on x86 and clz on ARM (http://www.devmaster.net/articles/fixed-point-optimizations/), that should be more efficient.
On Windows CE we have the intrinsic function _CountLeadingZeros, that should just call that value. However it is 4 times slower than the macro (measured on 10 million of iterations).
How is possible that an intrinsic function, that (should) rely on a dedicated ASM instruction, is 4 times slower?
Check the disassembly. Are you sure that the compiler inserted the instruction? In the Remarks section there is this text:
This function can be implemented by
calling a runtime function.
I suspect that's what's happening in your case.
Note that the CLZ instruction is only available in ARMv5 and later. You need to tell the compiler if you want ARMv5 code:
/QRarch5 ARM5 Architecture
/QRarch5T ARM5T Architecture
(Microsoft incorrectly uses "ARM5" instead of "ARMv5")

Resources