Setting values of Numpy array when indexing an indexed array - arrays

I'm trying to index some matrix, y, and then reindex that result with some boolean statement and set the corresponding elements in y to 0. The dummy code I'm using to test this indexing scheme is shown below.
x=np.zeros([5,4])+0.1;
y=x;
print(x)
m=np.array([0,2,3]);
y[0:4,m][y[0:4,m]<0.5]=0;
print(y)
I'm not sure why it does not work. The output I want:
[[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]]
[[ 0. 0.1 0. 0. ]
[ 0. 0.1 0. 0. ]
[ 0. 0.1 0. 0. ]
[ 0. 0.1 0. 0. ]
[ 0.1 0.1 0.1 0.1]]
But what I actually get:
[[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]]
[[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]
[ 0.1 0.1 0.1 0.1]]
I'm sure I'm missing some under-the-hood details that explains why this does not work. Interestingly, if you replace m with :, then the assignment works. For some reason, selecting a subset of the columns does not let me assign the zeros.
If someone could explain what's going on and help me find an alternative solution (hopefully one that does not involve generating a temporary numpy array since my actual y will be really huge), I would really appreciate it! Thank you!
EDIT:
y[0:4,:][y[0:4,:]<0.5]=0;
y[0:4,0:3][y[0:4,0:3]<0.5]=0;
etc.
all work as expected. It seems the issue is when you index with a list of some kind.

Make an array (this is one of my favorites because the values differ):
In [845]: x=np.arange(12).reshape(3,4)
In [846]: x
Out[846]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [847]: m=np.array([0,2,3])
In [848]: x[:,m]
Out[848]:
array([[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11]])
In [849]: x[:,m][:2,:]=0
In [850]: x
Out[850]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
No change. But if I do the indexing in one step, it changes.
In [851]: x[:2,m]=0
In [852]: x
Out[852]:
array([[ 0, 1, 0, 0],
[ 0, 5, 0, 0],
[ 8, 9, 10, 11]])
it also works if I reverse the order:
In [853]: x[:2,:][:,m]=10
In [854]: x
Out[854]:
array([[10, 1, 10, 10],
[10, 5, 10, 10],
[ 8, 9, 10, 11]])
x[i,j] is executed as x.__getitem__((i,j)). x[i,j]=v as x.__setitem__((i,j),v).
x[i,j][k,l]=v is x.__getitem__((i,j)).__setitem__((k,l),v).
The set applies to the value produced by the get. If the get returns a view, then the change affects x. But if it produces a copy, the change does not affect x.
With array m, y[0:4,m] produces a copy (do I need to demonstrate that?). y[0:4,:] produces a view.
So in short, if the first indexing produces a view the second indexed assignment works. But if produces a copy, the second has no effect.

Related

What function in DolphinDB corresponds to choice in Python numpy?

For example, I want to generate a sample of 100 elements from the array a = [1, 2, 3, 4] with the probabilities p = [0.1, 0.1, 0.3, 0.5] associated with each element in a. In Python I can use np.random.choice(a=[1, 2, 3, 4], size=100, p=[0.1, 0.1, 0.3, 0.5]).
Does DolphinDB have a built-in function for this?
You can use a user-defined function:
def choice(v, n, p){
        cump = removeTail!([0.0].join(cumsum(p\p.sum())), 1)
        return v[cump.asof(rand(1.0, n))]
}
a=[1, 2, 3, 4]
n=100000
p=[0.1, 0.1, 0.3, 0.5]
r = choice(a, n, p)
Starting from version 1.30.19/2.00.7, you can use the built-in function randDiscrete directly:
randDiscrete(1 2 3 4, [0.1, 0.1, 0.3, 0.3], 100)
output:

appending values to coordinates in array of zeros

I am generating two parameters e.g.
s1 = [0, 0.25, 0.5, 0.75, 1.0]
s2 = [0, 0.25, 0.5, 0.75, 1.0]
based on dimensions of both these lists above, i am creating a grid of zeros:
np.zeros((5,5))
I then pair up each of the numbers in each list so they form coordinate locations in my empty grid e.g. (0,0), (0,0.25), (0,0.5) etc. (25 combinations to fit into 5x5 grid).
my issue is i am not too sure how to append values into the grid based on each of the coordinates generated. e.g. if i want to append the number 5 to grid location (0,0) etc so the grid fills up.
Any help is greatly appreciated.
The easiest way is probably to construct a meshgrid and transpose it so that the axises are the way you want them:
np.array(np.meshgrid(s1, s2)).transpose(1, 2, 0)
not sure it is fastest way to get it, check if the output is what you were expecting
import numpy as np
s1 = [0, 0.25, 0.5, 0.75, 1.0]
s2 = [0, 0.25, 0.5, 0.75, 1.0]
arr = np.zeros((5,5,2), dtype=float)
print (arr.shape, arr.size, arr.ndim)
for i in range(len(s1)):
for j in range(len(s2)):
arr[i,j] = s1[i], s2[j]
print(arr)
output :
(5, 5, 2) 50 3
[[[0. 0. ]
[0. 0.25]
[0. 0.5 ]
[0. 0.75]
[0. 1. ]]
[[0.25 0. ]
[0.25 0.25]
[0.25 0.5 ]
[0.25 0.75]
[0.25 1. ]]
[[0.5 0. ]
[0.5 0.25]
[0.5 0.5 ]
[0.5 0.75]
[0.5 1. ]]
[[0.75 0. ]
[0.75 0.25]
[0.75 0.5 ]
[0.75 0.75]
[0.75 1. ]]
[[1. 0. ]
[1. 0.25]
[1. 0.5 ]
[1. 0.75]
[1. 1. ]]]

Is it possible to round to an integer AND remove the decimal point in an array in python

After rounding to an integer the result of operations between lists that produce an array is there a way to remove the decimal point? I am using python in Jupyter notebooks.
Should I use something other than 'np.round'?
'FoodSpent and 'Income' and are simply two lists of data that I created. The initial rounding attempt left the decimal point.
>>>PercentFood = np.around((FoodSpent / Income) * 100, 0)
>>>PercentFood
array([[ 10., 7., 11., 10., 6., 10., 10., 12., 11., 9., 11.,
14.]
Thanks to advice given I ran the following, which rounded down to the integer without giving the decimal point.
>>> PercentFood = ((FoodSpent / Income) * 100)
>>> PercentFood.astype(int)
array([[ 9, 6, 11, 9, 6, 9, 10, 11, 10, 9, 11, 13]])
I'm not sure how exactly your code works with this much context, but you can put this after rounding to get rid of the decimal.
PercentFood = [round(x) for x in PercentFood]

Split an array into bins of equal numbers

I have an array (not sorted) of N elements. I'd like to keep the original order of N, but instead of the actual elements, I'd like them to have their bin numbers, where N is split into m bins of equal (if N is divisible by m) or nearly equal (N not divisible by m) values. I need a vectorized solution (since N is fairly large, so standard python methods won't be efficient). Is there anything in scipy or numpy that can do this?
e.g.
N = [0.2, 1.5, 0.3, 1.7, 0.5]
m = 2
Desired output: [0, 1, 0, 1, 0]
I've looked at numpy.histogram, but it doesn't give me unequally spaced bins.
Listed in this post is a NumPy based vectorized approach with the idea of creating equally spaced indices for the length of the input array using np.searchsorted -
Here's the implementation -
def equal_bin(N, m):
sep = (N.size/float(m))*np.arange(1,m+1)
idx = sep.searchsorted(np.arange(N.size))
return idx[N.argsort().argsort()]
Sample runs with bin-counting for each bin to verify results -
In [442]: N = np.arange(1,94)
In [443]: np.bincount(equal_bin(N, 4))
Out[443]: array([24, 23, 23, 23])
In [444]: np.bincount(equal_bin(N, 5))
Out[444]: array([19, 19, 18, 19, 18])
In [445]: np.bincount(equal_bin(N, 10))
Out[445]: array([10, 9, 9, 10, 9, 9, 10, 9, 9, 9])
Here's another approach using linspace to create those equally spaced numbers that could be used as indices, like so -
def equal_bin_v2(N, m):
idx = np.linspace(0,m,N.size+0.5, endpoint=0).astype(int)
return idx[N.argsort().argsort()]
Sample run -
In [689]: N
Out[689]: array([ 0.2, 1.5, 0.3, 1.7, 0.5])
In [690]: equal_bin_v2(N,2)
Out[690]: array([0, 1, 0, 1, 0])
In [691]: equal_bin_v2(N,3)
Out[691]: array([0, 1, 0, 2, 1])
In [692]: equal_bin_v2(N,4)
Out[692]: array([0, 2, 0, 3, 1])
In [693]: equal_bin_v2(N,5)
Out[693]: array([0, 3, 1, 4, 2])
pandas.qcut
Another good alternative is the pd.qcut from pandas. For example:
In [6]: import pandas as pd
In [7]: N = [0.2, 1.5, 0.3, 1.7, 0.5]
...: m = 2
In [8]: pd.qcut(N, m, labels=False)
Out[8]: array([0, 1, 0, 1, 0], dtype=int64)
Tip for getting the bin middle points
If you want to return the bin edges, use labels=True (default). This will allow you to get the bin middle points with:
In [26]: intervals = pd.qcut(N, 2)
In [27]: [i.mid for i in intervals]
Out[27]: [0.34950000000000003, 1.1, 0.34950000000000003, 1.1, 0.34950000000000003]
The intervals is an array of pandas.Interval objects (when labels=True).
See also: pd.cut, if you would like to make the bin width (not bin count) equal

strange behavior when updating matrix

import numpy as np
X_mini=np.array([[ 4, 2104, 1],
[ 1, 1600, 3],
[ 3, 2400, 100]])
def feature_normalization(X):
row_length=len(X[0:1][0])
for i in range(0, row_length):
if not X[:,i].std()==0:
temp=(X[:,i]-X[:,i].mean())/X[:,i].std()
print(temp)
X[:,i]=temp
feature_normalization(X_mini)
print(X_mini)
outputs:
[ 1.06904497 -1.33630621 0.26726124]
[ 0.209937 -1.31614348 1.10620649]
[-0.72863911 -0.68535362 1.41399274]
[[ 1 0 0]
[-1 -1 0]
[ 0 1 1]]
my question is, why does not X_mini (after applying feature_normalization) correspond to what is being printed out?
Your array holds values of integer type (probably int64).
When fractions are inserted into it, they're converted to int.
You can explicitly specify the type of an array you create:
X_mini = np.array([[ 4.0, 2104.0, 1.0],
[ 1.0, 1600.0, 3.0],
[ 3.0, 2400.0, 100.0]], dtype=np.float128)
You can also convert an array to another type using numpy.ndarray.astype (docs).

Resources