Python '==' operator gives wrong result - arrays

I am comparing two elements of a numpy array. The memory address obtained by id() function for both elements are different. Also the is operator gives out that the two elements are not same.
However if I compare memory address of the two array elements using == operator it gives out that the two elements are same.
I am not able to understand how the == operator gives output as True when the two memory address are different.
Below is my code.
import numpy as np
a = np.arange(8)
newarray = a[np.array([3,4,2])]
print("Initial array : ", a)
print("New array : ", newarray)
# comparison of two element using 'is' operator
print("\ncomparison using is operator : ",a[3] is newarray[0])
# comparison of memory address of two element using '==' operator
print("comparison using == opertor : ", id(a[3]) == id(newarray[0]))
# memory address of both elements of array
print("\nMemory address of a : ", id(a[3]))
print("Memory address of newarray : ", id(newarray[0]))
Output:
Initial array : [0 1 2 3 4 5 6 7]
New array : [3 4 2]
comparison using is operator : False
comparison using == operator : True
Memory address of a : 2807046101296
Memory address of newarray : 2808566470576

This is probably due to a combination of Python's integer caching and obscure implemetation details of numpy.
If you slightly change the code you will see that the ids are not consistent during the flow of the code, but they are actually the same on each line:
import numpy as np
a = np.arange(8)
newarray = a[np.array([3,4,2])]
print(id(a[3]), id(newarray[0]))
print(id(a[3]), id(newarray[0]))
outputs
276651376 276651376
20168608 20168608

A numpy array does not store references to objects like a list (unless it is object dtype). It has a 1d databuffer with the numeric values, which it may access in various ways.
In [17]: a = np.arange(8)
...: newarray = a[np.array([3,4,2])]
In [18]: a
Out[18]: array([0, 1, 2, 3, 4, 5, 6, 7])
In [21]: newarray
Out[21]: array([3, 4, 2])
newarray, produced with advanced indexing is not a view. It has its own databuffer and values.
Let's 'unbox' elements of these arrays, assigning them to variables.
In [22]: x = a[3]; y = newarray[0]
In [23]: x
Out[23]: 3
In [24]: y
Out[24]: 3
In [25]: id(x),id(y)
Out[25]: (139768142922768, 139768142925584)
id are different (the assignment prevents the possibly confusing recycling of ids).
id are different, so is is False:
In [26]: x is y
Out[26]: False
but values are the same (by == test)
In [27]: x == y
Out[27]: True
Another 'unboxing', different id:
In [28]: w = a[3]
In [29]: w
Out[29]: 3
In [30]: id(w)
Out[30]: 139768133495504
These integers are actually np.int64 objects. Python does 'cache' small integers, but that does not apply here.
In [33]: type(x)
Out[33]: numpy.int64
Where can see "where" the arrays store their data:
In [31]: a.__array_interface__['data']
Out[31]: (33696480, False)
In [32]: newarray.__array_interface__['data']
Out[32]: (33838848, False)
These are totally different buffers. If newarray was a view the buffer pointers would be the same or nearby.
If we don't hang on to the indexed object, ids may be reused:
In [34]: id(newarray[0]), id(newarray[0])
Out[34]: (139768133493520, 139768133493520)
In general is and id are not useful when working with numpy arrays.

Related

Most computationally efficient way to batch alter values in each array of a 2d array, based on conditions for particular values by indices

Say that I have a batch of arrays, and I would like to alter them based on conditions of particular values located by indices.
For example, say that I would like to increase and decrease particular values if the difference between those values are less than two.
For a single 1D array it can be done like this
import numpy as np
single2 = np.array([8, 8, 9, 10])
if abs(single2[1]-single2[2])<2:
single2[1] = single2[1] - 1
single2[2] = single2[2] + 1
single2
array([ 8, 7, 10, 10])
But I do not know how to do it for batch of arrays. This is my initial attempt
import numpy as np
single1 = np.array([6, 0, 3, 7])
single2 = np.array([8, 8, 9, 10])
single3 = np.array([2, 15, 15, 20])
batch = np.array([
np.copy(single1),
np.copy(single2),
np.copy(single3),
])
if abs(batch[:,1]-batch[:,2])<2:
batch[:,1] = batch[:,1] - 1
batch[:,2] = batch[:,2] + 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Looking at np.any and np.all, they are used to create an array of booleans values, and I am not sure how they could be used in the code snippet above.
My second attempt uses np.where, using the method described here for comparing particular values of a batch of arrays by creating new versions of the arrays with values added to the front/back of the arrays.
https://stackoverflow.com/a/71297663/3259896
In the case of the example, I am comparing values that are right next to each other, so I created copies that shift the arrays forwards and backwards by 1. I also use only the particular slice of the array that I am comparing, since the other numbers would also be used in the comparison in np.where.
batch_ap = np.concatenate(
(batch[:, 1:2+1], np.repeat(-999, 3).reshape(3,1)),
axis=1
)
batch_pr = np.concatenate(
(np.repeat(-999, 3).reshape(3,1), batch[:, 1:2+1]),
axis=1
)
Finally, I do the comparisons, and adjust the values
batch[:, 1:2+1] = np.where(
abs(batch_ap[:,1:]-batch_ap[:,:-1])<2,
batch[:, 1:2+1]-1,
batch[:, 1:2+1]
)
batch[:, 1:2+1] = np.where(
abs(batch_pr[:,1:]-batch_pr[:,:-1])<2,
batch[:, 1:2+1]+1,
batch[:, 1:2+1]
)
print(batch)
[[ 6 0 3 7]
[ 8 7 10 10]
[ 2 14 16 20]]
Though I am not sure if this is the most computationally efficient nor programmatically elegant method for this task. Seems like a lot of operations and code for the task, but I do not have a strong enough mastery of numpy to be certain about this.
This works
mask = abs(batch[:,1]-batch[:,2])<2
batch[mask,1] -= 1
batch[mask,2] += 1

Filling the array with array does not work as I expected

I want to make a multiple array whose entry is multiple array, and want to push some array one by one into the entry.
For example, I made 2 x 3 Matrix named arr and tried to fill [1,1] and [1,2] entries with 4 x 4 Matrix spawned by randn(4,4).
arr = fill(Matrix{Float64}[], 2, 3)
push!(arr[1,1],randn(4,4))
push!(arr[1,2],randn(4,4))
println(arr[1,1])
println(arr[1,2])
println(arr[1,3])
However, the result is all the entries of arr (other than [1,1] and [1,2]) were filled with the same randn(4,4), instead of just [1,1] and [1,2] filled with randn(4,4):
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
What is wrong?
Any information would be appreciated.
When you do arr = fill(Matrix{Float64}[], 2, 3) all 6 elements point into exactly the same location in memory because fill does not make deep copy - it just copies the references. Basically, using fill when the first argument is mutable usually turns out not to be a good idea.
Hence what you actually want is:
arr = [Matrix{Float64}[] for i in 1:2, j in 1:3]
Now each of 6 slots will have its own address in the memory.
This way of creating the array implies that each element will be Float64, i.e. a scalar. You need to fix the type signature. So for instance you could do
D = Matrix{Array{Float64, 2}}(undef, 2, 3)
if you want it to have 2-dimensional arrays as elements (the Float64,2 does that)
and then allocate
D[1,1] = rand(4,4)
D[1,2] = rand(4,4)
to give you (or rather, me!):
julia> D[1,1]
4×4 Matrix{Float64}:
0.210019 0.528594 0.0566622 0.0547953
0.729212 0.40829 0.816365 0.804139
0.39524 0.940286 0.976152 0.128008
0.886597 0.379621 0.153302 0.798803
julia> D[1,2]
4×4 Matrix{Float64}:
0.640809 0.821668 0.627057 0.382058
0.532567 0.262311 0.916391 0.200024
0.0599815 0.17594 0.698521 0.517822
0.965279 0.804067 0.39408 0.105774

Haskell - Reproduce numpy's reshape

Getting into Haskell, I'm trying to reproduce something like numpy's reshape with lists. Specifically, given a flat list, reshape it into an n-dimensional list:
import numpy as np
a = np.arange(1, 18)
b = a.reshape([-1, 2, 3])
# b =
#
# array([[[ 1, 2, 3],
# [ 4, 5, 6]],
#
# [[ 7, 8, 9],
# [10, 11, 12]],
#
# [[13, 14, 15],
# [16, 17, 18]]])
I was able to reproduce the behaviour with fixed indices, e.g.:
*Main> reshape23 [1..18]
[[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]],[[13,14,15],[16,17,18]]]
My code is:
takeWithRemainder :: (Integral n) => n -> [a] -> ([a], [a])
takeWithRemainder _ [] = ([], [])
takeWithRemainder 0 xs = ([], xs)
takeWithRemainder n (x:xs) = (x : taken, remaining)
where (taken, remaining) = takeWithRemainder (n-1) xs
chunks :: (Integral n) => n -> [a] -> [[a]]
chunks _ [] = []
chunks chunkSize xs = chunk : chunks chunkSize remainderOfList
where (chunk, remainderOfList) = takeWithRemainder chunkSize xs
reshape23 = chunks 2 . chunks 3
Now, I can't seem to find a way to generalise this to an arbitrary shape. My original idea was doing a fold:
reshape :: (Integral n) => [n] -> [a] -> [b]
reshape ns list = foldr (\n acc -> (chunks n) . acc) id ns list
But, no matter how I go about it, I always get a type error from the compiler. From my understanding, the problem is that at some moment, the type for acc is inferred to be id's i.e. a -> a, and it doesn't like the fact that the list of functions in the fold all have a different (although compatible for composition) type signature. I run into the same problem trying to implement this with recursion myself instead of a fold.
This confused me because originally I had intended for the [b] in reshape's type signature to be a stand-in for "another, dissociated type" that could be anything from [[a]] to [[[[[a]]]]].
How am I going wrong about this? Is there a way to actually achieve the behaviour I intended, or is it just plain wrong to want this kind of "dynamic" behaviour in the first place?
There are two details here that are qualitatively different from Python, ultimately stemming from dynamic vs. static typing.
The first one you have noticed yourself: at each chunking step the resulting type is different from the input type. This means you cannot use foldr, because it expects a function of one specific type. You could do it via recursion though.
The second problem is a bit less obvious: the return type of your reshape function depends on what the first argument is. Like, if the first argument is [2], the return type is [[a]], but if the first argument is [2, 3], then the return type is [[[a]]]. In Haskell, all types must be known at compile time. And this means that your reshape function cannot take the first argument that is defined at runtime. In other words, the first argument must be at the type level.
Type-level values may be computed via type functions (aka "type families"), but because it's not just the type (i.e. you also have a value to compute), the natural (or the only?) mechanism for that is a type class.
So, first let's define our type class:
class Reshape (dimensions :: [Nat]) from to | dimensions from -> to where
reshape :: from -> to
The class has three parameters: dimensions of kind [Nat] is a type-level array of numbers, representing the desired dimensions. from is the argument type, and to is the result type. Note that, even though it is known that the argument type is always [a], we have to have it as a type variable here, because otherwise our class instances won't be able to correctly match the same a between argument and result.
Plus, the class has a functional dependency dimensions from -> to to indicate that if I know both dimensions and from, I can unambiguously determine to.
Next, the base case: when dimentions is an empty list, the function just degrades to id:
instance Reshape '[] [a] [a] where
reshape = id
And now the meat: the recursive case.
instance (KnownNat n, Reshape tail [a] [b]) => Reshape (n:tail) [a] [[b]] where
reshape = chunksOf n . reshape #tail
where n = fromInteger . natVal $ Proxy #n
First it makes the recursive call reshape #tail to chunk out the previous dimension, and then it chunks out the result of that using the value of the current dimension as chunk size.
Note also that I'm using the chunksOf function from the library split. No need to redefine it yourself.
Let's test it out:
λ reshape # '[1] [1,2,3]
[[1],[2],[3]]
λ reshape # '[1,2] [1,2,3,4]
[[[1,2]],[[3,4]]]
λ reshape # '[2,3] [1..12]
[[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]]
λ reshape # '[2,3,4] [1..24]
[[[[1,2,3,4],[5,6,7,8],[9,10,11,12]],[[13,14,15,16],[17,18,19,20],[21,22,23,24]]]]
For reference, here's the full program with all imports and extensions:
{-# LANGUAGE
MultiParamTypeClasses, FunctionalDependencies, TypeApplications,
ScopedTypeVariables, DataKinds, TypeOperators, KindSignatures,
FlexibleInstances, FlexibleContexts, UndecidableInstances,
AllowAmbiguousTypes
#-}
import Data.Proxy (Proxy(..))
import Data.List.Split (chunksOf)
import GHC.TypeLits (Nat, KnownNat, natVal)
class Reshape (dimensions :: [Nat]) from to | dimensions from -> to where
reshape :: from -> to
instance Reshape '[] [a] [a] where
reshape = id
instance (KnownNat n, Reshape tail [a] [b]) => Reshape (n:tail) [a] [[b]] where
reshape = chunksOf n . reshape #tail
where n = fromInteger . natVal $ Proxy #n
#Fyodor Soikin's answer is perfect with respect to the actual question. Except there is a bit of a problem with the question itself. Lists of lists is not the same thing as an array. It is a common misconception that Haskell doesn't have arrays and you are forced to deal with lists, which could not be further from the truth.
Because the question is tagged with array and there is comparison to numpy, I would like to add a proper answer that handles this situation for multidimensional arrays. There are a couple of array libraries in Haskell ecosystem, one of which is massiv
A reshape like functionality from numpy can be achieved by resize' function:
λ> 1 ... (18 :: Int)
Array D Seq (Sz1 18)
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 ]
λ> resize' (Sz (3 :> 2 :. 3)) (1 ... (18 :: Int))
Array D Seq (Sz (3 :> 2 :. 3))
[ [ [ 1, 2, 3 ]
, [ 4, 5, 6 ]
]
, [ [ 7, 8, 9 ]
, [ 10, 11, 12 ]
]
, [ [ 13, 14, 15 ]
, [ 16, 17, 18 ]
]
]

Find indices of zero array into an array

I have a numpy array
my_array = np.array([[1,2,3,4],[5,6,7,8],[0,0,0,0],[1,2,3,4],[0,0,0,0],[0,0,0,1]])
and I would like to get all index when array contains only zero values :
index 2 -> [0,0,0,0]
index 4 -> [0,0,0,0]
Discussion with the similar problem exists : Find indices of elements equal to zero in a NumPy array
but in this solution we get values equal to zero, instead of get array with zero as I want.
Thank for your help.
You can use np.argwhere with np.all to get indices of rows where all elements == 0:
In [11] np.argwhere((my_array == 0).all(axis=1))
Out[11]:
array([[2],
[4]], dtype=int64)
Or np.nonzero instead of np.argwhere gives slightly nicer output:
In [12] np.nonzero((my_array == 0).all(axis=1))
Out[12]: (array([2, 4], dtype=int64),)

How to remove all occurrences of an element from NumPy array? [duplicate]

This question already has answers here:
Deleting certain elements from numpy array using conditional checks
(3 answers)
Remove all occurrences of a value from a list?
(26 answers)
Closed 4 years ago.
The title is pretty self-explanatory: I have an numpy array like (let's say ints)
[ 1 2 10 2 12 2 ] and I would like to remove all occurrences of 2, so that the resulting array is [ 1 10 12 ]. Preferably I would like to do this as fastest as possible, because I am using relatively large arrays.
NumPy has a function called numpy.delete() but it takes the indexes as an argument, which I do not have.
Edit: The question is indeed different from Deleting certain elements from numpy array using conditional checks, which is I guess a more "general" case. However, the idea of removing occurrences from an array is fundamental enough to merit its own explicit question, so I am keeping the question.
You can use indexing:
arr = np.array([1, 2, 10, 2, 12, 2])
print(arr[arr != 2])
# [ 1 10 12]
Timing is pretty good:
from timeit import Timer
arr = np.array(range(5000))
print(min(Timer(lambda: arr[arr != 4999]).repeat(500, 500)))
# 0.004942436999999522
you can use another numpy function.It is numpy.setdiff1d(ar1, ar2, assume_unique=False).
This function Finds the set difference of two arrays.
import numpy as np
a = np.array([1, 2, 10, 2,12, 2])
b = np.array([2])
c = np.setdiff1d(a,b,True)
print(c)
There are several ways to do this. I suggest you use a mask:
import numpy as np
a = np.array([ 1, 2 ,10, 2, 12, 2 ])
a[~np.isin(a, 2)]
>> array([ 1, 10, 12])
np.isin is convenient because you can apply the filter to multiple elements at once if you need to:
a[~np.isin(a, (1,2))]
>> array([ 10, 12])
Also note that a[mask] is a slice of the original array. This is memory efficient; but if you need to create a new array with your filtered values and leave the original ones untouched, use .copy, e.g.:
b = a[~np.isin(a, (1,2))].copy()

Resources