Find indices of zero array into an array - arrays

I have a numpy array
my_array = np.array([[1,2,3,4],[5,6,7,8],[0,0,0,0],[1,2,3,4],[0,0,0,0],[0,0,0,1]])
and I would like to get all index when array contains only zero values :
index 2 -> [0,0,0,0]
index 4 -> [0,0,0,0]
Discussion with the similar problem exists : Find indices of elements equal to zero in a NumPy array
but in this solution we get values equal to zero, instead of get array with zero as I want.
Thank for your help.

You can use np.argwhere with np.all to get indices of rows where all elements == 0:
In [11] np.argwhere((my_array == 0).all(axis=1))
Out[11]:
array([[2],
[4]], dtype=int64)
Or np.nonzero instead of np.argwhere gives slightly nicer output:
In [12] np.nonzero((my_array == 0).all(axis=1))
Out[12]: (array([2, 4], dtype=int64),)

Related

Python '==' operator gives wrong result

I am comparing two elements of a numpy array. The memory address obtained by id() function for both elements are different. Also the is operator gives out that the two elements are not same.
However if I compare memory address of the two array elements using == operator it gives out that the two elements are same.
I am not able to understand how the == operator gives output as True when the two memory address are different.
Below is my code.
import numpy as np
a = np.arange(8)
newarray = a[np.array([3,4,2])]
print("Initial array : ", a)
print("New array : ", newarray)
# comparison of two element using 'is' operator
print("\ncomparison using is operator : ",a[3] is newarray[0])
# comparison of memory address of two element using '==' operator
print("comparison using == opertor : ", id(a[3]) == id(newarray[0]))
# memory address of both elements of array
print("\nMemory address of a : ", id(a[3]))
print("Memory address of newarray : ", id(newarray[0]))
Output:
Initial array : [0 1 2 3 4 5 6 7]
New array : [3 4 2]
comparison using is operator : False
comparison using == operator : True
Memory address of a : 2807046101296
Memory address of newarray : 2808566470576
This is probably due to a combination of Python's integer caching and obscure implemetation details of numpy.
If you slightly change the code you will see that the ids are not consistent during the flow of the code, but they are actually the same on each line:
import numpy as np
a = np.arange(8)
newarray = a[np.array([3,4,2])]
print(id(a[3]), id(newarray[0]))
print(id(a[3]), id(newarray[0]))
outputs
276651376 276651376
20168608 20168608
A numpy array does not store references to objects like a list (unless it is object dtype). It has a 1d databuffer with the numeric values, which it may access in various ways.
In [17]: a = np.arange(8)
...: newarray = a[np.array([3,4,2])]
In [18]: a
Out[18]: array([0, 1, 2, 3, 4, 5, 6, 7])
In [21]: newarray
Out[21]: array([3, 4, 2])
newarray, produced with advanced indexing is not a view. It has its own databuffer and values.
Let's 'unbox' elements of these arrays, assigning them to variables.
In [22]: x = a[3]; y = newarray[0]
In [23]: x
Out[23]: 3
In [24]: y
Out[24]: 3
In [25]: id(x),id(y)
Out[25]: (139768142922768, 139768142925584)
id are different (the assignment prevents the possibly confusing recycling of ids).
id are different, so is is False:
In [26]: x is y
Out[26]: False
but values are the same (by == test)
In [27]: x == y
Out[27]: True
Another 'unboxing', different id:
In [28]: w = a[3]
In [29]: w
Out[29]: 3
In [30]: id(w)
Out[30]: 139768133495504
These integers are actually np.int64 objects. Python does 'cache' small integers, but that does not apply here.
In [33]: type(x)
Out[33]: numpy.int64
Where can see "where" the arrays store their data:
In [31]: a.__array_interface__['data']
Out[31]: (33696480, False)
In [32]: newarray.__array_interface__['data']
Out[32]: (33838848, False)
These are totally different buffers. If newarray was a view the buffer pointers would be the same or nearby.
If we don't hang on to the indexed object, ids may be reused:
In [34]: id(newarray[0]), id(newarray[0])
Out[34]: (139768133493520, 139768133493520)
In general is and id are not useful when working with numpy arrays.

How to initialize a particular numpy array element value with a set of elements?

I have a code in which I want to create a multidimensional array of numpy with each element being another array of 3 elements of row vector here is how it looks:
a1=np.ndarray([4,4])
for i in range(4):
for j in range(4):
a1[i,j]=[2,2,2]
Now when I try to do so, I get an error:
ValueError: setting an array element with a sequence.
Please tell me where I went wrong.
Basically, my aim is to create a numpy ndarray( and not asarray or array) like this:
This is just a rough example of what I want to do.
[[1,1,1],[2,2,2],[3,3,3]
[4,4,4],[5,5,5],[6,6,6]
[1,2,3],[4,5,6],[1,2,4]]
The 3 element vector at every i, j location forms a third dimension. Thus the shape of the array should be [4, 4, 3] - the third dimension contains 3 elements.
a1 = np.ndarray([4, 4, 3])
...
your final array will have (4,4,3) shape. so you must reserve this room :
a1=np.empty((4,4,3),dtype=int)
# or np.ndarray((4,4,3),int)
for i in range(4):
for j in range(4):
a1[i,j]=[i,j,i+j] # for exemple

Determine Size of Multidimensional Array in Swift

I am new to Swift and am struggling to work out how to determine the size of a multidimensional array.
I can use the count function for single arrays, however when i create a matrix/multidimensional array, the output for the count call just gives a single value.
var a = [[1,2,3],[3,4,5]]
var c: Int
c = a.count
print(c)
2
The above matrix 'a' clearly has 2 rows and 3 columns, is there any way to output this correct size.
In Matlab this is a simple task with the following line of code,
a = [1,2,3;3,4,5]
size(a)
ans =
2 3
Is there a simple equivalent in Swift
I have looked high and low for a solution and cant seem to find exactly what i am after.
Thanks
- HB
Because 2D arrays in swift can have subarrays with different lengths. There is no "matrix" type.
let arr = [
[1,2,3,4,5],
[1,2,3],
[2,3,4,5],
]
So the concept of "rows" and "columns" does not exist. There's only count.
If you want to count all the elements in the subarrays, (in the above case, 12), you can flat map it and then count:
arr.flatMap { $0 }.count
If you are sure that your array is a matrix, you can do this:
let rows = arr.count
let columns = arr[0].count // 0 is an arbitrary value
You must ask the size of a specific row of your array to get column sizes :
print("\(a.count) \(a[0].count)")
If you are trying to find the length of 2D array which in this case the number of rows (or # of subarrays Ex.[1,2,3]) you may use this trick: # of total elements that can be found using:
a.flatMap { $0 }.count //a is the array name
over # of elements in one row using:
a[0].count //so elemints has to be equal in each subarray
so your code to get the length of 2D array with equal number of element in each subarray and store it in constant arrayLength is:
let arrayLength = (((a.flatMap { $0 }.count ) / (a[0].count))) //a is the array name

why an extra comma in the shape of a single index numpy array

A numpy array a
a = numpy.arange(12)
has shape
a.shape = (12,)
Why do we need the comma? is shape (12) reserved for something else?
The reason we don't use (12) for a one-element tuple (like [12] for one-element list) is that round parentheses also appear in formulas. E.g., in x = 2*(5+7) the part (5+7) is just a number, not a tuple. But what if we actually meant it to be a one-element tuple? The trailing comma is a way to indicate that. Compare:
>>> 2*(5+7)
24
>>> 2*(5+7,)
(12, 12)
With lists, the trailing comma is not needed although some style guides recommend it for consistency.
>>> 2*[5+7]
[12, 12]
>>> 2*[5+7,]
[12, 12]
A numpy array's shape property always returns a tuple.
The number of dimensions and items in an array is defined by its shape, which is a tuple of N positive integers that specify the sizes of each dimension.
(12,) is just a one-element tuple, so this indicates that you have a one-dimensional array (because the tuple has length 1) with a size of 12.
Documented here.

Number of Distinct Subarrays

I want to find an algorithm to count the number of distinct subarrays of an array.
For example, in the case of A = [1,2,1,2],
the number of distinct subarrays is 7:
{ [1] , [2] , [1,2] , [2,1] , [1,2,1] , [2,1,2], [1,2,1,2]}
and in the case of B = [1,1,1], the number of distinct subarrays is 3:
{ [1] , [1,1] , [1,1,1] }
A sub-array is a contiguous subsequence, or slice, of an array. Distinct means different contents; for example:
[1] from A[0:1] and [1] from A[2:3] are not distinct.
and similarly:
B[0:1], B[1:2], B[2:3] are not distinct.
Construct suffix tree for this array. Then add together lengths of all edges in this tree.
Time needed to construct suffix tree is O(n) with proper algorithm (Ukkonen's or McCreight's algorithms). Time needed to traverse the tree and add together lengths is also O(n).
Edit: I think about how to reduce iteration/comparison number.
I foud a way to do it: if you retrieve a sub-array of size n, then each sub-arrays of size inferior to n will already be added.
Here is the code updated.
List<Integer> A = new ArrayList<Integer>();
A.add(1);
A.add(2);
A.add(1);
A.add(2);
System.out.println("global list to study: " + A);
//global list
List<List<Integer>> listOfUniqueList = new ArrayList<List<Integer>>();
// iterate on 1st position in list, start at 0
for (int initialPos=0; initialPos<A.size(); initialPos++) {
// iterate on liste size, start on full list and then decrease size
for (int currentListSize=A.size()-initialPos; currentListSize>0; currentListSize--) {
//initialize current list.
List<Integer> currentList = new ArrayList<Integer>();
// iterate on each (corresponding) int of global list
for ( int i = 0; i<currentListSize; i++) {
currentList.add(A.get(initialPos+i));
}
// insure unicity
if (!listOfUniqueList.contains(currentList)){
listOfUniqueList.add(currentList);
} else {
continue;
}
}
}
System.out.println("list retrieved: " + listOfUniqueList);
System.out.println("size of list retrieved: " + listOfUniqueList.size());
global list to study: [1, 2, 1, 2]
list retrieved: [[1, 2, 1, 2], [1, 2, 1], [1, 2], [1], [2, 1, 2], [2, 1], [2]]
size of list retrieved: 7
With a list containing the same patern many time the number of iteration and comparison will be quite low.
For your example [1, 2, 1, 2], the line if (!listOfUniqueList.contains(currentList)){ is executed 10 times. It only raise to 36 for the input [1, 2, 1, 2, 1, 2, 1, 2] that contains 15 different sub-arrays.
You could trivially make a set of the subsequences and count them, but i'm not certain it is the most efficient way, as it is O(n^2).
in python that would be something like :
subs = [tuple(A[i:j]) for i in range(0, len(A)) for j in range(i + 1, len(A) + 1)]
uniqSubs = set(subs)
which gives you :
set([(1, 2), (1, 2, 1), (1,), (1, 2, 1, 2), (2,), (2, 1), (2, 1, 2)])
The double loop in the comprehension clearly states the O(n²) complexity.
Edit
Apparently there are some discussion about the complexity. Creation of subs is O(n^2) as there are n^2 items.
Creating a set from a list is O(m) where m is the size of the list, m being n^2 in this case, as adding to a set is amortized O(1).
The overall is therefore O(n^2).
Right my first answer was a bit of a blonde moment.
I guess the answer would be to generate them all and then remove duplicates. Or if you are using a language like Java with a set object make all the arrays and add them to a set of int[]. Sets only contain one instance of each element and automatically remove duplicates so you can just get the size of the set at the end
I can think of 2 ways...
first is compute some sort of hash then add to a set.
if on adding your hashes are the same is an existing array... then do a verbose comparison... and log it so that you know your hash algorithm isn't good enough...
The second is to use some sort of probable match and then drill down from there...
if number of elements is same and the total of the elements added together is the same, then check verbosely.
Create an array of pair where each pair store the value of the element of subarray and its index.
pair[i] = (A[i],i);
Sort the pair in increasing order of A[i] and then decreasing order of i.
Consider example A = [1,3,6,3,6,3,1,3];
pair array after sorting will be pair = [(1,6),(1,0),(3,7),(3,5),(3,3),(3,1),(6,4),(6,2)]
pair[0] has element of index 6. From index 6 we can have two sub-arrays [1] and [1,3]. So ANS = 2;
Now take each consecutive pair one by one.
Taking pair[0] and pair[1],
pair[1] has index 0. We can have 8 sub-arrays beginning from index 0. But two subarrays [1] and [1,3] are already counted. So to remove them, we need to compare longest common prefix of sub-array for pair[0] and pair[1]. So longest common prefix length for indices beginning from 0 and 6 is 2 i.e [1,3].
So now new distinct sub-arrays will be [1,3,6] .. to [1,3,6,3,6,3,1,3] i.e. 6 sub-arrays.
So new value of ANS is 2+6 = 8;
So for pair[i] and pair[i+1]
ANS = ANS + Number of sub-arrays beginning from pair[i+1] - Length of longest common prefix.
The sorting part takes O(n logn).
Iterating each consecutive pair is O(n) and for each iteration find longest common prefix takes O(n) making whole iteration part O(n^2). Its the best I could get.
You can see that we dont need pair for this. The first value of pair, value of element was not required. I used this for better understanding. You can always skip that.

Resources