comparing two numpy arrays and adding same rows - arrays

I have two large data files, one with two columns and one with three columns. I want to select all the rows from the second file that are contained in the fist array. My idea was to compare the numpy arrays.
Let's say I have:
a = np.array([[1, 2, 3], [3, 4, 5], [1, 4, 6]])
b = np.array([[1, 2], [3, 4]])
and the result should look like this:
[[1, 2, 3], [3, 4, 5]]
Any advice on that?
EDIT:
So in the end this works. Not very handy but it works.
for ii in range(a.shape[0]):
u, v, w = a[ii,:]
for jj in range(b.shape[0]):
if (u == b[jj, 0] and v == b[jj, 1]):
print [u, v, w]

The numpy_indexed package (disclaimer: I am its author) contains functionality to solve such problems efficiently, without using any python loops:
import numpy_indexed as npi
a[npi.contains(b, a[:, :2])]

If you prefer to not use another library but want to do this in numpy only, you can do something similar to what is suggested here and here, namely to use np.in1d (see docs) which does provide you with a mask indicating if an element in one 1D array exists in another 1D array. As the name indicates, this function only works for 1D arrays. But you can use a structured array view (using np.view) to cheat numpy into thinking you have 1D arrays. One caveat is though, that you need a deep copy of the first array a since np.view doesn't mix with slices, well. But if that is not too big of an issue for you, something along the lines of:
a_cp = a[:, :2].copy()
a[np.in1d(a_cp.view((np.void, a_cp.dtype.itemsize*a_cp.shape[1])).ravel(),
b.view((np.void, b.dtype.itemsize*b.shape[1])).ravel())]
might work for you.
This directly uses the masked array to return the correct values from your array a.

Check this, #Ernie. It may help you to get to the solution. ;D
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.in1d.html

Related

Deleting and adding numpy array rows in a for loop to create a dynamic subarray from larger numpy array,

Summary of problem
Ultimate goal
I would like to take a sub-array from a large input numpy array. This sub array is dynamic, and every iteration through the larger numpy input array will change the sub array so that I can perform a set of calculations that depend on previous iterations of the array. This involves nested for loops, which I realize is not very pythonic, but I don't know of another way.
Problem
The problem arises when I add to the existing dynamic sub-array, it seems to grow extra bracketing. This seems simple to fix, but I am having trouble adapting my Matlab knowledge of array indexing to numpy indexing. I have not even started implementing my calculations yet, but I cannot seem to get the structure of this loop correct.
What I've tried
I have [tried this originally in Pandas][1]. Originally, I thought I could write a pretty simply program to do this using pandas indexing and column naming. But it was SLOW! So I trying to streamline this by
changing the architecture and
relying on numpy instead of Pandas.
Below is a simple program that emulates what I want to do. I am sure I will have other questions, but this is the start. I have a simple (5, 2) array that I loop through the rows of. With each row after row 0, I add the new row to the top of the temp sub-array and delete the last row of the array, maintaining a (2, 2) array throughout. However, as you will see when you run this code, it results in some strange behavior that results in not being able to write the results into the output array. You will also see that I have tried several ways to add and delete columns. Whether these are optimal is besides the point - the current code is the closest I have gotten to running this program!
Some Example code
This code 'works' in the sense that it doesn't trow errors. However, it doesnt' produce the desired results. In this case it would be an output array with the same values as the inputs (because I am not doing any calculations- this is just to get the architecture correct). The desired result would be that each loop creates a sub array in this order:
n=1 [1 1]
n=2 [[1,1], [2,2]]
n=3 [[2, 2], [3, 3]]
n=4 [[3, 3], [4, 4]]
...
N [[N-1, N-1], [N, N]].
This does not need to be limited to 2 items (if list) or rows (if array), and the length will be set by an input variable. Thus, the size of this array must be dynamic (set during the call of the function). Furthermore, I supply a simple example here, but each loop will basically need to add a row from the input. It will be a little more advanced than simply a 2 member NDarray. Lists have the advantage of being able to use .append and .pop attributes, but as far as I can tell, arrays do not. I present the following code example using only arrays.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
print('Original a array: ', a)
out = np.empty_like(a)
b = np.empty(len(a[0,:]))
for ii, rr in enumerate(a):
if ii == 0:
c = [a[ii]]
else:
print('Before: ', c)
#Add next row from array a to the temp array for calculations
c = np.insert(c, 1, [rr], axis=0)
print('During: ', c)
#Remove the last row of the temp array prior to calculations
#indices_to_remove = [0]
#d = c[~np.isin(np.arange(c.size), [indices_to_remove])]
d = c[1::]
c = [d]
print('After: ', c)
#Add the temp array to the output array after calculations
#THIS THROWS ERRORS, AND I THINK IT IS DUE TO THE INCREASING NUMBERS OF BRACKETS.
#out[ii, :] = c
#print(c)
[1]: https://stackoverflow.com/questions/70186681/nested-loops-altering-rows-in-pandas-avoiding-a-value-is-trying-to-be-set-on?noredirect=1#comment124076103_70186681
MATLAB is 1-base indexing whereas Python uses 0-base indexing.
let's say we have a 2D array like this:
a= [[1, 2],
[3, 4],
[5, 6]]
In MATLAB if you do a(1, 1) you will get 1 in python a[1, 1] you will get 4. Also as you know in MATLAB you can do natural indexing which works if you do a(6) = 6 in python if you do that you will get IndexError: . it is almost the same, except in python it starts from 0.
Here is the working example with your desired results.
import numpy as np
a = np.array([[1, 1], [2, 2], [3, 3], [4,4], [5,5]])
test = []
for idx in range(len(a)):
if idx == 0:
test.append(a[idx])
test.append(a[idx:idx+2, :])
# remove the last [5, 5]
test.pop(-1)
for i in test:
print(i, end=',\n')
output
[1 1],
[[1 1]
[2 2]],
[[2 2]
[3 3]],
[[3 3]
[4 4]],
[[4 4]
[5 5]],

How to append np.arrays to a file and use them for plotting purposes later

I'm creating multiple np.arrays of the shape (1,2) inside a for-loop. The arrays contain the x- and the y-coordinates of some positions. In each iteration I overwrite these arrays. I want to store these arrays inside a file, so that I can use them later on to make plots. I don't care, if the file is readable for humans.
For now I only tried it for two (1,2) arrays, but it is really important, that I can store an arbitrary number of arrays in the file.
I tried writing the arrays to a .txt file with filename.write('{}\n'.format(arr)), but I failed to reuse them for plotting. Also I don't see yet how I will be able to write multiple arrays to the file like this.
import numpy as np
arr = np.array([[-1, 0], [1, 0]])
# write to a .txt file:
with open('file.dat', 'w+') as f:
f.write('{}\t{}\n'.format(arr[0], arr[1]))
for i in range(3):
arr = arr + 2
f.write('{}\t{}\n'.format(arr[0], arr[1]))
The file looks like this:
[2 3] [4 3]
[5 6] [7 6]
[8 9] [10 9]
With this code I get one column of vectors on the left and one column of vectors on the right.
What I want to do now is to plot arr[0][0] against arr[0][1] for every vector in the first column and the same for the second column of vectors. So I want to load this information from the file and get again arrays.
Please keep in mind, that this example only shows my problem for two (1,2) arrays, but I need to be able to do the same for an arbitrary amount of vectors i.e. it should also work for arr = np.array([[-1, 0], [1, 0], [1, 2]) and even more vectors .
Use numpy.savez or numpy.savez_compressed to save one or multiple arrays (they can have different data types and shapes) into a file. Later, you can restore them using numpy.load
e.g:
my_arrays = [ np.ones([2, 2])*k for k in range(0, 4) ]
np.savez('myfile.npz', *my_arrays)
# Now restore it back
my_arrays = list(np.load('myfile.npz').values())

Removing 'nil's from a column using transpose

I have a nested array:
arr = [[1,nil,2,3,4], [2,nil,4,5,6], [6,nil,3,3,5]]
Any elements at the same index in the subarrays that are nil across the array must be removed. The second index in all subarrays have nil.
I did this:
collection = arr.transpose.select(&:any?).transpose
# => [[1, 2, 3, 4], [2, 4, 5, 6], [6, 3, 3, 5]]
It works for me, albiet I am using transpose twice. Could this technique lead to data getting mixed up? It looks fool proof to me.
With the nil-vs-false caveat that #CarySwoveland noted in a comment, yes, your double-transpose is safe: it will only work on data that's rectangular to begin with, and it will produce equally-rectangular data as output. You're filtering out whole rows, so nothing can get misaligned.
While it's not super efficient, it's not too bad, and is far more expressive & readable than more direct looping & manipulation.

collect all elements and indices of an array in two separate arrays in Ruby

Suppose I have an array array = [1,2,3,4,5]
I want to collect all the elements and indices of the array in 2 separate arrays like
[[1,2,3,4,5], [0,1,2,3,4]]
How do I do this using a single Ruby collect statement?
I am trying to do it using this code
array.each_with_index.collect do |v,k|
# code
end
What should go in the code section to get the desired output?
Or even simpler:
[array, array.each_index.to_a]
I like the first answer that was posted a while ago. Don't know why the guy deleted it.
array.each_with_index.collect { |value, index| [value,index] }.transpose
Actually I am using an custom vector class on which I am calling the each_with_index method.
Here's one simple way:
array = [1,2,3,4,5]
indexes = *array.size.times
p [ array, indexes ]
# => [[1, 2, 3, 4, 5], [0, 1, 2, 3, 4]]
See it on repl.it: https://repl.it/FmWg

How to vectorize NumPy polyder function?

I would like to vectorize the NumPy function polyder, which computes derivatives of polynomials. Is there a simple way or a built-in function to do it?
With vectorize, I mean that if the input is an array of polynomials, the output would be the array with the derivative of the polynomials.
An example:
p = np.array([[3,4,5], [1,2]])
the output should be something like
np.array([[6, 4], [1]])
Since your subarrays, both input and output, can have different lengths, you are better off treating both as lists.
In [97]: [np.polyder(d) for d in [[3,4,5],[1,2]]]
Out[97]: [array([6, 4]), array([1])]
Your p is just a list in an expensive (timewise) array wrapper.
In [100]: p=np.array([[3,4,5],[1,2]])
In [101]: p
Out[101]: array([[3, 4, 5], [1, 2]], dtype=object)
There is little that you can do with such an array that you can't do just as well with a list. Do some time tests. You probably will find that iterating over the arrays of objects is slower than iteration over equivalent lists, especially if you take into account the time it takes convert a list to array.
It can also be tricky to create such arrays. If all the sublists are the same length the result will be a 2d array. Forcing them to be an object array takes special initiation.
A general rull of thumb is - if individual steps work with arrays or lists of different length, you probably can't vectorize. That is, you can't form a rectangular 2d array and apply vector operations.
If the polynomial lists were all the same length, then p could be 2d, and the result could also be that:
In [107]: p=np.array([[3,4,5],[0,1,2]])
In [108]: p
Out[108]:
array([[3, 4, 5],
[0, 1, 2]])
In [109]: np.array([np.polyder(i) for i in p])
Out[109]:
array([[6, 4],
[0, 1]])
In effect it is iterating over the rows of p, and then reassembling the result into an array. There are some numpy functions that streamline iteration (but don't speed it up much), but I see little need for those here.
Looking at the code of this function, the core is:
p = NX.asarray(p)
n = len(p) - 1
y = p[:-1] * NX.arange(n, 0, -1)
which for this 2d array, (len 3) is:
In [117]: p[:,:-1]*np.arange(2,0,-1)
Out[117]:
array([[6, 4],
[0, 1]])
So if the number of polynomials are all the same, this simple multiplication gives the 1st order derivative coefficients. And of course the rows can be padded so they are all the same. So 'vectorization' is easier than I initially thought.
import numpy as np
p = np.array([[3,4,5], [1,2]])
np.array([np.polyder(coefficients) for coefficients in p]) # array([[6 4], [1]], dtype=object)
would fulfill your interface for your specific example. But as hpaulj mentions, there's little sense in working with NumPy arrays instead of normal python lists here, and no actual (hardware-level) vectorization will happen. (Though, as with list comprehensions in general, the interpreter would be free to employ other means of parallelism to compute them.)

Resources