Want to concatenate elements of array into single element - arrays

I have an array of arrays that looks like
time
array([array([ 0, 1, 0, 10, 12, 2011], dtype=int16),
array([ 0, 1, 0, 10, 12, 2011], dtype=int16),
array([ 0, 1, 0, 10, 12, 2011], dtype=int16), ...,
array([ 0, 59, 23, 10, 12, 2011], dtype=int16),
array([ 0, 59, 23, 10, 12, 2011], dtype=int16),
array([ 0, 59, 23, 10, 12, 2011], dtype=int16)],
dtype=object)
and I would like to transform this into something like
time
array([0:1:0 10-12-2011,
etc
0:59:23 10-12-2011])
I feel like I should be able to do this for the whole structure without having to loop through each individual row/column.

I would say you cannot avoid loops, but you can get a pretty decent result by looping through the outer array and converting your data into datetime objects. Let's say a is your array:
from datetime import datetime
results = array([datetime(*row[::-1]) for row in a])

Related

Numpy: Get indices of boundaries in an array where starts of boundaries always start with a particular number; non-boundaries by a particular number

Problem:
The most computationally efficient solution to getting the indices of boundaries in an array where starts of boundaries always start with a particular number and non-boundaries are indicated by a different particular number.
Differences between this question and other boundary-based numpy questions on SO:
here are some other boundary based numpy questions
Numpy 1D array - find indices of boundaries of subsequences of the same number
Getting the boundary of numpy array shape with a hole
Extracting boundary of a numpy array
The difference between the question I am asking and other stackoverflow posts in my attempt to search for a solution is that the other boundaries are indicated by a jump in value, or a 'hole' of values.
What seems to be unique to my case is the starts of boundaries always start with a particular number.
Motivation:
This problem is inspired by IOB tagging in natural language processing. In IOB tagging, the start of a word is tagged with B [beginning] is the tag of the first letter in an entity, I [inside] is the tag for all other characters besides the first character in a word, and [O] is used to tag all non-entity characters
Example:
import numpy as np
a = np.array(
[
0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1
]
)
1 is the start of each boundary. If a boundary has a length greater than one, then 2 makes up the rest of the boundary. 0 are non-boundary numbers.
The entities of these boundaries are 1, 2, 2, 2, 1, 1,2, 1, 1, 1, 1, 1
So the desired solution; the indices of the indices boundary values for a are
desired = [[3, 6], [10, 10], [13, 14], [15, 15], [16,16], [19,19], [20,20], [21,21]]
Current Solution:
If flattened, the numbers in the desired solution are in ascending order. So the raw indices numbers can be calculated, sorted, and reshaped later.
I can get the start indices using
starts = np.where(a==1)[0]
starts
array([ 3, 10, 13, 15, 16, 19, 20, 21])
So what's left is 6, 10, 14, 15,16,19,20,21
I can get all except 1 using 3 different conditionals where I can compare a shifted array to the original by decreases in values and the values of the non-shifted array.
first = np.where(a[:-1] - 2 == a[1:])[0]
first
array([6])
second = np.where((a[:-1] - 1 == a[1:]) &
((a[1:]==1) | (a[1:]==0)))[0]
second
array([10, 14, 16])
third = np.where(
(a[:-1] == a[1:]) &
(a[1:]==1)
)[0]
third
array([15, 19, 20])
The last number I need is 21, but since I needed to shorten the length of the array by 1 to do the shifted comparisons, I'm not sure how to get that particular value using logic, so I just used a simple if statement for that.
Using the rest of the retrieved values for the indices, I can concatenate all the values and reshape them.
if (a[-1] == 1) | (a[-1] == 2):
pen = np.concatenate((
starts, first, second, third, np.array([a.shape[0]-1])
))
else:
pen = np.concatenate((
starts, first, second, third,
))
np.sort(pen).reshape(-1,2)
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])
Is this the most computationally efficient solution for my answer? I realize the four where statements can be combined with or operators but wanted to have each separate for the reader to see each result in this post. But I am wondering if there is a more computationally efficient solution since I have not mastered all of numpy's functions and am unsure of the computational efficiency of each.
A standard trick for this type of problem is to pad the input appropriately. In this case, it is helpful to append a 0 to the end of the array:
In [55]: a1 = np.concatenate((a, [0]))
In [56]: a1
Out[56]:
array([0, 0, 0, 1, 2, 2, 2, 0, 0, 0, 1, 0, 0, 1, 2, 1, 1, 0, 0, 1, 1, 1,
0])
Then your starts calculation still works:
In [57]: starts = np.where(a1 == 1)[0]
In [58]: starts
Out[58]: array([ 3, 10, 13, 15, 16, 19, 20, 21])
The condition for the end is that the value is a 1 or a 2 followed by a value that is not 2. You've already figured out that to handle the "followed by" condition, you can use a shifted version of the array. To implement the and and or conditions, use the bitwise binary operators & and |, respectiveley. In code, it looks like:
In [61]: ends = np.where((a1[:-1] != 0) & (a1[1:] != 2))[0]
In [62]: ends
Out[62]: array([ 6, 10, 14, 15, 16, 19, 20, 21])
Finally, put starts and ends into a single array:
In [63]: np.column_stack((starts, ends))
Out[63]:
array([[ 3, 6],
[10, 10],
[13, 14],
[15, 15],
[16, 16],
[19, 19],
[20, 20],
[21, 21]])

Fastest way to (arg)sort a flattened nD-array that is sorted along each dimension?

The question itself is language-agnostic. I will use python for my example, mainly because I think it is nice to demonstrate the point.
I have an N-dimensional array of shape (n1, n2, ..., nN) that is contiguous in memory (c-order) and filled with numbers. For each dimension by itself, the numbers are ordered in ascending order. A 2D example of such an array is:
>>> import numpy as np
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> n1+n2
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10]])
In this case, the values in each row are ascending, and the values in each column are ascending, too. A 1D example array is
>>> n1 = np.arange(10)
>>> n1*n1
array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81])
I would like to obtain a list/array containing the indices that would sort the flattened version of the nD array in ascending order. By the flattened array I mean that I interpret the nD-array as a 1D array of equivalent size. The sorting doesn't have to preserve order, i.e., the order of indices indexing equal numbers doesn't matter. For example
>>> n1 = np.arange(5)[:, None]
>>> n2 = np.arange(7)[None, :]
>>> arr = n1*n2
>>> arr
array([[ 0, 0, 0, 0, 0, 0, 0],
[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 2, 4, 6, 8, 10, 12],
[ 0, 3, 6, 9, 12, 15, 18],
[ 0, 4, 8, 12, 16, 20, 24]])
>>> np.argsort(arr.ravel())
array([ 0, 28, 14, 7, 6, 21, 4, 3, 2, 1, 5, 8, 9, 15, 22, 10, 11,
29, 16, 12, 23, 17, 13, 18, 30, 24, 19, 25, 31, 20, 26, 32, 27, 33,
34], dtype=int64)
Standard sorting on the flattened array can accomplish this; however, it doesn't exploit the fact that the array is already partially sorted, so I suspect there exists a more efficient solution. What is the most efficient way to do so?
A comment asked what my use-case is, and if I could provide some more realistic test data for benchmarking. Here is how I encountered this problem:
Given an image and a binary mask for that image (which selects pixels), find the largest sub-image which contains only selected pixels.
In my case, I applied a perspective transformation to an image, and want to crop it so that there is no black background while preserving as much of the image as possible.
from skimage import data
from skimage import transform
from skimage import img_as_float
tform = transform.EuclideanTransform(
rotation=np.pi / 12.,
translation = (10, -10)
)
img = img_as_float(data.chelsea())[50:100, 150:200]
tf_img = transform.warp(img, tform.inverse)
tf_mask = transform.warp(np.ones_like(img), tform.inverse)[..., 0]
y = np.arange(tf_mask.shape[0])
x = np.arange(tf_mask.shape[1])
y1 = y[:, None, None, None]
y2 = y[None, None, :, None]
x1 = x[None, :, None, None]
x2 = x[None, None, None, :]
y_padded, x_padded = np.where(tf_mask==0.0)
y_padded = y_padded[None, None, None, None, :]
x_padded = x_padded[None, None, None, None, :]
y_inside = np.logical_and(y1[..., None] <= y_padded, y_padded<= y2[..., None])
x_inside = np.logical_and(x1[..., None] <= x_padded, x_padded<= x2[..., None])
contains_padding = np.any(np.logical_and(y_inside, x_inside), axis=-1)
# size of the sub-image
height = np.clip(y2 - y1 + 1, 0, None)
width = np.clip(x2 - x1 + 1, 0, None)
img_size = width * height
# find all largest sub-images
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
cropped_img = tf_img[y_low[0]:y_high[0]+1, x_low[0]:x_high[0]+1]
The algorithm is quite inefficient; I am aware. What is interesting for this question is img_size, which is a (50,50,50,50) 4D-array that is ordered as described above. Currently I do:
img_size[contains_padding] = 0
y_low, x_low, y_high, x_high = np.where(img_size == np.max(img_size))
but with a proper argsort algorithm (that I can interrupt early) this could potentially be made much better.
I would do it using parts of mergesort and a divide and conquer approach.
You start with the first two arrays.
[0, 1, 2, 3, 4, 5, 6],//<- This
[ 1, 2, 3, 4, 5, 6, 7],//<- This
....
Then you can merge them like this (Java-like syntax):
List<Integer> merged=new ArrayList<>();
List<Integer> firstRow=... //Same would work with arrays
List<Integer> secondRow=...
int firstCnter=0;
int secondCnter=0;
while(firstCnter<firstRow.size()||secondCnter<secondRow.size()){
if(firstCnter==firstRow.size()){ //Unconditionally add all elements from the second, if we added all the elements from the first
merged.add(secondRow.get(secondCnter++));
}else if(secondCnter==secondRow.size()){
merged.add(firstRow.get(firstCnter++));
}else{ //Add the smaller value from both lists at the current index.
int firstValue=firstRow.get(firstCnter);
int secondValue=secondRow.get(secondCnter);
merged.add(Math.min(firstValue,secondValue));
if(firstValue<=secondValue)
firstCnter++;
else
secondCnter++;
}
}
After that you can merge the next two rows, until you have:
[0,1,1,2,2,3,3,4,4,5,5,6,7]
[2,3,3,4,4,5,5,6,6,7,7,8,8,9]
[4,5,6,7,8,9,10] //Not merged.
Continue to merge again.
[0,1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,7,7,7,8,8,9]
[4,5,6,7,8,9,10]
After that, the last merge:
[0,1,1,2,2,2,3,3,3,4,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,9,9,10]
I don't know about the time complexity, but should be a viable solution
Another idea: Use a min-heap with just the current must-have candidates for being the next-smallest value. Start with the value at the origin (index 0 in all dimensions), as that's smallest. Then repeatedly take out the smallest value from the heap and add its neighbors not yet added.

Converting string of Integers to bytearray for pyserial

I'm trying to convert a string from a file into a byte array to send over pyserial in python 3
I have a string like:
[66, 1, 32, 1, 3, 0, 0, 11, 0, 1, 4, 102, 198]
which I need to send over the line as:
\x42\x01\x20\x01\x03\x00\x00\x0b\x00\x01\x04\x66\xc6
I've tried many things without much success.
Any help is appreciated
You might do
data = [66, 1, 32, 1, 3, 0, 0, 11, 0, 1, 4, 102, 198]
a = bytearray()
for item in data:
a.append(item)
byte_str = bytes(a)
Output:
b'B\x01 \x01\x03\x00\x00\x0b\x00\x01\x04f\xc6'

Why the original numpy array get changed while changing another array created from it?

i have a numpy array r when i used to create another array r2 out of it and turning that new array r2 to zero it also changed the original array r
I have searched around the similar questions but did not turned around the any satisfying answer for this, so please consider suggesting an appropriate answer.
Original Array:
>>> r
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
another numpy array from original array r2 as follows:
>>> r2 = r[:3, :3]
>>> r2
array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14]])
So, When i do set new array to r2 to zero
>>> r2[:] = 0
>>> r2
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])
So, when i see original array then it also been looked change:
Array Changed after chanin the new array:
>>> r
array([[ 0, 0, 0, 3, 4, 5],
[ 0, 0, 0, 9, 10, 11],
[ 0, 0, 0, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 30, 30, 30, 30, 30]])
Happy New Years in advanced, Guys!
Explanation
r2 = r[:3, :3]
Doesn't create a new array, but renames the current array. What you need to do is known as 'deep copy'. Use, numpy.copy() to do what you need.
x = np.array([1, 2, 3])
y = x
z = np.copy(x)
x[0] = 10
x[0] == y[0]
True
x[0] == z[0]
False
Read more from,
https://het.as.utexas.edu/HET/Software/Numpy/reference/generated/numpy.copy.html

Appending rows to numpy array using less memory

I have the following problem. I need to change the shape of one Numpy array to match the shape of another Numpy array by adding rows and columns.
Let's say this is the array that needs to be changed:
change_array = np.random.rand(150, 120)
And this is the reference array:
reference_array = np.random.rand(200, 170)
To match the shapes I'm adding rows and columns containing zeros, using the following function:
def match_arrays(change_array, reference_array):
cols = np.zeros((change_array.shape[0], (reference_array.shape[1] - change_array.shape[1])), dtype=np.int8)
change_array = np.append(change_array, cols, axis=1)
rows = np.zeros(((reference_array.shape[0] - change_array.shape[0]), reference_array.shape[1]), dtype=np.int8)
change_array = np.append(change_array, rows, axis=0)
return change_array
Which perfectly works and changes the shape of change_array to the shape of reference_array. However, using this method, the array needs to be copied twice in memory. I understand how Numpy needs to make a copy of the array in memory in order to have space to append the rows and columns.
As my arrays can get very large I am looking for another, more memory efficient method, that can achieve the same result. Thanks!
Here are a couple ways. In the code examples, I'll use the following arrays:
In [190]: a
Out[190]:
array([[12, 11, 15],
[16, 15, 10],
[16, 12, 13],
[11, 19, 10],
[12, 12, 11]])
In [191]: b
Out[191]:
array([[70, 82, 83, 93, 97, 55],
[50, 86, 53, 75, 75, 69],
[60, 50, 76, 52, 72, 88],
[72, 79, 66, 93, 58, 58],
[57, 92, 71, 97, 91, 50],
[60, 77, 67, 91, 91, 63],
[60, 90, 91, 50, 86, 71]])
Use numpy.pad:
In [192]: np.pad(a, [(0, b.shape[0] - a.shape[0]), (0, b.shape[1] - a.shape[1])], 'constant')
Out[192]:
array([[12, 11, 15, 0, 0, 0],
[16, 15, 10, 0, 0, 0],
[16, 12, 13, 0, 0, 0],
[11, 19, 10, 0, 0, 0],
[12, 12, 11, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])
Or, use a more efficient version of your function, in which the result is preallocated as an array of zeros with the same shape as reference_array, and then the values in change_array are copied into the result:
In [193]: def match_arrays(change_array, reference_array):
...: result = np.zeros(reference_array.shape, dtype=change_array.dtype)
...: nrows, ncols = change_array.shape
...: result[:nrows, :ncols] = change_array
...: return result
...:
In [194]: match_arrays(a, b)
Out[194]:
array([[12, 11, 15, 0, 0, 0],
[16, 15, 10, 0, 0, 0],
[16, 12, 13, 0, 0, 0],
[11, 19, 10, 0, 0, 0],
[12, 12, 11, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0]])

Resources