I have a large dask array containing approx 300 million records and 3 numeric columns
It looks like roughly like (first few records):
2345 947 23
12 234 924
9 8 0
349 276 345
etc...
I would like to add say 100 on to all the values contained in column 2 such that I get the below dask array. Any ideas?
2345 1047 23
12 334 924
9 108 0
349 376 345
etc...
The easiest way might to just be switch it over to a DataFrame and do the assignment there before switching back to an array:
df = darr.to_dask_dataframe(columns=["a", "b", "c"])
df["b"] += 100
darr = df.to_dask_array()
darr.compute()
This also has the benefit of being fairly obvious as to what is happening.
I also took a shot at this using a generalized ufunc -- I couldn't get da.apply_gufunc to work for me in combination with np.add.at but I'm still working to grok ufuncs myself so there's likely a faster or more compact way to do it but this appears to work:
import numpy as np
import dask.array as da
darr = da.array([
[2345, 947, 23],
[12, 234, 924],
[9, 8, 0],
[349, 276, 345]])
def add_at(arr, at, val):
np.add.at(arr, at, val)
return arr
gufunc_add_at = da.gufunc(add_at,
signature="(i),(),()->(i)",
output_dtypes=darr.dtype,
vectorize=True)
gufunc_add_at(darr, 1, 100).compute()
This is a bit clunky but seems to work
import dask.array as da
darr = da.array([
[2345, 947, 23],
[12, 234, 924],
[9, 8, 0],
[349, 276, 345]])
print(darr.compute())
x=darr[:,0].reshape(4,1).compute()
y=(darr[:,1] + 100).reshape(4,1).compute()
z=darr[:,2].reshape(4,1).compute()
t= da.stack([x,y,z], axis=1).reshape(4,3)
t.compute()
Output:
[[2345 947 23]
[ 12 234 924]
[ 9 8 0]
[ 349 276 345]]
array([[2345, 1047, 23],
[ 12, 334, 924],
[ 9, 108, 0],
[ 349, 376, 345]])
This is possibly an improvement to my first answer
from dask.array import from_array, add
from numpy import array
darr = da.array([
[2345, 947, 23],
[12, 234, 924],
[9, 8, 0],
[349, 276, 345]])
vector = from_array(array([[0],[100],[0]]))
add(darr.T, vector).T.compute()
Output
array([[2345, 1047, 23],
[ 12, 334, 924],
[ 9, 108, 0],
[ 349, 376, 345]])
Related
I have 3 arrays down below a and b combine to make a_and_b. a is multiplied by a_multiplier and b gets multiplied by b_multiplier. How would I be able to modify a_and_b after the multiplier has been implemented in it.
Code:
import numpy as np
a_multiplier = 3
b_multiplier = 5
a = np.array([5,32,1,4])
b = np.array([1,5,11,3])
a_and_b = np.array([5,1,32,5,1,11,4,3])
Expected Output:
[15, 5, 96, 25, 3, 55, 12, 15]
first learn how to use the multiply:
In [187]: a = np.array([5,32,1,4])
In [188]: a*3
Out[188]: array([15, 96, 3, 12])
In [189]: b = np.array([1,5,11,3])
In [190]: b*5
Out[190]: array([ 5, 25, 55, 15])
One way to combine the 2 arrays:
In [191]: np.stack((a*3, b*5),axis=1)
Out[191]:
array([[15, 5],
[96, 25],
[ 3, 55],
[12, 15]])
which can be easily turned into the desired 1d array:
In [192]: np.stack((a*3, b*5),axis=1).ravel()
Out[192]: array([15, 5, 96, 25, 3, 55, 12, 15])
I have the following array
import numpy as np
single_array =
[[ 1 80 80 80]
[ 2 80 80 89]
[ 3 52 50 90]
[ 4 39 34 54]
[ 5 37 47 32]
[ 6 42 42 27]
[ 7 42 52 27]
[ 8 38 33 28]
[ 9 42 37 42]]
and want to create another array with all unique sums of 2 rows within this single_array so that 1+2 and 2+1 are treated as duplicates and are only included once.
First I would like to update the 0th column of the array to multiply each value by 10 (so I can identify the corresponding matching), then I want to add up every 2 rows and append them into the new array.
Output should look like this:
double_array=
[[12 160 160 169]
[13 132 130 170]
[14 119 114 134]
...
[98 80 70 70]]
Can I use itertools.combinations to get a 3D array with two unique combinations and then add the rows on the corresponding 3rd axis?
This
import numpy as np
from itertools import combinations
single_array = np.array(
[[ 1, 80, 80, 80],
[ 2, 80, 80, 89],
[ 3, 52, 50, 90],
[ 4, 39, 34, 54],
[ 5, 37, 47, 32],
[ 6, 42, 42, 27],
[ 7, 42, 52, 27],
[ 8, 38, 33, 28],
[ 9, 42, 37, 42]]
)
np.vstack([single_array[i] * np.array([10, 1, 1, 1]) + single_array[j]
for i, j in combinations(range(single_array.shape[0]), 2)])
does what you ask for in terms of specified input and output; I'm not sure if it's what you actually need. I don't think it will scale to big inputs.
A 3D array to find this sum would be ragged (first "layer" would be 9 deep, next one 8, etc.); you could maybe get around this with NaNs or masking. It also wouldn't scale that well for big inputs: you'd be allocating twice as much memory as you need, and then have to index out ragged layers to get your final output.
If you have to do this fast for big arrays, I suggest a pre-allocated output array and a for-loop with Numba:
from numba import jit
#jit(nopython=True)
def unique_row_sums(a):
n = a.shape[0]
b = np.empty((n*(n-1)//2, a.shape[1]))
s = np.array([10, 1, 1, 1])
k = 0
for i in range(n):
for j in range(i+1, n):
b[k] = s * a[i] + a[j]
k += 1
return b
In my not-too-careful testing with IPython's %timeit, this took about 4µs versus 152µs for the itertools-based version with your data, and should scale better.
I'm trying want to fetch the rows that are having even numbers from the array below:
mat1 = np.array([[23,45,63],[22,78,43],[12,77,47],[53,47,33]]).reshape(4,3)
mat1
array([[23, 45, 63],
[22, 78, 43],
[12, 77, 47],
[53, 47, 33]])
And the below code returns only the values..
mat1[mat1%2==0]
array([22, 78, 12])
Is there any way to fetch the entire row/column having the even numbers?
You can do that like this:
import numpy as np
mat1 = np.array([[23,45,63],[22,78,43],[12,77,47],[53,47,33]])
is_even = (mat1 % 2 == 0)
# Rows
print(mat1[is_even.any(1)])
# [[22 78 43]
# [12 77 47]]
# Columns
print(mat1[:, is_even.any(0)])
# [[23 45]
# [22 78]
# [12 77]
# [53 47]]
Imagine we have the following array of 3 arrays, covering the range 1 to 150:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ... 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
[51, 52, 53, 54, 55, 56, 57, 58, 59, 60 ... 92, 93, 94, 95, 96, 97, 98, 99, 100, 107]
[71, 73, 84, 101, 102, 103, 104, 105, 106, 108 ... 141, 142, 143, 144, 145, 146, 147, 148, 149, 150]
I want to build an array that stores in which array we find the values 1 to 150. The result must be then:
[1 1 1 ... 1 2 2 2 ... 2 3 2 3 2 ... 3 3 3 ... 3],
where each element corresponds to 1, 2, 3, ... ,150. The obtained array gives then the array-membership of the elements 1 to 150. The code must be applied for any number of arrays (so not only 3 arrays).
You can use an array comprehension. Here is an example with three vectors containing the range 1:10:
A = [1, 3, 4, 5, 7]
B = [2, 8, 9]
C = [6, 10]
Now we can write a comprehension using in with a fallback error to guard :
julia> [x in A ? 1 : x in B ? 2 : 3 for x in 1:10]
10-element Array{Int64,1}:
1
⋮
3
Perhaps also include a fallback error, in case the input is wrong
julia> [x in A ? 1 : x in B ? 2 : x in C ? 3 : error("not found") for x in 1:10]
10-element Array{Int64,1}:
1
⋮
3
Trade memory for search in this case:
Make an array to record which array each value is in.
# example arrays
N=100; A=rand(1:N,30);
B = rand(1:N,40);
C = rand(1:N,35);
# record array containing each value:
A=1,B=2,C=3;
not found=0;
arrayin = zeros(Int32, max(maximum(A),maximum(B),maximum(C)));
arrayin[A] .= 1;
arrayin[B] .= 2;
arrayin[C] .=3;
I have an array of arrays that represent matrices and I need to transpose each matrix, ideally without transposing in a loop. When I use array.T, it transposes everything, not just the axes in each array. Is it possible to just transpose each matrix?
INPUT: np.arange(27).reshape(3, 3, 3).T
OUTPUT:
[[[ 0 9 18]
[ 3 12 21]
[ 6 15 24]]
[[ 1 10 19]
[ 4 13 22]
[ 7 16 25]]
[[ 2 11 20]
[ 5 14 23]
[ 8 17 26]]]
What I want is for the arrays to look like this:
[[[ 0 3 6]
[ 1 4 7]
[ 2 5 8]]
[[ 9 12 15]
[ 10 13 16]
[ 11 14 17]]
[[ 18 21 24]
[ 19 22 25]
[ 20 23 26]]]
In [11]: A = np.arange(27).reshape(3, 3, 3)
In [12]: A
Out[12]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
transpose the last 2 dimensions:
In [13]: A.transpose(0,2,1)
Out[13]:
array([[[ 0, 3, 6],
[ 1, 4, 7],
[ 2, 5, 8]],
[[ 9, 12, 15],
[10, 13, 16],
[11, 14, 17]],
[[18, 21, 24],
[19, 22, 25],
[20, 23, 26]]])
A.swapaxes(1,2) also does it.