Julia iteratively make 3D array from 2D arrays - arrays

I am trying to make a 3D array from many 2D arrays.
Image Files
Each image becomes a 2D array.
https://drive.google.com/drive/folders/1xBucvqhKFjAfbRIhq5wjr40kSjNor_0t?usp=sharing
using Images, Colors
paths = readdir(
"/Users/me/Downloads/ct_scans"
, join = true
)
images_3D = []
for p = paths
img = load(p)
gray = Gray.(img)
arr = convert(Array{Float64}, gray) # <----- 2D array
append!(images_3d, arr)
end
>>> size(images_3d)
(1536000) # <--- 1D view?
>>> 1536000 == 80*160*120
true
>>> reshaped_3d = reshape(images_3d, (80,160,120))
>>> Gray.(reshaped_3d[1,:,:])
# 160x120 scrambled mess of pixels not rearranged as expected
append! makes a size== 1D array that does not reshape as expected.
Whereas push! creates an array of hard arrays that keep their shape. It’s not technically 3D, just an 80 element vector.
When I tried to initialize an empty 3D and then overwrite each 2D with my own 2D image I got Matrix{Float64} to Float64 type conversion failures.
Can’t iteratively vcat 2D arrays because cannot overwrite variables.
Part of the reason for posting this is to see how Julia programmers approach multi-dimensional arrays.

There's multiple ways to do this, you'll have to tty and test which one is the best in your case.
with append! and resize
Arrays in Julia should start iterating with the first index, which the number of images is the last index. If 80 is the amount of images, the reshape should be
reshape(images_3d, (160,120,80))
(maybe exchange 120 and 160, not sure about this one).
And then to get the first image, it's reshaped_3d[:,:,1]
with push!
push!ing the matrices and then creating the 3d array with cat would work too :
julia> A = [rand(3,4) for i in 1:2];
julia> cat(A..., dims=3)
3×4×2 Array{Float64, 3}:
[:, :, 1] =
0.372747 0.17654 0.398272 0.231992
0.514789 0.342374 0.399816 0.277959
0.908909 0.864676 0.9788 0.585375
[:, :, 2] =
0.358169 0.816448 0.0558052 0.404178
0.747453 0.80815 0.384903 0.447053
0.314895 0.46264 0.947465 0.170982
initialize the 3D Array (probably the best one)
and fill it up progressively
julia> A = Array{Float64}(undef,3,4,2);
julia> for i in 1:2
A[:,:,i] = rand(3,4)
end
julia> A
3×4×2 Array{Float64, 3}:
[:, :, 1] =
0.478106 0.829818 0.526572 0.644238
0.714812 0.781246 0.93239 0.759864
0.523958 0.955136 0.70079 0.193489
[:, :, 2] =
0.481405 0.561407 0.184557 0.449584
0.547769 0.170311 0.371797 0.538843
0.0285712 0.731686 0.00126473 0.452273

Just add to the accepted answer, looping over the first index will be even faster. Consider the following two functions, test1() is faster to run because the loop is in the first index.
aa_stack1 = zeros(3, 10000, 10000);
aa_stack3 = zeros(10000, 10000, 3);
function test1()
for ii = 1:3
aa_stack1[ii, :, :] = rand(10000, 10000)
end
end
function test2()
for ii = 1:3
aa_stack3[:, :, ii] = rand(10000, 10000)
end
end
#time test1()
#time test2()
The first way "maximizes memory locality and reduces cache misses" because "when you iterate over the first dimension, the values of the other two dimensions are kept in cache, which means that accessing them takes less time" (according to ChatGPT).

Related

Filling the array with array does not work as I expected

I want to make a multiple array whose entry is multiple array, and want to push some array one by one into the entry.
For example, I made 2 x 3 Matrix named arr and tried to fill [1,1] and [1,2] entries with 4 x 4 Matrix spawned by randn(4,4).
arr = fill(Matrix{Float64}[], 2, 3)
push!(arr[1,1],randn(4,4))
push!(arr[1,2],randn(4,4))
println(arr[1,1])
println(arr[1,2])
println(arr[1,3])
However, the result is all the entries of arr (other than [1,1] and [1,2]) were filled with the same randn(4,4), instead of just [1,1] and [1,2] filled with randn(4,4):
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
[[-0.15122805007483328 0.6132236453930502 -0.9090110366765862 1.2589924202099898; -1.120611384326006 -0.9083935218058066 0.7252290006516056 1.0970416725786256; -0.19173238706933265 1.3610525411901113 -0.05258697093572793 0.7776085390912448; 0.18491459001855373 -2.0537142669734934 0.3482557186126859 0.0047622478008474845], [0.23422967703060255 -0.51986351753462 0.45947166573674303 0.31316899298864387; 0.3704450103622709 -0.8186574197233013 -0.9990329964554037 -0.8345957519924763; 0.56641529964098 -0.8393435538481216 -0.6379336546939682 1.1843452368116358; 0.9435767553275002 0.0033471181565433127 -1.191611491619908 1.3970554854927264]]
What is wrong?
Any information would be appreciated.
When you do arr = fill(Matrix{Float64}[], 2, 3) all 6 elements point into exactly the same location in memory because fill does not make deep copy - it just copies the references. Basically, using fill when the first argument is mutable usually turns out not to be a good idea.
Hence what you actually want is:
arr = [Matrix{Float64}[] for i in 1:2, j in 1:3]
Now each of 6 slots will have its own address in the memory.
This way of creating the array implies that each element will be Float64, i.e. a scalar. You need to fix the type signature. So for instance you could do
D = Matrix{Array{Float64, 2}}(undef, 2, 3)
if you want it to have 2-dimensional arrays as elements (the Float64,2 does that)
and then allocate
D[1,1] = rand(4,4)
D[1,2] = rand(4,4)
to give you (or rather, me!):
julia> D[1,1]
4×4 Matrix{Float64}:
0.210019 0.528594 0.0566622 0.0547953
0.729212 0.40829 0.816365 0.804139
0.39524 0.940286 0.976152 0.128008
0.886597 0.379621 0.153302 0.798803
julia> D[1,2]
4×4 Matrix{Float64}:
0.640809 0.821668 0.627057 0.382058
0.532567 0.262311 0.916391 0.200024
0.0599815 0.17594 0.698521 0.517822
0.965279 0.804067 0.39408 0.105774

Inconsistent Results - Jupyter Numpy & Transpose

enter image description here
I am getting odd behavior with Jupyter/Numpy/Tranpose()/1D Arrays.
I found another post where transpose() will not transpose a 1D array, but in previous Jupyter notebooks, it does.
I have an example where it is inconsistent, and I do not understand:
Please see the picture attached of my jupyter notebook if 2 more or less identical arrays with 2 different outputs.
It seems it IS and IS NOT transposing the 1D array. Inconsistency is bad
outputs is (1000,) and (1,1000), why does this occur?
# GENERATE WAVEORM:
#---------------------------------------------------------------------------------------------------
N = 1000
fxc = []
fxn = []
for t in range(0,N):
fxc.append(A1*m.sin(2.0*pi*50.0*dt*t) + A2*m.sin(2.0*pi*120.0*dt*t))
fxn.append(A1*m.sin(2.0*pi*50.0*dt*t) + A2*m.sin(2.0*pi*120.0*dt*t) + 5*np.random.normal(u,std,size=1))
#---------------------------------------------------------------------------------------------------
# TAKE TRANSPOSE:
#---------------------------------
fc = np.transpose(np.array(fxc))
fn = np.transpose(np.array(fxn))
#---------------------------------
# PRINT DIMENSION:
#---------------------------------
print(fc.shape)
print(fn.shape)
#---------------------------------
Remove size=1 from your call to numpy.random.normal. Then it will return a scalar instead of a 1-d array of length 1.
For example,
In [2]: np.random.normal(0, 3, size=1)
Out[2]: array([0.47058288])
In [3]: np.random.normal(0, 3)
Out[3]: 4.350733438283539
Using size=1 in your code is a problem, because it results in fxn being a list of 1-d arrays (e.g. something like [[0.123], [-.4123], [0.9455], ...]. When NumPy converts that to an array, it has shape (N, 1). Transposing such an array results in the shape (1, N).
fxc, on the other hand, is a list of scalars (e.g. something like [0.123, 0.456, ...]). When converted to a NumPy array, it will be a 1-d array with shape (N,). NumPy's transpose operation swaps dimensions, but it does not create new dimensions, so transposing a 1-d array does nothing.

MATLAB: How to subset a multidimensional matrix using 1-D vector indices without for loops?

I am currently looking for an efficient way to slice multidimensional matrices in MATLAB. Ax an example, say I have a multidimensional matrix such as
A = rand(10,10,10)
I would like obtain a subset of this matrix (let's call it B) at certain indices along each dimension. To do this, I have access to the index vectors along each dimension:
ind_1 = [1,4,5]
ind_2 = [1,2]
ind_3 = [1,2]
Right now, I am doing this rather inefficiently as follows:
N1 = length(ind_1)
N2 = length(ind_2)
N3 = length(ind_3)
B = NaN(N1,N2,N3)
for i = 1:N1
for j = 1:N2
for k = 1:N3
B(i,j,k) = A(ind_1(i),ind_2(j),ind_3(k))
end
end
end
I suspect there is a smarter way to do this. Ideally, I'm looking for a solution that does not use for loops and could be used for an arbitrary N dimensional matrix.
Actually it's very simple:
B = A(ind_1, ind_2, ind_3);
As you see, Matlab indices can be vectors, and then the result is the Cartesian product of those vector indices. More information about Matlab indexing can be found here.
If the number of dimensions is unknown at programming time, you can define the indices in a cell aray and then expand into a comma-separated list:
ind = {[1 4 5], [1 2], [1 2]};
B = A(ind{:});
You can reference data in matrices by simply specifying the indices, like in the following example:
B = A(start:stop, :, 2);
In the example:
start:stop gets a range of data between two points
: gets all entries
2 gets only one entry
In your case, since all your indices are 1D, you could just simply use:
C = A(x_index, y_index, z_index);

Julia Approach to python equivalent list of lists

I just started tinkering with Julia and I'm really getting to like it. However, I am running into a road block. For example, in Python (although not very efficient or pythonic), I would create an empty list and append a list of a known size and type, and then convert to a NumPy array:
Python Snippet
a = []
for ....
a.append([1.,2.,3.,4.])
b = numpy.array(a)
I want to be able to do something similar in Julia, but I can't seem to figure it out. This is what I have so far:
Julia snippet
a = Array{Float64}[]
for .....
push!(a,[1.,2.,3.,4.])
end
The result is an n-element Array{Array{Float64,N},1} of size (n,), but I would like it to be an nx4 Array{Float64,2}.
Any suggestions or better way of doing this?
The literal translation of your code would be
# Building up as rows
a = [1. 2. 3. 4.]
for i in 1:3
a = vcat(a, [1. 2. 3. 4.])
end
# Building up as columns
b = [1.,2.,3.,4.]
for i in 1:3
b = hcat(b, [1.,2.,3.,4.])
end
But this isn't a natural pattern in Julia, you'd do something like
A = zeros(4,4)
for i in 1:4, j in 1:4
A[i,j] = j
end
or even
A = Float64[j for i in 1:4, j in 1:4]
Basically allocating all the memory at once.
Does this do what you want?
julia> a = Array{Float64}[]
0-element Array{Array{Float64,N},1}
julia> for i=1:3
push!(a,[1.,2.,3.,4.])
end
julia> a
3-element Array{Array{Float64,N},1}:
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
[1.0,2.0,3.0,4.0]
julia> b = hcat(a...)'
3x4 Array{Float64,2}:
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
1.0 2.0 3.0 4.0
It seems to match the python output:
In [9]: a = []
In [10]: for i in range(3):
a.append([1, 2, 3, 4])
....:
In [11]: b = numpy.array(a); b
Out[11]:
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
I should add that this is probably not what you actually want to be doing as the hcat(a...)' can be expensive if a has many elements. Is there a reason not to use a 2d array from the beginning? Perhaps more context to the question (i.e. the code you are actually trying to write) would help.
The other answers don't work if the number of loop iterations isn't known in advance, or assume that the underlying arrays being merged are one-dimensional. It seems Julia lacks a built-in function for "take this list of N-D arrays and return me a new (N+1)-D array".
Julia requires a different concatenation solution depending on the dimension of the underlying data. So, for example, if the underlying elements of a are vectors, one can use hcat(a) or cat(a,dims=2). But, if a is e.g a 2D array, one must use cat(a,dims=3), etc. The dims argument to cat is not optional, and there is no default value to indicate "the last dimension".
Here is a helper function that mimics the np.array functionality for this use case. (I called it collapse instead of array, because it doesn't behave quite the same way as np.array)
function collapse(x)
return cat(x...,dims=length(size(x[1]))+1)
end
One would use this as
a = []
for ...
... compute new_a...
push!(a,new_a)
end
a = collapse(a)

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

Resources