Cannot Flatten NumPy ndarray/ How to read binary file more intelligently - arrays

I am writing a numpy based .PLY importer. I am only interested in binary files, and vertices, faces and vertex colors. My target data format is a flattened list of x,y,z floats for the vertex data and r,g,b,a integers for the color data.
[x0,y0,z0,x1,y1,z1....xn,yn,zn]
[r0,g0,b0,a0,r1,g1,b1,a1....rn,gn,bn,an]
This allows me to use fasts builtin C++ methods to construct the mesh in the target program (Blender).
I am using a modified version of this code to read in the data into numpy arrays example
valid_formats = {'binary_big_endian': '>','binary_little_endian': '<'}
ply = open(filename, 'rb')
# get binary_little/big or ascii
fmt = ply.readline().split()[1].decode()
# get extension for building the numpy dtypes
ext = valid_formats[fmt]
ply.seek(end_header)
#v_dtype = [('x','<f4'),('y','<f4'), ('z','<f4'), ('red','<u1'), ('green','<u1'), ('blue','<u1'),('alpha','<u1')]
#points_size = (previously read in from header)
points_np = np.fromfile(ply, dtype=v_dtype, count=points_size)
The results being
print(points_np.shape)
print(points_np[0:3])
print(points_np.ravel()[0:3])
>>>(158561,)
>>>[ (20.781816482543945, 11.767952919006348, 15.565438270568848, 206, 216, 186, 255)
(20.679922103881836, 11.754084587097168, 15.560364723205566, 189, 196, 157, 255)
(20.72969627380371, 11.823691368103027, 15.51106071472168, 192, 193, 157, 255)]
>>>[ (20.781816482543945, 11.767952919006348, 15.565438270568848, 206, 216, 186, 255)
(20.679922103881836, 11.754084587097168, 15.560364723205566, 189, 196, 157, 255)
(20.72969627380371, 11.823691368103027, 15.51106071472168, 192, 193, 157, 255)]
So the ravel (I've also tried flatten, reshape etc) does work and I presume it is because the data types are (float, float, float, int, int, int).
What I have tried
-I've tried doing things like vectorizing a function that just pulls out the xyz and rgb separately into a new array.
-I've tried stack, vstack etc
List comprehension (yuck)
-Things like thes take 1 to 10s of seconds to execute compared to hundredths of seconds to read in the data.
-I have tried using astype on the verts data, but that seems to return only the first element.
convert to structured array
accessing first element of each element
Most efficient way to map function over numpy array
What I want to Try/Would Like to Know
Is there a better way to read the data in the data in the first place so I don't loose all this time reshaping, flattening etc? Perhaps by telling np.fromfile to skip over the color data on one pass and then come back and read it again?
Is there a numpy trick I don't know for reshaping/flattening data of this kind

Related

ValueError: could not broadcast input array from shape (180,180,3) into shape (1,3,180,180)

I'm trying to get an image shape of (1,3,180,180) from original shape, which is (1224, 1842, 3). I've tried specifying the shape like this:
im_cv = cv.imread('test.jpg')
im_cv = cv.resize(im_cv, (1,3,180,180))
But get the error
File "/Users/lucasjacaruso/Desktop/hawknet-openvino/experiment.py", line 47, in <module>
im_cv = cv.resize(im_cv, (1,3,180,180))
cv2.error: OpenCV(4.5.3-openvino) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
> - Can't parse 'dsize'. Expected sequence length 2, got 4
> - Can't parse 'dsize'. Expected sequence length 2, got 4
However, the model will not accept anything other than (1,3,180,180). If I simply specify the shape as (180,180), it's not accepted by the model:
ValueError: could not broadcast input array from shape (180,180,3) into shape (1,3,180,180)
How can I get the shape to be (1,3,180,180)?
Many thanks.
resize is to change the image dimensions, e.g. to change the image from (180,180,3) to say (300,300,3).
You need to add a new dimension with np.newaxis. Further, as imread will return the depth (color) dimension as axis 2, you need to move the axis:
im_cv = np.moveaxis(im_cv, 2, 0)[np.newaxis]

IndexError: tuple index out of range using Partial Correlation function

I'm using the partial correlation function developed by Fabian Pedregosa-Izquierdo (a MatLab copy of parrcor).
However, I'm trying to apply it to my data I keep getting the following error:
Traceback (most recent call last):
File "atd.py", line 280, in <module>
partialcorr = partial_corr(values_outliers)
File "/Users/dingo/Desktop/ATD/MiniProjATD/partial_corr.py", line 50, in partial_corr
p = C.shape[1]
IndexError: tuple index out of range
My values_outliers is an np.array as follows: https://pastebin.com/AHhwmpTg
The implementation of the partial correlation code can be found here: https://gist.github.com/fabianp/9396204419c7b638d38f
Thank you very much!
The function you posted expects to receive an n x m matrix as an argument. You are passing it an array of length n. To get your data into the right shape, you can do something like:
my_data = [1.234, 5.6789, -32.101]
C = np.array(my_data).reshape((-1,1))
partial_corr(C)
The (-1,1) argument to reshape says to put all the data in the first column of n x 1 array.

Match the values of 2 arrays

I am trying to create a program that will rate a bunch of arrays to find the one that most closely matches a given array. So say the given array is
[1, 80, 120, 155, 281, 301]
And one of the array to compare against is
[-6, 78, 108, 121, 157, 182, 218, 256, 310, 408, 410]
How can I match up the values in the first array to their closes values in the second array that will give it the lowest total difference.
In case this is unclear
1 => -6, 80 => 78, 120 => 121, 155 => 157
Than 281 should match up to 310 since it is closer than 256 however this would force 301 to match to 256. So the best overall match would be
281=>256 and 301=> 310
Then the program would simply calculate a rating by doing
abs(-6 - 1) + abs(78-80) etc for all matches. And the array with the lowest rating is the best match
*******NOTE*******
The given array will be the same size or smaller than the matching array and will only have positive values. The matching array can have negative values.
I was thinking of using cosine similarity but I am unsure how to implement that for this problem.
In general a computed distance is more accurate. There are different approaches with advantages and disadvantages. In your example you compute the sum of one dimension euclidean distances. But there are more extended comparisons like the dynamic time warping. It's an algorithm, which finds the best alignment between two 'arrays' and computes the optimal distance.
You can install and use this package. Here you can see a visual example. One other advantage of the DTW is, that the length of the arrays don't have to be matched.

Finding [index of] the minimal value in array which satisfies a condition in Fortran

I am looking for a minimal value in an array which is larger than a certain number. I found this discussion which I don't understand. There is MINLOC, but it looks like it does not do as much as I would like on its own, though I didn't parse the arguments passed to it in the given examples. (It is also possible to do this using a loop but it could be clumsy.)
You probably want MINVAL.
If your array is say,
array = (/ 21, 52, 831, 46, 125, 68, 7, 8, 549, 10 /)
And you want to find the minimum value greater than say 65,
variable = minval(array, mask=(array > 65))
which would obviously give 68.
It sounds like MINVAL is what you want.
You just need to do something like:
min_above_cutoff = MINVAL(a, MASK=(a > cutoff))
The optional parameter MASK should be a logical array with the same size as a. It tells MINVAL which elements in a to consider when searching for the minimum value.
Take a look at the documentation here: MINVAL
If you would instead like to get the index of the minimum value, rather than the value itself, you can use MINLOC. In this case the code would look like:
index = MINLOC(a, MASK=(a > cutoff))
Documentation can be found here: MINLOC

making histogram from a csv file

I am trying to read a column of data from a csv file and create a histogram for it. I could read the data into an array but was not able to make the histogram. Here is what I did:
thimar=csv.reader(open('thimar.csv', 'rb'))
thimar_list=[]
thimar_list.extend(thimar)
z=[]
for data in thimar_list:
z.append(data[7])
zz=np.array(z)
n, bins, patches = plt.hist(zz, 50, normed=1)
which gives me the error:
TypeError: cannot perform reduce with flexible type
Any idea what is going on?
modify the sixth line to cast string to numeric
z.append(float(data[7]))
with this i got some plot with my made up data.
Here are two options, this one will work if all your columns are made up of numbers:
array = np.loadtxt('thimar.csv', 'float', delimiter=',')
n, bins, patches = plt.hist(array[:, 7], 50, normed=1)
this one is better if you have non-numeric columns in your file (ie Name, Gender, ...):
thimar = csv.reader(open('thimar.csv', 'rb'))
thimar_list = list(thimar)
zz = np.array([float(row[7]) for row in thimar_list])
n, bins, patches = plt.hist(zz, 50, normed=1)

Resources