Pytables Value Error (rank of the appended object and "..."EArray differ) - arrays

I am trying to use pytables to store my images dataset. I am using Earray to append each image as it is read. The dimensions of my Earray and image are same(except for the first, along which appending is done). I am using the following code:
atom = Atom.from_dtype(np.dtype(np.uint32,(278,278,1)))
i=0
for <read each image from folder using os into img>:
im = cv2.imread(img.path,0)
im = np.expand_dims(im,2) #this is because keras needs 3d images and grayscale images are 2d
if not i:
X = data.create_earray(dataGroup,"X",atom,(0,)+im.shape,chunkshape=(20,20,20,1))
X.append(np.expand_dims(im,0)) #as appending require same dim.
i=1
But still when I run the code, it gives my ValueError saying the my object rank is 1 and X rank is 4. How is that possible when I am assigning X size using im. I even tried printing shape of im, it gives (278,278,1). So, what is the problem? I am using Pytables for first time, so dont know them in depth.

Adding a second answer with a more complicated write method plus an EArray.read example. Frankly, I prefer my simpler method (above) to create the EArray with obj= defined, and let Pytables handle the data structures. However, if you prefer to manage this yourself, see example 2 (below). Key items to note:
Atom definition has 4 dimensions, with 0 axis set to zero (defines
the direction that will be extended).
im = np.expand_dims(im,0) is done until AFTER im.shape is referenced in the
definition of the EArray shape at creation.
[UPDATED CODE BELOW]
import tables as tb, numpy as np
data = tb.open_file("image_data1.h5", mode='w')
dataGroup = data.create_group(data.root, 'MyData')
MyAtom = tb.Atom.from_dtype(np.dtype(np.uint32,(0,278,278,1)))
im = np.arange(278*278).reshape((278,278))
im = np.expand_dims(im,2)
X = data.create_earray(dataGroup,"X", MyAtom, (0,)+im.shape)
im = np.expand_dims(im,0)
X.append( im )
print ('flavor =', X.flavor )
print ('dim=', X.ndim, ', rows = ', X.nrows)
im = np.arange(278*278,278*278+278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X.append( im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
data.close()
Here are the lines you need to read the data from EArray X (with a couple of print statements to verify values in the corners). This should work so long as the EArray flavor is Numpy (as it is in my example). You can also use the out= parameter to specify a NumPy array to receive the output data. There are other methods to access EArray data, including .iterrows() to iterate, and .__getitem__() to slice with fancy indexing. Read the Pytables documentation if you want to do any of these.
Y_1 = X.read( 0 )
print (Y_1[0,0,0])
print (Y_1[-1,-1,-1])
Y_2 = X.read( 1 )
print (Y_2[0,0,0])
print (Y_2[-1,-1,-1])

First, note that you don't have to create the EArray before you load the first image dataset. Pytables is smart enough to determine the atom and shape definition from the first object.
It was hard for me to exercise your code without a complete example and your data. So, I created a very simple example that uses np.arange() to create a couple of (278,278) image arrays, then extends them in the 2 and 0 directions. Hopefully this mimics the data you are trying to load to the EArray. The 2 Pytables functions (file.create_earray and earray.append) create 2 rows of data, 1 for each "image". After running, open image_data1.h5 with HDFView and inspect the data.
Maybe this will help you understand how to load your images to HDF5 Earrays:
import tables as tb, numpy as np
data = tb.open_file("image_data1.h5", mode='w')
dataGroup = data.create_group(data.root, 'MyData')
im = np.arange(278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X = data.create_earray( dataGroup,"X",obj=im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
im = np.arange(278*278, 278*278+278*278).reshape((278,278))
im = np.expand_dims(im,2)
im = np.expand_dims(im,0)
X.append( im )
print ('dim=', X.ndim, ', rows = ', X.nrows)
data.close()

Related

Write into re-opend NetCDF4 file

I'm trying to write numbers into an array with an unlimited dimensions. The file I've created is structured like this :
import netCDF4 as nc4
rootgrp = nc4.Dataset("test.nc",'a',format="NETCDF4")
mgrp= rootgrp.createGroup('Flex')
mgrp.createDimension('pv',None)
mgrp.createDimension('s',4)
a = mgrp.createVariable('fill',"f8",('pv','s'))
rootgrp.close()
Now I'm trying to fill this array like this :
while i<10:
f = nc4.Dataset("test.nc",'r+',format="NETCDF4")
fgrp= f.groups['Flex']
fgrp['fill'][i][0] = i
print(fgrp['fill'][i][:])
f.groups['Flex'].variables['fill'][i][3] = i
f.close()
i=i+1
But I'm always getting a 'dimension out of bounds' error even though it's telling me that I've no dimension limit. Even if I use an array with fixed 100x4 dimension i still get the same error.
Would appreciate any kind of help.
This line is the problem:
fgrp['fill'][i][0] = i
fgrp['fill'][i] gets the ith row from the 'fill' variable. It then immediately tries to index into that row with [0], which errors out because there's nothing in that row. To solve this problem, do the indexing in one step instead:
fgrp['fill'][i, 0] = i

Extract Data From NetCDF4 File Using List

I am using a list of integers corresponding to an x,y index of a gridded NetCDF array to extract specific values, the initial code was derived from here. My NetCDF file has a single dimension at a single timestep, which is named TMAX2M. My code written to execute this is as follows (please note that I have not shown the call of netCDF4 at the top of the script):
# grid point lists
lat = [914]
lon = [2141]
# Open netCDF File
fh = Dataset('/pathtofile/temperaturedataset.nc', mode='r')
# Variable Extraction
point_list = zip(lat,lon)
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][i,j])
print(dataset_list)
The code executes, and the result is as follows:
masked_array(data=73,mask=False,fill_value=999999,dtype=int16]
The data value here is correct, however I would like the output to only contain the integer contained in "data". The goal is to pass a number of x,y points as seen in the example linked above and join them into a single list.
Any suggestions on what to add to the code to make this achievable would be great.
The solution to calling the particular value from the x,y list on single step within the dataset can be done as follows:
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][:][i,j])
The previous linked example contained [0,16] for the indexed variables, [:] can be used in this case.
I suggest converting to NumPy array like this:
for i, j in point_list:
dataset_list.append(np.array(fh.variables['TMAX2M'][i,j]))

Move N elements in array from back to front

I have a text file contains 2 columns, I need to select one column of them as an array
which contains 200000 and cut N elements from this array and move them from back to front.
I used the following code:
import numpy as np
import glob
files = glob.glob("input/*.txt")
for file in files:
data_file = np.loadtxt(file)
2nd_columns = data_file [:,1]
2nd_columns_array = np.array(2nd_columns)
cut = 62859 # number of elements to cut
remain_points = 2nd_columns_array[:cut]
cut_points = 2nd_columns_array[cut:]
new_array = cut_points + remain_points
It doesn't work and gave me the following error:
ValueError: operands could not be broadcast together with shapes (137141,) (62859,)
any help, please??
It doesn't work because you are trying to add values stored in both arrays and they have different shapes.
One of the ways is to use numpy.hstack:
new_array = np.hstack((2nd_columns_array[cut:], 2nd_columns_array[:cut]))
Side notes:
with your code you will reorder only 2nd column of the last file since reordering is outside of the for loop
you don't need to store cut_poinsts nor remain_points in separate variables. You can operate directly on the 2nd_columns_array
you shouldn't name variables starting from a number
A simple method for this process is numpy.roll.
new_array = np.roll(2nd_column, cut)

MATLAB cell array indexing and looping

I'm trying to create a script that reads data from a text file, and plots the data onto a scatter plot.
For example, say the file name is prices.txt and contains:
Pens 2 4
Pencils 1.5 3
Rulers 3 3.5
Sharpeners 1 3
Highlighters 3 4
Where columns 2 and 3 are prices of the items for two different stores.
What my script should do is read the prices, calculates (using another function) future prices of the stores and plots these prices onto a scatter plot where x is one store and y is another. This is a silly example I know but it fits the description.
Don't worry to much about the other function that does the calculation, just assume it does what its supposed to.
Basically, I've come up with the following:
pricesfile = fopen('Prices.txt');
prices = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
count = 1;
while count <= length(prices{1})
for item = constants{1}
name = constants{1}{count};
store_A = prices{2}{count};
store_B = prices{3}{count};
(...other function goes here...)
end
end
After doing this I'm completely stuck. My thought process behind this was to go through each item name, and create a vector that's assigned to this name with its two corresponding prices as items in the vector eg:
pens = [2 4]
pencils = [1.5 3]
etc. Then, I would somehow plot those items in the vector on a scatter plot and use the name of the vector as a label.
I'm not too sure how to carry out the rest of my code or even if what I've written will get me to the solution.
Please help and thanks in advance.
pricesfile = fopen('Prices.txt');
data = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
You were on the right track but after this (through a bit of hackery) you don't actually need a loop:
plot(repmat(data{2},1,2)', repmat(data{3},1,2)', '.')
legend(data{1})
What you DO NOT want to do is create variables named after strings. Rather store them in an array with an array of the names (which is basically what your textscan code gives you). Matlab is very good at handling matrices/arrays.
You could also split your price array up for example:
names = prices{1};
prices = [data{2:3}];
now you can perform calculations on prices quite easily like
prices_cents = prices*100;
plot(prices_cents(:,[1,1]), prices_cents(:,[2,2]))
legend(names)
Note that the [1,1] etc above is just using indexing as a short hand to achieve what repmat does...

How to write a random array (with no spatial reference) to geotiff format?

The following MATLAB script generates random locations within a 300x400 array and codes those locations with values from 1-12. How can I convert this non-spatial array to a geotiff? I hope to use the geotiff output to perform some trial analyses. Any projected coordinate system (e.g. UTM) would do for this analysis.
I have tried using geotiffwrite() without success using the following implementation:
out = geotiffwrite('C:\path\to\file\test.tif', m)
Which yields the following error:
>> test
Error using geotiffwrite
Too many output arguments.
EDIT:
The main problem I am encountering is a lack of inputs into the geotiffwrite() function. I am unsure how to deal with this problem. For example, I have no A or R variable because the array has no spatial reference. As long as each pixel is georeferenced somewhere, I do not care what the spatial reference is. The purpose of this is to create a sample dataset that I can experiment with using MATLAB spatial functions.
% Generate a totally black image to start with.
m = zeros(300, 400, 'uint8');
% Generate 1000 random locations.
numRandom = 1000;
linearIndices = randi(numel(m), 1, numRandom);
% Set those locations to be "white".
m(linearIndices) = randi(12, [numel(linearIndices) 1]);
% Display it. Random locations will appear white.
image(m);
colormap(gray);
I believe your question has a very simple answer. Skip the out-variable when you call geotiffwrite. That is, use:
geotiffwrite('C:\path\to\file\test.tif', m)
Instead of
out = geotiffwrite('C:\path\to\file\test.tif', m)
This is example of a working code using geotiffwrite, taken from the documentation. As you can see, there is no output variable there:
basename = 'boston_ovr';
imagefile = [basename '.jpg'];
RGB = imread(imagefile);
worldfile = getworldfilename(imagefile);
R = worldfileread(worldfile, 'geographic', size(RGB));
filename = [basename '.tif'];
geotiffwrite(filename, RGB, R)
figure
usamap(RGB, R)
geoshow(filename)
Update:
According to the documentation, you need at least 3 input parameters. The correct syntax is:
geotiffwrite(filename,A,R)
geotiffwrite(filename,X,cmap,R)
geotiffwrite(...,Name,Value)
From documentation:
geotiffwrite(filename,A,R) writes a georeferenced image or data grid,
A, spatially referenced by R, into an output file, filename.
Please visit this link to see how to use the function.

Resources