I have a text file contains 2 columns, I need to select one column of them as an array
which contains 200000 and cut N elements from this array and move them from back to front.
I used the following code:
import numpy as np
import glob
files = glob.glob("input/*.txt")
for file in files:
data_file = np.loadtxt(file)
2nd_columns = data_file [:,1]
2nd_columns_array = np.array(2nd_columns)
cut = 62859 # number of elements to cut
remain_points = 2nd_columns_array[:cut]
cut_points = 2nd_columns_array[cut:]
new_array = cut_points + remain_points
It doesn't work and gave me the following error:
ValueError: operands could not be broadcast together with shapes (137141,) (62859,)
any help, please??
It doesn't work because you are trying to add values stored in both arrays and they have different shapes.
One of the ways is to use numpy.hstack:
new_array = np.hstack((2nd_columns_array[cut:], 2nd_columns_array[:cut]))
Side notes:
with your code you will reorder only 2nd column of the last file since reordering is outside of the for loop
you don't need to store cut_poinsts nor remain_points in separate variables. You can operate directly on the 2nd_columns_array
you shouldn't name variables starting from a number
A simple method for this process is numpy.roll.
new_array = np.roll(2nd_column, cut)
Related
I am using a list of integers corresponding to an x,y index of a gridded NetCDF array to extract specific values, the initial code was derived from here. My NetCDF file has a single dimension at a single timestep, which is named TMAX2M. My code written to execute this is as follows (please note that I have not shown the call of netCDF4 at the top of the script):
# grid point lists
lat = [914]
lon = [2141]
# Open netCDF File
fh = Dataset('/pathtofile/temperaturedataset.nc', mode='r')
# Variable Extraction
point_list = zip(lat,lon)
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][i,j])
print(dataset_list)
The code executes, and the result is as follows:
masked_array(data=73,mask=False,fill_value=999999,dtype=int16]
The data value here is correct, however I would like the output to only contain the integer contained in "data". The goal is to pass a number of x,y points as seen in the example linked above and join them into a single list.
Any suggestions on what to add to the code to make this achievable would be great.
The solution to calling the particular value from the x,y list on single step within the dataset can be done as follows:
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][:][i,j])
The previous linked example contained [0,16] for the indexed variables, [:] can be used in this case.
I suggest converting to NumPy array like this:
for i, j in point_list:
dataset_list.append(np.array(fh.variables['TMAX2M'][i,j]))
I am using the NetCDF package in Julia 0.5.0 to read in the same multidimensional variable from ~10 different netcdf files. Is there a better way to loop through the files and consolidate them into one overarching multidimensional array rather than creating an array of arrays like I currently have?
Currently, my code is set up like:
files = ["file1", "file2", "file3", ... , "file10"]
#length(files) = 10
var = Array{Array}(10)
for i in collect(1:1:10)
var[i] = ncread(files[i], "x")
end
where
size(var) = 10
size(var[1]) = (192,59,193) #from file1
.
.
.
size(var[10]) = (192,59,193) #from file10
which works, but is not the desired format because I later want to take averages along Dimension Y in what are currently sub-arrays. Ideally, I would like to use ncread() to read x into one multidimensional array var, such that the size looks like
size(var) = (10,192,59,193)
where
var[1,:,:,:] #from file1
.
.
.
var[10,:,:,:] #from file10
I think that hcat() or push!() might be needed, but I'm not sure how to initialize the multidimensional arrays before the for-loop to account for ncread() output? I have to do this for ~8 variables in the files, and I don't know the dimensions or lengths of the different variables prior to calling ncread().
filenames = ["file$i" for i = 1:10]; # make some filenames
ncread(filename) = rand(2,3,4) # define a dummy function similar to yours
a = ncread(filenames[1]) # read the first file to get the size
output = Array{Float64}(length(filenames),size(a)...); #preallocate the full array, lookup splatting to see how this works
output[1,:,:,:] = a # assign the data we already read
for i in 2:length(filenames) # read and assign the rest
output[i,:,:,:] = ncread(filenames[i])
end
NCO uses ncecat for this:
ncecat in*.nc out.nc
However, doing so much with so little code may cause your head to explode.
A fourier analysis I'm doing outputs 5 data fields, each of which I've collected into 1-d numpy arrays: freq bin #, amplitude, wavelength, normalized amplitude, %power.
How best to structure the data so I can sort by descending amplitude?
When testing with just one data field, I was able to use a dict as follows:
fourier_tuples = zip(range(len(fourier)), fourier)
fourier_map = dict(fourier_tuples)
import operator
fourier_sorted = sorted(fourier_map.items(), key=operator.itemgetter(1))
fourier_sorted = np.argsort(-fourier)[:3]
My intent was to add the other arrays to line 1, but this doesn't work since dicts only accept 2 terms. (That's why this post doesn't solve my issue.)
Stepping back, is this a reasonable approach, or are there better ways to combine & sort separate arrays? Ultimately, I want to take the data values from the top 3 freqs and associated other data, and write them to an output data file.
Here's a snippet of my data:
fourier = np.array([1.77635684e-14, 4.49872050e+01, 1.05094837e+01, 8.24322470e+00, 2.36715913e+01])
freqs = np.array([0. , 0.00246951, 0.00493902, 0.00740854, 0.00987805])
wavelengths = np.array([inf, 404.93827165, 202.46913583, 134.97942388, 101.23456791])
amps = np.array([4.33257766e-16, 1.09724890e+00, 2.56328871e-01, 2.01054261e-01, 5.77355886e-01])
powers% = np.array([4.8508237956526163e-32, 0.31112370227749603, 0.016979224022185751, 0.010445983875848858, 0.086141014686372669])
The last 4 arrays are other fields corresponding to 'fourier'. (Actual array lengths are 42, but pared down to 5 for simplicity.)
You appear to be using numpy, so here is the numpy way of doing this. You have the right function np.argsort in your post, but you don't seem to use it correctly:
order = np.argsort(amplitudes)
This is similar to your dictionary trick only it computes the inverse shuffling compared to your procedure. Btw. why go through a dictionary and not simply a list of tuples?
The contents of order are now indices into amplitudes the first cell of order contains the position of the smallest element of amplitudes, the second cell contains the position of the next etc. Therefore
top5 = order[:-6:-1]
Provided your data are 1d numpy arrays you can use top5 to extract the elements corresponding to the top 5 ampltiudes by using advanced indexing
freq_bin[top5]
amplitudes[top5]
wavelength[top5]
If you want you can group them together in columns and apply top5 to the resulting n-by-5 array:
np.c_[freq_bin, amplitudes, wavelength, ...][top5, :]
If I understand correctly you have 5 separate lists of the same length and you are trying to sort all of them based on one of them. To do that you can either use numpy or do it with vanilla python. Here are two examples from top of my head (sorting is based on the 2nd list).
a = [11,13,10,14,15]
b = [2,4,1,0,3]
c = [22,20,23,25,24]
#numpy solution
import numpy as np
my_array = np.array([a,b,c])
my_sorted_array = my_array[:,my_array[1,:].argsort()]
#vanilla python solution
from operator import itemgetter
my_list = zip(a,b,c)
my_sorted_list = sorted(my_list,key=itemgetter(1))
You can then flip the array with my_sorted_array = np.fliplr(my_sorted_array) if you wish or if you are working with lists you can reverse it in place with my_sorted_list.reverse()
EDIT:
To get first n values only, you have to simply slice the array similarly to what #Paul is suggesting. Slice is done in a similar manner to classic list slicing by specifying start:stop:step (you can omit the step) arguments. In your case for 5 top columns it would be [:,-5:]. So in the example above you can take top 2 columns from each row like this:
my_sliced_sorted_array = my_sorted_array[:,-2:]
result will be:
array([[15, 13],
[ 3, 4],
[24, 20]])
Hope it helps.
I am trying to create a string array which will be fed with string values read from a text file this way:
labels = textread(file_name, '%s');
Basically, for each string in each line of the text file file_name I want to put this string in 10 positions of a final string array, which will be later saved in another text file.
What I do in my code is, for each string in file_name I put this string in 10 positions of a temporary cell array and then concatenate this array with a final array this way:
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector{1:10}=labels{i};
final_vector=horzcat(final_vector,temp_vector);
end
But when I run the code the following error appears:
The right hand side of this assignment has too few values to satisfy the left hand side.
Error in my_example_code (line 16)
temp_vector{1:10}=labels{i};
I am too rookie in cell strings in matlab and I don't really know what is happening. Do you know what is happening or even have a better solution to my problem?
Use deal and put the left hand side in square brackets:
labels{1} = 'Hello World!'
temp_vector = cell(10,1)
[temp_vector{1:10}] = deal(labels{1});
This works because deal can distribute one value to multiple outputs [a,b,c,...]. temp_vector{1:10} alone creates a comma-separated list and putting them into [] creates the output array [temp_vector{1}, temp_vector{2}, ...] which can then be populated by deal.
It is happening because you want to distribute one value to 10 cells - but Matlab is expecting that you like to assign 10 values to 10 cells. So an alternative approach, maybe more logic, but slower, would be:
n = 10;
temp_vector(1:n) = repmat(labels(1),n,1);
I also found another solution
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector(:,1:10)=cellstr(labels{i});
final_vector=horzcat(final_vector,temp_vector);
end
I want to use a list to save two arrays of the same length. One array (folders) contains names of folders and the other (files) contains arrays of filenames, which might be of different length.
mvExp = list(
folders = NULL,
files = NULL
)
mvExp$folders[1] = "../data_america/"
mvExp$files[1] = c("file1.dat")
mvExp$folders[2] = "../data_europe"
mvExp$files[2] = c("file1.dat", "file2.dat", "file3.dat")
When I try to add the array of filenames to the second field of the array "files", I recieve a warning, which says, that the number of elements I want to add is too long. "file2.dat", "file3.dat" are not saved to mvExp$files[2].
How can I save arrays of different length into a list?
I also tried to use a ´data.frame´ (since my two arrays have the same length), but I was not able to add elements to the data.frame.
Whereas mvExp$folders can be a simple character vector (containing one string for each folder), mvExp$files needs to be a list, so that some of its elements can themselves contain several elements (i.e. the files in a directory).
To make it work, your code needs two changes:
file needs to be 'initialized' as a list.
To assign new elements to the list, use the "[[<-" operator rather than the "[<-" operator.
.
mvExp = list(
folders = character(),
files = list()
)
mvExp$folders[1] <- "../data_america/"
mvExp$files[[1]] <- c("file1.dat")
mvExp$folders[2] <- "../data_europe"
mvExp$files[[2]] <- c("file1.dat", "file2.dat", "file3.dat")
You can store everything in a single list.
myExp <- list(
`../data_america` = "file.dat",
`../data_europe` = c("file1.dat", "file2.dat", "file3.dat")
)
Retrive the folder names like this
names(myExp)
and the files for a given folder with, e.g.,
myExp[["../data_america"]]