Numpy arrays best way to handle data - arrays

I have a set of files for different temperatures and have been having issues with how to store the data I need in NumPy arrays. Let's say I have a range of temperatures temperatures = [8,10,12,...]
and need to store each file's first and second columns in a NumPy array (one file per temperature). The files look like these:
The code I have so far looks like this:
import numpy as np
import sys
start_position = 84000
stop_position = 86500
step = 10
temperature = [8,10]
rootfile = 'C:root\\temperature_MnTe2__'
length_data = np.arange(250)
positions = np.zeros(shape=len(temperature))
# print(positions)
inphase = np.zeros(len(temperature))
for t in temperature:
# positions[t],inphase[t] = np.genfromtxt(rootfile + str(t) + '.0K.tsv', delimiter=' ', skip_header=23, unpack='True')
data = np.genfromtxt(rootfile + str(t) + '.0K.tsv', delimiter=' ', skip_header=23, unpack='True')
# print(data[0])
# sys.exit()
for column in data:
print(column)
for l in length_data:
positions[l] = column[0]
print(positions)
The total number of rows for each column is 250. Do you have any ideas on how to create the arrays for each temperature so that at the end, I can more easily access the 1st and 2nd columns to plot them for each temperature?
I'd like to know also how to save all values of the first column in an array called positions and that when I go over the second temperature, it stores it in the same array in a different column, for instance. Currently, my positions array contains only the first value of the column of one temperature.
Thank you so much in advance,

Related

Move N elements in array from back to front

I have a text file contains 2 columns, I need to select one column of them as an array
which contains 200000 and cut N elements from this array and move them from back to front.
I used the following code:
import numpy as np
import glob
files = glob.glob("input/*.txt")
for file in files:
data_file = np.loadtxt(file)
2nd_columns = data_file [:,1]
2nd_columns_array = np.array(2nd_columns)
cut = 62859 # number of elements to cut
remain_points = 2nd_columns_array[:cut]
cut_points = 2nd_columns_array[cut:]
new_array = cut_points + remain_points
It doesn't work and gave me the following error:
ValueError: operands could not be broadcast together with shapes (137141,) (62859,)
any help, please??
It doesn't work because you are trying to add values stored in both arrays and they have different shapes.
One of the ways is to use numpy.hstack:
new_array = np.hstack((2nd_columns_array[cut:], 2nd_columns_array[:cut]))
Side notes:
with your code you will reorder only 2nd column of the last file since reordering is outside of the for loop
you don't need to store cut_poinsts nor remain_points in separate variables. You can operate directly on the 2nd_columns_array
you shouldn't name variables starting from a number
A simple method for this process is numpy.roll.
new_array = np.roll(2nd_column, cut)

How to read data from .txt files and reduce it into a Numpy array : Python 3

So far, here is my code. I am trying to make a Numpy array by reading the data from a file and reducing the size of the data by saving it into an array format so that it is easy for me to plot it on a graph.
Basically, I have a folder containing 115 files. Each file has around 80K rows of data. The columns are structured like this:
1 2 3 ... 14
Location | Location | Month Average | Month Average
When I make this array, it needs to have
Location
Min/max value
Average over the whole year in that specific location
Total in the specific location.
When I run my code, though, I get a TypeError: cannot perform reduce with flexible type.
row = 80000
path = "Folder/files." #the file names have the same root but different years after
array=np.zeros([row,6,115], dtype=np.float) #I am supposed to save as a 3D numpy array.
np.array([array]).astype(np.float)
for i in range(1900,2015):#the amount of years in the folder
input_file=open(path+str(i),'r')
y=0
for line in input_file:
values=line.split()
array[i-1900][y][0] = values[0] #the first number in the array is location
array[i-1900][y][1] = values[1] #second number is also location
array[i-1900][y][2] = np.min(values[2:13])
array[i-1900][y][3] = np.max(values[2:13])
array[i-1900][y][4] = np.mean(values[2:13])
array[i-1900][y][5] = np.sum(values[2:13])
y+=1
a= array
np.save('array.npy', a)
print (a)

cell array to numeric array for plotting

I have a cell array containing historic gold price data and a cell array with the associated dates. I want to plot dates against prices for simple analysis but I am having difficulty converting the cell array of prices into a doubles.
My code is:
figure
plot(Date,USDAM,'b')
title('{\bf US Gold Daily Market Price, 2010 to 2015}')
datetick
axis tight
When I try to convert the gold prices (USDAM) into a double using cell2mat(USDAM), it throws the following error:
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
I use the following code to import the data:
filename = 'goldPriceData.csv';
delimiter = ',';
startRow = 2;
endRow = 759;
formatSpec = '%s%s%*s%*s%*s%*s%*s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'HeaderLines', startRow-1, 'ReturnOnError', false);
fclose(fileID);
Date = dataArray{:, 1};
USDAM = dataArray{:, 2};
For your problem, cell2mat is the wrong function. Consider the cell array of strings:
>> S = {'1.2';'3.14159';'2.718'}
S =
'1.2'
'3.14159'
'2.718'
>> cell2mat(S)
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
That's because each row of the output needs to have the same number of columns, the strings are of different length. You can use strvcat(S) to get a rectangular matrix padded with spaces, but that isn't what you want. You want numeric data.
>> str2double(S)
ans =
1.2000
3.1416
2.7180
Just because it puts a number in scientific notation, doesn't mean it's not the same number. That is, 1.199250000000000e+03 is 1199.25. To get the tick labels looking the way you want them, set YTickLabel property of the axis with formatted strings, the ones in USD.
Regarding your dates, you'll need numeric data to plot on each axis so convert the dates with datenum.
>> dateVals = datenum({'2014-12-30', '2014-12-31'})
dateVals =
735963
735964
Then to get the dates displayed on the x axis correctly, set the XTickLabel properties of the axis to get it looking how you want it (using the strings in Dates). Note that for setting the tick labels, you must ensure that you have the correct (same) number of ticks as labels that you indent to have. But that is a different question, I think.

MATLAB cell array indexing and looping

I'm trying to create a script that reads data from a text file, and plots the data onto a scatter plot.
For example, say the file name is prices.txt and contains:
Pens 2 4
Pencils 1.5 3
Rulers 3 3.5
Sharpeners 1 3
Highlighters 3 4
Where columns 2 and 3 are prices of the items for two different stores.
What my script should do is read the prices, calculates (using another function) future prices of the stores and plots these prices onto a scatter plot where x is one store and y is another. This is a silly example I know but it fits the description.
Don't worry to much about the other function that does the calculation, just assume it does what its supposed to.
Basically, I've come up with the following:
pricesfile = fopen('Prices.txt');
prices = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
count = 1;
while count <= length(prices{1})
for item = constants{1}
name = constants{1}{count};
store_A = prices{2}{count};
store_B = prices{3}{count};
(...other function goes here...)
end
end
After doing this I'm completely stuck. My thought process behind this was to go through each item name, and create a vector that's assigned to this name with its two corresponding prices as items in the vector eg:
pens = [2 4]
pencils = [1.5 3]
etc. Then, I would somehow plot those items in the vector on a scatter plot and use the name of the vector as a label.
I'm not too sure how to carry out the rest of my code or even if what I've written will get me to the solution.
Please help and thanks in advance.
pricesfile = fopen('Prices.txt');
data = textscan(pricesfile, '%s %d d');
fclose(pricesfile);
You were on the right track but after this (through a bit of hackery) you don't actually need a loop:
plot(repmat(data{2},1,2)', repmat(data{3},1,2)', '.')
legend(data{1})
What you DO NOT want to do is create variables named after strings. Rather store them in an array with an array of the names (which is basically what your textscan code gives you). Matlab is very good at handling matrices/arrays.
You could also split your price array up for example:
names = prices{1};
prices = [data{2:3}];
now you can perform calculations on prices quite easily like
prices_cents = prices*100;
plot(prices_cents(:,[1,1]), prices_cents(:,[2,2]))
legend(names)
Note that the [1,1] etc above is just using indexing as a short hand to achieve what repmat does...

How to sort structure arrays in MATLAB?

I'm working with an image retrieval system using color histogram intersection in MATLAB. This method gives me the following data: a real number which represents the histogram intersection distance, and the image file name. Because they are different data types, I store them in structure array with two fields, and then I save this structure in a .mat file. Now I need to sort this structure according to the histogram intersection distance in descending order in order to retrieve the image with the highest histogram intersection distance. I've tried many methods to sort this data but without result. Please can you help me solve this problem?
It's also possible to sort the entire structure.
To build off of gnovice's example...
% Create a structure array
s = struct('value',{1 7 4},'file',{'img1.jpg' 'img2.jpg' 'img3.jpg'});
% Sort the structure according to values in descending order
% We are only interested in the second output from the sort command
[blah, order] = sort([s(:).value],'descend');
% Save the sorted output
sortedStruct = s(order);
Here's one example of how you could do this, using the function MAX instead of having to sort:
%# First, create a sample structure array:
s = struct('value',{1 7 4},'file',{'img1.jpg' 'img2.jpg' 'img3.jpg'});
%# Next concatenate the "value" fields and find the index of the maximum value:
[maxValue,index] = max([s.value]);
%# Finally, get the file corresponding to the maximum value:
maxFile = s(index).file;
EDIT : If you would like to get the N highest values, and not just the maximum, you can use SORT instead of MAX (as Shaka suggested). For example (using the above structure):
>> N = 2; %# Get two highest values
>> [values,index] = sort([s.value],'descend'); %# Sort all values, largest first
>> topNFiles = {s(index(1:N)).file} %# Get N files with the largest values
topNFiles =
'img2.jpg' 'img3.jpg'

Resources