making histogram from a csv file - arrays

I am trying to read a column of data from a csv file and create a histogram for it. I could read the data into an array but was not able to make the histogram. Here is what I did:
thimar=csv.reader(open('thimar.csv', 'rb'))
thimar_list=[]
thimar_list.extend(thimar)
z=[]
for data in thimar_list:
z.append(data[7])
zz=np.array(z)
n, bins, patches = plt.hist(zz, 50, normed=1)
which gives me the error:
TypeError: cannot perform reduce with flexible type
Any idea what is going on?

modify the sixth line to cast string to numeric
z.append(float(data[7]))
with this i got some plot with my made up data.

Here are two options, this one will work if all your columns are made up of numbers:
array = np.loadtxt('thimar.csv', 'float', delimiter=',')
n, bins, patches = plt.hist(array[:, 7], 50, normed=1)
this one is better if you have non-numeric columns in your file (ie Name, Gender, ...):
thimar = csv.reader(open('thimar.csv', 'rb'))
thimar_list = list(thimar)
zz = np.array([float(row[7]) for row in thimar_list])
n, bins, patches = plt.hist(zz, 50, normed=1)

Related

Numpy arrays best way to handle data

I have a set of files for different temperatures and have been having issues with how to store the data I need in NumPy arrays. Let's say I have a range of temperatures temperatures = [8,10,12,...]
and need to store each file's first and second columns in a NumPy array (one file per temperature). The files look like these:
The code I have so far looks like this:
import numpy as np
import sys
start_position = 84000
stop_position = 86500
step = 10
temperature = [8,10]
rootfile = 'C:root\\temperature_MnTe2__'
length_data = np.arange(250)
positions = np.zeros(shape=len(temperature))
# print(positions)
inphase = np.zeros(len(temperature))
for t in temperature:
# positions[t],inphase[t] = np.genfromtxt(rootfile + str(t) + '.0K.tsv', delimiter=' ', skip_header=23, unpack='True')
data = np.genfromtxt(rootfile + str(t) + '.0K.tsv', delimiter=' ', skip_header=23, unpack='True')
# print(data[0])
# sys.exit()
for column in data:
print(column)
for l in length_data:
positions[l] = column[0]
print(positions)
The total number of rows for each column is 250. Do you have any ideas on how to create the arrays for each temperature so that at the end, I can more easily access the 1st and 2nd columns to plot them for each temperature?
I'd like to know also how to save all values of the first column in an array called positions and that when I go over the second temperature, it stores it in the same array in a different column, for instance. Currently, my positions array contains only the first value of the column of one temperature.
Thank you so much in advance,

cell array to numeric array for plotting

I have a cell array containing historic gold price data and a cell array with the associated dates. I want to plot dates against prices for simple analysis but I am having difficulty converting the cell array of prices into a doubles.
My code is:
figure
plot(Date,USDAM,'b')
title('{\bf US Gold Daily Market Price, 2010 to 2015}')
datetick
axis tight
When I try to convert the gold prices (USDAM) into a double using cell2mat(USDAM), it throws the following error:
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
I use the following code to import the data:
filename = 'goldPriceData.csv';
delimiter = ',';
startRow = 2;
endRow = 759;
formatSpec = '%s%s%*s%*s%*s%*s%*s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, endRow-startRow+1, 'Delimiter', delimiter, 'EmptyValue' ,NaN,'HeaderLines', startRow-1, 'ReturnOnError', false);
fclose(fileID);
Date = dataArray{:, 1};
USDAM = dataArray{:, 2};
For your problem, cell2mat is the wrong function. Consider the cell array of strings:
>> S = {'1.2';'3.14159';'2.718'}
S =
'1.2'
'3.14159'
'2.718'
>> cell2mat(S)
Error using cat
Dimensions of matrices being concatenated are not consistent.
Error in cell2mat (line 83)
m{n} = cat(1,c{:,n});
That's because each row of the output needs to have the same number of columns, the strings are of different length. You can use strvcat(S) to get a rectangular matrix padded with spaces, but that isn't what you want. You want numeric data.
>> str2double(S)
ans =
1.2000
3.1416
2.7180
Just because it puts a number in scientific notation, doesn't mean it's not the same number. That is, 1.199250000000000e+03 is 1199.25. To get the tick labels looking the way you want them, set YTickLabel property of the axis with formatted strings, the ones in USD.
Regarding your dates, you'll need numeric data to plot on each axis so convert the dates with datenum.
>> dateVals = datenum({'2014-12-30', '2014-12-31'})
dateVals =
735963
735964
Then to get the dates displayed on the x axis correctly, set the XTickLabel properties of the axis to get it looking how you want it (using the strings in Dates). Note that for setting the tick labels, you must ensure that you have the correct (same) number of ticks as labels that you indent to have. But that is a different question, I think.

Best way to compare data from file to data in array in Matlab

I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!

Matlab - Importing a .dat file into an array

I'm still fairly new to Matlab but for some reason the documentation hasn't been all that helpful with this.
I've got a .dat file that I want to turn into a _ row by 6 column array (the number of rows changes depending on the program that's generating the .dat file). What I need to do is get the dimensions of the image this array will be used to make from the 1st row 2nd column (x dimension) and 1st row 4th column (y dimension). When using the Import Data tool in Matlab, this works properly:
However I need the program to do it automatically. If the first line wasn't there, I'm pretty sure I could just use fscanf to put the data in the array, but the image dimensions are necessary.
Any idea what I need to use instead?
You may use textscan. The first call to this function will handle the first line (i.e. get the dimension of your file) and the second call the remaining of your file. The second call uses repmat to declare the format spec: %f, meaning double, repeated nb_col times. The option CollectOutput will concatenate all the columns in a single array. Note that textscan can read the entire file without specifying the number of rows.
The code would be
fileID = fopen('youfile.dat'); %declare a file id
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
nb_col = C1{4}; %get the number of columns (could be set by user too)
%read the remaining of the file
C2 = textscan(fileID, repmat('%f',1,nb_col), 'CollectOutput',1);
fclose(fileID); %close the connection
In the case where the the number of columns is fixed, you can simply do
fileID = fopen('youfile.dat');
C1 = textscan(fileID,'%s%f%s%f'); %read the first line
im_x = C1{2}; %get the x dimension
im_y = C1{4}; %get the x dimension
C2 = textscan(fileID,'%f%f%f%f%f%f%*[^\n]', 'CollectOutput',1);
fclose(fileID);
The format specification %*[^\n] skips the remaining of a line.

How to save a structure array to a text file

In MATLAB how do I save a structure array to a text file so that it displays everything the structure array shows in the command window?
I know this thread is old but I hope it's still going to help someone:
I think this is an shorter solution (with the constraint that each struct field can contain scalar,arrays or strings):
%assume that your struct array is named data
temp_table = struct2table(data);
writetable(temp_table,'data.csv')
Now your struct array is stored in the data.csv file. The column names are the field names of a struct and the rows are the different single-structs of your struct-array
You have to define a format for your file first.
Saving to a MATLAB workspace file (.MAT)
If you don't care about the format, and simply want to be able to load the data at a later time, use save, for example:
save 'myfile.mat' structarr
That stores struct array structarr in a binary MAT file named "file.mat". To read it back into your workspace you can use load:
load 'myfile.mat'
Saving as comma-separated values (.CSV)
If you want to save your struct array in a text file as comma-separated value pairs, where each pair contains the field name and its value, you can something along these lines:
%// Extract field data
fields = repmat(fieldnames(structarr), numel(structarr), 1);
values = struct2cell(structarr);
%// Convert all numerical values to strings
idx = cellfun(#isnumeric, values);
values(idx) = cellfun(#num2str, values(idx), 'UniformOutput', 0);
%// Combine field names and values in the same array
C = {fields{:}; values{:}};
%// Write fields to CSV file
fid = fopen('myfile.csv', 'wt');
fmt_str = repmat('%s,', 1, size(C, 2));
fprintf(fid, [fmt_str(1:end - 1), '\n'], C{:});
fclose(fid);
This solution assumes that each field contains a scalar value or a string, but you can extend it as you see fit, of course.
To convert any data type to a character vector as displayed in the MATLAB command window, use the function
str = matlab.unittest.diagnostics.ConstraintDiagnostic.getDisplayableString(yourArray);
You can then write the contents to a file
fid = fopen('myFile.txt', 'w');
fwrite(fid, str, '*char');
fclose(fid);

Resources