How can I write strings to an h5 in matlab? - arrays

I've managed to answer my own question. This code will write cell arrays of any shape containing strings. The datasets can be modified/overwritten by simply calling again with a different input.
https://www.mathworks.com/matlabcentral/fileexchange/24091-hdf5-read-write-cellstr-example
%Okay, Matlab's h5write(filename, dataset, data) function doesn't work for
%strings. It hasn't worked with strings for years. The forum post that
%comes up first in Google about it is from 2009. Yeah. This is terrible,
%and evidently it's not getting fixed. So, low level functions. Fun fun.
%
%What I've done here is adapt examples, one from the hdf group's website
%https://support.hdfgroup.org/HDF5/examples/api18-m.html called
%"Read / Write String Datatype (Dataset)", the other by Jason Kaeding.
%
%I added functionality to check whether the file exists and either create
%it anew or open it accordingly. I wanted to be able to likewise check the
%existence of a dataset, but it looks like this functionality doesn't exist
%in the API, so I'm doing a try-catch to achieve the same end. Note that it
%appears you can't just create a dataset or group deep in a heirarchy: You
%have to create each level. Since I wanted to accept dataset names in the
%same format as h5read(), in the event the dataset doesn't exist, I loop
%over the parts of the dataset's path and try to create all levels. If they
%already exist, then this action throws errors too; hence a second
%try-catch.
%
%I've made it more advanced than h5create()/h5write() in that it all
%happens in one call and can accept data inputs of variable size. I take
%care of updating the dataset's extent to accomodate changing data array
%sizes. This is important for applications like adding a new timestamp
%every time the file is modified.
%
%#author Pavel Komarov pavel#gatech.edu 941-545-7573
function h5createwritestr(filename, dataset, str)
%"The class of input data must be cellstring instead of char when the
%HDF5 class is VARIABLE LENGTH H5T_STRING.", but also I don't want to
%force the user to put braces around single strings, so this.
if ischar(str)
str = {str};
end
%check whether the specified .h5 exists and either create or open
%accordingly
if ~exist(filename, 'file')
file = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
else
file = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
end
%set variable length string type
vlstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(vlstr_type,'H5T_VARIABLE');
% There is no way to check whether a dataset exists, so just try to
% open it, and if that fails, create it.
try
dset = H5D.open(file, dataset);
H5D.set_extent(dset, fliplr(size(str)));
catch
%create the intermediate groups one at a time because evidently the
%API's functions aren't smart enough to be able to do this themselves.
slashes = strfind(dataset, '/');
for i = 2:length(slashes)
url = dataset(1:(slashes(i)-1));%pull out the url of the next level
try
H5G.create(file, url, 1024);%1024 "specifies the number of
catch %bytes to reserve for the names that will appear in the group"
end
end
%create a dataspace for cellstr
H5S_UNLIMITED = H5ML.get_constant_value('H5S_UNLIMITED');
spacerank = max(1, sum(size(str) > 1));
dspace = H5S.create_simple(spacerank, fliplr(size(str)), ones(1, spacerank)*H5S_UNLIMITED);
%create a dataset plist for chunking. (A dataset can't be unlimited
%unless the chunk size is defined.)
plist = H5P.create('H5P_DATASET_CREATE');
chunksize = ones(1, spacerank);
chunksize(1) = 2;
H5P.set_chunk(plist, chunksize);% 2 strings per chunk
dset = H5D.create(file, dataset, vlstr_type, dspace, plist);
%close things
H5P.close(plist);
H5S.close(dspace);
end
%write data
H5D.write(dset, vlstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
%close file & resources
H5T.close(vlstr_type);
H5D.close(dset);
H5F.close(file);
end

I found a bug!
spacerank = length(size(str));
Now it works flawlessly as far as I can tell.

Related

Structuring a for loop to output classifier predictions in python

I have an existing .py file that prints a classifier.predict for a SVC model. I would like to loop through each row in the X feature set to return a prediction.
I am currently trying to define the element from which to iterate over so as to allow for definition of the test statistic feature set X.
The test statistic feature set X is written in code as:
X_1 = xspace.iloc[testval-1:testval, 0:5]
testval is the element name used in the for loop in the above line:
for testval in X.T.iterrows():
print(testval)
I am having trouble returning a basic set of index values for X (X is the pandas dataframe)
I have tested the following with no success.
for index in X.T.iterrows():
print(index)
for index in X.T.iteritems():
print(index)
I am looking for the set of index values, with base 1 if possible, like 1,2,3,4,5,6,7,8,9,10...n
seemingly simple stuff...i haven't located an existing question via stackoverflow or google.
ALSO, the individual dataframes I used as the basis for X were refined with the line:
df1.set_index('Date', inplace = True)
Because dates were used as the basis for the concatenation of the individual dataframes the loops as written above are returning date values rather than
location values as I would prefer hence:
X_1 = xspace.iloc[testval-1:testval, 0:5]
where iloc, location is noted
please ask for additional code if you'd like to see more
the loops i've done thus far are returning date values, I would like to return index values of the location of the rows to accommodate the line:
X_1 = xspace.iloc[testval-1:testval, 0:5]
The loop structure below seems to be working for my application.
i = 1
j = list(range(1, len(X),1)
for i in j:

Pass Array to Subroutine IDL

I have a very long lookup table (~40,000 lines) that I am using for my code. Currently, I have it set to grab 4 arrays from my lookup table in the subroutine that uses it, but I call that subroutine ~3,000 times. I would rather not waste processing time grabbing this table as arrays repeatedly. Is there a way to grab them in my main program, store them, and source them later in my subroutine?
My current code grabs the lookup table in 4 separate arrays of 39,760 lines, and I am currently calling it like this:
READCOL, 'LookupTable2.txt', F='D,D,D,D',Albedo, Inertia, NightT, DayT
EDIT: I should probably note I have IDL 6.2, but if there is a way to do it in a newer version, I would still appreciate knowing how.
EDIT 2: My current program has a function which saves 4 arrays and executes the main function. Can I call my function with arrays as an argument? That way I wouldn't have to keep creating the same array
Something like:
Pro
FUNC(Array1, Array2, Var1, Var2, Var3)
END
There are several ways you can do this.
It looks like you have four columns and 40,000 lines, correct?
Then you can do the following. First, I will assume there is no header data in the ASCII file for the following commands.
FUNCTION read_my_file,file_name
;; Assume FILE_NAME is full path to and including file name with extension
fname = file_name[0]
;; One could also find the file with the following
;; fname = FILE_SEARCH([path to file],[file name with extension])
;; Define the number of lines in the file
nl = FILE_LINES(fname[0])
;; Define empty arrays to fill
col1 = DBLARR(nl[0])
col2 = DBLARR(nl[0])
col3 = DBLARR(nl[0])
col4 = DBLARR(nl[0])
dumb = DBLARR(4)
;; Open file
OPENR,gunit,fname[0],ERROR=err,/GET_LUN
IF (err NE 0) THEN PRINT, -2, !ERROR_STATE.MSG ;; Prints an error message
FOR n=0L, nl[0] - 1L DO BEGIN
;; Read in file data
READF,gunit,FORMAT='(4d)',dumb
;; Fill arrays
col1[n] = dumb[0]
col2[n] = dumb[1]
col3[n] = dumb[2]
col4[n] = dumb[3]
ENDFOR
;; Close file
FREE_LUN,gunit
;; Define output
output = [[col1],[col2],[col3],[col4]]
;; Return to calling routine
RETURN,output
END
Note that this will work better if you provide an explicit width for the format statement, e.g., '(4d15.5), which means a 15 character input with 5 decimal places.
This will return col1 through col4 to the user or calling routine as an [N,4]-element array, e.g., col1 = output[*,0]. You could use a structure where each tag contains one of the colj arrays or you could return them through keywords.
Then you can pass these arrays to another function/program in the following way:
PRO my_algorithm_wrapper,file_name,RESULTS=results
;; Get data from files
columns = read_my_file(file_name)
;; Pass data to [algorithm] function
results = my_algorithm(columns[*,0],columns[*,1],columns[*,2],columns[*,3])
;; Return to user
RETURN
END
To call this from the command line (after making sure both routines are compiled), you would do something like the following:
IDL> my_algorithm_wrapper,file_name,RESULTS=results
IDL> HELP,results ;; see what the function my_algorithm.pro returned
The above code should work with IDL 6.2.
General Notes
Try to avoid using uppercase letters in IDL routine names as it can cause issues when IDL searches for the routine during a call or compilation statement.
You need to name the program/function in the line with the PRO/FUNCTION statement at the beginning of the file. The name must come immediately after the PRO/FUNCTION statement.
It is generally wise to use explicit formatting statements to avoid ambiguities/errors when reading data files.
You can pass any variable type (e.g., scalar integer, array, structure, object, etc.) to programs/functions so long as they are handled appropriately within the program/function.

How to save dynamic variable from workspace in a separate file in matlab?

I'm working on a problem where I have an array A of 100 elements.
All these 100 elements are changing with time.
So in my workspace, I only get the final values of all these elements after the entire time cycle has run.
I'm trying to save the values with time in a separate file (.txt or .mat) so that I can access that file in order to check how the variable varies with time.
I'm trying the following command:
save('file.mat','A','-append');
But this command overwrites the existing values in my file.
Kindly suggest me a way to save these values without overwriting them and also guide me how to access them in MATLAB.
You can also change the output filename to be unique for each iteration:
for iter=1:n
A = rand(10);
save(sprintf('file%d.mat',iter), 'A');
end
That way each iteration creates one file.
The reason that saving to a file (even using the -append) flag didn't work is because the variable A already exists in the file and will be over-written each time through the loop. You would need to create a new file or new variable name every time through the loop in order for this to not happen.
Saving the results in a file is probably not the best way to store the time-varying values of A. You would be better off using a cell array to store all intermediate values of A.
A_over_time = cell();
for k = 1:n
%// Get A somehow
A_over_time{k} = A;
end
Depending on what A is, you could also store the values of A in a numeric array or matrix.
%// Using an array
A_over_time = zeros(N, 1);
for k = 1:N
A_over_time(k) = A;
end
%// Using a matrix
A_over_time = zeros(N, numel(A));
for k = 1:N
A_over_time(k,:) = A;
end

Apply a string value to several positions of a cell array

I am trying to create a string array which will be fed with string values read from a text file this way:
labels = textread(file_name, '%s');
Basically, for each string in each line of the text file file_name I want to put this string in 10 positions of a final string array, which will be later saved in another text file.
What I do in my code is, for each string in file_name I put this string in 10 positions of a temporary cell array and then concatenate this array with a final array this way:
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector{1:10}=labels{i};
final_vector=horzcat(final_vector,temp_vector);
end
But when I run the code the following error appears:
The right hand side of this assignment has too few values to satisfy the left hand side.
Error in my_example_code (line 16)
temp_vector{1:10}=labels{i};
I am too rookie in cell strings in matlab and I don't really know what is happening. Do you know what is happening or even have a better solution to my problem?
Use deal and put the left hand side in square brackets:
labels{1} = 'Hello World!'
temp_vector = cell(10,1)
[temp_vector{1:10}] = deal(labels{1});
This works because deal can distribute one value to multiple outputs [a,b,c,...]. temp_vector{1:10} alone creates a comma-separated list and putting them into [] creates the output array [temp_vector{1}, temp_vector{2}, ...] which can then be populated by deal.
It is happening because you want to distribute one value to 10 cells - but Matlab is expecting that you like to assign 10 values to 10 cells. So an alternative approach, maybe more logic, but slower, would be:
n = 10;
temp_vector(1:n) = repmat(labels(1),n,1);
I also found another solution
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector(:,1:10)=cellstr(labels{i});
final_vector=horzcat(final_vector,temp_vector);
end

append char array to cell array in matlab

I am a Matlab beginner, as will soon be very obvious. I am trying to assemble an cell array that has a single column of filenames.
I have multiple sessions. Each session should have 56 filenames (but some could be short or long, so I'd honestly prefer a solution that wouldn't break on encountering a short session). I need to loop over sessions and append the names in each subsequent session to my cell array, so that after two sessions the dimensions are 112, 1.
In other words, I'd like an array that went:
P =
/data/session1/dvol1.img
/data/session1/dvol2.img
...
/data/session1/dvol56.img
/data/session2/dvol1.img
/data/session2/dvol2.img
...
/data/session2/dvol56.img
and so on if there are more than two sessions.
The function I have that finds the filenames in the session is spm_select. It returns a char array of all the files in a directory that match a regular expression, in this case, 56 files for each session directory.
(I recognize my question is very similar to the question here: Using loops to get multiple values into a cell but I couldn't figure out an answer to my question since that person is only trying to append a single value at a time.)
I have tried a lot of things that haven't worked.
This:
data_path = '/foo/bar/';
subjects = {'test1'};
sessions = {'session1' 'session2' };
for i=1:numel(subjects)
clear P
P=cell(56*numel(sessions),1);
for j=1:numel(sessions)
P{(j-1)*56+1} = spm_select('FPList', fullfile(data_path,subjects{i}, sessions{j}), '^d.*\.img$');
end
end
generated a cell array that was 112x1, but had a first element that was 56x57 char array, that is, the filenames of all files in my first session directory, and none of them from the second.
I'm not sure how useful it would be to recapitulate every wrong-headed thing I've done, so I won't.
Thanks in advance for your help.
Editing to include sample output from spm_select by request:
>> output = spm_select('FPList', fullfile(data_path,subjects{i}, sessions{j}), '^d.*\.img$')
output =
/home/katie/Desktop/sample/test1/run_1L3/draghf000001.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000035.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000069.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000103.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000137.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000171.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000205.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000239.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000273.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000307.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000341.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000375.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000409.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000443.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000477.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000511.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000545.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000579.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000613.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000647.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000681.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000715.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000749.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000783.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000817.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000851.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000885.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000919.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000953.img
/home/katie/Desktop/sample/test1/run_1L3/draghf000987.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001021.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001055.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001089.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001123.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001157.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001191.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001225.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001259.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001293.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001327.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001361.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001395.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001429.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001463.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001497.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001531.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001565.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001599.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001633.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001667.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001701.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001735.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001769.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001803.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001837.img
/home/katie/Desktop/sample/test1/run_1L3/draghf001871.img
>> class(output)
ans =
char
>> size(output)
ans =
56 57
>>
Edit: Ok, problem solved. Here is the code I eventually used:
data_path = '/foo/bar';
subjects = {'test1'};
sessions = {'session1' 'session2' };
output={};
for i=1:numel(subjects)
for j=1:numel(sessions)
files=spm_select('FPList', fullfile(data_path,subjects{i},sessions{j}), '^d.*\.img$')
f_c=cellstr(files);
output=vertcat(output,f_c);
end
end
I think the answer to how "do you get a char array to append to a column cell array vertically" is convert it to a cell array and use vertcat.
You can try this code:
function output=file_list(path)
output={};
subjects=dir(path);
for a=3:length(subjects)
sessions=dir(fullfile(path,subjects(a).name));
for b=3:length(sessions)
files=dir(fullfile(path,subjects(a).name,'/',sessions(b).name,'/*.img'));
f_c=struct2cell(files);
f=f_c(1,:)';
output=vertcat(output,fullfile(path,subjects(a).name,'/',sessions(b).name,'/',f));
end
end
One drawback of this code is that the size of output grows inside the loop. Here is an example:
path='/home/naveen/Desktop/example/'; % path is the main directory in which the data of
% subjects is stored in sub directories.
output=file_list(path)
The output is:
output =
'/home/naveen/Desktop/example/subject_1/session_1/lipo2.png'
'/home/naveen/Desktop/example/subject_1/session_1/lipo_6.png'
'/home/naveen/Desktop/example/subject_1/session_1/lps_4.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2_1.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_2_3.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_4.png'
'/home/naveen/Desktop/example/subject_1/session_2/ltx_6.png'
'/home/naveen/Desktop/example/subject_2/session_1/lipo2.png'
'/home/naveen/Desktop/example/subject_2/session_1/lipo_6.png'
'/home/naveen/Desktop/example/subject_2/session_1/lps_4.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2_1.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_2_3.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_4.png'
'/home/naveen/Desktop/example/subject_2/session_2/ltx_6.png'
Hope this works for you. Please note that in the inner most for loop you have to change the file extension while using for your purpose.

Resources