I have a very long lookup table (~40,000 lines) that I am using for my code. Currently, I have it set to grab 4 arrays from my lookup table in the subroutine that uses it, but I call that subroutine ~3,000 times. I would rather not waste processing time grabbing this table as arrays repeatedly. Is there a way to grab them in my main program, store them, and source them later in my subroutine?
My current code grabs the lookup table in 4 separate arrays of 39,760 lines, and I am currently calling it like this:
READCOL, 'LookupTable2.txt', F='D,D,D,D',Albedo, Inertia, NightT, DayT
EDIT: I should probably note I have IDL 6.2, but if there is a way to do it in a newer version, I would still appreciate knowing how.
EDIT 2: My current program has a function which saves 4 arrays and executes the main function. Can I call my function with arrays as an argument? That way I wouldn't have to keep creating the same array
Something like:
Pro
FUNC(Array1, Array2, Var1, Var2, Var3)
END
There are several ways you can do this.
It looks like you have four columns and 40,000 lines, correct?
Then you can do the following. First, I will assume there is no header data in the ASCII file for the following commands.
FUNCTION read_my_file,file_name
;; Assume FILE_NAME is full path to and including file name with extension
fname = file_name[0]
;; One could also find the file with the following
;; fname = FILE_SEARCH([path to file],[file name with extension])
;; Define the number of lines in the file
nl = FILE_LINES(fname[0])
;; Define empty arrays to fill
col1 = DBLARR(nl[0])
col2 = DBLARR(nl[0])
col3 = DBLARR(nl[0])
col4 = DBLARR(nl[0])
dumb = DBLARR(4)
;; Open file
OPENR,gunit,fname[0],ERROR=err,/GET_LUN
IF (err NE 0) THEN PRINT, -2, !ERROR_STATE.MSG ;; Prints an error message
FOR n=0L, nl[0] - 1L DO BEGIN
;; Read in file data
READF,gunit,FORMAT='(4d)',dumb
;; Fill arrays
col1[n] = dumb[0]
col2[n] = dumb[1]
col3[n] = dumb[2]
col4[n] = dumb[3]
ENDFOR
;; Close file
FREE_LUN,gunit
;; Define output
output = [[col1],[col2],[col3],[col4]]
;; Return to calling routine
RETURN,output
END
Note that this will work better if you provide an explicit width for the format statement, e.g., '(4d15.5), which means a 15 character input with 5 decimal places.
This will return col1 through col4 to the user or calling routine as an [N,4]-element array, e.g., col1 = output[*,0]. You could use a structure where each tag contains one of the colj arrays or you could return them through keywords.
Then you can pass these arrays to another function/program in the following way:
PRO my_algorithm_wrapper,file_name,RESULTS=results
;; Get data from files
columns = read_my_file(file_name)
;; Pass data to [algorithm] function
results = my_algorithm(columns[*,0],columns[*,1],columns[*,2],columns[*,3])
;; Return to user
RETURN
END
To call this from the command line (after making sure both routines are compiled), you would do something like the following:
IDL> my_algorithm_wrapper,file_name,RESULTS=results
IDL> HELP,results ;; see what the function my_algorithm.pro returned
The above code should work with IDL 6.2.
General Notes
Try to avoid using uppercase letters in IDL routine names as it can cause issues when IDL searches for the routine during a call or compilation statement.
You need to name the program/function in the line with the PRO/FUNCTION statement at the beginning of the file. The name must come immediately after the PRO/FUNCTION statement.
It is generally wise to use explicit formatting statements to avoid ambiguities/errors when reading data files.
You can pass any variable type (e.g., scalar integer, array, structure, object, etc.) to programs/functions so long as they are handled appropriately within the program/function.
Related
I've managed to answer my own question. This code will write cell arrays of any shape containing strings. The datasets can be modified/overwritten by simply calling again with a different input.
https://www.mathworks.com/matlabcentral/fileexchange/24091-hdf5-read-write-cellstr-example
%Okay, Matlab's h5write(filename, dataset, data) function doesn't work for
%strings. It hasn't worked with strings for years. The forum post that
%comes up first in Google about it is from 2009. Yeah. This is terrible,
%and evidently it's not getting fixed. So, low level functions. Fun fun.
%
%What I've done here is adapt examples, one from the hdf group's website
%https://support.hdfgroup.org/HDF5/examples/api18-m.html called
%"Read / Write String Datatype (Dataset)", the other by Jason Kaeding.
%
%I added functionality to check whether the file exists and either create
%it anew or open it accordingly. I wanted to be able to likewise check the
%existence of a dataset, but it looks like this functionality doesn't exist
%in the API, so I'm doing a try-catch to achieve the same end. Note that it
%appears you can't just create a dataset or group deep in a heirarchy: You
%have to create each level. Since I wanted to accept dataset names in the
%same format as h5read(), in the event the dataset doesn't exist, I loop
%over the parts of the dataset's path and try to create all levels. If they
%already exist, then this action throws errors too; hence a second
%try-catch.
%
%I've made it more advanced than h5create()/h5write() in that it all
%happens in one call and can accept data inputs of variable size. I take
%care of updating the dataset's extent to accomodate changing data array
%sizes. This is important for applications like adding a new timestamp
%every time the file is modified.
%
%#author Pavel Komarov pavel#gatech.edu 941-545-7573
function h5createwritestr(filename, dataset, str)
%"The class of input data must be cellstring instead of char when the
%HDF5 class is VARIABLE LENGTH H5T_STRING.", but also I don't want to
%force the user to put braces around single strings, so this.
if ischar(str)
str = {str};
end
%check whether the specified .h5 exists and either create or open
%accordingly
if ~exist(filename, 'file')
file = H5F.create(filename, 'H5F_ACC_TRUNC', 'H5P_DEFAULT', 'H5P_DEFAULT');
else
file = H5F.open(filename, 'H5F_ACC_RDWR', 'H5P_DEFAULT');
end
%set variable length string type
vlstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(vlstr_type,'H5T_VARIABLE');
% There is no way to check whether a dataset exists, so just try to
% open it, and if that fails, create it.
try
dset = H5D.open(file, dataset);
H5D.set_extent(dset, fliplr(size(str)));
catch
%create the intermediate groups one at a time because evidently the
%API's functions aren't smart enough to be able to do this themselves.
slashes = strfind(dataset, '/');
for i = 2:length(slashes)
url = dataset(1:(slashes(i)-1));%pull out the url of the next level
try
H5G.create(file, url, 1024);%1024 "specifies the number of
catch %bytes to reserve for the names that will appear in the group"
end
end
%create a dataspace for cellstr
H5S_UNLIMITED = H5ML.get_constant_value('H5S_UNLIMITED');
spacerank = max(1, sum(size(str) > 1));
dspace = H5S.create_simple(spacerank, fliplr(size(str)), ones(1, spacerank)*H5S_UNLIMITED);
%create a dataset plist for chunking. (A dataset can't be unlimited
%unless the chunk size is defined.)
plist = H5P.create('H5P_DATASET_CREATE');
chunksize = ones(1, spacerank);
chunksize(1) = 2;
H5P.set_chunk(plist, chunksize);% 2 strings per chunk
dset = H5D.create(file, dataset, vlstr_type, dspace, plist);
%close things
H5P.close(plist);
H5S.close(dspace);
end
%write data
H5D.write(dset, vlstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
%close file & resources
H5T.close(vlstr_type);
H5D.close(dset);
H5F.close(file);
end
I found a bug!
spacerank = length(size(str));
Now it works flawlessly as far as I can tell.
I am trying to convert a vector which is the following :
A =
[
02376R102 ;
21871B206 ;
81765M106 ;
G3156P103]
into string.
It should give the following result :
['02376R102' ;
'21871B206' ;
'81765M106' ;
'G3156P103']
I can not use functions such as num2str because my vector is composed of both letters and numbers...
The ultimate goal is that I want to use the function mkdir to create directories with names in the vector A.
for i=1:end
mkdir('mypath', A(i))
end
but the mkdir functions need to have strings in A...
Thank you a lot for your help
edit :
Sorry for my mispecification, the array I am working with are CUSIP (firms code created by CRSP database) which I have uploaded with excel. The exact array when I copy paste the array is :
'02376R102'
'21871B206'
'81765M106'
'G3156P103'
Which looks like strings... But when I try the function for directories
i=1:end
mkdir('mypath', A(i))
end
matlab says that argument must contain a string.
The full code is the following :
CUSIP_list = unique(CUSIP)
for i = 1:length(CUSIP_list)
mkdir('C:\Users\Marc-Aurèle\Desktop\MASTER THESIS\DATAS',CUSIP_list(i))
end
You are looking for cells instead of strings, first I wonder how you loaded your data in, because data formats with numbers and string should be loaded in as cells, cell arrays or char arrays:
A = {
'02376R102' ;
'21871B206' ;
'81765M106' ;
'G3156P103'}
This would generate 4 x 1 cells
Alternatively you can use vertcat to concatenate them into 1 cell array
A = vertcat( '02376R102','21871B206')
which would give you 2 x 9 chars
I believe you might want to use the second method, after which you can run your function to create directories.
I am having a bit of trouble with a specific file i/o in matlab, I am fairly new to it still so some things are still a bit of a mystery to me. The input file is structured as so:
File Name: Processed_kplr003942670-2010174085026_llc.fits.txt
File contents- 6 Header Lines then:
1, 2, 3
1, 2, 3
basically a matrix of about [1443,3] with varying values
now here is the matrix that I'm comparing it to:
[(0123456, 1, 2, 3), (0123456, 2, 3, 4), (etc..)]
Now here is my problem, first I need to know how to properly do the file input in a way which can let me compare the ID number (0123456) that is in the filename with the ID value that is in the matrix, so that I can compare the other columns of both. I do not know how to achieve this in matlab. Furthermore, I need to be able to loop over every point in the the matrix that matches up to the specific file, for example:
If I have 15 files ranging from 'Processed_0123456_1' to 'Processed_0123456_15' then I want to be able to read in the values contained in 'Processed_0123456_1'and compare them to ANY row in the matrix that corresponds to that ID (0123456). I don't know if maybe accumaray can be used for this, but as I said I'm not sure.
So the code must:
-Read in file
-Compare file to any point in the matrix with corresponding ID
-Do operations
-Loop over until full list of files in the directory are read in and processed, and output a matrix with the results.
Thanks for any help.
EDIT: Exact File Sample--
Kepler I.D.-----Channel
[1161345]--------[84]
-TTYPE1--------TTYPE8------------TTYPE4
['TIME']---['PDCSAP_FLUX']---['SAP_FLUX']
['BJD - 2454833']--['e-/s']--------['e-/s']
CROWDSAP --- 0.9791
630.195880143,277165.0,268233.0
630.216312946,277214.0,268270.0
630.23674585,277239.0,268293.0
630.257178554,277296.0,268355.0
630.277611357,277294.0,268364.0
630.29804426,277365.0,268441.0
630.318476962,277337.0,268419.0
630.338909764,277403.0,268481.0
630.359342667,277389.0,268463.0
630.379775369,277441.0,268508.0
630.40020817,277545.0,268604.0
There are more entries than what was just posted but they go for about 1000 lines so it is impractical to post that all here.
To get the file ID, use regular expressions, e.g.:
filename = 'Processed_0123456_1';
file_id_str = regexprep(filename, 'Processed_(\d+)_\d+', '$1');
file_num_str = regexprep(filename, 'Processed_\d+_(\d+)', '$1')
To read in the file contents, assuming that it's all comma-separated values without a header, use textscan, e.g.,
fid = fopen(filename)
C = textscan(fid, '%f,%f,%f') % Use as many %f specifiers as you have entries per line in the file
textscan also works on strings. So, for example, if your file contents was:
filestr = sprintf('1, 2, 3\n1, 3, 3')
Then running textscan on filestr works like this:
C = textscan(filestr, '%f,%f,%f')
C =
[2x1 int32] [2x1 int32] [2x1 int32]
You can convert that to a matrix using cell2mat:
cell2mat(C)
ans =
1 2 3
1 3 3
You could then repeat this procedure for all files with the same ID and concatenate them into a single matrix, e.g.,
C_full = [];
for (all files with the same ID)
C = do_all_the_above_stuff;
C_full = [C_full; C];
end
Then you can look for what you want in C_full.
Update based on updated OP Dec 12, 2013
Here's code to read the values from a single file. Wrap this all in the the loop that I mentioned above to loop over all your files and read them all into a single matrix.
fid = fopen('/path/to/file');
% Skip over 12 header lines
for kk = 1:12
fgetl(fid);
end
% Read in values to a matrix
C = textscan(fid, '%f,%f,%f');
C = cell2mat(C);
I think your requirements are too complicated to write the whole script here. Nonetheless, I will try to give some pointers to help. Disclaimer: None of this is tested, just my best guess. Please expect syntax errors, etc. I hope you can figure them out :-)
1) You can use the textscan function with the delimiter option to get data from the lines of your file. Since your format varies as it does, we will probably want to use...
2) ... fgetl to read the first two lines into strings and process them separately using texstscan. Such an operation might look like:
fid = fopen('file.txt','w');
tline1 = fgetl(fid);
tline2 = fgetl(fid);
fclose(fid);
C1 = textscan(tline1,'%s %d %s','delimiter','_'); %C1{2} will be the integer we want
C2 = textscan(tline2,'%s %s'),'delimiter,':'); %C2{2} will be the values we want, but they're still a string so...
mat = str2num(C2{2});
3) Then, for the rest of the lines, we can use something like dlmread:
mat2 = dlmread('file.txt',',',2,0);
The 2,0 specifies the offset in 0-based rows,columns from the start of the file. You may need to look at something like vertcat to stitch mat and mat2 together.
4) The list of files in the directory can be found with the dir command. The filename is an attribute of the structure that's returned:
dirlist = dir;
for i = 1:length(dirlist)
filename = dirlist(i).name
%process your files
end
You can also pass matching strings to dir, like so:
dirlist = dir('*.txt');
which will find all of the files with extension .txt.
5) You can very easily loop through the comparison matrix:
sze = size(comparisonmatrix);
for i = 1:sze(1)
%compare comparisonmatrix(i,1) to C1{2}
%Perform whatever operations you need
end
Hope that helps!
I've embedded Lua into my C application, and am trying to figure out why a table created in my C code via:
lua_createtable(L, 0, numObjects);
and returned to Lua, will produce a result of zero when I call the following:
print("Num entries", table.getn(data))
(Where "data" is the table created by lua_createtable above)
There's clearly data in the table, as I can walk over each entry (string : userdata) pair via:
for key, val in pairs(data) do
...
end
But why does table.getn(data) return zero? Do I need to insert something into the meta of the table when I create it with lua_createtable? I've been looking at examples of lua_createtable use, and I haven't seen this done anywhere....
table.getn (which you shouldn't be using in Lua 5.1+. Use the length operator #) returns the number of elements in the array part of the table.
The array part is every key that starts with the number 1 and increases up until the first value that is nil (not present). If all of your keys are strings, then the size of the array part of your table is 0.
Although it's a costly (O(n) vs O(1) for simple lists), you can also add a method to count the elements of your map :
>> function table.map_length(t)
local c = 0
for k,v in pairs(t) do
c = c+1
end
return c
end
>> a = {spam="data1",egg='data2'}
>> table.map_length(a)
2
If you have such requirements, and if your environment allows you to do so think about using penlight that provides that kind of features and much more.
the # operator (and table.getn) effectivly return the size of the array section (though when you have a holey table the semantics are more complex)
It does not count anything in the hash part of the table (eg, string keys)
for k,v in pairs(tbl) do count = count + 1 end
I've made an M-file that outputs data into my MATLAB command window in the form below (I've deleted the extra spaces). Is there an easy way to convert this output into an array, without having to type all the data into the array editor as I'm currently doing? (Or even run it straight from the M-file into an array?)
T = 0.3000
price = 24.8020
T = 0.4000
price = 28.3453
T = 0.5000
price = 31.3934
T = 0.6000
price = 34.0880
Organize your data into arrays.
For example:
T=0.3:0.1:0.6;
Price=yourfunction(T);
Then if you want a price vs T graph,
plot(T,Price)
If you've got a large amount of data, try to avoid for loops, as they're slower than vectorized code.
At some point in your M-file you are printing each line of data to the command window, presumably using DISP or FPRINTF. You can replace that line with the following:
data = [data; T price];
Where T and price are the variables holding your data. Every time you call the above line (say, in a loop) it will append your data as a new row to the variable data. At some point at the beginning of your M-file, you would therefore have to add the following initialization:
data = []; %# An empty array
Appending values to an array like this can sometimes be inefficient, so if you already know ahead of time how many rows of data you will collect you can instead preallocate data with a given size. For example, if you know you will have 4 pairs of values for T and price, you can initialize data in the following way:
data = zeros(4,2); %# A 4-by-2 array of zeroes
Then, when you add data to the array you would instead do the following:
data(i,:) = [T price]; %# Fill row i with data
Another issue to consider is whether your M-file is a script or a function. A function M-file has a line like function output = file_name(input) at the top, whereas a script M-file does not. Running a script M-file is equivalent to typing the entire contents of the file directly into the MATLAB command window, so all of the variables created in the M-file are available in the workspace.
If you are using a function M-file, all variables created are local to the function, and any you want to use in the workspace will have to be passed as outputs from the function. For example, the top line of your function M-file could look like:
function data = your_file
where your_file is the name of the M-file and data is a variable being returned. When you call this function from the workspace you would then have to capture the output in a variable:
outputData = your_file();
Now you have the contents of the variable data from your_file stored as a new variable outputData in the workspace.
Why do you print out data instead of collecting it in an array?
M = [];
for ...
M(end+1, :) = [T, price];
end;
or, more idiomatically,
M = 0.3:0.1:0.6; % or whatever your T values should be
M = [M' (M'.^2)] % replace the .^2 by your price function