I am trying to convert a vector which is the following :
A =
[
02376R102 ;
21871B206 ;
81765M106 ;
G3156P103]
into string.
It should give the following result :
['02376R102' ;
'21871B206' ;
'81765M106' ;
'G3156P103']
I can not use functions such as num2str because my vector is composed of both letters and numbers...
The ultimate goal is that I want to use the function mkdir to create directories with names in the vector A.
for i=1:end
mkdir('mypath', A(i))
end
but the mkdir functions need to have strings in A...
Thank you a lot for your help
edit :
Sorry for my mispecification, the array I am working with are CUSIP (firms code created by CRSP database) which I have uploaded with excel. The exact array when I copy paste the array is :
'02376R102'
'21871B206'
'81765M106'
'G3156P103'
Which looks like strings... But when I try the function for directories
i=1:end
mkdir('mypath', A(i))
end
matlab says that argument must contain a string.
The full code is the following :
CUSIP_list = unique(CUSIP)
for i = 1:length(CUSIP_list)
mkdir('C:\Users\Marc-Aurèle\Desktop\MASTER THESIS\DATAS',CUSIP_list(i))
end
You are looking for cells instead of strings, first I wonder how you loaded your data in, because data formats with numbers and string should be loaded in as cells, cell arrays or char arrays:
A = {
'02376R102' ;
'21871B206' ;
'81765M106' ;
'G3156P103'}
This would generate 4 x 1 cells
Alternatively you can use vertcat to concatenate them into 1 cell array
A = vertcat( '02376R102','21871B206')
which would give you 2 x 9 chars
I believe you might want to use the second method, after which you can run your function to create directories.
Related
I've got a character variable which holds a delimited list of strings, like so:
data lists;
format list_val $75.;
list_val = "PDC; QRS; OLN; ABC";
run;
I need to alphabetize the elements of each list (so the desired result when applied to the above string is "ABC; OLN; PDC; QRS;").
I adapted the solution here for my purposes as follows:
data lists_sorted;
set lists;
array_size = count(list_val,";") + 1; /* Cannot be used as array length must be specified at creation */
array t(50) $ 8 _TEMPORARY_;
call missing(of t(*));
do _n_=1 to array_size;
t(_n_)=scan(list_val,_n_,";");
end;
call sortc(of t(*));
new_list_val =catx("; ", of t(*));
put "original: " list_val " new: " new_list_val;
run;
When I run this code I get the following output:
original: PDC; QRS; OLN; ABC new: ABC; OLN; QRS; PDC
Which was not expected or desired. In general, the result of the above code applied to any list is a new list which is sorted alphabetically, except that the first element of the original list becomes the last element of the new list, regardless of its alphabetical ordering.
I can't find anything in the documentation of sortc which would explain this behavior, so I'm wondering if the issue is somehow the way I've set up the temporary array (I don't have much experience with these).
Does anyone know why sortc behaves this way? Side question: is there anyway I can dynamically determine the size of the array, rather than hard-coding a value such as 50?
It is because you included the leading spaces when assigning the values to the array elements. Remove those.
t[_n_]=left(scan(list_val,_n_,";"));
If you want to know what the minimum size array you could use for your data step you would need to process the dataset twice.
proc sql ;
select max(count(list_val,";") + 1) into :max_size trimmed from have;
quit;
....
array t[&max_size] $ 8 _temporary_;
But there is probably not much harm in just using some large constant value.
I am using a list of integers corresponding to an x,y index of a gridded NetCDF array to extract specific values, the initial code was derived from here. My NetCDF file has a single dimension at a single timestep, which is named TMAX2M. My code written to execute this is as follows (please note that I have not shown the call of netCDF4 at the top of the script):
# grid point lists
lat = [914]
lon = [2141]
# Open netCDF File
fh = Dataset('/pathtofile/temperaturedataset.nc', mode='r')
# Variable Extraction
point_list = zip(lat,lon)
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][i,j])
print(dataset_list)
The code executes, and the result is as follows:
masked_array(data=73,mask=False,fill_value=999999,dtype=int16]
The data value here is correct, however I would like the output to only contain the integer contained in "data". The goal is to pass a number of x,y points as seen in the example linked above and join them into a single list.
Any suggestions on what to add to the code to make this achievable would be great.
The solution to calling the particular value from the x,y list on single step within the dataset can be done as follows:
dataset_list = []
for i, j in point_list:
dataset_list.append(fh.variables['TMAX2M'][:][i,j])
The previous linked example contained [0,16] for the indexed variables, [:] can be used in this case.
I suggest converting to NumPy array like this:
for i, j in point_list:
dataset_list.append(np.array(fh.variables['TMAX2M'][i,j]))
I am using the NetCDF package in Julia 0.5.0 to read in the same multidimensional variable from ~10 different netcdf files. Is there a better way to loop through the files and consolidate them into one overarching multidimensional array rather than creating an array of arrays like I currently have?
Currently, my code is set up like:
files = ["file1", "file2", "file3", ... , "file10"]
#length(files) = 10
var = Array{Array}(10)
for i in collect(1:1:10)
var[i] = ncread(files[i], "x")
end
where
size(var) = 10
size(var[1]) = (192,59,193) #from file1
.
.
.
size(var[10]) = (192,59,193) #from file10
which works, but is not the desired format because I later want to take averages along Dimension Y in what are currently sub-arrays. Ideally, I would like to use ncread() to read x into one multidimensional array var, such that the size looks like
size(var) = (10,192,59,193)
where
var[1,:,:,:] #from file1
.
.
.
var[10,:,:,:] #from file10
I think that hcat() or push!() might be needed, but I'm not sure how to initialize the multidimensional arrays before the for-loop to account for ncread() output? I have to do this for ~8 variables in the files, and I don't know the dimensions or lengths of the different variables prior to calling ncread().
filenames = ["file$i" for i = 1:10]; # make some filenames
ncread(filename) = rand(2,3,4) # define a dummy function similar to yours
a = ncread(filenames[1]) # read the first file to get the size
output = Array{Float64}(length(filenames),size(a)...); #preallocate the full array, lookup splatting to see how this works
output[1,:,:,:] = a # assign the data we already read
for i in 2:length(filenames) # read and assign the rest
output[i,:,:,:] = ncread(filenames[i])
end
NCO uses ncecat for this:
ncecat in*.nc out.nc
However, doing so much with so little code may cause your head to explode.
I have a very long lookup table (~40,000 lines) that I am using for my code. Currently, I have it set to grab 4 arrays from my lookup table in the subroutine that uses it, but I call that subroutine ~3,000 times. I would rather not waste processing time grabbing this table as arrays repeatedly. Is there a way to grab them in my main program, store them, and source them later in my subroutine?
My current code grabs the lookup table in 4 separate arrays of 39,760 lines, and I am currently calling it like this:
READCOL, 'LookupTable2.txt', F='D,D,D,D',Albedo, Inertia, NightT, DayT
EDIT: I should probably note I have IDL 6.2, but if there is a way to do it in a newer version, I would still appreciate knowing how.
EDIT 2: My current program has a function which saves 4 arrays and executes the main function. Can I call my function with arrays as an argument? That way I wouldn't have to keep creating the same array
Something like:
Pro
FUNC(Array1, Array2, Var1, Var2, Var3)
END
There are several ways you can do this.
It looks like you have four columns and 40,000 lines, correct?
Then you can do the following. First, I will assume there is no header data in the ASCII file for the following commands.
FUNCTION read_my_file,file_name
;; Assume FILE_NAME is full path to and including file name with extension
fname = file_name[0]
;; One could also find the file with the following
;; fname = FILE_SEARCH([path to file],[file name with extension])
;; Define the number of lines in the file
nl = FILE_LINES(fname[0])
;; Define empty arrays to fill
col1 = DBLARR(nl[0])
col2 = DBLARR(nl[0])
col3 = DBLARR(nl[0])
col4 = DBLARR(nl[0])
dumb = DBLARR(4)
;; Open file
OPENR,gunit,fname[0],ERROR=err,/GET_LUN
IF (err NE 0) THEN PRINT, -2, !ERROR_STATE.MSG ;; Prints an error message
FOR n=0L, nl[0] - 1L DO BEGIN
;; Read in file data
READF,gunit,FORMAT='(4d)',dumb
;; Fill arrays
col1[n] = dumb[0]
col2[n] = dumb[1]
col3[n] = dumb[2]
col4[n] = dumb[3]
ENDFOR
;; Close file
FREE_LUN,gunit
;; Define output
output = [[col1],[col2],[col3],[col4]]
;; Return to calling routine
RETURN,output
END
Note that this will work better if you provide an explicit width for the format statement, e.g., '(4d15.5), which means a 15 character input with 5 decimal places.
This will return col1 through col4 to the user or calling routine as an [N,4]-element array, e.g., col1 = output[*,0]. You could use a structure where each tag contains one of the colj arrays or you could return them through keywords.
Then you can pass these arrays to another function/program in the following way:
PRO my_algorithm_wrapper,file_name,RESULTS=results
;; Get data from files
columns = read_my_file(file_name)
;; Pass data to [algorithm] function
results = my_algorithm(columns[*,0],columns[*,1],columns[*,2],columns[*,3])
;; Return to user
RETURN
END
To call this from the command line (after making sure both routines are compiled), you would do something like the following:
IDL> my_algorithm_wrapper,file_name,RESULTS=results
IDL> HELP,results ;; see what the function my_algorithm.pro returned
The above code should work with IDL 6.2.
General Notes
Try to avoid using uppercase letters in IDL routine names as it can cause issues when IDL searches for the routine during a call or compilation statement.
You need to name the program/function in the line with the PRO/FUNCTION statement at the beginning of the file. The name must come immediately after the PRO/FUNCTION statement.
It is generally wise to use explicit formatting statements to avoid ambiguities/errors when reading data files.
You can pass any variable type (e.g., scalar integer, array, structure, object, etc.) to programs/functions so long as they are handled appropriately within the program/function.
I am trying to create a string array which will be fed with string values read from a text file this way:
labels = textread(file_name, '%s');
Basically, for each string in each line of the text file file_name I want to put this string in 10 positions of a final string array, which will be later saved in another text file.
What I do in my code is, for each string in file_name I put this string in 10 positions of a temporary cell array and then concatenate this array with a final array this way:
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector{1:10}=labels{i};
final_vector=horzcat(final_vector,temp_vector);
end
But when I run the code the following error appears:
The right hand side of this assignment has too few values to satisfy the left hand side.
Error in my_example_code (line 16)
temp_vector{1:10}=labels{i};
I am too rookie in cell strings in matlab and I don't really know what is happening. Do you know what is happening or even have a better solution to my problem?
Use deal and put the left hand side in square brackets:
labels{1} = 'Hello World!'
temp_vector = cell(10,1)
[temp_vector{1:10}] = deal(labels{1});
This works because deal can distribute one value to multiple outputs [a,b,c,...]. temp_vector{1:10} alone creates a comma-separated list and putting them into [] creates the output array [temp_vector{1}, temp_vector{2}, ...] which can then be populated by deal.
It is happening because you want to distribute one value to 10 cells - but Matlab is expecting that you like to assign 10 values to 10 cells. So an alternative approach, maybe more logic, but slower, would be:
n = 10;
temp_vector(1:n) = repmat(labels(1),n,1);
I also found another solution
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector(:,1:10)=cellstr(labels{i});
final_vector=horzcat(final_vector,temp_vector);
end