Looping through A Folder and all its subdirectories - loops

Ok. Many trouble shooting hours ...and many error "dings" later, I'm still having the same problem. Due to my beginner skills I'm having problems achieving the following segment of my project:
I will be as detailed as possible so I can hopefully nail it this time:
On my computer i have a folder C:\data which contains many different subfolders.
The subfolders are named by dates in a MMDDYY fashion. For example "040312"
In each subfolder are excel files named after Baseball teams. each subfolder may contain a different combination of xls files.
I am trying to write code that achieves the following objectives:
1.) Loops through all the subfolders of the C:\data folder looking for xls files that have the filenames: Angles.xls, Diamondbacks.xls, etc.
2.) If the files are found in each subfolder import the spreadsheet data and generate a plot of the data titled "Score" and "Allow".
3.) If the file is not found any given subfolder skip and continue to the next file to be located.
4.)Then save the generated plot in the same folder that the spreadsheet was imported from as a .fig and a .bmp file.
I've gotten hints to use various functions like: genpath, dir, but the code I've been fumbling through isn't able to achieve my goals.
a) the script doesn't import the excel files from all the subfolders
b) the script wont save the .fig or .bmp file in the associated subfolder
Here is the code I have been fumbling through:
%I know all of this is wrong wrong wrong. Please help to adjust my code to %achieve the objectives outlined above!
addpath(genpath('c:\data'))
folder = 'c:\data';
subdirs = dir(folder);
subdirs(~[subdirs.isdir]) = [] ;
numberOfFolders = length(subdirs);
if numberOfFolders <= 0
uiwait(warndlg('Number of folders = 0!'))
end
wantedfiles = {'Angels' 'Diamondbacks' 'Orioles' 'Royals' 'Yankees' 'Mets' 'Giants'};
for K = 1 : numberOfFolders
thissubdir = subdirs(K).name;
if strcmp(thissubdir, '.') || strcmp(thissubdir, '..')
continue;
end
subdirpath = [folder '\' thissubdir];
for L = 1 : length(wantedfiles)
for wantedfiles = {'Angels' 'Diamondbacks' 'Orioles' 'Royals' 'Yankees' 'Mets' 'Giants'};
folder = '';
fileToRead1 = [wantedfiles{1} '.xls'];
sheetName='Sheet1';
if exist(fileToRead1, 'file') == 0
% File does not exist
% Skip to bottom of loop and continue with the loop
continue;
end
%This is to import the data and organize it
% All of this code I had auto-generated from importing files manually
[numbers, strings, raw] = xlsread(fileToRead1, sheetName);
if ~isempty(numbers)
newData1.data = numbers;
end
if ~isempty(strings) && ~isempty(numbers)
[strRows, strCols] = size(strings);
[numRows, numCols] = size(numbers);
likelyRow = size(raw,1) - numRows;
% Break the data up into a new structure with one field per column.
if strCols == numCols && likelyRow > 0 && strRows >= likelyRow
newData1.colheaders = strings(likelyRow, :);
end
end
% Create new variables in the base workspace from those fields.
for i = 1:size(newData1.colheaders, 2)
assignin('base', genvarname(newData1.colheaders{i}), newData1.data(:,i));
end
% Now I execute the plotting of data
subplot (2,1,1), plot(Score,Allow)
title([wantedfiles{1} 'Testing to see if it works']);
subplot (2,1,2), plot(Allow,Score)
title('Well, did it?');
% here I save the generated plots, but they don't save where I want them to
saveas(gcf,[wantedfiles{1} ' did it work.fig']);
saveas(gcf,[wantedfiles{1} ' did it work.bmp']);
end
end
end
%At the end of the script I still was unable to loop over the files that I wanted
rmpath(genpath('c:\data'));

Related

Change ID in multiple FASTA files

I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
Each fasta file corresponds to a single organism, but it is not specified in the IDs. I have the original fasta files (because these have been translated) with the same filenames but different directories and include in their IDs the name of each organism.
I wanted to figure out how to loop through all these fasta files and rename each ID in each file with the corresponding organism name.
ok my effort, got to use my own input folders/files since they where not specified in question
/old folder contains files :
MW628877.1.fasta :
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
ATGCAAGATAAAAATTTAGTAGATGTTAATCTAACTAGTGAAATGAAAACGAGTTTTATCGATTACGCCA
TGAGTGTCATTGTTGCTCGTGCACTTCCAGATGTTAGAGATGGTTTAAAACCTGTTCATCGTCGTATTTT
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
CGGCGCGTACCGTACGCGATGCACGAGCTGAAAAATAACTGGAATGCCGCCTACAAAAAATCGGCGCGCA
TCGTCGGCGACGTCATCGGTAAATACCACCCCCACGGCGATTTCGCAGTTTACGGCACCATCGTCCGTAT
MG995190.1.fasta :
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCCAGCAGGAGA
TGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCGGAGGTGCGCGACGG
and an /empty folder.
/new folder contains files :
MW628877.1.fasta :
>MW628877.1
MQDKNLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRI
>KY347969.1
RRVPYAMHELKNNWNAAYKKSARIVGDVIGKYHPHGDFAVYGTIVR
MG995190.1.fasta :
>MG995190.1
MTDTTLPPDDSLDRIEPVDIQQEMQRSYIDYAMSVIVGRALPEVRD
my code is :
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
print(filename)
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('_____________________')
print('old ids ---> ',old_ids_dict)
print('_____________________')
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('### ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id += '.'+old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
print(seq_record)
sequences.append(seq_record)
SeqIO.write(sequences, filename, 'fasta')
check how it works, it actually overwrites both files in new folder,
as pointed out by #Vovin in his comment it needs to be adapted per your files template from-to.
I am sure there is more than a way to do this, probably better and more pythonic than may way, I am learning too. Let us know

Error loading physionet ECG database on MATLAB

I'm using this code to load the ECG-ID database into MATLAB:
%% Initialization
clear all; close all; clc
%% read files from folder A
% Specify the folder where the files live.
myFolder = 'Databases\ECG_ID';
% Check to make sure that folder actually exists. Warn user if it doesn't.
if ~isfolder(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s\nPlease specify a new folder.', myFolder;)
uiwait(warndlg(errorMessage);)
myFolder = uigetdir(; % Ask for a new one.)
if myFolder == 0
% User clicked Cancel
return;
end
end
% Get a list of all files in the folder with the desired file name pattern.
filePattern = fullfile(myFolder, '**/rec_*'; % Change to whatever pattern you need.)
theFiles = dir(filePattern;)
for k = 1 : length(theFiles)
baseFileName = theFiles(k.name;)
fullFileName = fullfile(theFiles(k.folder, baseFileName);)
fprintf(1, 'Now reading %s\n', fullFileName;)
% Now do whatever you want with this file name,
% such as reading it in as an image array with imread()
[sig, Fs, tm] = rdsamp(fullFileName, [1],[],[],[],1;)
end
But I keep getting this error message:
Now reading C:\Users\******\Documents\MATLAB\Databases\ECG_ID\Person_01\rec_1.atr
Error using rdsamp (line 203)
Could not find record: C:\Users\******\Documents\MATLAB\Databases\ECG_ID\Person_01\rec_1.atr. Search path is set to: '.
C:\Users\******\Documents\MATLAB\mcode\..\database\ http://physionet.org/physiobank/database/'
I can successfully load one signal at a time (but I can't load the entire database using the above code) using this command:
[sig, Fs, tm] = rdsamp('Databases\ECG_ID\Person_01\rec_1');
How do I solve this problem? How can I load all the files in MATLAB?
Thanks in advance.

How to make an array of .mat files and append it with a for

I have that code,
h1 = dir('C:\Users\John\Documents\MATLAB\code for yannis\anger(W)'); %angry
h2 = dir('C:\Users\John\Documents\MATLAB\code for yannis\neutral(N)');%neutral
h3 = dir('C:\Users\John\Documents\MATLAB\code for yannis\happiness(F)');%happy
%fprintf('%s', filename);%filename h(i,1).name
fprintf('%d \n',numel(h1));
fprintf('%d \n',numel(h2));
fprintf('%d \n',numel(h3));
%fprintf('%d', max(numel(h1),numel(h2),numel(h3)));
A= [numel(h1) numel(h2) numel(h3)];
fprintf('%d \n \n \n', max(A));
fprintf('%s \n', h1(2).name);
%load('C:\Users\John\Documents\MATLAB\code for yannis\anger(W)\*.mat');
fprintf('%d \n \n \n', length(h1));
resultsdir = 'results';
addpath('C:\Users\John\Documents\MATLAB\code for yannis\anger(W)\');
array = [h1(2)];
for i=3:max(A)+3
%s= 'C:\Users\John\Documents\MATLAB\code for yannis\anger(W)';
thisfile = h1(i).name;
destfile = fullfile(resultsdir, thisfile);
thisdata = load(thisfile);
%array[thisdata];
%strcmp(h(i,1).name(1:2), s);
%cat(1, array, h1(i,1).M);
fprintf('%s \n', h1(i,1).name);
end
and i want to keep all the .mat files in an array in order to compare it with other mat files
Thanks in advance
You can select the top level folder and utilize your system's dir call to recursively search for all files that match your criteria (*.mat, in this case)
For example:
mypath = uigetdir('', 'Select Top Level Folder');
oldpath = cd(mypath); % cd to data directory for simpler dir call
[~, cmdout] = system('dir /S /B *.mat');
cd(oldpath); % Return to previous path
mymatfiles = regexp(cmdout, '(.:\\[\w\-\\. ]+\.\w+)', 'match');
The system call returns one long string with all of the absolute paths to your *.mat files. I utilize a regexp to split this into a cell array, where each cell is an absolute path to a single *.mat file.
Note that this is Windows only because it uses the MS-DOS dir command. This function can easily be adapted to other operating systems, I just don't know them.
Assuming you have .mat files in all of those directories (ending in ".mat"), simply get the directory listings of each of those directories and store them.
% All of the folders where the *.mat files live
folders = {'C:\folder1'; 'C:\folder2'};
allfiles = cell();
for k = 1:numel(folders)
% Find all the .mat files
files = dir(fullfile(folders{k}, '*.mat'));
% Convert to absolute paths
fullpaths = cellfun(#(x)fullfile(folders{k}, x), {files.name}, 'uniform', 0);
% Store in our global cell array
allfiles = cat(1, allfiles, fullpaths(:));
end
Now allfiles contains an entry for every .mat file contained within those paths. You can then loop through this cell array to perform any operations you need.
alldata = cellfun(#load, allfiles, 'uniform', 0);

printing a multiple figures in a single PDFfile

there is a folder containing multiple CVS (excel) files. i wrote below, to plot figures of each file (representing a data-set) and then print; but each figure is printed in different PDF file
how can i modify this m file
directory_name=uigetdir(pwd,'Select data directory');
directory_name=([directory_name '\']);
files=dir([directory_name,'*csv']);
if isempty(files)
msgbox('No raw files in this directory')
end
if isempty(files)
msgbox('No raw files in this directory')
end
for i_files = 1 : length(files);
filename=files(i_files).name;disp(filename)
titles(i_files) = {filename};
cc=hsv(33);
figure
X=[];IND=33;
X=load([directory_name,filename]);
plot(X)
xlabel('time')
ylabel('angle(degree)')
legend('repetitive lifting')
title(filename)
set(gcf,'Units','inches');
screenposition = get(gcf,'Position');
set(gcf,...
'PaperPosition',[0 0 screenposition(3:4)],...
'PaperSize',[screenposition(3:4)]);
print -dpdf
end

matlab, how could i read from multiple folders

I wanna read multiple pics from multiple folders, assume that, I have an animal folder in drive D, and I have cat,dog,koala sub-folders in it, each sub-folder has 5 pics of animals. how could i read this pics and process them? Please explain me by details.
I wrote this code for one folder:
cd dog
tasavir = dir('*.jpg');
n = length(tasavir);
figure;
for i=1:n
esm = tasavir(i).name;
t = imread(esm);
ss{i} = t;
subplot(5,2,i),imshow(ss{i})
end
Welcome to Stackoverflow! Given your structure you can do something like this:
workDir = cd;
cd('D:\'); % start in parent directory
dirs = dir();
for dIdx = 1:length(dirs)
curDir = dirs(dIdx).name;
if isdir(curDir) % is directory?
cd(curDir);
% RUN YOUR CODE FOR A SINGLE DIRECTORY
cd('..');
end
end
cd(workDir);

Resources