I read multiple dta files in a folder and export them in the csv format. I should export dta files the sample size of which is greater than 30.
cd D:\myfolder\
/* There are many dta files in myfolder */
fs *.dta
foreach f in `r(files)' {
use `f', clear
export delimited using "D:\csvfolder\mycsvfile_`f'.csv", novarnames replace
How can I prevent exporting data sets that contain 30 observations or less?
Use if:
set obs 29
gen t = "should not be here"
tempfile file1
save "`file1'"
set obs 31
gen t = "should be here"
tempfile file2
save "`file2'"
foreach f in file1 file2 {
use "``f''", clear
if _N > 30 {
export excel using "~/Desktop/mycsvfile_`f'.xls"
See http://www.stata.com/support/faqs/data-management/multiple-operations/ for different concepts of if used in Stata.
Try this
if c(N)>30 export delimited using ...
I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
Each fasta file corresponds to a single organism, but it is not specified in the IDs. I have the original fasta files (because these have been translated) with the same filenames but different directories and include in their IDs the name of each organism.
I wanted to figure out how to loop through all these fasta files and rename each ID in each file with the corresponding organism name.
ok my effort, got to use my own input folders/files since they where not specified in question
/old folder contains files :
MW628877.1.fasta :
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
MG995190.1.fasta :
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
and an /empty folder.
/new folder contains files :
MW628877.1.fasta :
MG995190.1.fasta :
my code is :
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('old ids ---> ',old_ids_dict)
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('### ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id += '.'+old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
SeqIO.write(sequences, filename, 'fasta')
check how it works, it actually overwrites both files in new folder,
as pointed out by #Vovin in his comment it needs to be adapted per your files template from-to.
I am sure there is more than a way to do this, probably better and more pythonic than may way, I am learning too. Let us know
I have a very large folder, contains 19 folders, each folder contains images for a single class, I would like to split them into training/testing/validation sets; at the same time I have to add an annotation file for validation and testing together to train the model?
for split :
import splitfolders
# Split with a ratio.
# To only split into training and validation set, set a tuple to `ratio`, i.e, `(.8, .2)`.
splitfolders.ratio("/home/marouane/dev/mdl-py-classification/model/data", output="/home/marouane/dev/mdl-py-classification/model/data_s",
seed=1337, ratio=(.4, .1, .5), group_prefix=None, move=False) # default values
for annotation :
import os
import numpy as np
import shutil
import pandas as pd
def train_test_split(name):
parameter : name of the folder
return : text file
classes_dir = ["advert","box_start_horse","end_carriage","end_horse","group_heat_carriage","group_heat_horse","heat_carriage","heat_horse","interview",
dir = '/home/marouane/dev/mdl-py-classification/model/data_s/'+name+"/"
destFile = '/home/marouane/dev/mdl-py-classification/model/data_s/'+name+".txt"
for cls in classes_dir:
path = dir + cls
files = os.listdir(path)
for file in files :
with open(destFile, 'a') as f:
f.write(cls+"/" +file +" " +str(classes_dir.index(cls))+"\r\n")
I have a problem loading data from text file in Octave.
My text file looks like this:
# Created by Octave 5.2.0, Wed May 05 16:07:02 2021 GMT <unknown#DESKTOP-HEVT6O6>
# name: x
# type: matrix
# rows: 1
# columns: 3600
4.8899999999999997 4.9000000000000004 4.9000000000000004 4.9100000000000001 4.9299999999999997 4.9249999999999998 ...
I need to load those float numbers in one matrix and plot them in time domain.
My code so far:
fs = 360;
Ts = 1/fs;
d = fileread('ecg.txt');
data = regexp(d(1,136:62328),' ','split');
data = str2double(data);
ed = length(data);
t = linspace(0,Ts,ed - 1);
So My question is if there is another way to do it or if there is a better way to do it.
Your file is in Octave’s text data format. This is the default file format when saving variables to file with save. That is, that text file was saved in Octave using save ecg.txt x. The Octave command load ecg.txt will load the file, and re-create the x variable just like it was when it was saved.
Thus, to plot your data, just do
load ecg.txt
How can I get Python to loop through a directory and find a specific string in each file located within that directory, then output a summary of what it found?
I want to search the long files for the following string:
Only, the firmware version can be different in each file. So I want the log file to report back with whatever version it finds.
import glob
import os
print("The following list contains the firmware version of each server.\n")
os.chdir( "LOGS\\" )
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + " = ???)
I was thinking I could use something like the following to return the extra characters but it's not working.
I want the output to look something like this:
server1.web.com = FIRMWARE_VERSION = "2.16"
server2.web.com = FIRMWARE_VERSION = "3.01"
server3.web.com = FIRMWARE_VERSION = "1.26"
server4.web.com = FIRMWARE_VERSION = "4.1"
server5.web.com = FIRMWARE_VERSION = "3.50"
Any suggestions on how I can do this?
You can use regex for grub the text :
import re
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + '='+ re.search(r'FIRMWARE_VERSION ="([\d.]+)"',contents).group(1))
In this case re.search will do the job! with searching the file content based on the following pattern :
r'FIRMWARE_VERSION ="([\d.]+)"'
that find a float number between two double quote!also you can use the following that match anything right after FIRMWARE_VERSIONbetween two double quote.
Ok. Many trouble shooting hours ...and many error "dings" later, I'm still having the same problem. Due to my beginner skills I'm having problems achieving the following segment of my project:
I will be as detailed as possible so I can hopefully nail it this time:
On my computer i have a folder C:\data which contains many different subfolders.
The subfolders are named by dates in a MMDDYY fashion. For example "040312"
In each subfolder are excel files named after Baseball teams. each subfolder may contain a different combination of xls files.
I am trying to write code that achieves the following objectives:
1.) Loops through all the subfolders of the C:\data folder looking for xls files that have the filenames: Angles.xls, Diamondbacks.xls, etc.
2.) If the files are found in each subfolder import the spreadsheet data and generate a plot of the data titled "Score" and "Allow".
3.) If the file is not found any given subfolder skip and continue to the next file to be located.
4.)Then save the generated plot in the same folder that the spreadsheet was imported from as a .fig and a .bmp file.
I've gotten hints to use various functions like: genpath, dir, but the code I've been fumbling through isn't able to achieve my goals.
a) the script doesn't import the excel files from all the subfolders
b) the script wont save the .fig or .bmp file in the associated subfolder
Here is the code I have been fumbling through:
%I know all of this is wrong wrong wrong. Please help to adjust my code to %achieve the objectives outlined above!
folder = 'c:\data';
subdirs = dir(folder);
subdirs(~[subdirs.isdir]) = [] ;
numberOfFolders = length(subdirs);
if numberOfFolders <= 0
uiwait(warndlg('Number of folders = 0!'))
wantedfiles = {'Angels' 'Diamondbacks' 'Orioles' 'Royals' 'Yankees' 'Mets' 'Giants'};
for K = 1 : numberOfFolders
thissubdir = subdirs(K).name;
if strcmp(thissubdir, '.') || strcmp(thissubdir, '..')
subdirpath = [folder '\' thissubdir];
for L = 1 : length(wantedfiles)
for wantedfiles = {'Angels' 'Diamondbacks' 'Orioles' 'Royals' 'Yankees' 'Mets' 'Giants'};
folder = '';
fileToRead1 = [wantedfiles{1} '.xls'];
if exist(fileToRead1, 'file') == 0
% File does not exist
% Skip to bottom of loop and continue with the loop
%This is to import the data and organize it
% All of this code I had auto-generated from importing files manually
[numbers, strings, raw] = xlsread(fileToRead1, sheetName);
if ~isempty(numbers)
newData1.data = numbers;
if ~isempty(strings) && ~isempty(numbers)
[strRows, strCols] = size(strings);
[numRows, numCols] = size(numbers);
likelyRow = size(raw,1) - numRows;
% Break the data up into a new structure with one field per column.
if strCols == numCols && likelyRow > 0 && strRows >= likelyRow
newData1.colheaders = strings(likelyRow, :);
% Create new variables in the base workspace from those fields.
for i = 1:size(newData1.colheaders, 2)
assignin('base', genvarname(newData1.colheaders{i}), newData1.data(:,i));
% Now I execute the plotting of data
subplot (2,1,1), plot(Score,Allow)
title([wantedfiles{1} 'Testing to see if it works']);
subplot (2,1,2), plot(Allow,Score)
title('Well, did it?');
% here I save the generated plots, but they don't save where I want them to
saveas(gcf,[wantedfiles{1} ' did it work.fig']);
saveas(gcf,[wantedfiles{1} ' did it work.bmp']);
%At the end of the script I still was unable to loop over the files that I wanted