Handling Hebrew files and folders with Python 3.4

Handling Hebrew files and folders with Python 3.4 - filesystems

I used Python 3.4 to create a programm that goes through E-mails and saves specific attachments to a file server.
Each file is saved to a specific destination depending on the sender's E-mail's address.
My problem is that the destination folders and the attachments are both in Hebrew and for a few attachments I get an error that the path does not exsist.
Now that's not possible because It can fail for one attachment but not for the others on the same Mail (the destination folder is decided by the sender's address).
I want to debug the issue but I cannot get python to display the file path it is trying to save correctly. (it's mixed hebrew and english and it always displays the path in a big mess, although it works correctly 95% of the time when the file is being saved to the file server)
So my questions are:
what should I add to this code so that it will proccess Hewbrew correctly?
Should I encode or decode somthing?
Are there characters I should avoid when proccessing the files?
here's the main piece of code that fails:
try:
found_attachments = False
for att in msg.Attachments:
_, extension = split_filename(str(att))
# check if attachment is not inline
if str(att) not in msg.HTMLBody:
if extension in database[sender][TYPES]:
file = create_file(str(att), database[sender][PATH], database[sender][FORMAT], time_stamp)
# This is where the program fails:
att.SaveAsFile(file)
print("Created:", file)
found_attachments = True
if found_attachments:
items_processed.append(msg)
else:
items_no_att.append(msg)
except:
print("Error with attachment: " + str(att) + " , in: " + str(msg))
and the create file function:
def create_file(att, location, format, timestamp):
"""
process an attachment to make it a file
:param att: the name of the attachment
:param location: the path to the file
:param format: the format of the file
:param timestamp: the time and date the attachment was created
:return: return the file created
"""
# create the file by the given format
if format == "":
output_file = location + "\\" + att
else:
# split file to name and type
filename, extension = split_filename(att)
# extract and format the time sent on
time = str(timestamp.time()).replace(":", ".")[:-3]
# extract and format the date sent on
day = str(timestamp.date())
day = day[-2:] + day[4:-2] + day[:4]
# initiate the output file
output_file = format
# add the original file name where needed
output_file = output_file.replace(FILENAME, filename)
# add the sent date where needed
output_file = output_file.replace(DATE, day)
# add the time sent where needed
output_file = output_file.replace(TIME, time)
# add the path and type
output_file = location + "\\" + output_file + "." + extension
print(output_file)
# add an index to the file if necessary and return it
index = get_file_index(output_file)
if index:
filename, extension = split_filename(output_file)
return filename + "(" + str(index) + ")." + extension
else:
return output_file
Thanks in advance, I would be happy to explain more or supply more code if needed.

I found out that the promlem was not using Hebrew. I found that there's a limit on the number of chars that the (path + filename) can hold (255 chars).
The files that failed excided that limit and that caused the problem

Related

Change ID in multiple FASTA files

I need to rename multiple sequences in multiple fasta files and I found this script in order to do so for a single ID:
original_file = "./original.fasta"
corrected_file = "./corrected.fasta"
with open(original_file) as original, open(corrected_file, 'w') as corrected:
records = SeqIO.parse(original_file, 'fasta')
for record in records:
print record.id
if record.id == 'foo':
record.id = 'bar'
record.description = 'bar' # <- Add this line
print record.id
SeqIO.write(record, corrected, 'fasta')
Each fasta file corresponds to a single organism, but it is not specified in the IDs. I have the original fasta files (because these have been translated) with the same filenames but different directories and include in their IDs the name of each organism.
I wanted to figure out how to loop through all these fasta files and rename each ID in each file with the corresponding organism name.

ok my effort, got to use my own input folders/files since they where not specified in question
/old folder contains files :
MW628877.1.fasta :
>MW628877.1 Streptococcus agalactiae strain RYG82 DNA gyrase subunit A (gyrA) gene, complete cds
ATGCAAGATAAAAATTTAGTAGATGTTAATCTAACTAGTGAAATGAAAACGAGTTTTATCGATTACGCCA
TGAGTGTCATTGTTGCTCGTGCACTTCCAGATGTTAGAGATGGTTTAAAACCTGTTCATCGTCGTATTTT
>KY347969.1 Neisseria gonorrhoeae strain 1448 DNA gyrase subunit A (gyrA) gene, partial cds
CGGCGCGTACCGTACGCGATGCACGAGCTGAAAAATAACTGGAATGCCGCCTACAAAAAATCGGCGCGCA
TCGTCGGCGACGTCATCGGTAAATACCACCCCCACGGCGATTTCGCAGTTTACGGCACCATCGTCCGTAT
MG995190.1.fasta :
>MG995190.1 Mycobacterium tuberculosis strain UKR100 GyrA (gyrA) gene, complete cds
ATGACAGACACGACGTTGCCGCCTGACGACTCGCTCGACCGGATCGAACCGGTTGACATCCAGCAGGAGA
TGCAGCGCAGCTACATCGACTATGCGATGAGCGTGATCGTCGGCCGCGCGCTGCCGGAGGTGCGCGACGG
and an /empty folder.
/new folder contains files :
MW628877.1.fasta :
>MW628877.1
MQDKNLVDVNLTSEMKTSFIDYAMSVIVARALPDVRDGLKPVHRRI
>KY347969.1
RRVPYAMHELKNNWNAAYKKSARIVGDVIGKYHPHGDFAVYGTIVR
MG995190.1.fasta :
>MG995190.1
MTDTTLPPDDSLDRIEPVDIQQEMQRSYIDYAMSVIVGRALPEVRD
my code is :
from Bio import SeqIO
from os import scandir
old = './old'
new = './new'
old_ids_dict = {}
for filename in scandir(old):
if filename.is_file():
print(filename)
for seq_record in SeqIO.parse(filename, "fasta"):
old_ids_dict[seq_record.id] = ' '.join(seq_record.description.split(' ')[1:3])
print('_____________________')
print('old ids ---> ',old_ids_dict)
print('_____________________')
for filename in scandir(new):
if filename.is_file():
sequences = []
for seq_record in SeqIO.parse(filename, "fasta"):
if seq_record.id in old_ids_dict.keys():
print('### ', seq_record.id,' ', old_ids_dict[seq_record.id])
seq_record.id += '.'+old_ids_dict[seq_record.id]
seq_record.description = ''
print('-->', seq_record.id)
print(seq_record)
sequences.append(seq_record)
SeqIO.write(sequences, filename, 'fasta')
check how it works, it actually overwrites both files in new folder,
as pointed out by #Vovin in his comment it needs to be adapted per your files template from-to.
I am sure there is more than a way to do this, probably better and more pythonic than may way, I am learning too. Let us know

Error loading physionet ECG database on MATLAB

I'm using this code to load the ECG-ID database into MATLAB:
%% Initialization
clear all; close all; clc
%% read files from folder A
% Specify the folder where the files live.
myFolder = 'Databases\ECG_ID';
% Check to make sure that folder actually exists. Warn user if it doesn't.
if ~isfolder(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s\nPlease specify a new folder.', myFolder;)
uiwait(warndlg(errorMessage);)
myFolder = uigetdir(; % Ask for a new one.)
if myFolder == 0
% User clicked Cancel
return;
end
end
% Get a list of all files in the folder with the desired file name pattern.
filePattern = fullfile(myFolder, '**/rec_*'; % Change to whatever pattern you need.)
theFiles = dir(filePattern;)
for k = 1 : length(theFiles)
baseFileName = theFiles(k.name;)
fullFileName = fullfile(theFiles(k.folder, baseFileName);)
fprintf(1, 'Now reading %s\n', fullFileName;)
% Now do whatever you want with this file name,
% such as reading it in as an image array with imread()
[sig, Fs, tm] = rdsamp(fullFileName, [1],[],[],[],1;)
end
But I keep getting this error message:
Now reading C:\Users\******\Documents\MATLAB\Databases\ECG_ID\Person_01\rec_1.atr
Error using rdsamp (line 203)
Could not find record: C:\Users\******\Documents\MATLAB\Databases\ECG_ID\Person_01\rec_1.atr. Search path is set to: '.
C:\Users\******\Documents\MATLAB\mcode\..\database\ http://physionet.org/physiobank/database/'
I can successfully load one signal at a time (but I can't load the entire database using the above code) using this command:
[sig, Fs, tm] = rdsamp('Databases\ECG_ID\Person_01\rec_1');
How do I solve this problem? How can I load all the files in MATLAB?
Thanks in advance.

renaming files in a directory by adding string to a list of string using Python 2.7.8

Im trying to add the file extension back to a file path and filename after deleting the last 10 characters of the file's name.
Code so far:
rootDir = "C\\Users\\Documents\\New"
for root, dirs, filenames in os.walk(rootDir):
for fileys in filenames:
fullpath = os.path.join(root, fileys)
filesplit = os.path.splitext(fullpath)
fileext = filesplit
os.rename(fullpath, fullpath[:-5] + fileext) #line that has error
What my error is:
Traceback (most recent call last):
File "C:/Users/Student/Downloads/App.py", line 28, in
os.rename(fullpath, fullpath[:-10] + (s + fileext for s in fullpath))
TypeError: cannot concatenate 'str' and 'generator' objects
I see that the issue is that I am trying to concatenate a string and generator, but when i tried:
os.rename (fullpath, fullpath[:10] + fileext)
The error i receive is that I cannot concatenate a tuple and a string.
So how do I add on the file extension to the fullpath after i remove the last 10 characters of the filename while renaming the file.
Thanks

Find string in log files and return extra characters

How can I get Python to loop through a directory and find a specific string in each file located within that directory, then output a summary of what it found?
I want to search the long files for the following string:
FIRMWARE_VERSION = "2.15"
Only, the firmware version can be different in each file. So I want the log file to report back with whatever version it finds.
import glob
import os
print("The following list contains the firmware version of each server.\n")
os.chdir( "LOGS\\" )
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + " = ???)
I was thinking I could use something like the following to return the extra characters but it's not working.
file[:+5]
I want the output to look something like this:
server1.web.com = FIRMWARE_VERSION = "2.16"
server2.web.com = FIRMWARE_VERSION = "3.01"
server3.web.com = FIRMWARE_VERSION = "1.26"
server4.web.com = FIRMWARE_VERSION = "4.1"
server5.web.com = FIRMWARE_VERSION = "3.50"
Any suggestions on how I can do this?

You can use regex for grub the text :
import re
for file in glob.glob('*.log'):
with open(file) as f:
contents = f.read()
if 'FIRMWARE_VERSION = "' in contents:
print (file + '='+ re.search(r'FIRMWARE_VERSION ="([\d.]+)"',contents).group(1))
In this case re.search will do the job! with searching the file content based on the following pattern :
r'FIRMWARE_VERSION ="([\d.]+)"'
that find a float number between two double quote!also you can use the following that match anything right after FIRMWARE_VERSIONbetween two double quote.
r'FIRMWARE_VERSION =(".*")'

Groovy - create file issue: The filename, directory name or volume label syntax is incorrect

I'm running a script made in Groovy from Soap UI and the script needs to generate lots of files.
Those files have also in the name two numbers from a list (all the combinations in that list are different), and there are 1303 combinations
available and the script generates just 1235 files.
A part of the code is:
filename = groovyUtils.projectPath + "\\" + "$file"+"_OK.txt";
targetFile = new File(filename);
targetFile.createNewFile();
where $file is actually that part of the file name which include those 2 combinations from that list:
file = "abc" + "-$firstNumer"+"_$secondNumber"
For those file which are not created is a message returned:"The filename, directory name or volume label syntax is incorrect".
I've tried puting another path:
filename = "D:\\rez\\" + "\\" + "$file"+"_OK.txt";
targetFile = new File(filename);
targetFile.createNewFile();
and also:
File parentFolder = new File("D:\\rez\\");
File targetFile = new File(parentFolder, "$file"+"_OK.txt");
targetFile.createNewFile();
(which I've found here: What are possible reasons for java.io.IOException: "The filename, directory name, or volume label syntax is incorrect")
but nothing worked.
I have no ideea where the problem is. Is strange that 1235 files are created ok, and the rest of them, 68 aren't created at all.
Thanks,

My guess is that some of the files have illegal characters in their paths. Exactly which characters are illegal is platform specific, e.g. on Windows they are
\ / : * ? " < > |
Why don't you log the full path of the file before targetFile.createNewFile(); is called and also log whether this method succeeded or not, e.g.
filename = groovyUtils.projectPath + "\\" + "$file"+"_OK.txt";
targetFile = new File(filename);
println "attempting to create file: $targetFile"
if (targetFile.createNewFile()) {
println "Successfully created file $targetFile"
} else {
println "Failed to create file $targetFile"
}
When the process is finished, check the logs and I suspect you'll see a common pattern in the ""Failed to create file...." messages

File.createNewFile() returns false when a file or directory with that name already exists. In all other failure cases (security, I/O) it throws an exception.
Evaluate createNewFile()'s return value or, additionally, use the File.exists() method:
File file = new File("foo")
// works the first time
createNewFile(file)
// prints an error message
createNewFile(file)
void createNewFile(File file) {
if (!file.createNewFile()) {
assert file.exists()
println file.getPath() + " already exists."
}
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Handling Hebrew files and folders with Python 3.4 - filesystems

I found out that the promlem was not using Hebrew. I found that there's a limit on the number of chars that the (path + filename) can hold (255 chars). The files that failed excided that limit and that caused the problem

Related

Change ID in multiple FASTA files

Error loading physionet ECG database on MATLAB

renaming files in a directory by adding string to a list of string using Python 2.7.8

Find string in log files and return extra characters

Groovy - create file issue: The filename, directory name or volume label syntax is incorrect

Categories

Resources