File names with space, sending quoted commands to shell? - file

So I have a few files with spaces in them, I first remove spaces and put the "new names" in a list, then I construct a command to take old file name, which I put in quotes, and then the new file-name and rename them using mv and subprocess.run - but python is spitting out errors:
Here's the code:
import os
import subprocess
file_path = "/home/emil/import/"
files = os.listdir(file_path)
new_files = []
#Create new names
for each in files:
print ("Taking file: ", each)
new_name = each.replace(" ", "")
final_name = file_path + new_name
new_files.append(final_name)
cmd = "mv \"" + file_path + each + "\" " + final_name
print (cmd)
subprocess.run([cmd])
And here is the output:
emil#TITAN:~/programmering$ ./call.py
Taking file: test file1.pdf
mv "/home/emil/import/test file1.pdf" /home/emil/import/testfile1.pdf
Traceback (most recent call last):
File "./call.py", line 22, in <module>
subprocess.run([cmd])
File "/usr/lib/python3.7/subprocess.py", line 472, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'mv "/home/emil/import/test file1.pdf" /home/emil/import/testfile1.pdf': 'mv "/home/emil/import/test file1.pdf" /home/emil/import/testfile1.pdf'
As you can see I print the variable cmd which I am sending to subprocess, and if I run that command individually it works great:
emil#TITAN:~/programmering$ mv "/home/emil/import/test file1.pdf" /home/emil/import/testfile1.pdf
emil#TITAN:~/programmering$ ls /home/emil/import/
'test file2.pdf' 'test file3.pdf' testfile1.pdf
So where am I going wrong?

Related

Parsing unique data and renaming files

I was trying to create a Perl script to rename the files (hundreds of files with different names), but I have not had any success. I first need to find the unique file number and then rename it to something more human readable. Since file names are not sequential, it makes it difficult.
Examples of files names: The number of importance is after que sequence
# vv-- this number
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
lane8-s250-index--TCCGGAGA-AGGCGAAG-13_S250_L008_R1_001.fastq
lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
lane7-s0008-index--ATTACTCG-ATAGAGGC-105_S8_L007_R1_001.fastq
lane7-s0009-index--ATTACTCG-CCTATCCT-195_S9_L007_R1_001.fastq
lane7-s0010-index--ATTACTCG-GGCTCTGA-106_S10_L007_R1_001.fastq
lane7-s0011-index--ATTACTCG-AGGCGAAG-197_S11_L007_R1_001.fastq
lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.fastq
I have created a file called RENAMING_parse_data.sh that reference RENAMING_parse_data.pl
So in theory the idea is that it is parsing the data to find the sample # that is in the middle of the name, and taking that unique ID and renaming it. But I don't think it's even going into the IF loop.
Any ideas?
HERE IS THE .sh file that calls the perl scipt
#!/bin/bash
#first part is the program
#second is the directory path
#third and fourth times are the names of the output files
#./parse_data.pl /ACTF/Course/PATHTDIRECTORY Tabsummary.txt Strucsummary.txt
#WHERE ./parse_data.pl =name of the program
#WHERE /ACTF/Course/PATHTODIRECTORY = directory path were your field are saved AND is referred to as $dir_in = $ARGV[0] in the perl script;
#new files you recreating with the extracted data AND is refered to as $dir_in = $ARGV[1];
./RENAMING_parse_data.pl ./Test/ FishList.txt
HERE IS THE PERL SCRIP:
#!/usr/bin/perl
print (":)\n");
#Proesessing files in a directory
$dir_in = $ARGV[0];
$indv_list = $ARGV[1];
#open directory to acess those files, the folder where you have the files
opendir(DIR, $dir_in) || die ("Cannot open $dir_in");
#files = readdir(DIR);
#set all variables = 0 to void chaos
$j=0;
#open output header line for output file and print header line for tab delimited file
open(OUTFILETAB, ">", $indv_list);
print(OUTFILETAB "\t Fish ID", "\t");
#open each file
foreach (#files){
#re start all arrays to void chaos
print("in loop [$j]");
#acc_ID=();
#find FISH name
#EXAMPLE FISH NAMES: (lenth of fishname varies)
#lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq.gz
#lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.final.fastq
#NOTE: what is in btween () is the ID that is printed NOTE that value can change from 2 -3 depending on Sample #
#Trials:
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})[a-z]{1}[0-9]{2}_[A-Z]{1}[0-9]{3}_[a-z]{1}[0-9]{1}_[0-9]{3}.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})*.fastq
#lane*([0-9]{3})*.fastq
#lane.*-([0-9]{2})_.*.fastq
#lane.*-([0-9]{2})_*.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq
$string_FISH = #files;
if ($string_FISH =~ /^lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq/){
$FISH_ID =$1;
#acc_ID[$j] = $FISH_ID;
#print ("FISH. = |$FISH_ID[$j]| \n");
rename($string_FISH, "FISH. = |$FISH_ID[$j]|");
#print ($acc_ID[$j], "\n");
print(OUTFILETAB "FISH. = |$FISH_ID[$j]| \n");
}
$j= $j+1;
}
IDEAL END RESULT
So in the end I would like it to take the file name, find the unique identifier and rename it
from :
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
to:
Fish.01.fastq
Fish.193.fastq
Any Ideas or suggestion on hot to fix this or If it need to change completely are greatly appreciated.
At the core of a Perl solution, you could use
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa
For example,
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ rename 's/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa' *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
(There are two similar tools named rename. This one is also known as prename.)
It's pretty simple to implement yourself:
#!/usr/bin/perl
use strict;
use warnings;
my $errors = 0;
for (#ARGV) {
my $old = $_;
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa;
my $new = $_;
next if $new eq $old;
if ( -e $new ) {
warn( "Can't rename \"$old\" to \"$new\": Already exists\n" );
++$errors;
}
elsif ( !rename( $old, $new ) ) {
warn( "Can't rename \"$old\" to \"$new\": $!\n" );
++$errors;
}
}
exit( !!$errors );
Provide the files to rename as arguments (e.g. using *.fastq from the shell).
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ ./a *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
The existence check (-e) is to prevent accidentally renaming a bunch of files to the same name and therefore losing all but one of them.
The above is an cleaned up version of an one-liner pattern I often use.
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n"
Adapted to sh:
\ls ... | perl -nle'$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n'

Create a text file with each iteration of a python loop

I have a python script that runs a C program :
cmd = ["/Users/stordd/Desktop/StageI2M/C/forestenostre/grezza_foresta", "-w", "/Users/stordd/Desktop/StageI2M/Leiden/text_file/USA.txt", "-m", "5", "-e", "-0"]
# Open/Create the output file
outFile = open("/Users/stordd/Desktop/StageI2M/Leiden/text_file/Output.txt", 'wb')
result = subprocess.Popen(cmd, stdout=subprocess.PIPE)
out = result.stdout.read()
outFile.write(out)
outFile.close()
It takes a text file as input and create a text file as output and I want to make a loop that would create a text file at each iteration without replacing the previous one. How can I do that ?
In the open function call add the "a" for append
outFile = open("/Users/stordd/Desktop/StageI2M/Leiden/text_file/Output.txt", 'ab')
https://docs.python.org/3/library/functions.html#open

How to open multiple files in a directory

I need to build a simple script in python3 that opens more files inside a directory and see if inside these files is a keyword.
All the files inside the directory are like this: "f*.formatoffile" (* stays for a casual number)
Example:
f9993546.txt
f10916138.txt
f6325802.txt
Obviusly i just need to open the txt files ones.
Thanks in advance!
Final script:
import os
Path = "path of files"
filelist = os.listdir(Path)
for x in filelist:
if x.endswith(".txt"):
try:
with open(Path + x, "r", encoding="ascii") as y:
for line in y:
if "firefox" in line:
print ("Found in %s !" % (x))
except:
pass
This should do the trick:
import os
Path = "path of the txt files"
filelist = os.listdir(Path)
for i in filelist:
if i.endswith(".txt"): # You could also add "and i.startswith('f')
with open(Path + i, 'r') as f:
for line in f:
# Here you can check (with regex, if, or whatever if the keyword is in the document.)

How to create a dynamic variable array name and fill it with multi-line text in bash script

I need to create a dynamic variable name for some arrays and fill it with multi-line text.
What I actually have is this :
#!/bin/bash
IFS=$'\n'
# Set an array with 1 item
ARRAY=("Item1")
# Get a description for the last item from a simple text file
DESCRIPTION=("$(cat test1.txt)")
# Get the number of items in the array
ARRAY_ITEMS_COUNT=${#ARRAY[#]}
# Create a variable name containing the number of items in the array as identifier
# and fill it with the description
eval ARRAY_ITEM${ARRAY_ITEMS_COUNT}_DESCRIPTION="(\"${DESCRIPTION[#]}\")"
# Display some results
echo "ARRAY_ITEM1_DESCRIPTION[#] = \"${ARRAY_ITEM1_DESCRIPTION[#]}\""
echo "ARRAY_ITEM1_DESCRIPTION[0] = \"${ARRAY_ITEM1_DESCRIPTION[0]}\""
echo "ARRAY_ITEM1_DESCRIPTION[1] = \"${ARRAY_ITEM1_DESCRIPTION[1]}\""
echo
# Same as above with a different text file
ARRAY=("Item1" "Item2")
DESCRIPTION=("$(cat test2.txt)")
ARRAY_ITEMS_COUNT=${#ARRAY[#]}
# Get an error here due to the ' character used in the text file
eval ARRAY_ITEM${ARRAY_ITEMS_COUNT}_DESCRIPTION="(\"${DESCRIPTION[#]}\")"
echo "ARRAY_ITEM2_DESCRIPTION[#] = \"${ARRAY_ITEM2_DESCRIPTION[#]}\""
echo "ARRAY_ITEM2_DESCRIPTION[0] = \"${ARRAY_ITEM2_DESCRIPTION[0]}\""
echo "ARRAY_ITEM2_DESCRIPTION[1] = \"${ARRAY_ITEM2_DESCRIPTION[1]}\""
The files "test1.txt" and "test2.txt" are as follow :
test1.txt
Simple text file with multi-lines used as
a test without special characters inside.
test2.txt
Simple text file with multi-lines used as
a test with single ' and double " quotes.
Expected result :
ARRAY_ITEM1_DESCRIPTION[#] = "Simple text file with multi-lines used as
a test without special characters inside."
ARRAY_ITEM1_DESCRIPTION[0] = "Simple text file with multi-lines used as"
ARRAY_ITEM1_DESCRIPTION[1] = "a test without special characters inside."
ARRAY_ITEM2_DESCRIPTION[#] = "Simple text file with multi-lines used as
a test with single ' and double " quotes."
ARRAY_ITEM2_DESCRIPTION[0] = "Simple text file with multi-lines used as"
ARRAY_ITEM2_DESCRIPTION[1] = "a test with single ' and double " quotes."
Current result :
ARRAY_ITEM1_DESCRIPTION[#] = "Simple text file with multi-lines used as
a test without special characters inside."
ARRAY_ITEM1_DESCRIPTION[0] = "Simple text file with multi-lines used as
a test without special characters inside."
ARRAY_ITEM1_DESCRIPTION[1] = ""
./test.sh: eval: line 28: unexpected EOF while looking for matching `"'
./test.sh: eval: line 29: syntax error: unexpected end of file
ARRAY_ITEM2_DESCRIPTION[#] = ""
ARRAY_ITEM2_DESCRIPTION[0] = ""
ARRAY_ITEM2_DESCRIPTION[1] = ""
I tried a lot of things but it never gives me what is expected, so can someone help me solve the 2 issues I have there please :
How to get proper array containing each line of the text file on each index
How to make it work even with quotes characters in the texts
EDIT : Working solution (bash version > 4) is :
#!/bin/bash
# Set an array with 1 item
ARRAY=("Item1")
# Get the number of items in the array
ARRAY_ITEMS_COUNT=${#ARRAY[#]}
# Create a variable name containing the number of items in the array as identifier
# and fill it with the description
readarray -t "ARRAY_ITEM${ARRAY_ITEMS_COUNT}_DESCRIPTION" < test1.txt
# Display some results
echo "ARRAY_ITEM1_DESCRIPTION[#] = \"${ARRAY_ITEM1_DESCRIPTION[#]}\""
echo "ARRAY_ITEM1_DESCRIPTION[0] = \"${ARRAY_ITEM1_DESCRIPTION[0]}\""
echo "ARRAY_ITEM1_DESCRIPTION[1] = \"${ARRAY_ITEM1_DESCRIPTION[1]}\""
echo
# Same as above with a different text file
ARRAY=("Item1" "Item2")
ARRAY_ITEMS_COUNT=${#ARRAY[#]}
# Get an error here due to the ' character used in the text file
readarray -t ARRAY_ITEM${ARRAY_ITEMS_COUNT}_DESCRIPTION < test2.txt
echo "ARRAY_ITEM2_DESCRIPTION[#] = \"${ARRAY_ITEM2_DESCRIPTION[#]}\""
echo "ARRAY_ITEM2_DESCRIPTION[0] = \"${ARRAY_ITEM2_DESCRIPTION[0]}\""
echo "ARRAY_ITEM2_DESCRIPTION[1] = \"${ARRAY_ITEM2_DESCRIPTION[1]}\""
Thanks for your help, have a nice day.
Slander
When you call cat in an array assignment you shouldn't quote it if you want the file to be read line by line. Because if you do so the contents of the file will be handled as one string/one line. So it won't get read line by line. Just try:
DESCRIPTION=($(cat test1.txt))
Also if you are using Bash version 4 you could use bash builtin command readarray to generate an array:
readarray -t DESCRIPTION < "test1.txt"
For Bash version < 4 this could be an alternative to cat and readarray:
IFS=$'\n' read -d -r -a DESCRIPTION < "test1.txt"

KeyProperty object has no attribute get

I placed a _post_put_hook into one of my NDB model types so that that whenever an entity of that type were put, it would invalidate a memcache key. This key is made up with the urlsafe version of the settings key. However, when this code runs, it says this:
Traceback (most recent call last):
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 1102, in __call__
return handler.dispatch()
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "C:\Program Files (x86)\Google\google_appengine\lib\webapp2\webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "U:\Hefner\Dropbox\Public\Projects\GHI\dev\rpc.py", line 68, in get
result = func(*args)
File "U:\Hefner\Dropbox\Public\Projects\GHI\dev\rpc.py", line 154, in pub_refreshSandbox
team_key = s.create.team("Cool Group")
File "U:\Hefner\Dropbox\Public\Projects\GHI\dev\GlobalUtilities.py", line 534, in team
new_team.put()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\model.py", line 2902, in _put
return self._put_async(**ctx_options).get_result()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 320, in get_result
self.check_success()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 315, in check_success
self.wait()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 299, in wait
if not ev.run1():
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\eventloop.py", line 219, in run1
delay = self.run0()
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\eventloop.py", line 181, in run0
callback(*args, **kwds)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 454, in _on_future_completion
self._help_tasklet_along(gen, val)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 368, in _help_tasklet_along
self.set_result(result)
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\tasklets.py", line 264, in set_result
callback(*args, **kwds)
File "U:\Hefner\Dropbox\Public\Projects\GHI\dev\DataModels.py", line 182, in _post_put_hook
tools.expireCache('allteams-' + self.settings.get().websafe)
AttributeError: 'KeyProperty' object has no attribute 'get'
Here is the relevant model class:
class Team(ndb.Expando):
name = ndb.StringProperty()
show_team = ndb.BooleanProperty()
settings = ndb.KeyProperty()
#classmethod
def _post_put_hook(self, future):
memcache.delete('allteams-' + self.settings.get().websafe)
Ideas?
in this case self.settings is not the actual key but the Models property because this is a classmethod and not an instance method. you need to work on the future object.
here the docs: https://developers.google.com/appengine/docs/python/ndb/futureclass
in this case:
#classmethod
def _post_put_hook(self, future):
entitykey = future.get_result()
entity = entitykey.get()
memcache.delete('allteams-' + entity.settings.get().websafe)
not sure what websafe does for you. maybe you mean entity.settings.urlsafe() ?

Resources