Parsing unique data and renaming files - loops

I was trying to create a Perl script to rename the files (hundreds of files with different names), but I have not had any success. I first need to find the unique file number and then rename it to something more human readable. Since file names are not sequential, it makes it difficult.
Examples of files names: The number of importance is after que sequence
# vv-- this number
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
lane8-s250-index--TCCGGAGA-AGGCGAAG-13_S250_L008_R1_001.fastq
lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
lane7-s0008-index--ATTACTCG-ATAGAGGC-105_S8_L007_R1_001.fastq
lane7-s0009-index--ATTACTCG-CCTATCCT-195_S9_L007_R1_001.fastq
lane7-s0010-index--ATTACTCG-GGCTCTGA-106_S10_L007_R1_001.fastq
lane7-s0011-index--ATTACTCG-AGGCGAAG-197_S11_L007_R1_001.fastq
lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.fastq
I have created a file called RENAMING_parse_data.sh that reference RENAMING_parse_data.pl
So in theory the idea is that it is parsing the data to find the sample # that is in the middle of the name, and taking that unique ID and renaming it. But I don't think it's even going into the IF loop.
Any ideas?
HERE IS THE .sh file that calls the perl scipt
#!/bin/bash
#first part is the program
#second is the directory path
#third and fourth times are the names of the output files
#./parse_data.pl /ACTF/Course/PATHTDIRECTORY Tabsummary.txt Strucsummary.txt
#WHERE ./parse_data.pl =name of the program
#WHERE /ACTF/Course/PATHTODIRECTORY = directory path were your field are saved AND is referred to as $dir_in = $ARGV[0] in the perl script;
#new files you recreating with the extracted data AND is refered to as $dir_in = $ARGV[1];
./RENAMING_parse_data.pl ./Test/ FishList.txt
HERE IS THE PERL SCRIP:
#!/usr/bin/perl
print (":)\n");
#Proesessing files in a directory
$dir_in = $ARGV[0];
$indv_list = $ARGV[1];
#open directory to acess those files, the folder where you have the files
opendir(DIR, $dir_in) || die ("Cannot open $dir_in");
#files = readdir(DIR);
#set all variables = 0 to void chaos
$j=0;
#open output header line for output file and print header line for tab delimited file
open(OUTFILETAB, ">", $indv_list);
print(OUTFILETAB "\t Fish ID", "\t");
#open each file
foreach (#files){
#re start all arrays to void chaos
print("in loop [$j]");
#acc_ID=();
#find FISH name
#EXAMPLE FISH NAMES: (lenth of fishname varies)
#lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq.gz
#lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.final.fastq
#NOTE: what is in btween () is the ID that is printed NOTE that value can change from 2 -3 depending on Sample #
#Trials:
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})[a-z]{1}[0-9]{2}_[A-Z]{1}[0-9]{3}_[a-z]{1}[0-9]{1}_[0-9]{3}.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})*.fastq
#lane*([0-9]{3})*.fastq
#lane.*-([0-9]{2})_.*.fastq
#lane.*-([0-9]{2})_*.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq
$string_FISH = #files;
if ($string_FISH =~ /^lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq/){
$FISH_ID =$1;
#acc_ID[$j] = $FISH_ID;
#print ("FISH. = |$FISH_ID[$j]| \n");
rename($string_FISH, "FISH. = |$FISH_ID[$j]|");
#print ($acc_ID[$j], "\n");
print(OUTFILETAB "FISH. = |$FISH_ID[$j]| \n");
}
$j= $j+1;
}
IDEAL END RESULT
So in the end I would like it to take the file name, find the unique identifier and rename it
from :
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
to:
Fish.01.fastq
Fish.193.fastq
Any Ideas or suggestion on hot to fix this or If it need to change completely are greatly appreciated.

At the core of a Perl solution, you could use
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa
For example,
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ rename 's/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa' *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
(There are two similar tools named rename. This one is also known as prename.)
It's pretty simple to implement yourself:
#!/usr/bin/perl
use strict;
use warnings;
my $errors = 0;
for (#ARGV) {
my $old = $_;
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa;
my $new = $_;
next if $new eq $old;
if ( -e $new ) {
warn( "Can't rename \"$old\" to \"$new\": Already exists\n" );
++$errors;
}
elsif ( !rename( $old, $new ) ) {
warn( "Can't rename \"$old\" to \"$new\": $!\n" );
++$errors;
}
}
exit( !!$errors );
Provide the files to rename as arguments (e.g. using *.fastq from the shell).
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ ./a *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
The existence check (-e) is to prevent accidentally renaming a bunch of files to the same name and therefore losing all but one of them.
The above is an cleaned up version of an one-liner pattern I often use.
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n"
Adapted to sh:
\ls ... | perl -nle'$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n'

Related

Creating array of file names using grep

I'm having difficulty outputting file names as an array using grep. Specifically, I want to create an array of file names (plant photos) formatted like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Here's my original Perl script that I'm attempting to modify. It returns, as intended, ONE file name per plant as "Genus_species", printing a list of those plants:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r } # Removes "1" from end of file names
grep { /^[^2-9]*\.png\z/i && /_/ } # Finds "Genus_species.png" & "Genus_species1.png" and returns one file name per plant as "Genus_species.png"
readdir $dfh;
foreach my$file (#files) {
$file =~s/\.png//; # Removes ".png" extension
print "$file\n"; #Prints list of file names (plant names)
}
Here's the output:
Ilex_verticillata
Asarum_canadense
Ageratina_altissima
However, since each plant often has MULTIPLE photos (e.g.-- "Genus_species1.png, Genus_species2.png, etc.), I need to re-grep the directory using the above output to find their file names, then output the results in the form of an array as previously illustrated.
I know the solution likely involves modifying the "foreach" statement, using grep to return ALL file names with "Genus_species" in their name. Here's what I tried:
foreach my$file (#files) {
$file =~s/\.png//;
grep ($file,readdir(DIR));
print "$file = $file\n";
But the output was this:
Ilex_verticillata = Ilex_verticillata
Asarum_canadense = Asarum_canadense
Ageratina_altissima = Ageratina_altissima
Again, I want to output an array formatted as:
"Genus_species= Genus_species1.png, Genus_species2.png, etc.," meaning I want it to look like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Notice that I also want to add back the ".png" extension ONLY to the file names to the right of the equals sign.
Please advise. Thanks.
Readdir returns a list of files in the folder. You've put them on one line, which is compact. However, if you loop them you can process the items further.
#!/usr/bin/perl
use strict;
use warnings;
use English; ## use names rather than symbols for special varables
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $OS_ERROR";
my %genus_species; ## store matching entries in a hash
for my $file (readdir $dfh)
{
next unless $file =~ /\d\.png$/; ## skip entry if not a png file ending with a number
my $genus = $file =~ s/\d\.png$//r;
push(#{$genus_species{$genus}}, $file); ## push to array,the #{} is to cast the single entry to a referance to an list
}
for my $genus (keys %genus_species)
{
print "$genus = ";
print "$_ " for sort #{$genus_species{$genus}}; # sort and loop though entries in list referance
print "\n";
}

Regexp to Compare partial filenames then moving to another directory perl

I am working on a script to compare non-running files within a dir to running files from a command. I have to use Regex to strip the front half of the filenames from the dir then regex to strip the filenames out of a command which then records the unmatched names into an array.
The part I cannot figure out is how I can move the filenames from the old dir into a new directory for future deletion.
In order to move the files I will need to enclose them in wildcards, * due to the random numbers in front of the filenames and the extention.
example filenames before and after:
within dir:
13209811124300209156562070_cake_872_trucks.rts
within command:
{"file 872","cake_872_trucks.rts",running}
in #events array:
cake_872_trucks
My code:
#!/usr/bin/perl -w
use strict;
use warnings;
use File::Copy qw(move);
use Data::Dumper;
use List::Util 'max';
my $orig_dir = "/var/user/data/";
my $dest_dir = "/var/user/data/DeleteMe/";
my $dir = "/var/user/data";
opendir(DIR, $dir) or die "Could not open $dir: $!\n";
my #allfiles = readdir DIR;
close DIR;
my %files;
foreach my $allfiles(#allfiles) {
$allfiles =~ m/^(13{2}638752056463{2}635181_|1[0-9]{22}_|1[0-9]{23}_|1[0-9]{24}_|1[0-9]{25}_)([0-9a-z]{4}_8[0-9a-z]{2}_[0-9a-z]{2}[a-z][0-9a-z]0[0-9]\.rts|[a-z][0-9a-z]{3}_[0-9a-z]{4}_8[0-9a-z]{2}_[0-9a-z]{2}[a-z]{2}0[0-9]\.rts|[a-z]{2}[0-9a-z][0-9]\N[0-9a-z]\N[0-9]\N[0-9]\N[0-9a-z]{4}\N[0-9]\.rts|[a-z]{2}[0-9a-z]{2}\N{2}[0-9a-z]{2}\N{2}[0-9][0-9a-z]{2}\N[0-9]{2}\.rts|S0{2}2_86F_JATD_01ZF\.rts)$/im;
$files{$2} = [$1];
}
my #stripfiles = keys %files;
my $cmd = "*****";
my #runEvents = `$cmd`;
chomp #runEvents;
foreach my $running(#runEvents) {
$running =~ s/^\{"blah 8[0-9a-z]{2}","(?<field2>CBE1_D{3}1_8EC_J6TG0{2}\.rts|[0-9a-z]{4}_8[0-9a-z]{2}_[0-9a-z]{2}[a-z][0-9a-z]0[0-9]\.rts|[a-z]{2}[0-9a-z]{2}\N{2}[0-9a-z]{2}\N{2}[0-9][0-9a-z]{2}\N[0-9]{2}\.rts)(?:",\{239,20,93,5\},310{2},20{3},run{2}ing\}|",\{239,20,93,5\},310{2},[0-9]{2}0{3},run{2}ing\}|",\{239,20,93,5\},310{2},[0-9]{3}0{4},run{2}ing\}|",\{239,20,93,5\},3[0-9]0{2},[0-9]{2}0{4},run{2}ing\})$/$+{field2}/img;
}
my #events = grep {my $x = $_; not grep {$x =~/\Q$_/i}#runEvents}#stripfiles;
foreach my $name (#events) {
my ($randnum, $fnames) = { $files{$name}};
my $combined = $randnum . $fnames;
print "Move $file from $orig_dir to $dest_dir";
move ("$orig_dir/$files{$name}", $dest_dir)
or warn "Can't move $file: $!";
}
#print scalar(grep $_, #stripfiles), "\n";
#returned 1626
#print scalar(grep $_, #runEvents), "\n";
#returned 102
#print scalar(grep $_, #allfiles), "\n";
#returned 1906
Once you are parsing filenames with regex there is no reason not to be able to capture all parts so that you can later reconstitute needed parts of the filename.
I assume that that overly long (and incomplete) regex does what it is meant to.
I am not sure how the files to move relate to the original files in #allfiles, since those are fetched from /var/user/data while your moving attempt uses /home/user/RunBackup. So code snippets below are more generic.
If what gets moved are precisely the files from #allfiles then just keep the file name
my %files;
foreach my $oldfile (#allfiles) {
$oldfile =~ m/...(...).../; # your regex, but capture the name
$files{$1} = $oldfile;
}
where by /...(...).../ I mean to indicate that you use your regex, but to which you add parenthesis around the part of the pattern that matches the name itself.
Then you can later retrieve the filename from the "name" of interest (cake_872_trucks).
If, however, the filename components may be needed to patch a different (while related) filename then capture and store the individual components
my %files;
foreach my $oldfile (#allfiles) {
$oldfile =~ m/(...)(...)(...)/; # your regex, just with capture groups
$files{$2} = [$1, $3]; # add to %files: name => [number, ext]
}
The regex only matches (why change names in #allfiles with s///?), and captures.
The first set of parenthesis captures that long leading factor (number) into $1, the second one gets the name (cake_872_trucks) into $2, and the third one has the extension, in $3.
So you end up with a hash with keys that are names of interest, with their values being arrayrefs with all other needed components of the filename. Please adjust as needed as I don't know what that regex does and may have missed some parts.
Now once you go through #events you can rebuild the name
use File::Copy qw(move);
foreach my $name (#events) {
my ($num, $ext) = #{ $files{$name} };
my $file = $num . $name . $ext;
say "Move $file from $orig_dir to $dest_dir";
move("$orig_dir/$file", $dest_dir) or warn "Can't move $file: $!";
}
But if the files to move are indeed from #allfiles (as would be the case in this example) then use the first version above to store filenames as values in %files and now retrieve them
foreach my $name (#events) {
move ("$orig_dir/$files{$name}", $dest_dir)
or warn "Can't move $file: $!";
}
I use the core module File::Copy, instead of going out to the system for the move command.
You can also rebuild the name by going through the directory again, now with names of interest on hand. But that'd be very expensive since you have to try to match every name in #events for every file read in the directory (O(mn) complexity).
What you asked about can be accomplished with glob (and note File::Glob's version)
my #files = glob "$dir/*${name}*";
but you'd have to do this for every $name -- a huge and needless waste of resources.
If that regex really must spell out specific numbers, here is a way to organize it for easier digestion (and debugging!): break it into reasonable parts, with a separate variable for each.
Ideally each part of alternation would be one variable
my $p1 = qr/.../;
my $p2 = qr/.../;
...
my $re_alt = join '|', $p1, $p2, ...;
my $re_other = qr/.../;
$var =~ m/^($re_alt)($re_other)(.*)$/; # adjust anchors, captures, etc
where the qr operator builds a regex pattern.
Adjust those capturing parenthesis, anchors, etc to your actual needs. Breaking it up so that the regex is sensibly split into variables will go a long way for readability, and thus correctness.
Assuming that there is a good reason to seek those specific numbers in filenames, this is also a good way to document any such fixed factors.
I guess you need something like this:
my $path = '/home/user/RunBackup/';
my #files = map {$path."*$_*"} #events;
system(join " ", "mv", #files, "/home/user/RunBackup/files/");
If there are lots of files you might need to move them one by one:
system(join " ", "mv", $_, "/home/user/RunBackup/files/") for #files;

comparing two filename arrays for differences

below is my attempt and loading all filenames in a text file into an array and comparing that array to filenames which are in a seperate directory. I would like to identify the filenames that are in the directory and not in the file so I can then process those files. I am able to load the contents of the both directories succesfully but the compare operation is outputting all the files not just the difference.
Thank you in advance for the assistance.
use File::Copy;
use Net::SMTP;
use POSIX;
use constant DATETIME => strftime("%Y%m%d", localtime);
use Array::Utils qw(:all);
use strict;
use warnings;
my $currentdate = DATETIME;
my $count;
my $ErrorMsg = "";
my $MailMsg = "";
my $MstrTransferLogFile = ">>//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Mstr_Transfer_Log.txt";
my $DailyLogFile = ">//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Daily_Transfer_Log_" . DATETIME . ".txt";
my $InputDir = "//CFVFTP/Users/ssi/Transfer_Logs/folder1/";
my $MoveDir = "//CFVFTP/Users/ssi/Transfer_Logs/folder2/";
my $filetouse;
my #filetouse;
my $diff;
my $file1;
my $file2;
my %diff;
open (MSTRTRANSFERLOGFILE, $MstrTransferLogFile) or $ErrorMsg = $ErrorMsg . "ERROR: Could not open master transfer log file!\n";
open (DAILYLOGFILE, $DailyLogFile) or $ErrorMsg = $ErrorMsg . "ERROR: Could not open daily log file!\n";
#insert all files in master transfer log into array for cross reference
open (FH, "<//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Mstr_Transfer_Log.txt") or $ErrorMsg = $ErrorMsg . "ERROR: Could not open master log file!\n";
my #master = <FH>;
close FH;
print "filenames in text file:\n";
foreach $file1 (#master) { print "$file1\n"; }
print "\n";
#insert all 835 files in Input directory into array for cross reference
opendir (DIR, $InputDir) or $ErrorMsg = $ErrorMsg . "ERROR: Could not open input directory $InputDir!\n";
my #list = grep { $_ ne '.' && $_ ne '..' && /\.835$/ } readdir DIR;
close(DIR);
print "filenames in folder\n";
foreach $file2 (#list) { print "$file2\n"; }
print "\n";
#get the all files in the Input directory that are NOT in the master transfer log and place into #filetouse array
#diff{ #master }= ();;
#filetouse = grep !exists($diff{$_}), #list;;
print "difference:\n";
foreach my $file3 (#filetouse) { print "$file3\n"; }
print DAILYLOGFILE "$ErrorMsg\n";
print DAILYLOGFILE "$MailMsg\n";
close(MSTRTRANSFERLOGFILE);
close(DAILYLOGFILE);
this is what the output looks like:
filenames in text file:
160411h00448car0007.835
filenames in folder
160411h00448car0007.835
160411h00448car0008.835
160418h00001com0001.835
difference:
160411h00448car0007.835
160411h00448car0008.835
160418h00001com0001.835
This should help you to do what you need. It stores the names of all of the files in INPUT_DIR as keys in hash %files, and then deletes all the names found in LOG_FILE. The remainder are printed
This program uses autodie so that the success of IO operations needn't be checked explicitly. It was first available in Perl 5 core in v5.10.1
use strict;
use warnings 'all';
use v5.10.1;
use autodie;
use feature 'say';
use constant LOG_FILE => '//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Mstr_Transfer_Log.txt';
use constant INPUT_DIR => undef;
chdir INPUT_DIR;
my %files = do {
opendir my $dh, '.';
my #files = grep -f, readdir $dh;
map { $_ => 1 } #files;
};
my #logged_files = do {
open my $fh, '<', LOG_FILE;
<$fh>;
};
chomp #logged_files;
delete #files{#logged_files};
say for sort keys %files;
Update
After a lot of attrition I found this underneath your original code
use strict;
use warnings 'all';
use v5.10.1;
use autodie;
use feature 'say';
use Time::Piece 'localtime';
use constant DATETIME => localtime()->ymd('');
use constant XFR_LOG => '//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Mstr_Transfer_Log.txt';
use constant DAILY_LOG => '//CFVFTP/Users/ssi/Transfer_Logs/Artiva/ARTIVA_Daily_Transfer_Log_' . DATETIME . '.txt';
use constant INPUT_DIR => '//CFVFTP/Users/ssi/Transfer_Logs/folder1/';
use constant MOVE_DIR => '//CFVFTP/Users/ssi/Transfer_Logs/folder2/';
chdir INPUT_DIR;
my #master = do {
open my $fh, '<', XFR_LOG;
<$fh>;
};
chomp #master;
my #list = do {
opendir my $dh, '.';
grep -f, readdir $dh;
};
my %diff;
#diff{ #master } = ();
my #filetouse = grep { not exists $diff{$_} } #list;
As you can see, it's very similar to my solution. Here are some notes about your original
Always use lexical file handles. With open FH, ... the file handle is global and will never be closed unless you do it explicitly or until the program terminates. Instead, open my $fh, ... leaves perl to close the file handle at the end of the current block
Always use the three-parameter form of open, so that the open mode is separate from the file name, and never put an open mode as part of a file name. You opened the same file twice: once as $MstrTransferLogFile which begins with >> and once explicitly because you needed read access
It is very rare for a program to be able to recover from an IO operation error. Unless you are writing fail-safe software, a failure to open or read from a file or directory means the program won't be able to fulfill its purpose. That means there's little reason to accumulate a list of error messages -- the code should just die when it can't succeed
The output from readdir is very messy if you need to process directories because it includes the pseudo-directories . and ... But if you only want files then a simple grep -f, readdir $dh will throw those out for you
The block form of grep is often more readable, and not is much more visible than !. So grep !exists($diff{$_}), #list is clearer as grep { not exists $diff{$_} } #list
Unless your code is really weird, comments usually just add more noise and confusion and obscure the structure. Make your code look like what it does, so you don't have to explain it
Oh, and don't throw in all the things you might need at the start "just in case". Write your code as if it was all there and the compiler will tell you what's missing
I hope that helps
First, use a hash to store your already-processed files. Then it's just a matter of checking if a file exists in the hash.
(I've changed some variable names to make the answer a bit clearer.)
foreach my $file (#dir_list) {
push #to_process, $file unless ($already_processed{$file});
}
(Which could be a one-liner, but get it working in its most expanded form first.)
If you insist on your array, this looks much less efficient
foreach my $file (#dir_list) {
push #to_process, $file unless (grep (/^$file$/, #already_processed));
}
(Again could be a one-liner, but...)

Perl: Rename all files in a directory

So I am brand new to Perl and this is my first program (excluding a few basic tutorials to get to grips with very basic syntax)
What I want to do is rename all files within a specified directory to "File 1", "File 2", "File 3" etc
This is the code I have got so far:
use 5.16.3;
use strict;
print "Enter Directory: ";
my $directoryPath = <>;
chdir('$directoryPath') or die "Cant chdir to $directoryPath$!";
#files = readdir(DIR); #Array of file names
closedir(DIR);
my $i = 1; #counting integer for file names
my $j = 0; #counting integer for array values
my $fileName = File;
for (#files)
{
rename (#files[j], $fileName + i) or die "Cant rename file #files[j]$!";
i++;
j++;
}
chdir; #return to home directory
I have a number of issues:
1: Whenever I try to change directory I get the 'or die' message. I am wondering if this is to do with the working directory I start from, do I need to go up to the C: directory by doing something like '..\' before traversing down through a different directory path?
2: Error message 'Bareword "File" not allowed while "strict subs" in use'
3: Same as point 2. but for "i" and for "j"
4: Error message 'Global symbol "#files" requires explicit package name'
Note: I can obviously only get error one if I comment out everything after line else the program won't compile.
No. You probably need to chomp($directoryPath) first to remove the newline, though. And remove the single quotes, since they do not allow interpolation. You never need to quote a single variable like that.
File should be "File". Otherwise it is a bareword.
j should be $j, and same for i
You must declare with my #files, just like you did with the other variables.
You should also know that + is not a concatenation operator in Perl. You should use . for that purpose. But you can also just interpolate it in a double quoted string. When referring to a single array element, you should also use the scalar sigil $, and not #:
rename($files[$j], "$fileName$i") or die ...
You have also forgotten to opendir before you readdir.
You are using a for loop, but not using the iterator value $_, instead using your own counter. You are using two counters where only one is needed. So you might as well do:
for my $i (0 .. #files) { # #files in scalar context returns its size
rename($files[$i], $fileName . ($i+1)) or die ...
}
Change chdir('$directoryPath') to chdir("$directoryPath") (Double quote for interpreting variable) and chomp it before
File -> "File"
i -> $i
Declare my #files

uniquely rename each of many files using perl

I have a folder containing 96 files that I want to rename. The problem is that each file name needs a unique change...not like adding a zero the front of each name or changing extensions. It isn't practical to do a search and replace.
Here's a sample of the names I want to change:
newSEACODI-sww2320H-sww24_07A_CP.9_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07B_CP.10_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07C_CP.11_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07D_CP.12_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07E_R.1_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07F_R.3_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07G_R.4_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07H_R.5_sww2320H_sww2403F.fsa
I'd like to use perl to change the above names to the below names, respectively:
SEACODI_07A_A.2_sww2320H_2403F.fsa
SEACODI_07B_A.4_sww2320H_2403F.fsa
SEACODI_07C_H.1_sww2320H_2403F.fsa
SEACODI_07D_H.3_sww2320H_2403F.fsa
SEACODI_07E_H.6_sww2320H_2403F.fsa
SEACODI_07F_H.7_sww2320H_2403F.fsa
SEACODI_07G_Rb.4_sww2320H_2403F.fsa
SEACODI_07H_Rb.9_sww2320H_2403F.fsa
Can such a thing be done? I have a vague idea that I might make a text file with a list of the new names and call that list #newnames. I would make another array out of the current file names, and call it #oldnames. I'd then do some kind of for loop where each element $i in #oldnames is replaced by the corresponding $i in #newnames.
I don't know how to make an array out of my current file names, though, and so I'm not sure if this vague idea is on the right track. I keep my files with the messed-up names in a directory called 'oldnames'. The below is my attempt to make an array out of the file names in that directory:
#!/usr/bin/perl -w
use strict; use warnings;
my $dir = 'oldnames';
opendir ('oldnames', $dir) or die "cannot open dir $dir: $!";
my #file = readdir 'oldnames';
closedir 'oldnames';
print "#file\n";
The above didn't seem to do anything. I'm lost. Help?
Here:
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use File::Copy;
# capture script name, in case we are running the script from the
# same directory we working on.
my $this_file = (split(/\//, $0))[-1];
print "skipping file: $this_file\n";
my $oldnames = "/some/path/to/oldnames";
my $newnames = "/some/path/to/newnames";
# open the directory
opendir(my $dh, $oldnames);
# grep out all directories and possibly this script.
my #files_to_rename = grep { !-d && $_ ne $this_file } readdir $dh;
closedir $dh;
### UPDATED ###
# create hash of file names from lists:
my #old_filenames = qw(file1 file2 file3 file4);
my #new_filenames = qw(onefile twofile threefile fourfile);
my $filenames = create_hash_of_filenames(\#old_filenames, \#new_filenames);
my #missing_new_file = ();
# change directory, so we don't have to worry about pathing
# of files to rename and move...
chdir($oldnames);
mkdir($newnames) if !-e $newnames;
### UPDATED ###
for my $file (#files_to_rename) {
# Check that current file exists in the hash,
# if true, copy old file to new location with new name
if( exists($filenames->{$file}) ) {
copy($file, "$newnames/$filenames->{$file}");
} else {
push #missing_new_file, $file;
}
}
if( #missing_new_file ) {
print "Could not map files:\n",
join("\n", #missing_new_file), "\n";
}
# create_hash_of_filenames: creates a hash, where
# key = oldname, value = newname
# input: two array refs
# output: hash ref
sub create_hash_of_filenames {
my ($oldnames, $newnames) = #_;
my %filenames = ();
for my $i ( 0 .. (scalar(#$oldnames) - 1) ) {
$filenames{$$oldnames[$i]} = $$newnames[$i];
}
# see Dumper output below, to see data structure
return \%filenames;
}
Dumper result:
$VAR1 = {
'file2' => 'twofile',
'file1' => 'onefile',
'file4' => 'fourfile',
'file3' => 'threefile'
};
Running script:
$ ./test.pl
skipping file: test.pl
Could not map files:
a_file.txt
b_file.txt
c_file.txt
File result:
$ ls oldnames/
a_file.txt
b_file.txt
c_file.txt
file1
file2
file3
file4
$ ls newnames/
fourfile
onefile
threefile
twofile
Your code is a little odd, but it should work. Are you running it in the directory "oldnames" or in the directory above it? You should be in the directory above it. A more standard way of writing it would be like this:
#!/usr/bin/perl -w
use strict; use warnings;
my $dir = 'oldnames';
opendir ( my $oldnames, $dir) or die "cannot open dir $dir: $!";
my #file = readdir $oldnames;
closedir $oldnames;
print "#file\n";
This would populate #files with all the files in oldnames, including '.' and '..'. You might need to filter those out depending on how you do your renaming.
Can you do this with rename? It does allow you to use perl code and expressions as arguments if I recall.
The real answer is the one by #chrsblck it does some checks and doesn't make a mess.
For comparison here is a messy one liner that may suffice. It relies on you providing a list of equivalent new file names that will rename your list of old files in the correct order. Perhaps for your situation (where you don't want to do any programmatic transformation of the files names) you could just use a shell loop (see the end of this post) reading lists of new and old names from a file. A better perl solution would be to put both of these file name lists into two columns and then that file using the -a switch , #F and then useFile::Copy to copy the files around.
Anyway, below are some suggestions.
First, set things up:
% vim newfilenames.txt # list new names one per line corresponding to old names.
% wc -l newfilenames.txt # the same number of new names as files in ./oldfiles/
8 newfilenames.txt
% ls -1 oldfiles # 8 files rename these in order to list from newfilenames.txt
newSEACODI-sww2320H-sww24_07A_CP.9_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07B_CP.10_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07C_CP.11_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07D_CP.12_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07E_R.1_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07F_R.3_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07G_R.4_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07H_R.5_sww2320H_sww2403F.fsa
With files arranged as above, copy everything over:
perl -MFile::Copy -E 'opendir($dh , oldfiles); #newfiles=`cat newfilenames.txt`; chomp #newfiles; #oldfiles = sort grep(/^.+\..+$/, readdir $dh); END {for $i (0..$#oldfiles){copy("oldfiles/$oldfiles[$i]", "newfiles/$newfiles[$i]"); }}'
Not pretty: you have to grep andsort on #oldfiles to get rid of . .. and put the array elments in order. And there's always the risk that a typo could make a mess and it would be hard to figure out.
If you put the old and new names in a couple of files you could just do this with this with a shell script:
for i in `cat ../oldfilenames.txt` ; do ; done; for n in `cat ../newfilenames.txt`; do cp $i $n;
or just cd into the directory with the old files and do:
mkdir new
for i in * ; do ; done; for n in `cat ../newfilenames.txt`; do cp $i new/$n;
Good luck!

Resources