Best way to randomly access and modify sequential data in Perl

Best way to randomly access and modify sequential data in Perl - database

The input to my Perl program is a text file that contains one 'item' per line. For example, a file with below contents:
item1 abc yyy anything blabla whatever
item2 efg dod whatever blabla mmm
item3 hij naa anything gagaga whatever gooo 1,2,3
item4 fff ahh whatever blabla whatever
item5 noo kir whatever bbbbbb hhhhhh
item6 123 kkk anything blabla whatever
item7 555 yyy anything ghghgh a,b,c
item8 777 yxy whatever blabla whatever
This input file is read into an array (in reality is read into an array of hashes but for this example lets assume an array strings):
my #items;
Associated with each item is a file on the network. For each item in the input file, the program searches the network for its associated file:
my $associated_files_handle;
open ( $associated_files_filehandle, "> $associated_files_filename");
my $associated_file;
my #items_whose_associated_file_was_found;
my #items_whose_associated_file_was_not_found;
foreach my $item (#items) {
# sub search_associated_file returns:
# - If associated file is found: path to the associated file, or
# - If associated file not found: undef
$associated_file = search_associated_file( $item );
if (defined $associated_file) {
print $associated_files_filehandle, "$associated_file\n";
push #items_whose_associated_file_was_found, $item;
} else {
print $associated_files_filehandle, "\n";
push #items_whose_associated_file_was_not_found, $item;
}
}
Lets assume associated files for items 1, 2, 3, 6, and 8 were found. After above loop finishes, the output file will contains:
<path to item1's associated file>
<path to item2's associated file>
<path to item3's associated file>
<path to item6's associated file>
<path to item8's associated file>
The program now invokes an external process to generate the associated files for items 4, 5, and 7:
my #missing_associated_files = generate_associated_files_of (\#items_whose_associated_file_was_not_found);
Now I have the associated files of all items and want to update the output file to contain:
<path to item1's associated file>
<path to item2's associated file>
<path to item3's associated file>
<path to item4's associated file>
<path to item5's associated file>
<path to item6's associated file>
<path to item7's associated file>
<path to item8's associated file>
And my question is what is the best way to do it?
One way is to discard the output file and re-run my program (since I generated the missing associated files at the end of my first run, I know all associated files will be found in the new run), but this is very undesirable due to long runtimes.
I am looking for a way to keep track of the index of the missing files so I can insert them in their correct position after they are generated.
My software skills are limited to what I learned in Programming 101 some 40 years ago so I would really appreciate an answer in the form of Perl code that I can integrate into my program despite my limited S/W literacy.
Thank you in advance.

Here is an example of how you can build the list of files in advance before you start writing to the output file:
use feature qw(say);
use strict;
use warnings;
use experimental qw(signatures);
{
my #items = map { "item_" . $_ } 1..10;
my #files; # all files
my #items_not_found; # items whose associated file was not found
my #idx_not_found; # indexes of items not found
for my $i (0..$#items) {
my $item = $items[$i];
my $associated_file = search_associated_file( $item );
if (defined $associated_file) {
$files[$i] = $associated_file;
} else {
push #items_not_found, $item;
push #idx_not_found, $i;
}
}
# The program now invokes an external process to generate the associated files
# for the missing items:
#files[#idx_not_found] = generate_associated_files_of ( \#items_not_found);
my $fn = 'associated_files.txt';
open ( my $fh, '>', $fn ) or die "Could not open file '$fn': $!";
for my $file (#files) {
say $fh "$file";
}
close $fh;
}
sub generate_associated_files_of($items) {
my #files;
for my $item (#$items) {
my ( $idx ) = $item =~ /item_(\d+)/;
die "Unexpected index" if !defined $idx;
push #files, "file_$idx";
}
return #files;
}
sub search_associated_file($item) {
my ( $idx ) = $item =~ /item_(\d+)/;
my %not_ok_idx = map { $_ => 1} 4,5,7;
return undef if $not_ok_idx{$idx};
return "file_" . $idx;
}

Related

Parsing unique data and renaming files

I was trying to create a Perl script to rename the files (hundreds of files with different names), but I have not had any success. I first need to find the unique file number and then rename it to something more human readable. Since file names are not sequential, it makes it difficult.
Examples of files names: The number of importance is after que sequence
# vv-- this number
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
lane8-s250-index--TCCGGAGA-AGGCGAAG-13_S250_L008_R1_001.fastq
lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
lane7-s0008-index--ATTACTCG-ATAGAGGC-105_S8_L007_R1_001.fastq
lane7-s0009-index--ATTACTCG-CCTATCCT-195_S9_L007_R1_001.fastq
lane7-s0010-index--ATTACTCG-GGCTCTGA-106_S10_L007_R1_001.fastq
lane7-s0011-index--ATTACTCG-AGGCGAAG-197_S11_L007_R1_001.fastq
lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.fastq
I have created a file called RENAMING_parse_data.sh that reference RENAMING_parse_data.pl
So in theory the idea is that it is parsing the data to find the sample # that is in the middle of the name, and taking that unique ID and renaming it. But I don't think it's even going into the IF loop.
Any ideas?
HERE IS THE .sh file that calls the perl scipt
#!/bin/bash
#first part is the program
#second is the directory path
#third and fourth times are the names of the output files
#./parse_data.pl /ACTF/Course/PATHTDIRECTORY Tabsummary.txt Strucsummary.txt
#WHERE ./parse_data.pl =name of the program
#WHERE /ACTF/Course/PATHTODIRECTORY = directory path were your field are saved AND is referred to as $dir_in = $ARGV[0] in the perl script;
#new files you recreating with the extracted data AND is refered to as $dir_in = $ARGV[1];
./RENAMING_parse_data.pl ./Test/ FishList.txt
HERE IS THE PERL SCRIP:
#!/usr/bin/perl
print (":)\n");
#Proesessing files in a directory
$dir_in = $ARGV[0];
$indv_list = $ARGV[1];
#open directory to acess those files, the folder where you have the files
opendir(DIR, $dir_in) || die ("Cannot open $dir_in");
#files = readdir(DIR);
#set all variables = 0 to void chaos
$j=0;
#open output header line for output file and print header line for tab delimited file
open(OUTFILETAB, ">", $indv_list);
print(OUTFILETAB "\t Fish ID", "\t");
#open each file
foreach (#files){
#re start all arrays to void chaos
print("in loop [$j]");
#acc_ID=();
#find FISH name
#EXAMPLE FISH NAMES: (lenth of fishname varies)
#lane8-s251-index--TCCGGAGA-TAATCTTA-14_S251_L008_R1_001.fastq.gz
#lane7-s0096-index--AGCGATAG-CAGGACGT-287_S96_L007_R1_001.final.fastq
#NOTE: what is in btween () is the ID that is printed NOTE that value can change from 2 -3 depending on Sample #
#Trials:
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})[a-z]{1}[0-9]{2}_[A-Z]{1}[0-9]{3}_[a-z]{1}[0-9]{1}_[0-9]{3}.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{4}-index--[A-Z]{8}[A-Z]{8}-([0-9]{3})*.fastq
#lane*([0-9]{3})*.fastq
#lane.*-([0-9]{2})_.*.fastq
#lane.*-([0-9]{2})_*.fastq
#lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq
$string_FISH = #files;
if ($string_FISH =~ /^lane[0-9]{1}-[a-z]{1}[0-9]{3}-index--[A-Z]{8}[A-Z]{8}-([0-9]{2})_[A-Z]{1}[0-9]{3}_L008_R1_001.fastq/){
$FISH_ID =$1;
#acc_ID[$j] = $FISH_ID;
#print ("FISH. = |$FISH_ID[$j]| \n");
rename($string_FISH, "FISH. = |$FISH_ID[$j]|");
#print ($acc_ID[$j], "\n");
print(OUTFILETAB "FISH. = |$FISH_ID[$j]| \n");
}
$j= $j+1;
}
IDEAL END RESULT
So in the end I would like it to take the file name, find the unique identifier and rename it
from :
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane7-s0007-index--ATTACTCG-TATAGCCT-193_S7_L007_R1_001.fastq
to:
Fish.01.fastq
Fish.193.fastq
Any Ideas or suggestion on hot to fix this or If it need to change completely are greatly appreciated.

At the core of a Perl solution, you could use
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa
For example,
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ rename 's/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa' *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
(There are two similar tools named rename. This one is also known as prename.)
It's pretty simple to implement yourself:
#!/usr/bin/perl
use strict;
use warnings;
my $errors = 0;
for (#ARGV) {
my $old = $_;
s/^.*-(\d+)_[^-]+(?=\.fastq\z)/Fish.$1/sa;
my $new = $_;
next if $new eq $old;
if ( -e $new ) {
warn( "Can't rename \"$old\" to \"$new\": Already exists\n" );
++$errors;
}
elsif ( !rename( $old, $new ) ) {
warn( "Can't rename \"$old\" to \"$new\": $!\n" );
++$errors;
}
}
exit( !!$errors );
Provide the files to rename as arguments (e.g. using *.fastq from the shell).
$ ls -1 *.fastq
lane8-s244-index--ATTACTCG-TATAGCCT-01_S244_L008_R1_001.fastq
lane8-s245-index--ATTACTCG-ATAGAGGC-02_S245_L008_R1_001.fastq
lane8-s246-index--TCCGGAGA-TATAGCCT-09_S246_L008_R1_001.fastq
lane8-s247-index--TCCGGAGA-ATAGAGGC-10_S247_L008_R1_001.fastq
lane8-s248-index--TCCGGAGA-CCTATCCT-11_S248_L008_R1_001.fastq
lane8-s249-index--TCCGGAGA-GGCTCTGA-12_S249_L008_R1_001.fastq
$ ./a *.fastq
$ ls -1 *.fastq
Fish.01.fastq
Fish.02.fastq
Fish.09.fastq
Fish.10.fastq
Fish.11.fastq
Fish.12.fastq
The existence check (-e) is to prevent accidentally renaming a bunch of files to the same name and therefore losing all but one of them.
The above is an cleaned up version of an one-liner pattern I often use.
dir /b ... | perl -nle"$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n"
Adapted to sh:
\ls ... | perl -nle'$o=$_; s/.../.../; $n=$_; rename$o,$n if!-e$n'

Creating array of file names using grep

I'm having difficulty outputting file names as an array using grep. Specifically, I want to create an array of file names (plant photos) formatted like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Here's my original Perl script that I'm attempting to modify. It returns, as intended, ONE file name per plant as "Genus_species", printing a list of those plants:
#!/usr/bin/perl
use strict;
use warnings;
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $!";
my #files =
map { s/1\.png\z/.png/r } # Removes "1" from end of file names
grep { /^[^2-9]*\.png\z/i && /_/ } # Finds "Genus_species.png" & "Genus_species1.png" and returns one file name per plant as "Genus_species.png"
readdir $dfh;
foreach my$file (#files) {
$file =~s/\.png//; # Removes ".png" extension
print "$file\n"; #Prints list of file names (plant names)
}
Here's the output:
Ilex_verticillata
Asarum_canadense
Ageratina_altissima
However, since each plant often has MULTIPLE photos (e.g.-- "Genus_species1.png, Genus_species2.png, etc.), I need to re-grep the directory using the above output to find their file names, then output the results in the form of an array as previously illustrated.
I know the solution likely involves modifying the "foreach" statement, using grep to return ALL file names with "Genus_species" in their name. Here's what I tried:
foreach my$file (#files) {
$file =~s/\.png//;
grep ($file,readdir(DIR));
print "$file = $file\n";
But the output was this:
Ilex_verticillata = Ilex_verticillata
Asarum_canadense = Asarum_canadense
Ageratina_altissima = Ageratina_altissima
Again, I want to output an array formatted as:
"Genus_species= Genus_species1.png, Genus_species2.png, etc.," meaning I want it to look like this:
Ilex_verticillata= Ilex_verticillata1.png, Ilex_verticillata2.png, Ilex_verticillata3.png
Asarum_canadense= Asarum_canadense1.png, Asarum_canadense2.png
Ageratina_altissi= Ageratina_altissi1.png, Ageratina_altissi2.png
Notice that I also want to add back the ".png" extension ONLY to the file names to the right of the equals sign.
Please advise. Thanks.

Readdir returns a list of files in the folder. You've put them on one line, which is compact. However, if you loop them you can process the items further.
#!/usr/bin/perl
use strict;
use warnings;
use English; ## use names rather than symbols for special varables
my $dir = '/Users/jdm/Desktop/xampp/htdocs/cnc/images/plants';
opendir my $dfh, $dir or die "Can't open $dir: $OS_ERROR";
my %genus_species; ## store matching entries in a hash
for my $file (readdir $dfh)
{
next unless $file =~ /\d\.png$/; ## skip entry if not a png file ending with a number
my $genus = $file =~ s/\d\.png$//r;
push(#{$genus_species{$genus}}, $file); ## push to array,the #{} is to cast the single entry to a referance to an list
}
for my $genus (keys %genus_species)
{
print "$genus = ";
print "$_ " for sort #{$genus_species{$genus}}; # sort and loop though entries in list referance
print "\n";
}

Save all Data in array, Filter out duplicated Data, Compare Data between arrays and Removed the matched Data

I have some problems regarding my script.
The problems are:
The value of $str or #matchedPath sometimes blank when I print out. It is not random, it happen only to certain Path in the table.txt file, which I can't figure it out, why?
How to print like the outcome, because I can't find the correct file location or directory of table.txt file because I have put all the path location in an array, filtered it and compared with the matched correct file location of table.txt, because of this, some location is missing when printed out.
Example path that the /home/is/latest/table.txt files contain, the bold texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
Example path that the /home/are/latest/table.txt files contain, the middle texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
e.g. list.txt file contains,
rty/b
uio/b/c
qwe/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
Expected outcome:
Unmatched Path : b/c/d/g
table.txt file location: /home/is/latest/table.txt
Unmatched Path : b/c/d/j
table.txt file location: /home/are/latest/table.txt
Below is my detailed script,
#!/usr/perl/5.14.1/bin/perl
# I want to make a script that automatically compare the path in table.txt with list.txt
#table.txt files is located under a parent directory and it differs in the subdirectory.
#There is about 10 table.txt files and each one of it need to compare with list.txt
#The objective is to print out the path that are not in the list.txt
use strict;
use warnings;
use Switch;
use Getopt::Std;
use Getopt::Long;
use Term::ANSIColor qw(:constants);
use File::Find::Rule;
use File::Find;
use File::Copy;
use Cwd;
use Term::ANSIColor;
my $path1='/home'; #Automatically search all table.txt file in this directory even in subdirectory
my $version='latest'; #search the file specified subdirectory e.g. /home/is/latest/table.txt and /home/are/latest/table.txt
my $path2='/list.text'; #there is about 10 table.txt files which contain specified paths in it.
$path1 =~ s/^\s+|\s+$//g;
$version =~ s/^\s+|\s+$//g;
$path2 =~ s/^\s+|\s+$//g;
my #files = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my #symlink_dirs = File::Find::Rule->directory->symlink->in($path1); #If the directory is a symlink, in my case 'latest' is a symlink directory
print colored (sprintf ("\n\n\tSUMMARY REPORT"),'bold','magenta');
print "\n\n_______________________________________________________________________________________________________________________________________________________\n\n";
if ($version eq "latest")
{
foreach my $dir (#symlink_dirs)
{
my #filess = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my $symDir=($dir."/"."table.txt");
$symDir =~ s/^\s+|\s+$//g;
my $wantedPath=$symDir;
my $path_1 = $wantedPath;
function($path_1);
}
}
else
{
for my $file (#files)
{
if ($file =~ m/.*$version.*/)
{
my $wantedPath=$file;
my $path_1 = $wantedPath;
function($path_1);
}
}
}
sub function
{
my $path_1 = $_[0];
open DATA, '<', $path_1 or die "Could not open $path_1: $!";
my $path_2 = "$path2";
open DATA1, '<', $path_2 or die "Could not open $path_2: $!";
################# FOCUSED PROBLEM AREA ##############################
my #matchedPath;
my #matched_File_Path;
my #unmatchedPath;
my #unmatched_File_Path;
my #s2 = <DATA1>;
while(<DATA>)
{
my $s1 = $_;
if ($s1 =~ /^#.*/)
{
next;
}
if ($s1 =~ /(.*)\s+(.*)\s+(.*)\s+/)
{
my $str=($2);
$str =~ s/\s+//g;
for my $s2 (#s2)
{
if ($s2 =~ /.*$str/)
{
push #matchedPath,$str;
push #matched_File_Path,$path_1;
print "matched Path: $str\n\t$path_1\n"; #I don't understand, sometimes I get empty $str value in this. Can anyone help me?
last;
}
else
{
#print "unmatch:$str\n\t$path_1\n";
push #unmatchedPath,$str;
#unmatched_File_Path,$path_1;
}
}
}
}
foreach (#unmatchedPath)
{print "unmatch path: $_\n";}
foreach (#matchedPath)
{print "\nmatch path: $_\n\n";}
foreach (#unmatched_File_Path)
{print "unmatch File Path: $_\n";}
foreach (#matched_File_Path)
{print "match File Path: $_\n";}
my #filteredUnmatchedPath = uniq(#unmatchedPath);
my #filteredUnmatched_IP_File_Path =uniq(#unmatched_IP_File_Path);
#filteredUnmatchedPath = grep {my $filteredPath = $_; not grep $_ eq $filteredPath, #matchedPath} #filteredUnmatchedPath;
}
print "#filteredUnmatchedPath\n";
print "#filteredUnmatched_IP_File_Path\n";
sub uniq
{
my %seen;
grep !$seen{$_}++, #_;
}
close(DATA);
close(DATA1);
print "_________________________________________________________________________________________________________________________________________________________\n\n";

I think using hashes is much simpler here
here's what I tried:
you will have to replace #all_path with your array containing every path where table is present
use strict;
use warnings;
my #all_path =("some/location/table.txt","some/location_2/table.txt");
my %table_paths;
my %list_paths;
foreach my $path (#all_path)
{
open (my $table, "<", $path) or die ("error opening file");
#we create hash, each key is a path
while (<$table>)
{
chomp;
#only process lines starting with "a" as it seems to be the format of this file
$table_paths{(split)[1]}=$path if (/^a/); #taking the 2nd element in each line
}
close $table;
}
open (my $list, "<", "list.txt") or die ("error opening file");
#we create hash, each key is a path
while (<$list>)
{
chomp;
$list_paths{$_}=1;
}
close $list;
#now we delete from table_paths common keys with list, that lefts unmathed
foreach my $key (keys %table_paths)
{
delete $table_paths{$key} if (grep {$_ =~ /$key$/} (keys %list_paths));
}
#printing unmatched keys
print "unmatched :$_\nlocation: $table_paths{$_}\n\n" foreach keys %table_paths;
inputs
in some/location/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
in some/location_2/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
in list.txt
rty/b
uio/b/c
qwe/dummyName/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
output:
unmatched: b/c/d/g
location: some/location/table.txt
unmatched: b/c/d/j
location: some/location_2/table.txt

Why is my script only accessing the first element in array?

Below is my script.
I have attempted many print statements to work out why it is only accessing the first array element. The pattern match works. The array holds a minimum 40 elements. I have checked and it is full.
I have printed each line, and each line prints.
my $index = 0;
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
}
$index++;
}
}
Okay, lets say the input was:
File:
Barcode001 001 abc
Barcode002 002 def
Barcode003 003 ghi
#barcode holds:
001
002
003
#otherarray holds:
abc
def
ghi
The output result for this script is currently printing only:
Barcode001_001 abc
It should be printing:
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi
Where it should be printing a whole load up to ~40 lines.
Any ideas? There must be something wrong with the way I am accessing the array elements? Or incrementing? Hoping this isn't something too silly!
Thanks in advance.
It needs the index because I am trying to match arrays in parallel, as they are ordered. Each line needs to match the corresponding indices of the arrays to each line in the file.

It's a little hard to answer with certainty without more information about the contents of #barcode and FILE, but there is something odd in your code which makes me think that it might be the problem.
The construct while (<FILE>) { ... } will, until end of file, read a line from FILE into $_ and then execute the contents of the loop. In your code, you also read all the lines from FILE from within the loop that iterates over #barcode. I think it is likely that you intended to check each line from FILE against all the elements of #barcode, which would make the loop look like the following:
while (my $line = <FILE>) {
foreach my $barcode (#barcode) {
if ($line =~ /Barcode([0-9]*)\t$barcode/) {
my $bar = $1;
print OUT ">Barcode$bar"."_"."$barcode\n";
}
else {
print OUT $line;
}
}
}
I've taken the liberty of doing a bit of code tidying, but I may have made some unwarranted assumptions.

Your core problem in the above is - in your first iteration you slurp all of your file into #lines. But because it's lexically scoped to the loop, it disappears when that loop completes.
Furthermore:
I would strongly suggest that you don't use $_ like that.
$_ is a special variable that's set implicitly in loops. I'd strongly suggest that you need to replace that with something that isn't a special variable, because that's a sure way to cause yourself pain.
turn on use strict; and use warnings;
use 3 argument open with a lexical filehandle.
perltidy your code, so the bracketing looks right.
you've a search and replace pattern on $_ that's emptying it completely, but then you're trying to print it. You may well not be printing what you think you're printing.
You're accessing <FILE> outside and inside your loop. This will cause you problems.
Barcode([0-9]*) - with a '*' there you're saying 'zero or more' is valid. You may want to consider \d+ - one or more digits.
referencing multiple arrays by index is messy. I'd suggest coalescing them into a hash lookup (lookup by key - barcode)
This line:
my #line = <FILE>;
reads your whole file into #line. But you do this inside the while loop that iterates... each line in <FILE>. Don't do that, it's horrible.
Is this something like what you wanted?
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #barcode = qw (
001
002
003
);
my #otherarray = qw (
abc
def
ghi
);
my %lookup;
#lookup{#barcode} = #otherarray;
print Dumper \%lookup;
#commented because I don't have your source data
#my $file = "input_file_name";
#my $output = "output_file_name";
#open( my $input, "<", $file ) or die "\nNot opening $file for reading\n\n";
#open( my $output, ">", $final ) or die "Did not open $final\n";
#while ( my $line = <$input> )
while ( my $line = <DATA> ) {
foreach my $barcode (#barcode) {
if ( my ($bar) = ( $line =~ /Barcode(\d+)\s+$barcode/ ) ) {
print ">Barcode$bar" . "_" . "$barcode $lookup{$barcode}\n";
#print {$output} ">Barcode$bar" . "_" . "$lookup{$barcode}\n";
}
}
}
__DATA__
Barcode001 001
Barcode002 002
Barcode003 003
Prints:
$VAR1 = {
'001' => 'abc',
'002' => 'def',
'003' => 'ghi'
};
>Barcode001_001 abc
>Barcode002_002 def
>Barcode003_003 ghi

It turns out it was a simple issue as I had suspected being a Monday. I had a colleague go through it with me, and it was the placing of the index:
#my $index = 0; #This means the index is iterated through,
#but for each barcode for one line, then it continues
#counting up and misses the other values, therefore
#repeatedly printing just the first element of the array.
open(FILE, "$file") or die "\nNot opening $file for reading\n\n";
open(OUT, ">$final") or die "Did not open $final\n";
while (<FILE>) {
$index = 0; #New placement of $index for initialising.
foreach my $barcode (#barcode) {
my #line = <FILE>;
foreach $_ (#line) {
if ($_ =~ /Barcode([0-9]*)\t$barcode[$index]\t$otherarray[$index]/) {
my $bar = $1;
$_ =~ s/.*//;
print OUT ">Barcode$bar"."_"."$barcode[$index]\t$otherarray[$index]";
}
print OUT $_;
$index++; #Increment here
}
#$index++;
}
}
Thanks to everyone for their responses, for my original and poorly worded question they would have worked and may be more efficient, but for the purpose of the script and my edited question, it needs to be this way.

uniquely rename each of many files using perl

I have a folder containing 96 files that I want to rename. The problem is that each file name needs a unique change...not like adding a zero the front of each name or changing extensions. It isn't practical to do a search and replace.
Here's a sample of the names I want to change:
newSEACODI-sww2320H-sww24_07A_CP.9_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07B_CP.10_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07C_CP.11_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07D_CP.12_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07E_R.1_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07F_R.3_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07G_R.4_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07H_R.5_sww2320H_sww2403F.fsa
I'd like to use perl to change the above names to the below names, respectively:
SEACODI_07A_A.2_sww2320H_2403F.fsa
SEACODI_07B_A.4_sww2320H_2403F.fsa
SEACODI_07C_H.1_sww2320H_2403F.fsa
SEACODI_07D_H.3_sww2320H_2403F.fsa
SEACODI_07E_H.6_sww2320H_2403F.fsa
SEACODI_07F_H.7_sww2320H_2403F.fsa
SEACODI_07G_Rb.4_sww2320H_2403F.fsa
SEACODI_07H_Rb.9_sww2320H_2403F.fsa
Can such a thing be done? I have a vague idea that I might make a text file with a list of the new names and call that list #newnames. I would make another array out of the current file names, and call it #oldnames. I'd then do some kind of for loop where each element $i in #oldnames is replaced by the corresponding $i in #newnames.
I don't know how to make an array out of my current file names, though, and so I'm not sure if this vague idea is on the right track. I keep my files with the messed-up names in a directory called 'oldnames'. The below is my attempt to make an array out of the file names in that directory:
#!/usr/bin/perl -w
use strict; use warnings;
my $dir = 'oldnames';
opendir ('oldnames', $dir) or die "cannot open dir $dir: $!";
my #file = readdir 'oldnames';
closedir 'oldnames';
print "#file\n";
The above didn't seem to do anything. I'm lost. Help?

Here:
#!/usr/bin/perl
use warnings;
use strict;
use autodie;
use File::Copy;
# capture script name, in case we are running the script from the
# same directory we working on.
my $this_file = (split(/\//, $0))[-1];
print "skipping file: $this_file\n";
my $oldnames = "/some/path/to/oldnames";
my $newnames = "/some/path/to/newnames";
# open the directory
opendir(my $dh, $oldnames);
# grep out all directories and possibly this script.
my #files_to_rename = grep { !-d && $_ ne $this_file } readdir $dh;
closedir $dh;
### UPDATED ###
# create hash of file names from lists:
my #old_filenames = qw(file1 file2 file3 file4);
my #new_filenames = qw(onefile twofile threefile fourfile);
my $filenames = create_hash_of_filenames(\#old_filenames, \#new_filenames);
my #missing_new_file = ();
# change directory, so we don't have to worry about pathing
# of files to rename and move...
chdir($oldnames);
mkdir($newnames) if !-e $newnames;
### UPDATED ###
for my $file (#files_to_rename) {
# Check that current file exists in the hash,
# if true, copy old file to new location with new name
if( exists($filenames->{$file}) ) {
copy($file, "$newnames/$filenames->{$file}");
} else {
push #missing_new_file, $file;
}
}
if( #missing_new_file ) {
print "Could not map files:\n",
join("\n", #missing_new_file), "\n";
}
# create_hash_of_filenames: creates a hash, where
# key = oldname, value = newname
# input: two array refs
# output: hash ref
sub create_hash_of_filenames {
my ($oldnames, $newnames) = #_;
my %filenames = ();
for my $i ( 0 .. (scalar(#$oldnames) - 1) ) {
$filenames{$$oldnames[$i]} = $$newnames[$i];
}
# see Dumper output below, to see data structure
return \%filenames;
}
Dumper result:
$VAR1 = {
'file2' => 'twofile',
'file1' => 'onefile',
'file4' => 'fourfile',
'file3' => 'threefile'
};
Running script:
$ ./test.pl
skipping file: test.pl
Could not map files:
a_file.txt
b_file.txt
c_file.txt
File result:
$ ls oldnames/
a_file.txt
b_file.txt
c_file.txt
file1
file2
file3
file4
$ ls newnames/
fourfile
onefile
threefile
twofile

Your code is a little odd, but it should work. Are you running it in the directory "oldnames" or in the directory above it? You should be in the directory above it. A more standard way of writing it would be like this:
#!/usr/bin/perl -w
use strict; use warnings;
my $dir = 'oldnames';
opendir ( my $oldnames, $dir) or die "cannot open dir $dir: $!";
my #file = readdir $oldnames;
closedir $oldnames;
print "#file\n";
This would populate #files with all the files in oldnames, including '.' and '..'. You might need to filter those out depending on how you do your renaming.

Can you do this with rename? It does allow you to use perl code and expressions as arguments if I recall.
The real answer is the one by #chrsblck it does some checks and doesn't make a mess.
For comparison here is a messy one liner that may suffice. It relies on you providing a list of equivalent new file names that will rename your list of old files in the correct order. Perhaps for your situation (where you don't want to do any programmatic transformation of the files names) you could just use a shell loop (see the end of this post) reading lists of new and old names from a file. A better perl solution would be to put both of these file name lists into two columns and then that file using the -a switch , #F and then useFile::Copy to copy the files around.
Anyway, below are some suggestions.
First, set things up:
% vim newfilenames.txt # list new names one per line corresponding to old names.
% wc -l newfilenames.txt # the same number of new names as files in ./oldfiles/
8 newfilenames.txt
% ls -1 oldfiles # 8 files rename these in order to list from newfilenames.txt
newSEACODI-sww2320H-sww24_07A_CP.9_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07B_CP.10_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07C_CP.11_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07D_CP.12_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07E_R.1_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07F_R.3_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07G_R.4_sww2320H_sww2403F.fsa
newSEACODI-sww2320H-sww24_07H_R.5_sww2320H_sww2403F.fsa
With files arranged as above, copy everything over:
perl -MFile::Copy -E 'opendir($dh , oldfiles); #newfiles=`cat newfilenames.txt`; chomp #newfiles; #oldfiles = sort grep(/^.+\..+$/, readdir $dh); END {for $i (0..$#oldfiles){copy("oldfiles/$oldfiles[$i]", "newfiles/$newfiles[$i]"); }}'
Not pretty: you have to grep andsort on #oldfiles to get rid of . .. and put the array elments in order. And there's always the risk that a typo could make a mess and it would be hard to figure out.
If you put the old and new names in a couple of files you could just do this with this with a shell script:
for i in `cat ../oldfilenames.txt` ; do ; done; for n in `cat ../newfilenames.txt`; do cp $i $n;
or just cd into the directory with the old files and do:
mkdir new
for i in * ; do ; done; for n in `cat ../newfilenames.txt`; do cp $i new/$n;
Good luck!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Best way to randomly access and modify sequential data in Perl - database

Related

Parsing unique data and renaming files

Creating array of file names using grep

Save all Data in array, Filter out duplicated Data, Compare Data between arrays and Removed the matched Data

Why is my script only accessing the first element in array?

uniquely rename each of many files using perl

Categories

Resources