array elements gets deleted when looping files - loops

I have a problem with looping through file names, my input array elements gets deleted.
CODE:
use Data::Dumper;
use warnings;
use strict;
my #files = ("file1", "file2", "file3");
print Dumper(\#files);
for (#files) {
my $filename = $_ . '.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while(<$fh>) {
print "$filename read line \n";
}
}
print Dumper(\#files);
OUTPUT:
$VAR1 = [
'file1',
'file2',
'file3'
];
file1.txt read line
file2.txt read line
file3.txt read line
$VAR1 = [
undef,
undef,
undef
];
FILE CONTENTS:
cat file1.txt
asdfsdfs
cat file2.txt
iasdfasdsf
cat file3.txt
sadflkjasdlfj
Why does the array contents get deleted?
(I have 2 different workarounds for the problem, but I would like to understand what's the problem with this code.)

while (<$fh>)
is short for
while ($_ = <$fh>)
so you are clobbering $_ which is aliased to an element of #files. You need to protect $_ as follows:
while (local $_ = <$fh>)
Better yet, use a different variable name.
while (my $line = <$fh>)

You're using $_ in two different ways inside of the loop (as the current filename and as the current line) and they're clobbering each other. Don't do this. Name your variables, e.g.:
for my $file (#files) {
...
while(my $line = <$fh>) {
...
}
}
You can imagine that your current code does this after reading each file:
for (#files) {
undef $_;
}

Related

Save all Data in array, Filter out duplicated Data, Compare Data between arrays and Removed the matched Data

I have some problems regarding my script.
The problems are:
The value of $str or #matchedPath sometimes blank when I print out. It is not random, it happen only to certain Path in the table.txt file, which I can't figure it out, why?
How to print like the outcome, because I can't find the correct file location or directory of table.txt file because I have put all the path location in an array, filtered it and compared with the matched correct file location of table.txt, because of this, some location is missing when printed out.
Example path that the /home/is/latest/table.txt files contain, the bold texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
Example path that the /home/are/latest/table.txt files contain, the middle texts is the wanted path in table.txt,
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
e.g. list.txt file contains,
rty/b
uio/b/c
qwe/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
Expected outcome:
Unmatched Path : b/c/d/g
table.txt file location: /home/is/latest/table.txt
Unmatched Path : b/c/d/j
table.txt file location: /home/are/latest/table.txt
Below is my detailed script,
#!/usr/perl/5.14.1/bin/perl
# I want to make a script that automatically compare the path in table.txt with list.txt
#table.txt files is located under a parent directory and it differs in the subdirectory.
#There is about 10 table.txt files and each one of it need to compare with list.txt
#The objective is to print out the path that are not in the list.txt
use strict;
use warnings;
use Switch;
use Getopt::Std;
use Getopt::Long;
use Term::ANSIColor qw(:constants);
use File::Find::Rule;
use File::Find;
use File::Copy;
use Cwd;
use Term::ANSIColor;
my $path1='/home'; #Automatically search all table.txt file in this directory even in subdirectory
my $version='latest'; #search the file specified subdirectory e.g. /home/is/latest/table.txt and /home/are/latest/table.txt
my $path2='/list.text'; #there is about 10 table.txt files which contain specified paths in it.
$path1 =~ s/^\s+|\s+$//g;
$version =~ s/^\s+|\s+$//g;
$path2 =~ s/^\s+|\s+$//g;
my #files = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my #symlink_dirs = File::Find::Rule->directory->symlink->in($path1); #If the directory is a symlink, in my case 'latest' is a symlink directory
print colored (sprintf ("\n\n\tSUMMARY REPORT"),'bold','magenta');
print "\n\n_______________________________________________________________________________________________________________________________________________________\n\n";
if ($version eq "latest")
{
foreach my $dir (#symlink_dirs)
{
my #filess = File::Find::Rule->file()
->name( 'table.txt' )
->in( "$path1" );
my $symDir=($dir."/"."table.txt");
$symDir =~ s/^\s+|\s+$//g;
my $wantedPath=$symDir;
my $path_1 = $wantedPath;
function($path_1);
}
}
else
{
for my $file (#files)
{
if ($file =~ m/.*$version.*/)
{
my $wantedPath=$file;
my $path_1 = $wantedPath;
function($path_1);
}
}
}
sub function
{
my $path_1 = $_[0];
open DATA, '<', $path_1 or die "Could not open $path_1: $!";
my $path_2 = "$path2";
open DATA1, '<', $path_2 or die "Could not open $path_2: $!";
################# FOCUSED PROBLEM AREA ##############################
my #matchedPath;
my #matched_File_Path;
my #unmatchedPath;
my #unmatched_File_Path;
my #s2 = <DATA1>;
while(<DATA>)
{
my $s1 = $_;
if ($s1 =~ /^#.*/)
{
next;
}
if ($s1 =~ /(.*)\s+(.*)\s+(.*)\s+/)
{
my $str=($2);
$str =~ s/\s+//g;
for my $s2 (#s2)
{
if ($s2 =~ /.*$str/)
{
push #matchedPath,$str;
push #matched_File_Path,$path_1;
print "matched Path: $str\n\t$path_1\n"; #I don't understand, sometimes I get empty $str value in this. Can anyone help me?
last;
}
else
{
#print "unmatch:$str\n\t$path_1\n";
push #unmatchedPath,$str;
#unmatched_File_Path,$path_1;
}
}
}
}
foreach (#unmatchedPath)
{print "unmatch path: $_\n";}
foreach (#matchedPath)
{print "\nmatch path: $_\n\n";}
foreach (#unmatched_File_Path)
{print "unmatch File Path: $_\n";}
foreach (#matched_File_Path)
{print "match File Path: $_\n";}
my #filteredUnmatchedPath = uniq(#unmatchedPath);
my #filteredUnmatched_IP_File_Path =uniq(#unmatched_IP_File_Path);
#filteredUnmatchedPath = grep {my $filteredPath = $_; not grep $_ eq $filteredPath, #matchedPath} #filteredUnmatchedPath;
}
print "#filteredUnmatchedPath\n";
print "#filteredUnmatched_IP_File_Path\n";
sub uniq
{
my %seen;
grep !$seen{$_}++, #_;
}
close(DATA);
close(DATA1);
print "_________________________________________________________________________________________________________________________________________________________\n\n";
I think using hashes is much simpler here
here's what I tried:
you will have to replace #all_path with your array containing every path where table is present
use strict;
use warnings;
my #all_path =("some/location/table.txt","some/location_2/table.txt");
my %table_paths;
my %list_paths;
foreach my $path (#all_path)
{
open (my $table, "<", $path) or die ("error opening file");
#we create hash, each key is a path
while (<$table>)
{
chomp;
#only process lines starting with "a" as it seems to be the format of this file
$table_paths{(split)[1]}=$path if (/^a/); #taking the 2nd element in each line
}
close $table;
}
open (my $list, "<", "list.txt") or die ("error opening file");
#we create hash, each key is a path
while (<$list>)
{
chomp;
$list_paths{$_}=1;
}
close $list;
#now we delete from table_paths common keys with list, that lefts unmathed
foreach my $key (keys %table_paths)
{
delete $table_paths{$key} if (grep {$_ =~ /$key$/} (keys %list_paths));
}
#printing unmatched keys
print "unmatched :$_\nlocation: $table_paths{$_}\n\n" foreach keys %table_paths;
inputs
in some/location/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d B
a b/c/d/e B
a b/c/d/e/f B
a b/c/d/g B
in some/location_2/table.txt
##WHAT PATH IS_THAT,Backup
a b/c/d/j B
in list.txt
rty/b
uio/b/c
qwe/dummyName/b/c/d
asd/b/c/d/e
zxc/b/c/d/e/f
vbn/c/d/e
fgh/j/k/l
output:
unmatched: b/c/d/g
location: some/location/table.txt
unmatched: b/c/d/j
location: some/location_2/table.txt

How to get the data of each line from a file?

Here, I want to print the data in each line as 3 separate values with ":" as separator. The file BatmanFile.txt has the following details:
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
And the output I expected was:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
The output after executing the script was:
Bruce
Batman
bat#bat.com
Bruce
Batman
bat#bat.com
Please explain me what I am missing here:
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
push(#resultarray, #linearray);
print join("\n",$resultarray[0]),"\n";
print join("\n",$resultarray[1]),"\n";
print join("\n",$resultarray[2]),"\n";
}
close $info;
You are looping through file line by line. You have stored all lines (after splitting) in an array. Once the loop finishes you have all data in resultarray array, just print whole array after the loop (instead of printing just first 3 indexes which are you doing at the moment).
#!/usr/bin/perl
use strict;
use warnings;
my #resultarray;
while( my $line = <DATA>){
chomp $line;
my #linearray = split(":", $line);
push #resultarray, #linearray;
}
print "$_\n" foreach #resultarray;
__DATA__
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
Demo
You can avoid all variables and do something like below
while(<DATA>){
chomp;
print "$_\n" foreach split(":");
}
One liner:
perl -e 'while(<>){chomp; push #result, split(":",$_);} print "$_\n" foreach #result' testdata.txt
When you do:
push(#resultarray, #linearray);
you're pushing #linearray into #resultarray at the end, so index 0 through 2 is still the items from the first time you pushed #linearray.
To overwrite #resultarray with the values from the second iteration, do:
#resultarray = #linearray;
instead.
Alternatively, use unshift to place #linearray at the start of #resultarray, as suggested by Sobrique:
unshift #resultarray, #linearray;
So, you just want to transliterate : to \n?
$ perl -pe 'tr/:/\n/' data.txt
Output:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
#push(#resultarray, #linearray);
print join("\n",$linearray[0]),"\n";
print join("\n",$linearray[1]),"\n";
print join("\n",$linearray[2]),"\n";
}
close $info;

I need help around file access and modification in Perl

I have the folder "segmentation" where i need the use of ".purseg" files(x.purseg,y.purseg,z.purseg). They are kind of text files.
Their form is:
'0.1 4.5 speech_L1'
'4.7 9.2 speech_L2'
etc.
I also have the folder audio where i have the "audio": x.wav,y.wav,z.wav.
Each ".purseg" file matches a ".wav" file,they both have the same name.
For my script i have to get the information from the ".purseg" file and based on it i have to cut from the wav file the part that i need(get the speaker mentioned as speech_L2).I made a script that works if i have both ".purseg" and ".wav" file in the same folder but because i am working with a lot of data i need to fix my script in order to work with folders. Here is the script:
#! /usr/bin/perl -w
use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
my $directory = '/home/taurete/Desktop/diar_fem_fin/segmentation/';
opendir (DIR, $directory) or die $!;
while (my $file = readdir(DIR))
{
next unless ($file =~ m/\.purseg$/);
$file =~ s{\.[^.]+$}{};
push (#list1, $file);
# print "$file\n";
}
my $list=#list1;
# print "$list";
$i=0;
while ($i<$list)
{
my $nume1=$list[$i];
open my $fh, "$nume1.purseg" or die $!;
my #file_array;
while (my $line = <$fh>)
{
chomp $line;
my #line_array = split(/\s+/, $line);
push (#file_array, \#line_array);
}
my #arr=#file_array;
$cont1=0;
my $s1= #arr;
for (my $i=0;$i < $s1;$i++)
{
$directory="$nume1";
make_path($directory);
if ("speech_L2" eq "$arr[$i][2]")
{
my $directory = '/home/taurete/Desktop/data/audio/';
opendir (DIR, $directory) or die $!;
$interval = $arr[$i][1] - $arr[$i][0];
$speakername=$nume1._.$cont1;
`sox $nume1.wav ./$directory/$speakername.wav trim $arr[$i][0] $interval`;
$cont1++;
}
}
$i++;
}
Here is what i get:
Name "main::list" used only once: possible typo at ./spkfinal.pl line
23. Use of uninitialized value $nume1 in concatenation (.) or string at ./spkfinal.pl line 27. No such file or directory at ./spkfinal.pl
line 27.
To answer your question about Name "main::list" used only once: possible typo at ./spkfinal.pl line 23., change:
my $nume1=$list[$i];
to:
my $nume1=$list1[$i];
You do not have an array #list, but you do have an array #list1.
I think that will clear up your subsequent warnings, too.

Compare MD5 from files in a directory against an array (perl)

I was checking out this link here: How could I write a Perl script to calculate the MD5 sum of every file in a directory?
It gets the md5 of each file in a specified directory. What i want to do is take those md5's and compare them against an array. This is what i have so far.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
if ( -d $file || !-r $file ) { next; }
open( my $FILE, $file );
binmode($FILE);
print Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if (#array ~~ $FILE) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' )
But with this, i always get doesnt match array no matter if it does match the array perfectly. I can print #array and it will even show the same md5 values of the file. But like i said it just always says "doesnt match array". ive never got it to say "matches array" on any file. Thank you for looking :)
EDIT:
This is what i have now.
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
my $dirname = "./";
opendir( DIR, $dirname );
my #files = readdir(DIR);
closedir(DIR);
print "#files\n";
foreach my $file (#files) {
next if -d $file || !-r $file;
open( my $FILE, $file );
binmode($FILE);
#print digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
Sdigest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
my #array = ('667fc8db8e5519cacbf8f9f2af2e0b08');
if($digest eq $array[0]) {
print "matches array", "\n";
} else {
print "doesnt match array", "\n";
}
}
system ( 'pause' );
Thanks to all for your help. You guys are awesome ;)
Please do not use smartmatch ~~. It was declared experimental in the latest release of Perl, and the semantics are likely to change in the future.
The best solution is to create a hash of the fingerprints you know:
my %fingerprints;
$fingerprints{"667fc8db8e5519cacbf8f9f2af2e0b08"} = undef;
If you want to load a whole array of fingerprints into the hash so that we can easily test for existence, you can use a hash slice:
#fingerprints{#array} = ();
Next, we store the fingerprint of the current file in a variable:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest;
Then we test if that $digest exists in the hash of fingerprints:
if (exists $fingerprints{$digest}) {
print "$digest for <$file> -- FOUND\n";
}
else {
print "$digest for <$file>\n";
}
Using a hash is usually faster than looping through an array (If you do multiple lookups).
Suggested complete program:
use strict;
use warnings;
use feature qw< say >;
use autodie; # automatic error handling
use Digest::MD5;
my ($dirname, $fingerprint_file) = #ARGV; # takes two command line arguments
length $dirname or die "First argument must be a directory name\n";
length $fingerprint_file or die "Second argument must be a file with fingerprints\n";
# load the fingerprints
my %fingerprints;
open my $fingerprints_fh, "<", $fingerprint_file;
while (<$fingerprints_fh>) {
chomp;
$fingerprints{$_} = undef;
}
close $fingerprints_fh;
opendir my $directory, $dirname;
while(my $file = readdir $directory) {
next if not -f $file;
open my $fh, "<:raw", "$dirname/$file";
my $digest = Digest::MD5->new->addfile($fh)->hexdigest;
close $fh;
if (exists $fingerprints{$digest}) {
say qq($digest "$file" -- FOUND);
}
else {
say qq($digest "$file");
}
}
closedir $directory;
Example invocation
> perl script.pl . digests.txt
Perhaps the following will be helpful:
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
use File::Basename;
my $dirname = './';
my %MD5s = (
'667fc8db8e5519cacbf8f9f2af2e0b08' => 1,
'8c0452b597bc2c261ded598a65b043b9' => 1
);
for my $file ( grep { !-d and -r } <$dirname*> ) {
open my $FILE, '<', $file or die $!;
binmode $FILE;
my $md5hexdigest = Digest::MD5->new->addfile($FILE)->hexdigest;
close $FILE;
print basename ($file), " md5hexdigest $md5hexdigest ";
if ( $MD5s{$md5hexdigest} ) {
print "matches hash", "\n";
}
else {
print "doesn't match hash", "\n";
}
}
Sample output:
XOR_String_Match.pl md5hexdigest 8c0452b597bc2c261ded598a65b043b9 matches hash
zipped.txt md5hexdigest d41d8cd98f00b204e9800998ecf8427e doesn't match hash
Like this:
my $digest = Digest::MD5->new->addfile($FILE)->hexdigest, " $file\n";
then
if($digest eq $array[0])
By the way, it would maybe be slightly more idiomatic to say (earlier on in your code):
next if -d $file || !-r $file;

problem with the code in perl

My problem is that I am not able to figure out that why my code is taking each of the line from the file as one element of an array instead of taking the whole record starting from AD to SS as one element of the array. As you can see that my file is starting from AD and ending at SS which is same for all the followed lines in the data. But I want to make the array having elements starting from AD to SS which will be having all the lines in between AD to SS that is BC....,EG...., FA.....etc.not each line as an element. I tried my way and get the same file as such.Could anyone check my code. Thanks in advance.
AD uuu23
BC jjj
EG iii
FA vvv
SS
AD hhh25
BC kkk
EG ppp
FA aaa
SS
AD ttt26
BC xxx
FA rrr
SS
#!/usr/bin/env perl
use strict;
use warnings;
my $ifh;
my $line = '';
my #data;
my $ifn = "fac.txt";
open ($ifh, "<$ifn") || die "can't open $ifn";
my $a = "AD ";
my $b = "SS ";
my $_ = " ";
while ($line = <$ifh>)
{
chomp
if ($line =~ m/$a/g); {
$line = $_;
push #data, $line;
while ($line = <$ifh>)
{
$line .= $_;
push #data, $line;
last if
($line =~ m/$b/g);
}
}
push #data, $line; }
print #data;
If I understand correctly your problem, the fact is that the way you are reading the file:
while ($line = <$ifh>)
is inherently a line-by-line approach. It uses the content of the "line termination variable" ($/) to understand where to split lines. One easy way to change this behavior is un-defining the $/:
my $oldTerminator = $/;
undef $/;
....... <your processing here>
$/ = $oldTerminator;
so, your file would be just one line, but I am not sure what would happen of your code.
Another approach is the following (keeping in mind what I said about the fact that you are reading the file line-by-line): instead of doing
`push #data, $line;`
at each iteration of your loop, just accumulate the lines you read in a variable
$line .= $_;
(like you already do), and do the push only at the end, just once. Actually, this second approach will be more easily applicable to your code (you only have to remove the two push statements you have and put one outside of the loop).
I believe part of your problem is here
chomp
if ($line =~ m/$a/g);
it should be
chomp;
if ($line =~ m/$a/g)
otherwise the if statement is always executed. Please update your question if this has helped you advance
Here's a way to accomplish reading the records into an array, with newlines removed:
Code:
use strict;
use warnings;
use autodie;
my #data;
my $record;
my $file = "fac.txt";
open my $fh, '<', $file;
while (<$fh>) {
chomp;
if (/^AD /) { # new record starts
$record = $_;
while (<$fh>) {
chomp;
$record .= $_;
last if /^SS\s*/;
}
push #data, $record;
} else { die "Data outside record: $_" }
}
use Data::Dumper;
print Dumper \#data;
Output:
$VAR1 = [
'AD uuu23BC jjjEG iiiFA vvvSS',
'AD hhh25BC kkkEG pppFA aaaSS',
'AD ttt26BC xxxFA rrrSS'
];
This is another version, using the input record separator $/:
use strict;
use warnings;
use autodie;
my $file = "fac.txt";
open my $fh, '<', $file;
my #data;
$/ = "\nSS";
while (<$fh>) {
s/\n//g;
push #data, $_;
}
use Data::Dumper;
print Dumper \#data;
Produces the same output with this data. It does not care about the record start characters, only the end, which is SS at the beginning of a line.

Resources