Perl: open files from array and read one by one - arrays

I have an array in Perl that looks like this:
my #dynfiles = ('dyn.abc.transcript', 'dyn.def.transcript', 'dyn.ghi.transcript', 'dyn.jkl.transcript');
I'm trying to open these files and read them one by one. For this I have a code that looks like this:
foreach my $dynfile (#dynfiles) {
print "$dynfile\n";
open my $fh , '<', $dynfile or die "Could not open file\n";
my %data;
$data{$dynfile} = do {
local $/ = undef;
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/Errors:\s+0/) {
print "Dyn run status: PASS\n";
} else {
print "Dyn Run status : FAIL\n";
}
}
close $fh;
}
}
And I get this error as output:
dyn.bxt.transcript
Dyn run status: FAIL
dyn.cnl.transcript
17:25:19 : -E- Could not open dyn.cnl.transcript
So my concern is that it isn't reading the files in the array at all. Also, this file dyn.bxt.transcript had this string Errors : 0 in it, but I still get Dyn run status: FAIL in the output.
Am I doing anything wrong here? I'm using a simple pattern match, not sure where the problem is..Kindly help.
Thanks in advance!

After viewing reading your code and debugging in chat, I would probably go with something like this:
sub dynamo_check {
opendir(my $dh, $log_file) or die "can't opendir $log_file: $!";
my #dynfiles = grep { /^dynamo.*transcript$/ && -f "$log_file/$_" } readdir($dh);
close $dh;
foreach my $dynamofile (#dynfiles) {
print "Checking file: $dynamofile\n";
open my $fh, '<', $log_file . $dynamofile or die "$!\n";
my $passed = 0;
while(my $line = <$fh>) {
if ($line =~ m/Errors\s*:\s*0/i) {
$passed = 1;
last;
}
}
if ( $passed == 1 ) {
print "Dynamo run status: PASS\n";
$data{$dynamofile} = "pass";
}else {
print "Dynamo run status: FAIL\n";
$data{$dynamofile} = "fail";
}
}
print Dumper(\%data);
}
Summary of changes:
Add on $! to get a better error message as reference in perlvar
Use grep and readdir to find the files you want to read instead of hard coding it.
Prepend the directory path to the file name when we open the files.
Remove the do block;
Set the values in %data to pass or fail.
No need to use chomp here.
No need to set local $/ = undef;, we can go through the lines one by one and break out of the while loop with last when we find the Errors line.

First, let Perl tell you why it couldn't open a file:
open my $fh , '<', $dynfile or die "Could not open file $!\n";
I notice that your error message references dyn.bxt.transcript and dyn.cnl.transcript that are not in your #dynfiles. It helps is you build a complete and minimal script with sample inputs.
Then you are undefining the input record separator, after which you use a while which should only ever return the single line of the entire file. That's typically a bad thing.
Next, if looks as if your pattern doesn't match the string Errors : 0 that has a space in it.
if ($line =~ m/Errors\s*:\s+0/) {
I'm not sure what you're doing with the do. That returns the last evaluated expression, which in your case is close $fh. But, that %data hash disappears at the end of each iteration of the block. Again, strip out everything that isn't part of investigating this problem.

You can reduce the code using some well tested modules. For example using one of my favourite Path::Tiny, you could write:
use 5.014;
use warnings;
use Path::Tiny;
my #dynfiles = map { "dyn.$_.transcript" } qw(abc def ghi jkl);
say "Dyn run status: ",
(path($_)->slurp =~ /error\s*:\s*0\b/i)
? "PASS"
: "FAIL"
for (grep {-f} #dynfiles);

Related

Perl: Adding an exception to a foreach loop

I'm new to Stack Overflow and I would like to ask for some advice with regard to a minor problem I have with my Perl code.
In short, I have written a small programme that opens text files from a pre-defined array, then searches for certain strings in them and finally prints out the line containing the string.
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print ("\n");
print ("Search results for $_:\n\n");
open (F, $_) or die("Can't open file!\n");
while ($line = <F>) {
if ($line =~ /\$(voc)?[R|L]D|\$Rep|\/\//) {
print ($line);
}
}
}
close (F);
I was wondering whether it is possible to create an exception to the foreach loop, so that the line containing
print ("\n");
not be executed if the file is $S1A_SING_Files[0]. It should then be normally executed if the file is any of the following ones. Do you think this could be accomplished?
Thank you very much in advance!
Yes. Just add a check for the first file. Change:
print ("\n");
to:
print ("\n") if $_ ne $S1A_SING_Files[0];
If the array contains unique strings, you can use the following:
print("\n") if $_ ne $S1A_SING_Files[0]; # Different stringification than 1st element?
The following will work even if the array contains non-strings or duplicate values (and it's faster too):
print("\n") if \$_ != \$S1A_SING_Files[0]; # Different scalar than 1st element?
Both of the above could fail for magical arrays. The most reliable solution is to iterate over the indexes.
for my $i (0..$#S1A_SING_Files) {
my $file = $S1A_SING_Files[$i];
print("\n") if $i; # Different index than 1st element?
...
}
Your code can be written in following form
use strict;
use warnings;
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print "\n" unless $_ ne $S1A_SING_Files[0];
print "Search results for $_:\n\n";
open my $fh, '<', $_ or die("Can't open file!\n");
m!\$(voc)?[R|L]D|\$Rep|//! && print while <$fh>;
close $fh;
}

Perl user input file searching

I am currently trying to take user input in for a file name and then search for that file. The program has to terminate gracefully if it isn't found and then continue if it is. For some reason from the research I found, the "-e" function isn't working for me. I am on a mac if that makes a difference although I have the shabang.
#!/usr/bin/perl
use strict;
print "Enter the name of a file: ";
my $userInput = <STDIN>;
my $fileName = '/' . $userInput;
if(-e $fileName) {
print "File exist.\n";
die();
} else {
print "File doesnt exist.\n";
die();
}
Never ends up finding the file if it is named right or not.
The problem is that you are also getting the newline as part of the filename when you hit enter key. You can notice that if you print $filename
You can get rid of it by using chomp function after getting the input:
chomp($userInput);
Also, I'm not sure if you actually want to check for the file in the root directory or in current directory. If it is in the current maybe you missed a dot before the slash:
'./' . $userInput;
With this two changes your code should look like this:
#!/usr/bin/perl
use strict;
print "Enter the name of a file: ";
my $userInput = <STDIN>;
chomp($userInput);
my $fileName = './' . $userInput;
if(-e $fileName) {
print "File '$fileName' exist.\n";
die();
} else {
print "File '$fileName' doesnt exist.\n";
die();
}

Print regex match Perl

Hello I am in the process of making a program that matches a given set of keywords to a file.
I want to output the matched data to a text file and include the regex keyword that triggered the match.
Below is my code related to my issue:
my $counter = 0;
foreach($words)
{
while($line = <FILE>)
{
if($line =~ /$words/)
{
print "#array[$counter] $line\n";
print OUTPUT $line;
}
}
$counter ++;
}
This does not produce the expected outcome. It works perfectly for the first element in the array but for the rest it just simply prints the first one again. I believe the counter is not being incremented.
Is there a better / easier way to get the current element being used in the loop? or even get the current regex match?
The problem is that <FILE> exhausts the file for the first word. For the next word, <FILE> tries to read at the end of the file, which means the whole loop is skipped.
You can iterate over the words inside the loop over the file, or you can seek
back to the beginning of the file at the end of the loop.
Here is what you should do:
use strict;
use warnings;
use 5.016;
my $fname = 'data.txt';
my #patterns = (
'do.',
'.at',
'.ir.',
);
open my $INFILE, '<', $fname
or die "Couldn't read from $fname: $!";
while (my $line = <$INFILE>) {
for my $pattern (#patterns) {
if ($line =~ /($pattern)/) {
print "$pattern --> $1";
}
}
}
close $INFILE:
Putting parentheses around parts of the regex causes perl to set the match variables $1, $2, $3, etc., which contain the match for each parenthesized group.
$line will have a newline at the end of the line, so if you write print "$line\n", you will add another newline, so your output file will have blank lines between every line you print.

Perl script problems

The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016

How do I read files from an array in perl?

I have read the directory with the files into an array. Now the problem is I would like to open the files and display their contents with a line in between them.
For example, I would like to open file1.txt and file2.txt, and display their contents like this:
Hello nice to meet you -- file 1
How are you? -- file 2
The code is :
sub openFile{
opendir(fol, "folder/details");
my #files= readdir(fol);
my #FilesSorted = sort(#files);
foreach my $EachFile (#FilesSorted) {
print $EachFile . "\n";
}
}
If you just want to display all the lines in the files (as your code seems to be trying to do), without any indication of which line(s) came from which file, there's a trick involving the pre-defined variable #ARGV:
sub openFile {
opendir(fol, "folder/details");
#ARGV = sort(readdir(fol));
close fol;
while (<>) {
print "$_\n";
}
}
If you need to print the file names, you'll have to open each file explicitly:
sub openFile {
opendir(fol, "folder/details");
my #files = sort(readdir(fol));
close fol;
while ($file = shift #files) {
open(FILE, $file);
while (<FILE>) {
print "$_\n";
}
close FILE;
}
}
Another example, with some error checking, and skipping sub dirs:
sub print_all_files {
my $dir = shift;
opendir(my $dh, $dir) || die "Can't read [$dir]: $!";
while(defined(my $file = readdir $dh)) {
next unless -f "$dir/$file"; # Ignore subdirs and . and ..
open(my $fh, "<", "$dir/$file") || die "Can't read [$dir/$file]: $!";
print while readline($fh);
print "\n"; # add an extra line
}
}
print_all_files("folder/details");
Try
sub openFile{
opendir(fol, "folder/details");
my #files= readdir(fol);
my #FilesSorted = sort(#files);
foreach my $EachFile (#FilesSorted) {
if($EachFile ne "." && $EachFile ne ".." && !(-d $EachFile)) { #important to skip the directories
open IT $EachFile || die "unable to read ".$EachFile."\n"; # open file
while($line = <IT>) { # print content in file
print "$line\n";
}
close(IT); # close file
print "-->$EachFile\n"; # print file name
}
}
Can this be a stand-alone program?
#! /usr/bin/env perl
use strict;
use warnings;
die "$0: no arguments allowed\n" if #ARGV;
my $dir = "folder/details";
opendir my $dh, $dir or die "$0: opendir $dir: $!";
while (defined(my $file = readdir $dh)) {
my $path = $dir . "/" . $file;
push #ARGV, $path if -f $path;
}
#ARGV = sort #ARGV;
while (<>) {
print;
}
continue {
print "\n" if eof && #ARGV;
}
Notes
The defined check on each value returned from readdir is necessary to handle a file whose name is a false value in Perl, e.g., 0.
Your intent is to write the contents of the files in folder/details, so it's also necessary to filter for plain files with the -f test.
Perl's built-in processing in while (<>) { ... } shifts arguments off the front of #ARGV. The eof—without parentheses!—test in the continue clause detects the end of each file and prints newlines as separators between files, not terminators.
You can simplify the directory reading code by using a module like File::Util. It, or some other module like it, will provide several conveniences: error checking; filtering out unwanted directory contents (like the . and .. subdirs); selecting contents by type (for example, just files or dirs); and attaching the root path to the contents.
use strict;
use warnings;
use File::Util qw();
my $dir = 'folder/details';
my $file_util = File::Util->new;
my #files = $file_util->list_dir($dir, qw(--with-paths --no-fsdots --files-only));
for my $f (#files){
local #ARGV = ($f);
print while <>;
print "\n";
}
If you prefer to avoid using other modules, you can get the file names like this:
opendir(my $dh, $dir) or die $!;
my #files = grep -f, map("$dir/$_", readdir $dh);
Following up Greg Bacon's answer (re: "Can this be a stand-alone program?"), if you still want this in an openFile subroutine, you could use the same loop there:
sub openFile{
opendir(fol, "folder/details");
my #files= readdir(fol);
my #FilesSorted = sort(#files);
local #ARGV = #FilesSorted;
while (<>) {
print;
}
continue {
print "\n" if eof && #ARGV;
}
}
Note use of local #ARGV (as in FMc's answer): this preserves any global argument list from outside the subroutine and restores that value on exit.

Resources