Merging variables on Perl - arrays

I'm very new to perl. I get a very annoying problem trying to merge two variables.
When I do it on a simple script like:
$arg1 = "hello";
$arg2 = " world";
print $arg1.$arg2;
It seems to work fine.
But when I'm trying to make it a bit more complex (reading a file into array and then adding the variable), it seems to instead of adding the second variable, it's replacing the first characters of the first variable instead.
Here's the code:
#!/usr/bin/perl -w
use LWP::Simple;
use Parallel::ForkManager;
use vars qw( $PROG );
( $PROG = $0 ) =~ s/^.*[\/\\]//;
if ( #ARGV == 0 ) {
print "Usage: ./$PROG [TARGET] [THREADS] [LIST] [TIMEOUT]\n";
exit;
}
my $host = $ARGV[0];
my $threads = $ARGV[1];
my $weblist = $ARGV[2];
my $timeout = $ARGV[3];
my $pm = Parallel::ForkManager->new($threads);
alarm($timeout);
open(my $handle, "<", $weblist);
chomp(my #webservers = <$handle>);
close $handle;
repeat:
for $target (#webservers) {
my $pid = $pm->start and next;
print "$target$host\n";
get($target.$host);
$pm->finish;
}
$pm->wait_all_children;
goto repeat;
So if the text file (list) would look like this:
www.site.com?url=
www.site.net?url=
etc
And the host variable is domain.com.
So instead of having something like: www.site.com?url=domain.com, I keep having this: domain.comsite.com?url=.
It's replacing the first characters of the first variable with the second variable. I can't get my head around it.
Any kind of help would be appreciate as I'm sure I'm just missing a small thing that would make me feel stupid later.
Thanks ahead, have a good day!

Your input file probably contains the carriage return \r character. You can remove it by s/\r//g, or convert the input form MSWin to *nix by dos2unix or fromdos.

Related

Perl: Adding an exception to a foreach loop

I'm new to Stack Overflow and I would like to ask for some advice with regard to a minor problem I have with my Perl code.
In short, I have written a small programme that opens text files from a pre-defined array, then searches for certain strings in them and finally prints out the line containing the string.
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print ("\n");
print ("Search results for $_:\n\n");
open (F, $_) or die("Can't open file!\n");
while ($line = <F>) {
if ($line =~ /\$(voc)?[R|L]D|\$Rep|\/\//) {
print ($line);
}
}
}
close (F);
I was wondering whether it is possible to create an exception to the foreach loop, so that the line containing
print ("\n");
not be executed if the file is $S1A_SING_Files[0]. It should then be normally executed if the file is any of the following ones. Do you think this could be accomplished?
Thank you very much in advance!
Yes. Just add a check for the first file. Change:
print ("\n");
to:
print ("\n") if $_ ne $S1A_SING_Files[0];
If the array contains unique strings, you can use the following:
print("\n") if $_ ne $S1A_SING_Files[0]; # Different stringification than 1st element?
The following will work even if the array contains non-strings or duplicate values (and it's faster too):
print("\n") if \$_ != \$S1A_SING_Files[0]; # Different scalar than 1st element?
Both of the above could fail for magical arrays. The most reliable solution is to iterate over the indexes.
for my $i (0..$#S1A_SING_Files) {
my $file = $S1A_SING_Files[$i];
print("\n") if $i; # Different index than 1st element?
...
}
Your code can be written in following form
use strict;
use warnings;
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print "\n" unless $_ ne $S1A_SING_Files[0];
print "Search results for $_:\n\n";
open my $fh, '<', $_ or die("Can't open file!\n");
m!\$(voc)?[R|L]D|\$Rep|//! && print while <$fh>;
close $fh;
}

Split an array into two. Predetermined sizes, might not always be equal

I need to iterate over many files in a directory and split each file into two parts. I need to keep lines intact (I can't split on bite size). I also can't always assume that the file has an equal number of lines. I could use the "split" function, but am looking for a faster way of going through my files and to avoid the standard output names "xaa" and "xab" it generates.
The easiest would be to make two subsequent substrings of an array in the sizes specified ($number_of_group_one and $number_of_group_two). I can't find out how to do this. Instead I am trying to push the lines into different arrays- filling one up until a certain number of lines and then "spill over" into the other array until there are no more lines left to push. However, this approach yields two output arrays that both have exactly double the number of input lines. Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
my ($directory) = #ARGV;
my $dir = "$directory";
my #arrayoffiles = glob "$dir/*";
my #arrayoflines_one;
my #arrayoflines_two;
my $counter = 0;
foreach my $filename(#arrayoffiles){
my #arrayoflines_one;
my #arrayoflines_two;
my #lines = read_lines($filename);
my $NumberofLines = #lines;
my $number_of_group_one = int($NumberofLines/2);
my $number_of_group_two = ($NumberofLines - $number_of_group_one);
foreach my $line (#lines){
$counter++;
push (#arrayoflines_one, $line, "\n");
if ($counter == $number_of_group_one){
push (#arrayoflines_two, $line, "\n");
}
}
}
sub read_lines {
my ($file) = #_;
open my $in, '<', $file or die $!;
local $/ = undef; #slurps the whole file in as one
my $content = <$in>;
return split /\s/, $content;
close $in;
}
I hope this is clear. Thanks for your help!
This is a good use case for splice:
my #lines = read_lines($filename);
my #lines1 = splice #lines, 0, #lines/2;
will put (about) half of your lines from #lines into #lines1, removing them (and leaving about half of the lines) from #lines.

Perl: open files from array and read one by one

I have an array in Perl that looks like this:
my #dynfiles = ('dyn.abc.transcript', 'dyn.def.transcript', 'dyn.ghi.transcript', 'dyn.jkl.transcript');
I'm trying to open these files and read them one by one. For this I have a code that looks like this:
foreach my $dynfile (#dynfiles) {
print "$dynfile\n";
open my $fh , '<', $dynfile or die "Could not open file\n";
my %data;
$data{$dynfile} = do {
local $/ = undef;
while (my $line = <$fh>) {
chomp $line;
if ($line =~ m/Errors:\s+0/) {
print "Dyn run status: PASS\n";
} else {
print "Dyn Run status : FAIL\n";
}
}
close $fh;
}
}
And I get this error as output:
dyn.bxt.transcript
Dyn run status: FAIL
dyn.cnl.transcript
17:25:19 : -E- Could not open dyn.cnl.transcript
So my concern is that it isn't reading the files in the array at all. Also, this file dyn.bxt.transcript had this string Errors : 0 in it, but I still get Dyn run status: FAIL in the output.
Am I doing anything wrong here? I'm using a simple pattern match, not sure where the problem is..Kindly help.
Thanks in advance!
After viewing reading your code and debugging in chat, I would probably go with something like this:
sub dynamo_check {
opendir(my $dh, $log_file) or die "can't opendir $log_file: $!";
my #dynfiles = grep { /^dynamo.*transcript$/ && -f "$log_file/$_" } readdir($dh);
close $dh;
foreach my $dynamofile (#dynfiles) {
print "Checking file: $dynamofile\n";
open my $fh, '<', $log_file . $dynamofile or die "$!\n";
my $passed = 0;
while(my $line = <$fh>) {
if ($line =~ m/Errors\s*:\s*0/i) {
$passed = 1;
last;
}
}
if ( $passed == 1 ) {
print "Dynamo run status: PASS\n";
$data{$dynamofile} = "pass";
}else {
print "Dynamo run status: FAIL\n";
$data{$dynamofile} = "fail";
}
}
print Dumper(\%data);
}
Summary of changes:
Add on $! to get a better error message as reference in perlvar
Use grep and readdir to find the files you want to read instead of hard coding it.
Prepend the directory path to the file name when we open the files.
Remove the do block;
Set the values in %data to pass or fail.
No need to use chomp here.
No need to set local $/ = undef;, we can go through the lines one by one and break out of the while loop with last when we find the Errors line.
First, let Perl tell you why it couldn't open a file:
open my $fh , '<', $dynfile or die "Could not open file $!\n";
I notice that your error message references dyn.bxt.transcript and dyn.cnl.transcript that are not in your #dynfiles. It helps is you build a complete and minimal script with sample inputs.
Then you are undefining the input record separator, after which you use a while which should only ever return the single line of the entire file. That's typically a bad thing.
Next, if looks as if your pattern doesn't match the string Errors : 0 that has a space in it.
if ($line =~ m/Errors\s*:\s+0/) {
I'm not sure what you're doing with the do. That returns the last evaluated expression, which in your case is close $fh. But, that %data hash disappears at the end of each iteration of the block. Again, strip out everything that isn't part of investigating this problem.
You can reduce the code using some well tested modules. For example using one of my favourite Path::Tiny, you could write:
use 5.014;
use warnings;
use Path::Tiny;
my #dynfiles = map { "dyn.$_.transcript" } qw(abc def ghi jkl);
say "Dyn run status: ",
(path($_)->slurp =~ /error\s*:\s*0\b/i)
? "PASS"
: "FAIL"
for (grep {-f} #dynfiles);

Print regex match Perl

Hello I am in the process of making a program that matches a given set of keywords to a file.
I want to output the matched data to a text file and include the regex keyword that triggered the match.
Below is my code related to my issue:
my $counter = 0;
foreach($words)
{
while($line = <FILE>)
{
if($line =~ /$words/)
{
print "#array[$counter] $line\n";
print OUTPUT $line;
}
}
$counter ++;
}
This does not produce the expected outcome. It works perfectly for the first element in the array but for the rest it just simply prints the first one again. I believe the counter is not being incremented.
Is there a better / easier way to get the current element being used in the loop? or even get the current regex match?
The problem is that <FILE> exhausts the file for the first word. For the next word, <FILE> tries to read at the end of the file, which means the whole loop is skipped.
You can iterate over the words inside the loop over the file, or you can seek
back to the beginning of the file at the end of the loop.
Here is what you should do:
use strict;
use warnings;
use 5.016;
my $fname = 'data.txt';
my #patterns = (
'do.',
'.at',
'.ir.',
);
open my $INFILE, '<', $fname
or die "Couldn't read from $fname: $!";
while (my $line = <$INFILE>) {
for my $pattern (#patterns) {
if ($line =~ /($pattern)/) {
print "$pattern --> $1";
}
}
}
close $INFILE:
Putting parentheses around parts of the regex causes perl to set the match variables $1, $2, $3, etc., which contain the match for each parenthesized group.
$line will have a newline at the end of the line, so if you write print "$line\n", you will add another newline, so your output file will have blank lines between every line you print.

Perl - can't use string (...) as an array ref

I'm practicing Perl with a challenge from codeeval.com, and I'm getting an unexpected error. The goal is to iterate through a file line-by-line, in which each line has a string and a character separated by a comma, and to find the right-most occurrence of that character in the string. I was getting wrong answers back, so I altered the code to print out just variable values, when I got the following error:
Can't use string ("Hello world") as an ARRAY ref while "strict refs" in use at char_pos.pl line 20, <FILE> line 1.
My code is below. You can see a sample from the file in the header. You can also see the original output code, which was incorrectly only displaying the right-most character in each string.
#CodeEval challenge: https://www.codeeval.com/open_challenges/31/
#Call with $> char_pos.pl numbers
##Hello world, d
##Hola mundo, H
##Keyboard, b
##Connecticut, n
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
open FILE, $path or die $!;
my $len;
while(<FILE>)
{
my #args = split(/,/,$_);
$len = length($args[0]) - 1;
print "$len\n";
for(;$len >= 0; $len--)
{
last if $args[0][$len] == $args[1];
}
#if($len > -1)
#{
# print $len, "\n";
#}else
#{
# print "not found\n";
#}
}
EDIT:
Based on the answers below, here's the code that I got to work:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
open my $fh,"<",shift;
while(my $line = <$fh>)
{
chomp $line;
my #args = split(/,/,$line);
my $index = rindex($args[0],$args[1]);
print $index>-1 ? "$index\n" : "Not found\n";
}
close $fh;
It looks like you need to know a bit about Perl functions. Perl has many functions for strings and scalars and it's not always possible to know them all right off the top of your head.
However, Perl has a great function called rindex that does exactly what you want. You give it a string, a substring (in this case, a single character), and it looks for the first position of that substring from the right side of the string (the index does the same thing from the left hand side.)
Since you're learning Perl, it may be a good idea to get a few books on Modern Perl and standard coding practices. This way, you know newer coding techniques and the standard coding practices.
Modern Perl - Gives you newer programming help.
Learning Perl - An old standard.
Perl Best Practices - The standard coding practices.
Here's a sample program:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use feature qw(say);
open my $fh, "<", shift;
while ( my $line = <$fh> ) {
chomp $line;
my ($string, $char) = split /,/, $line, 2;
if ( length $char != 1 or not defined $string ) {
say qq(Invalid line "$line".);
next;
}
my $location = rindex $string, $char;
if ( $location != -1 ) {
say qq(The right most "$char" is at position $location in "$string".);
}
else {
say qq(The character "$char" wasn't found in line "$line".)";
}
close $fh;
A few suggestions:
use autodie allows your program to automatically die on bad open. No need to check.
Three parameter open statement is now considered de rigueur.
Use scalar variables for file handles. They're easier to pass into subroutines.
Use lexically scoped variables for loops. Try to avoid using $_.
Always do a chomp after a read.
And most importantly, error check! I check the format of the line to make sure that's there is only a single comma, and that the character I'm searching for is a character. I also check the exit value of rindex to make sure it found the character. If rindex doesn't find the character, it returns a -1.
Also know that the first character in a line is 0 and not 1. You may need to adjust for this depending what output you're expecting.
Strings in perl are a basic type, not subscriptable arrays. You would use the substr function to get individual characters (which are also just strings) or substrings from them.
Also note that string comparison is done with eq; == is numeric comparison.
while($i=<DATA>){
($string,$char)=split(",",$i);
push(#str,$string);}
#join=split("",$_), print "$join[-1]\n",foreach(#str);
__DATA__
Hello world, d
Hola mundo, H
Keyboard, b
Connecticut, n

Resources