Using Parallel::ForkManager in foreach loop - arrays

I am just learning Perl as a fourth language.
My wish is to use Parallel::ForkManager to speed up a foreach loop using an array whose members are taken from a text file.
Basically I am testing a .txt file of URLs, and wish to make it so that it will test multiple members of the array at once, not one at a time (five at a time in this instance) and without spamming the same URL inadvertently DoSing it.
Would something like this do the trick?
$limit = new Parallel::ForkManager(5);
foreach (#lines) {
$limit->start and next;
$lines = $_;
... do processing here ...
$limit->finish;
}
or would it be the equivalent of running that loop 5 times making a small multithreaded DoS script?

It isn't too clear from the documentation, but
A call to start will block in the parent process until there are fewer children running than the limit specified. Then it will return the (non-zero) child PID in the parent, and zero in the child
A child process can see all the data in the parent process as it was when the start was called. The data is presumably copy-on-write, as the child may modify it but the changes aren't reflected in any other process's workspace
The $pm->start and next idiom may seem a little obscure. Essentially it skips the rest of the loop if the start method returns a true value. I prefer something like my $pid = $fm->start; next if $pid; or the if construct in the code below. Both do the same thing, but I think more legibly
I recommend that you experiment with this simpler application, which uses a cache of five child threads to print the numbers from zero to nine.
use strict;
use warnings;
use Parallel::ForkManager;
STDOUT->autoflush;
my $fm = Parallel::ForkManager->new(5);
for my $i (0 .. 9) {
my $pid = $fm->start;
if ($pid == 0) {
print "$i\n";
sleep 2;
$fm->finish;
}
}

To test, use a safe local process like print or write to avoid spamming the URL's. Here's a working snippet from a program I wrote that uses the fork manager.
my $pm=new Parallel::ForkManager(20);
foreach $add (#adds){
$pm->start and next;
#if email is invalid move on
if (!defined(Email::Valid::Loose->address($add))){
writeaddr(*BADADDR, $add); #address is bad
$pm->finish;
}
#if email is valid get domain name
$is_valid = Email::Valid::Loose->address($add);
if ($is_valid =~ m/\#(.*)$/) {
$host = $1;
}
$is_valid="";
# perform dsn lookup to check domain
#mx=mx($resolver, $host);
if (#mx) {
writeaddr(*GOODADDR, $add); #address is good
}else{
writeaddr(*BADADDR, $add); #address is bad
}
$pm->finish;
}

Related

Changing an array in Perl

I am trying to use Perl to parse output from a (C-based) program.
Every output line is a (1D) Perl array, which I sometimes want to store (based on certain conditions).
I now wish to (deep) copy an array when its first element has a certain keyword,
and print that same copied array if another keyword matches in a later line-array.
So far, I have attempted the following:
#!/usr/bin/env perl
use strict; # recommended
use Storable qw(dclone);
...
while(1) # loop over the lines
{
# subsequent calls to tbse_line contain
# (references to) arrays of data
my $la = $population->tbse_line();
my #copy;
my $header = shift #$la;
# break out of the loop:
last if ($header eq 'fin');
if($header eq 'keyword')
{
#copy = #{ dclone \#$la };
}
if($header eq 'other_keyword')
{
print "second condition met, print first line:\n"
print "#copy\n";
}
}
However, this prints an empty line to the screen, instead of the contents of the copied array. I don't have a lot of Perl experience, and I can't figure out what I am doing wrong.
Any idea on how to go about this?
my #copy allocates a new Perl array named #copy in the current scope. It looks like you want to set #copy during one iteration of your while loop and print it in a different iteration. In order for your array not to be erased each time a new while loop iteration starts, you should move the my #copy declaration outside of the loop.
my #copy;
while (1) { ... }

Reading data from file into an array to manipulate within Perl script

New to Perl. I need to figure out how to read from a file, separated by (:), into an array. Then I can manipulate the data.
Here is a sample of the file 'serverFile.txt' (Just threw in random #'s)
The fields are Name : CPU Utilization: avgMemory Usage : disk free
Server1:8:6:2225410
Server2:75:68:64392
Server3:95:90:12806
Server4:14:7:1548700
I would like to figure out how to get each field into its appropriate array to then perform functions on. For instance, find the server with the least amount of free disk space.
The way I have it set up now, I do not think will work. So how do I put each element in each line into an array?
#!usr/bin/perl
use warnings;
use diagnostics;
use v5.26.1;
#Opens serverFile.txt or reports and error
open (my $fh, "<", "/root//Perl/serverFile.txt")
or die "System cannot find the file specified. $!";
#Prints out the details of the file format
sub header(){
print "Server ** CPU Util% ** Avg Mem Usage ** Free Disk\n";
print "-------------------------------------------------\n";
}
# Creates our variables
my ($name, $cpuUtil, $avgMemUsage, $diskFree);
my $count = 0;
my $totalMem = 0;
header();
# Loops through the program looking to see if CPU Utilization is greater than 90%
# If it is, it will print out the Server details
while(<$fh>) {
# Puts the file contents into the variables
($name, $cpuUtil, $avgMemUsage, $diskFree) = split(":", $_);
print "$name ** $cpuUtil% ** $avgMemUsage% ** $diskFree% ", "\n\n", if $cpuUtil > 90;
$totalMem = $avgMemUsage + $totalMem;
$count++;
}
print "The average memory usage for all servers is: ", $totalMem / $count. "%\n";
# Closes the file
close $fh;
For this use case, a hash is much better than an array.
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use List::Util qw{ min };
my %server;
while (<>) {
chomp;
my ($name, $cpu_utilization, $avg_memory, $disk_free)
= split /:/;
#{ $server{$name} }{qw{ cpu_utilization avg_memory disk_free }}
= ($cpu_utilization, $avg_memory, $disk_free);
}
my $least_disk = min(map $server{$_}{disk_free}, keys %server);
say for grep $server{$_}{disk_free} == $least_disk, keys %server;
choroba's answer
is ideal, but I think your own code could be improved
Don't use v5.26.1 unless you need a specific feature that is available only in the given version of Perl. Note that it also enables use strict, which should be at the top of every Perl program you write
die "System cannot find the file specified. $!" is wrong: there are multiple reasons why an open may fail, beyond that it "cannot be found". Your die string should include the path to the file you're trying to open; the reason for the failure is in $!
Don't use subroutine prototypes: they don't do what you think they do. sub header() { ... } should be just sub header { ... }
There's no point in declaring a subroutine only to call it a few lines later. Put your code for header in line
You have clearly come from another language. Declare your variables with my as late as possible. In this case only $count and $totalMem must be declared outside the while loop
perl will close all open file handles when the program exits. There is rarely a need for an explicit close call, which just makes your code more noisy
$totalMem = $avgMemUsage + $totalMem is commonly written $totalMem += $avgMemUsage
I hope that helps
To your original question about how to store the data in an array...
First, initialize an empty array outside the file read loop:
my #servers = ();
Then, within the loop, after you have your data pieces parsed out, you can store them in your array as sub-arrays (the resulting data structure is a two dimensional array):
$servers[$count] = [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
Note, the square brackets on the right create the sub-array for the server's data pieces and return a reference to this new array. Also, on the left side we just use the current value of $count as an index within the #servers array and as the value increases, the size of the #servers array will grow automatically (this is called autovivification of new elements). Alternatively, you can push new elements onto the #servers array inside the loop, like this:
push #servers, [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
This way, you explicitly ask for a new element to be added to the array and the square brackets still do the same creation of the sub-array.
In any case, the end result is that after you are finished with the file read loop, you now have a 2D array where you can access the first server and its disk free field (the 4-th field at index 3) like this:
my $df = $servers[0][3];
Or inspect all the servers in a loop to find the minimum disk free:
my $min_s = 0;
for ( my $s = 0; $s < #servers; $s++ ) {
$min_s = $s if ( $servers[$s][3] < $servers[$min_s][3] );
}
print "Server $min_s has least disk free: $servers[$min_s][3]\n";
Like #choroba suggested, you can store the server data pieces/fields in hashes, so that your code will be more readable. You can still store your list of servers in an array but the second dimension can be hash:
$servers[$count] = {
name => $name,
cpu_util => $cpuUtil,
avg_mem_usage => $avgMemUsage,
disk_free => $diskFree
};
So, your resulting structure will be an array of hashes. Here, the curly braces on the right create a new hash and return the reference to it. So, you can later refer to:
my $df = $servers[0]{disk_free};

how to get a particular block in an array get copied in perl

I have details like below in an array. There will be plenty of testbed details in actual case. I want to grep a particular testbed(TESTBED = vApp_eprapot_icr) and an infomation like below should get copied to another array. How can I do it using perl ? End of Testbed info can be understood by a closing flower bracket }.
TESTBED = vApp_eprapot_icr {
DEVICE = vApp_eprapot_icr-ipos1
DEVICE = vApp_eprapot_icr-ipos2
DEVICE = vApp_eprapot_icr-ipos3
DEVICE = vApp_eprapot_icr-ipos5
CARDS=1GIGE,ETHFAST
CARDS=3GIGE,ETHFAST
CARDS=10PGIGE,ETHFAST
CARDS=20PGIGE,ETHFAST
CARDS=40PGIGE,ETHFAST
CARDS=ETHFAST,ETHFAST
CARDS=10GIGE,ETHFAST
CARDS=ETH,ETHFAST
CARDS=10P10GIGE,ETHFAST
CARDS=PPA2GIGE,ETHFAST
CARDS=ETH,ETHFAST,ETHGIGE
}
I will make it simpler, please see the below array
#array("
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
");
Consider an array like this. Where such entries are plenty. I need to grep, for an example student=Akshaya's data. from the opening '{' to closing '}' all info should get copied to another array. This is what I'm looking for.
while (<>) {
print if /TESTBED = vApp_eprapot_icr/../\}/;
}
as a sidenote <> will capture the filename you use on cmdline. So if the data is stored in a file you will run from commandline
perl scriptname.pl filename.txt
Ok. We finally have enough information to come up with an answer. Or, at least, to produce two answers which will work on slightly different versions of your input file.
In a comment you say that you are creating your array like this:
#array = `cat $file`;
That's not a very good idea for a couple of reasons. Firstly, why run an external command like cat when Perl will read the file for you. And secondly, this gives you one element in your array for each line in your input file. Things become far easier if you arrange it so that each of your TESTBED = foo { ... } records is a single array element.
Let's get rid of the cat first. The easiest way to read a single file into an array is to use the file input operator - <>. That will read data from the file whose name is given on the command line. So if you call your program filter_records, you can call it like this:
$ ./filter_records your_input_data.txt
And then read it into an array like this:
#array = <>;
That's good, but we still have each line of the input file in its own array element. How we fix that depends on the exact format of your input file. It's easiest if there's a blank line between each record in the input file, so it looks like this:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
Perl has a special variable called $/ which controls how it reads records from input files. If we set it to be an empty string then Perl goes into "paragraph" mode and it uses blank lines to delimit records. So we can write code like this:
{
local $/ = '';
#array = <>;
}
Note that it's always a good idea to localise changes to Perl's special variables, which is why I have enclosed the whole thing in a naked block.
If there are no blank lines, then things get slightly harder. We'll read the whole file in and then split it.
Here's our example file with no blank lines:
student=Amit {
Age=20
sex=male
rollno=201
}
student=Akshaya {
Age=24
phone:88665544
sex=female
rollno=407
}
student=Akash {
Age=23
sex=male
rollno=356
address=na
phone=88456789
}
And here's the code we use to read that data into an array.
{
local $/;
$data = <>;
}
#array = split /(?<=^})\n/m, $data;
This time, we've set $/ to undef which means that all of the data has been read from the file. We then split the data wherever we find a newline that is preceded by a } on a line by itself.
Whichever of the two solutions above that we use, we end up with an array which (for our sample data) has three elements - one for each of the records in our data file. It's then simple to use Perl's grep to filter that array in various ways:
# All students whose names start with 'Ak'
#filtered_array = grep { /student=Ak/ } #array;
If you use similar techniques on your original data file, then you can get the records that you are interested in with code like this:
#filtered_array = grep { /TESTBED = vApp_eprapot_icr/ } #array;

Breadth first c

How would I implement a breadth first traversal of a directory(depth unknown and not necessarily symmetrical).
My first thought was using fork. I am not sure how to implement it. I was thinking a loop that would first get the parent/s, then's get the number of children of those parents, then forks so many times based on how many children and chdir to that child, which since multi-processes were made all children are then chdir'd to. Then return the children as parents to be forked.
I feel like there are possible hole's in this and I am looking for input on possible flaws or is this a terrible approach. I have heard about people using fork with breadth first, but never found any examples, so if you have any I would gladly look at them.
Your code will look like:
Set initialize todo queue with the base directory.
While the todo queue isn't empty,
Assign the head of the queue to path.
Remove the head of the queue.
If path references a directory,
Append the path of the files in path to the todo queue.
Perform whatever action you want to perform with path.
I don't see why you think fork would help.
For example, actual Perl implementation:
sub dir_contents {
my ($path) = #_;
my $dh;
if (!opendir(my $dh, $path)) {
warn("Can't open dir \"$path\": $!\n");
return;
}
return map { "$path/$_" } grep { !/^\.\.?/ } readdir($dh)
}
my #todo = 'some path';
while ( my $path = shift(#todo) ) {
if (!stat($path)) {
warn("Can't stat \"$path\": $!\n");
next;
}
push #todo, dir_contents($path) if -d _;
print("$path\n");
}

Perl IO::Pipe does not work within arrays

im trying the following:
I want to fork multiple processes and use multiple pipes (child -> parent) simultaneously.
My approach is to use IO::Pipe.
#!/usr/bin/perl
use strict;
use IO::Pipe;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my #ua_processes = (0..9);
my $url = "http://<some-sample-textfile>";
my #ua_pipe;
my #ua_process;
$ua_pipe[0] = IO::Pipe->new();
$ua_process[0] = fork();
if( $ua_process[0] == 0 ) {
my $response = $ua->get($url);
$ua_pipe[0]->writer();
print $ua_pipe[0] $response->decoded_content;
exit 0;
}
$ua_pipe[0]->reader();
while (<$ua_pipe[0]>) {
print $_;
}
In future i want to use multiple "$ua_process"s in an array.
After execution i got the following errors:
Scalar found where operator expected at ./forked.pl line 18, near "] $response"
(Missing operator before $response?)
syntax error at ./forked.pl line 18, near "] $response"
BEGIN not safe after errors--compilation aborted at ./forked.pl line 23.
If i dont use arrays, the same code works perfectly. It seems only the $ua_pipe[0] dont work as expected (together with a array).
I really dont know why. Anyone knows a solution? Help would be very appreciated!
Your problem is here:
print $ua_pipe[0] $response->decoded_content;
The print and say builtins use the indirect syntax to specify the file handle. This allows only for a single scalar variable or a bareword:
print STDOUT "foo";
or
print $file "foo";
If you want to specify the file handle via a more complex expression, you have to enclose that expression in curlies; this is called a dative block:
print { $ua_pipe[0] } $response-decoded_content;
This should now work fine.
Edit
I overlooked the <$ua_pipe[0]>. The readline operator <> also doubles as the glob operator (i.e. does shell expansion for patterns like *.txt). Here, the same rules as for say and print apply: It'll only use the file handle if it is a bareword or a simple scalar variable. Otherwise, it will be interpreted as a glob pattern (implying stringification of the argument). To disambiguate:
For the readline <>, we have to resort to the readline builtin:
while (readline $ua_pipe[0]) { ... }
To force globbing <>, pass it a string: <"some*.pattern">, or preferably use the glob builtin.

Resources