How to loop through file and count specific values in perl? - file

Let's say I have a file with the lines such as:
*some numbers* :00: *somenumbers*
*somenumbers* :21: *somenumbers*
And for every number between :: I need to count how many times it repeats in the file?
while (<>){
chomp($_);
my ($nebitno,$bitno,$opetnebitno) = split /:/, $_;
$count{$bitno}++;
}
foreach $bitno(sort keys %count){
print $bitno," ",$count{bitno}, "\n";
}

What you produced was not bad code — it did the job for a single file at a time. Adapting the code shown in the question to handle multiple files, resetting the counts after each file:
#!/usr/bin/perl
use strict;
use warnings;
my %count = ();
while (<>) {
my ($nebitno, $bitno, $opetnebitno) = split /:/, $_;
$count{$bitno}++;
}
continue
{
if (eof) {
print "$ARGV:\n";
foreach $bitno (sort keys %count) {
print "$bitno $count{bitno}\n";
}
%count = ();
}
}
The key here is the continue block, and the if (eof) test. You can use close $ARGV in a continue block to reset $. (the line number) when the file changes; it is a common use for it. This sort of per-file summary is another use. The other changes are cosmetic. You don't need to chomp the line (though there's no particular harm done if you do); I print whole strings rather than using comma-separated lists (it works well here and very often). I use a few more spaces. I left it with the 1TBS format for the blocks of code, though I don't use that myself (I use Allman).
My draft solution used practically the same printing code as shown above, but the main while loop was slightly different:
#!/usr/bin/env perl
use strict;
use warnings;
my %counts = ();
while (<>)
{
$counts{$1}++ if (m/.*:(\d+):/);
}
continue
{
if (eof)
{
print "$ARGV:\n";
foreach my $number (sort { $a <=> $b } keys %counts)
{
print ":$number: $counts{$number}\n"
}
%counts = ();
}
}
The only advantage over what you used is that if some line doesn't contain a colon-surrounded number, it ignores the line, whereas yours doesn't consider that possibility. I'm not sure the comparison code in the sort is necessary — it ensures that the comparisons are numeric, though. If the numbers are all the same length and zero-padded on the left when necessary, there's no problem. If they're more generally formatted, the 'forced numeric' comparison might make a difference.
Remember: this is Perl, so TMTOWDTI (There's More Than One Way To Do It). Someone else might come up with a simpler solution.

Desired output can be achieved with following code snippet
look for pattern :\d+: in a line
increment hash %count for the digit
output result to console
use strict;
use warnings;
use feature 'say';
my %count;
/:(\d+):/ && $count{$1}++ for <>;
say "$_ = $count{$_}" for sort keys %count;

Related

Comparison of two arrays in perl

I am trying to compare the content of two arrays and I need the final output as "Matched" or "Not Matched"
I have written the below code and it is giving the expected output. However, can anyone suggest me any other simple way of doing it
#!/usr/bin/perl
use strict;
use warnings;
#Numeric scalar
my #array_1= (10,20,40,19);
my #array_2= (10,30,23,19);
print "#array_1\n";
my $count=0;
while ($count < scalar #array_1){
for (#array_2) {
if ($array_1[$count] == $array_2[$count]) {
print "matched\n";
$count++;}
else {
print "Not matched\n";
$count++;
}
}
}
Above solution is good. Also you can use https://metacpan.org/pod/Array::Compare module
Array::Compare - Perl extension for comparing arrays. If you have two arrays and you want to know if they are the same or different, then Array::Compare will be useful to you.
All comparisons are carried out via a comparator object.
use strict;
use warnings;
use Array::Compare;
my #array_1= (10,20,40,19);
my #array_2= (10,30,23,19,66);
my $comp = Array::Compare->new;
if ($comp->compare(\#array_1, \#array_2)) {
print "Arrays are the same (Matched)\n";
} else {
print "Arrays are different (Not Matched)\n";
}
Output
Arrays are different (Not Matched)
It's easy to write down all the conditions that don't match first, and then show it as a match at the end.
#!/usr/bin/perl
use strict;
use warnings;
my #array_1 = (10,20,40,19);
my #array_2 = (10,30,23,19);
if (scalar #array_1 != scalar #array_2) {
print "Not matched\n";
exit 0;
}
while (my ($index, $elem) = each #array_1) {
if ($elem != $array_2[$index]) {
print "Not matched\n";
exit 0;
}
}
print "matched\n";

Reading data from file into an array to manipulate within Perl script

New to Perl. I need to figure out how to read from a file, separated by (:), into an array. Then I can manipulate the data.
Here is a sample of the file 'serverFile.txt' (Just threw in random #'s)
The fields are Name : CPU Utilization: avgMemory Usage : disk free
Server1:8:6:2225410
Server2:75:68:64392
Server3:95:90:12806
Server4:14:7:1548700
I would like to figure out how to get each field into its appropriate array to then perform functions on. For instance, find the server with the least amount of free disk space.
The way I have it set up now, I do not think will work. So how do I put each element in each line into an array?
#!usr/bin/perl
use warnings;
use diagnostics;
use v5.26.1;
#Opens serverFile.txt or reports and error
open (my $fh, "<", "/root//Perl/serverFile.txt")
or die "System cannot find the file specified. $!";
#Prints out the details of the file format
sub header(){
print "Server ** CPU Util% ** Avg Mem Usage ** Free Disk\n";
print "-------------------------------------------------\n";
}
# Creates our variables
my ($name, $cpuUtil, $avgMemUsage, $diskFree);
my $count = 0;
my $totalMem = 0;
header();
# Loops through the program looking to see if CPU Utilization is greater than 90%
# If it is, it will print out the Server details
while(<$fh>) {
# Puts the file contents into the variables
($name, $cpuUtil, $avgMemUsage, $diskFree) = split(":", $_);
print "$name ** $cpuUtil% ** $avgMemUsage% ** $diskFree% ", "\n\n", if $cpuUtil > 90;
$totalMem = $avgMemUsage + $totalMem;
$count++;
}
print "The average memory usage for all servers is: ", $totalMem / $count. "%\n";
# Closes the file
close $fh;
For this use case, a hash is much better than an array.
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use List::Util qw{ min };
my %server;
while (<>) {
chomp;
my ($name, $cpu_utilization, $avg_memory, $disk_free)
= split /:/;
#{ $server{$name} }{qw{ cpu_utilization avg_memory disk_free }}
= ($cpu_utilization, $avg_memory, $disk_free);
}
my $least_disk = min(map $server{$_}{disk_free}, keys %server);
say for grep $server{$_}{disk_free} == $least_disk, keys %server;
choroba's answer
is ideal, but I think your own code could be improved
Don't use v5.26.1 unless you need a specific feature that is available only in the given version of Perl. Note that it also enables use strict, which should be at the top of every Perl program you write
die "System cannot find the file specified. $!" is wrong: there are multiple reasons why an open may fail, beyond that it "cannot be found". Your die string should include the path to the file you're trying to open; the reason for the failure is in $!
Don't use subroutine prototypes: they don't do what you think they do. sub header() { ... } should be just sub header { ... }
There's no point in declaring a subroutine only to call it a few lines later. Put your code for header in line
You have clearly come from another language. Declare your variables with my as late as possible. In this case only $count and $totalMem must be declared outside the while loop
perl will close all open file handles when the program exits. There is rarely a need for an explicit close call, which just makes your code more noisy
$totalMem = $avgMemUsage + $totalMem is commonly written $totalMem += $avgMemUsage
I hope that helps
To your original question about how to store the data in an array...
First, initialize an empty array outside the file read loop:
my #servers = ();
Then, within the loop, after you have your data pieces parsed out, you can store them in your array as sub-arrays (the resulting data structure is a two dimensional array):
$servers[$count] = [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
Note, the square brackets on the right create the sub-array for the server's data pieces and return a reference to this new array. Also, on the left side we just use the current value of $count as an index within the #servers array and as the value increases, the size of the #servers array will grow automatically (this is called autovivification of new elements). Alternatively, you can push new elements onto the #servers array inside the loop, like this:
push #servers, [ $name, $cpuUtil, $avgMemUsage, $diskFree ];
This way, you explicitly ask for a new element to be added to the array and the square brackets still do the same creation of the sub-array.
In any case, the end result is that after you are finished with the file read loop, you now have a 2D array where you can access the first server and its disk free field (the 4-th field at index 3) like this:
my $df = $servers[0][3];
Or inspect all the servers in a loop to find the minimum disk free:
my $min_s = 0;
for ( my $s = 0; $s < #servers; $s++ ) {
$min_s = $s if ( $servers[$s][3] < $servers[$min_s][3] );
}
print "Server $min_s has least disk free: $servers[$min_s][3]\n";
Like #choroba suggested, you can store the server data pieces/fields in hashes, so that your code will be more readable. You can still store your list of servers in an array but the second dimension can be hash:
$servers[$count] = {
name => $name,
cpu_util => $cpuUtil,
avg_mem_usage => $avgMemUsage,
disk_free => $diskFree
};
So, your resulting structure will be an array of hashes. Here, the curly braces on the right create a new hash and return the reference to it. So, you can later refer to:
my $df = $servers[0]{disk_free};

Split an array into two. Predetermined sizes, might not always be equal

I need to iterate over many files in a directory and split each file into two parts. I need to keep lines intact (I can't split on bite size). I also can't always assume that the file has an equal number of lines. I could use the "split" function, but am looking for a faster way of going through my files and to avoid the standard output names "xaa" and "xab" it generates.
The easiest would be to make two subsequent substrings of an array in the sizes specified ($number_of_group_one and $number_of_group_two). I can't find out how to do this. Instead I am trying to push the lines into different arrays- filling one up until a certain number of lines and then "spill over" into the other array until there are no more lines left to push. However, this approach yields two output arrays that both have exactly double the number of input lines. Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
my ($directory) = #ARGV;
my $dir = "$directory";
my #arrayoffiles = glob "$dir/*";
my #arrayoflines_one;
my #arrayoflines_two;
my $counter = 0;
foreach my $filename(#arrayoffiles){
my #arrayoflines_one;
my #arrayoflines_two;
my #lines = read_lines($filename);
my $NumberofLines = #lines;
my $number_of_group_one = int($NumberofLines/2);
my $number_of_group_two = ($NumberofLines - $number_of_group_one);
foreach my $line (#lines){
$counter++;
push (#arrayoflines_one, $line, "\n");
if ($counter == $number_of_group_one){
push (#arrayoflines_two, $line, "\n");
}
}
}
sub read_lines {
my ($file) = #_;
open my $in, '<', $file or die $!;
local $/ = undef; #slurps the whole file in as one
my $content = <$in>;
return split /\s/, $content;
close $in;
}
I hope this is clear. Thanks for your help!
This is a good use case for splice:
my #lines = read_lines($filename);
my #lines1 = splice #lines, 0, #lines/2;
will put (about) half of your lines from #lines into #lines1, removing them (and leaving about half of the lines) from #lines.

Perl - can't use string (...) as an array ref

I'm practicing Perl with a challenge from codeeval.com, and I'm getting an unexpected error. The goal is to iterate through a file line-by-line, in which each line has a string and a character separated by a comma, and to find the right-most occurrence of that character in the string. I was getting wrong answers back, so I altered the code to print out just variable values, when I got the following error:
Can't use string ("Hello world") as an ARRAY ref while "strict refs" in use at char_pos.pl line 20, <FILE> line 1.
My code is below. You can see a sample from the file in the header. You can also see the original output code, which was incorrectly only displaying the right-most character in each string.
#CodeEval challenge: https://www.codeeval.com/open_challenges/31/
#Call with $> char_pos.pl numbers
##Hello world, d
##Hola mundo, H
##Keyboard, b
##Connecticut, n
#!/usr/bin/perl
use strict;
use warnings;
my $path = $ARGV[0];
open FILE, $path or die $!;
my $len;
while(<FILE>)
{
my #args = split(/,/,$_);
$len = length($args[0]) - 1;
print "$len\n";
for(;$len >= 0; $len--)
{
last if $args[0][$len] == $args[1];
}
#if($len > -1)
#{
# print $len, "\n";
#}else
#{
# print "not found\n";
#}
}
EDIT:
Based on the answers below, here's the code that I got to work:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
open my $fh,"<",shift;
while(my $line = <$fh>)
{
chomp $line;
my #args = split(/,/,$line);
my $index = rindex($args[0],$args[1]);
print $index>-1 ? "$index\n" : "Not found\n";
}
close $fh;
It looks like you need to know a bit about Perl functions. Perl has many functions for strings and scalars and it's not always possible to know them all right off the top of your head.
However, Perl has a great function called rindex that does exactly what you want. You give it a string, a substring (in this case, a single character), and it looks for the first position of that substring from the right side of the string (the index does the same thing from the left hand side.)
Since you're learning Perl, it may be a good idea to get a few books on Modern Perl and standard coding practices. This way, you know newer coding techniques and the standard coding practices.
Modern Perl - Gives you newer programming help.
Learning Perl - An old standard.
Perl Best Practices - The standard coding practices.
Here's a sample program:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
use feature qw(say);
open my $fh, "<", shift;
while ( my $line = <$fh> ) {
chomp $line;
my ($string, $char) = split /,/, $line, 2;
if ( length $char != 1 or not defined $string ) {
say qq(Invalid line "$line".);
next;
}
my $location = rindex $string, $char;
if ( $location != -1 ) {
say qq(The right most "$char" is at position $location in "$string".);
}
else {
say qq(The character "$char" wasn't found in line "$line".)";
}
close $fh;
A few suggestions:
use autodie allows your program to automatically die on bad open. No need to check.
Three parameter open statement is now considered de rigueur.
Use scalar variables for file handles. They're easier to pass into subroutines.
Use lexically scoped variables for loops. Try to avoid using $_.
Always do a chomp after a read.
And most importantly, error check! I check the format of the line to make sure that's there is only a single comma, and that the character I'm searching for is a character. I also check the exit value of rindex to make sure it found the character. If rindex doesn't find the character, it returns a -1.
Also know that the first character in a line is 0 and not 1. You may need to adjust for this depending what output you're expecting.
Strings in perl are a basic type, not subscriptable arrays. You would use the substr function to get individual characters (which are also just strings) or substrings from them.
Also note that string comparison is done with eq; == is numeric comparison.
while($i=<DATA>){
($string,$char)=split(",",$i);
push(#str,$string);}
#join=split("",$_), print "$join[-1]\n",foreach(#str);
__DATA__
Hello world, d
Hola mundo, H
Keyboard, b
Connecticut, n

Perl script problems

The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016

Resources