Comparing two associative arrays - arrays

I started learning Perl last week.
I have an associative array from a file containing 'tokens' - Just a bunch of numbers.
I have another associative array from an SQL Database containing 'tokens'.
I'm wanting to see if any tokens in the file are NOT in the database. However anything I do doesn't seem to work and I've come to the conclusion that I'm just confusing myself.
I'm not sure I fully understand associative arrays yet but this is a snippet of my code for the file hash:
while($row = <FILE>){
if($row =~ /^000\E/){
#tmp=split(/\s+/,$row);
if($tmp[1] ne "Unassigned"){
$tokenfile{$tmp[0]} = $tmp[1] . " " . $tmp[2];
}
}
}
$tmp[1] + $tmp[2] are the first and second names. I compare names later on to see if they equal each other. However I want to compare $tmp[0] - The token. This is the SQL hash:
while(#rows = $sth->fetchrow_array){
($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell) = getpwnam("\L$rows[1]\E");
$gcos =~ s/,.*//;
if(!defined($gcos)){
$missing++;
$tokendb{$rows[0]} = $rows[1];
}
else{
$tokendb{$rows[0]} = $gcos;
}
}
$rows[0] is the token.
I assumed I would use two foreach loops such as this:
foreach $token (keys(%tokendb)) {
foreach $token2(keys(%tokenfile)){
if($token ne $token2){
print "$token2 NOT IN DATABASE\n";
}
}
}
But that gives me the result of a lot of values that are still in the database.
I'd love some hints as to why this isn't working. Very frustrating as I know it's something so simple but my brain isn't working so well today (Even though it's my 21st Birthday :|).

foreach $token (keys(%tokenfile)) {
if (! exists $tokendb{$token}) {
print "$token NOT IN DATABASE\n";
}
}
Your nested loop failed because even if a key exists, it doesn't match all the other keys. To do it with a nested loop, it should be:
foreach $token (keys(%tokenfile)) {
$found = 0;
foreach $token2 (keys(%tokendb)) {
if ($token eq $token2) {
$found = 1;
last;
}
}
if (!found) {
print "$token NOT IN DATABASE\n";
}
}
Of course, there's no reason to write it this way, this is just to help you understand how your logic failed.

If you're iterating over a hash and testing every key individually to see whether one of them is a target value, then you're not taking advantage of the power of hashes: Lookups. Try something like
foreach $token (keys(%tokenfile)) {
unless (exists $tokendb{$token}) {
print "$token NOT IN DATABASE\n";
}
}
instead.

Related

How to loop through file and count specific values in perl?

Let's say I have a file with the lines such as:
*some numbers* :00: *somenumbers*
*somenumbers* :21: *somenumbers*
And for every number between :: I need to count how many times it repeats in the file?
while (<>){
chomp($_);
my ($nebitno,$bitno,$opetnebitno) = split /:/, $_;
$count{$bitno}++;
}
foreach $bitno(sort keys %count){
print $bitno," ",$count{bitno}, "\n";
}
What you produced was not bad code — it did the job for a single file at a time. Adapting the code shown in the question to handle multiple files, resetting the counts after each file:
#!/usr/bin/perl
use strict;
use warnings;
my %count = ();
while (<>) {
my ($nebitno, $bitno, $opetnebitno) = split /:/, $_;
$count{$bitno}++;
}
continue
{
if (eof) {
print "$ARGV:\n";
foreach $bitno (sort keys %count) {
print "$bitno $count{bitno}\n";
}
%count = ();
}
}
The key here is the continue block, and the if (eof) test. You can use close $ARGV in a continue block to reset $. (the line number) when the file changes; it is a common use for it. This sort of per-file summary is another use. The other changes are cosmetic. You don't need to chomp the line (though there's no particular harm done if you do); I print whole strings rather than using comma-separated lists (it works well here and very often). I use a few more spaces. I left it with the 1TBS format for the blocks of code, though I don't use that myself (I use Allman).
My draft solution used practically the same printing code as shown above, but the main while loop was slightly different:
#!/usr/bin/env perl
use strict;
use warnings;
my %counts = ();
while (<>)
{
$counts{$1}++ if (m/.*:(\d+):/);
}
continue
{
if (eof)
{
print "$ARGV:\n";
foreach my $number (sort { $a <=> $b } keys %counts)
{
print ":$number: $counts{$number}\n"
}
%counts = ();
}
}
The only advantage over what you used is that if some line doesn't contain a colon-surrounded number, it ignores the line, whereas yours doesn't consider that possibility. I'm not sure the comparison code in the sort is necessary — it ensures that the comparisons are numeric, though. If the numbers are all the same length and zero-padded on the left when necessary, there's no problem. If they're more generally formatted, the 'forced numeric' comparison might make a difference.
Remember: this is Perl, so TMTOWDTI (There's More Than One Way To Do It). Someone else might come up with a simpler solution.
Desired output can be achieved with following code snippet
look for pattern :\d+: in a line
increment hash %count for the digit
output result to console
use strict;
use warnings;
use feature 'say';
my %count;
/:(\d+):/ && $count{$1}++ for <>;
say "$_ = $count{$_}" for sort keys %count;

Perl Program to Count Two Character Frequencies

I am trying to find two character strings in a text file and print them and their frequencies out.
#!/usr/bin/perl
#digram finder
use strict; use warnings;
#finds digrams in a file and prints them and their frequencies out
die "Must input file\n" if (#ARGV != 1);
my ($file) = #ARGV;
my %wordcount;
open (my $in, "<$file") or die "Can't open $file\n";
while (my $words = <$in>){
chomp $words;
my $length = length($words);
for (my $i = 0; $i<$length; $i++){
my $duo = substr($words, $i; 2);
if (not exists $wordcount{$duo}){
$wordcount{$duo} = 1;
}
else {
$wordcount{$duo}++;
}
}
}
foreach my $word (sort {$wordcount{$b} cmp $wordcount{$a}} keys %wordcount){
print "$word\t$wordcount{$duo}\n";
}
close($in);
First I set the text file to a string $words.
Then, I run a for loop and create a substring $duo at each position along $words
If $duo doesn't exist within the hash %wordcount, then the program creates the key $duo
If $duo does exist, then the count for that key goes up by 1
Then the program prints out the digrams and their frequencies, in order of decreasing frequency
When I try to run the code, I get the error message that I forgot to declare $word on line 17 but I do not even have the string $word. I am not sure where this error message is coming from. Can someone help me find where the error is coming from?
Thank you
My best guess is that you actually have $word instead of $words; a typo. If the compilation found the symbol $word in the text then it's probably there.
However, I'd also like to comment on the code. A cleaned up version
while (my $words = <$in>) {
chomp $words;
my $last_duo_idx = length($words) - 2;
for my $i (0 .. $last_duo_idx) {
my $duo = substr($words, $i, 2);
++$wordcount{$duo};
}
}
my #skeys = sort { $wordcount{$b} <=> $wordcount{$a} } keys %wordcount;
foreach my $word (#skeys) {
print "$word\t$wordcount{$word}\n";
}
This runs correctly on a made-up file. (I sort separately only so to not run off of the page.)
Comments
Need to stop one before last in the line, and substr starts from 0; thus -2
One almost never needs a C-style loop
There is no need here to test for existence of a key. If it doesn't exist it is autovivified (created), then incremented to 1 with ++; otherwise the count is incremented.
To sort numerically use <=>, not cmp
Typos:
substr($words, $i; 2) needs a , not ;, so substr($words, $i, 2)
$wordcount{$duo} in print should be $wordcount{$word}.
I am not sure about naming: why is a line of text called $words?

Working with Email::MIME and byte strings

I'm a perl novice and trying to process the lines of an email using Email::MIME. I've managed to isolate the individual parts of the message, but the body parts appear to be available as "byte strings", which appear to be just one big string, and not a series of lines.
my $parser = MIME::Parser->new;
$parser->extract_uuencode(1);
$parser->extract_nested_messages(1);
$parser->output_to_core(0);
my $buf;
while(<STDIN> ){
$buf .= $_;
}
my $entity = $parser->parse_data($buf);
my #mailData;
my $msg = Email::MIME->new($buf);
foreach my $part ( $msg->subparts ) {
foreach my $sub_part ($part->subparts) {
print $sub_part->body;
}
}
How can I write the $sub_part->body into an array such that I can process it line-by-line outside of those foreach() loops? I'd like to be able to do something like:
foreach my $line (#mailData) {
print $line;
}

Perl Modification of non creatable array value attempted, subscript -1

I have a Perl-Script, which executes a recursive function. Within it compares two elements of a 2dimensional Array:
I call the routine with a 2D-Array "#data" and "0" as a starting value. First I load the parameters into a separate 2D-Array "#test"
Then I want to see, if the array contains only one Element --> Compare if the last Element == the first. And this is where the Error occurs: Modification of non creatable array value attempted, subscript -1.
You tried to make an array value spring into existence, and the subscript was probably negative, even counting from end of the array backwards.
This didn't help me much...I'm pretty sure it has to do with the if-clause "$counter-1". But I don't know what, hope you guys can help me!
routine(#data,0);
sub routine {
my #test #(2d-Array)
my $counter = $_[-1]
for(my $c=0; $_[$c] ne $_[-1]; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $_[$c][$j];
}
}
if ($test[$counter-1][1] eq $test[-1][1]{
$puffertime = $test[$counter][4];
}
else{
for (my $l=0; $l<=$counter;$l++){
$puffertime+= $test[$l][4]
}
}
}
#
#
#
if ($puffertime <90){
if($test[$counter][8]==0){
$counter++;
routine(#test,$counter);
}
else{ return (print"false");}
}
else{return (print "true");}
Weird thing is that I tried it out this morning, and it worked. After a short time of running he again came up with this error message. Might be that I didn't catch up a error constellation, which could happen by the dynamic database-entries.
Your routine() function would be easier to read if it starts off like this:
sub routine {
my #data = #_;
my $counter = pop(#data);
my #test;
for(my $c=0; $c <= $#data; $c++){
for (my $j=0; $j<13;$j++){ #Each element has 13 other elements
$test[$c][$j] = $data[$c][$j];
}
}
You can check to see if #data only has one element by doing scalar(#data) == 1 or $#data == 0. From your code snippet, I do not see why you need to copy the data to passed to routine() to #test. Seems superfluous. You can just as well skip all this copying if you are not going to modify any of the data passed to your routine.
Your next code might look like this:
if ($#test == 0) {
$puffertime = $test[0][4];
} else {
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}
}
But if your global variable $puffertime was initialized to zero then you can replace this code with:
for (my $l=0; $l <= $counter; $l++) {
$puffertime += $test[$l][4];
}

Why does substr work differently when passed directly into a method?

I've a question about perl that I used to not bother about in the past, but it's bugging me now.
I have a method call saveItems which takes in a value from a text log and parses the input.
so I have this few lines in the method.
$intime = $_[1];
$timeHr = substr($intime, 0,2);
$timeMin = substr($intime, 2,2);
$timeSec = substr($intime, 5,2);
$object[$_[0]]->hr($timeHr);
$object[$_[0]]->min($timeMin);
$object[$_[0]]->sec($timeSec);
$intime being the value of the time passed into this method.
Sample of $intime: 0431:12
My question is that why does the above not give me any error but when I try to shorten the lines like so :
$object[$_[0]]->hr(substr($intime, 0,2));
$object[$_[0]]->min(substr($intime, 2,2));
$object[$_[0]]->sec(substr($intime, 5,2));
Only the first one works while the rest gives me an out of string error.
I am relatively new to perl, as you can see, but can anyone give me an answer to this?
EDIT
Sample HR:
sub hr {
my $self = shift;
if (#_) { $self->{HR} = shift }
return $self->{HR};
}
EDIT
Case Closed.. Read my answer post
From the comments above, adding .'' after each substr solved your problem. The reason for this is that the ->hr, ->min, and ->sec methods are modifying their argument in some way. Without seeing it further I can't say for certain what is happening.
The substr function returns a value that is a valid lvalue. This means that it can be assigned to. So when something in those methods assigns to the slice from substr, it is interfering with the other methods.
Appending an empty string fixes the problem by breaking the alias between the slice and the original string (stored in $intime).
If you wrote the hr, min and sec methods, you should figure out why they are modifying their arguments. Adding print "[$intime]\n"; statements between each method call should be revealing.
Can you come up with self-contained runnable code that demonstrates the problem? The problem you describe doesn't quite match up with the code you show, though I don't understand #object's role in your code.
The following works just fine:
use strict;
use warnings;
package Class;
sub new { bless {} }
sub saveItems {
my $intime = $_[1];
$_[0]->hr(substr($intime, 0,2));
$_[0]->min(substr($intime, 2,2));
$_[0]->sec(substr($intime, 5,2));
}
sub hr {
my $self = shift;
if (#_) { $self->{HR} = shift }
return $self->{HR};
}
sub min {
my $self = shift;
if (#_) { $self->{MIN} = shift }
return $self->{MIN};
}
sub sec {
my $self = shift;
if (#_) { $self->{SEC} = shift }
return $self->{SEC};
}
package main;
my $object = Class->new();
$object->saveItems( '0431:12' );
print "hr: ", $object->hr(), " min: ", $object->min(), " sec: ", $object->sec(), "\n";
This matter has been resolved.
The way of using substr as follows, are able to perform normally, without errors.
$object[$_[0]]->hr(substr($intime, 0,2));
$object[$_[0]]->min(substr($intime, 2,2));
$object[$_[0]]->sec(substr($intime, 5,2));
However, it is the log file that has trailing blank lines that got this script to fail.
Thanks to #ysth for asking me to reproduce the problem, when I realized that the problem actually lies with the log file instead of the script.
Lesson learnt: Check the codes AND the source before raising an issue

Resources