Trying to print one element of an array somehow prints all elements - arrays

I am writing a script to take a list of integers from file aryData, sort them, print the sorted array, the highest value, and the lowest value.
aryData
89 62 11 75 8 33 95 4
But when printing the highest or lowest value, all elements of the array are printed.
This is my Perl code
#!/bin/perl
use strict;
use warnings;
print "Enter filename to be sorted: ";
my $filename = <STDIN>;
chomp( $filename );
open( INFILE, "<$filename" );
my #nums = <INFILE>;
close INFILE;
my #sorted = sort { $a cmp $b } #nums;
open my $outfile, '>', "HighLow.txt";
print $outfile "Sorted numbers: #sorted";
print $outfile "Smallest number: $sorted[0] \n";
print $outfile "Largest number: $sorted[-1] \n";
output HighLow.txt
Sorted numbers: 89 62 11 75 8 33 95 4
Smallest number: 89 62 11 75 8 33 95 4
Largest number: 89 62 11 75 8 33 95 4

This answer will have a large portion of code review and explain concepts that are not directly related to the question.
Let's look at the part of your code that reads in the array.
open(INFILE, "<$filename");
my #nums = <INFILE>;
close INFILE;
This code is fine for what you are doing, but it has a few security and style issues that I will get into further below.
So you have a file name, and you read in a file line by line. Each line goes into one element in the array #nums. Since your stuff is not working the way you want, the first step you need to take to debug this is to try to look at the array.
Your attempt to do this was not a bad idea.
print "Sorted numbers: #sorted";
Interpolating an array in a double quoted "" string in Perl joins the the elements of the array with the variable $,, which is also known as the output field separator. By default, it's a blank space .
my #foo = (1, 2, 3);
print "#foo";
This will give the following output
1 2 3
Unfortunately your input file already had spaces as separators, and all numbers were on one line. So you couldn't really see that the array wasn't properly set up. That's one of those facepalm moments when you notice it yourself. You could have noticed it by looking at the sorted numbers. You did sort them, but they were not sorted.
Sorted numbers: 89 62 11 75 8 33 95 4
A better way to figure out what's in the array would be to use Data::Dumper, which lets you serialize data structures. It's included with Perl.
use Data::Dumper;
my #foo = (1, 2, 3);
print Dumper \#foo;
The module gives you a Dumper function. It likes works better on references, so you need to add the backslash to create a reference to #foo. What that means exactly is not relevant at this point. Just remember that if your variable does not have a $, you put a backslash in front.
$VAR1 = [
1,
2,
3
];
This is useful. It tells us the three elements. Now lets look at your code. Instead of an actual file, I am using the pseudo-filehandle DATA, which reads from the __DATA__ section at the end of the program. This is great for testing and for examples.
use Data::Dumper;
my #nums = <DATA>;
my #sorted = sort { $a cmp $b } #nums;
print Dumper \#sorted;
__DATA__
89 62 11 75 8 33 95 4
This prints
$VAR1 = [
'89 62 11 75 8 33 95 4
'
];
We can see two things here. First, all numbers are on one line, and thus they went into the first element. Second, the line has a newline at the end. You already know that you can remove that with chomp.
So lets try to fix this. We now know that we need to split the line of numbers. There are many different ways to accomplish this task. I will go with a very verbose one to explain the steps involved.
use Data::Dumper;
my $line = <DATA>; # only read one line
chomp $line; # remove the line ending
my #nums = split / /, $line;
my #sorted = sort { $a cmp $b } #nums;
print Dumper \#sorted;
__DATA__
89 62 11 75 8 33 95 4
We use split with an empty pattern / / to turn the string of numbers into a list of numbers, and put that in an array. Then we sort.
$VAR1 = [
'11',
'33',
'4',
'62',
'75',
'8',
'89',
'95'
];
As you can see, we now have a sorted list of numbers. But they are not sorted numerically. Instead, they are sorted asciibetically. That's because cmp is the operator that sorts by ASCII character number. It's also the default behavior of Perl's sort, so you could have omitted that whole { $a cmp $b } block. It's the same as just saying sort #nums.
But we want to sort numbers by their numerical value, so we need to use the <=> sorting operator.
use Data::Dumper;
my $line = <DATA>; # only read one line
chomp $line; # remove the line ending
my #nums = split / /, $line;
my #sorted = sort { $a <=> $b } #nums;
print Dumper \#sorted;
__DATA__
89 62 11 75 8 33 95 4
Now the program prints the right output.
$VAR1 = [
'4',
'8',
'11',
'33',
'62',
'75',
'89',
'95'
];
I'll leave it to you to put this back into your actual program.
Finally, a word about your open. You are using what's called glob filehandles. Those things like INFILE are global identifiers. They are valid throughout your program, even in other modules that you might load. While in this tiny program that doesn't really make a difference, it might cause problems in the future. If for example the Data::Dumper module was to open a file and use the same identifier INFILE, and you had not called close INFILE, your program might either crash or do very weird things, because it would reuse the same handle.
Instead, you can use a lexical file handle. A lexical variable is only valid inside of a certain scope, like a function or the body of a loop. It's just a regular variable, declared with my. It will automatically call close for you when it goes out of scope.
open my $fh, "<foo";
my #nums = <$fh>;
close $fh;
You are calling open with two arguments. That's also not a good idea. Right now you have the mode <, but if you leave that out and do open my $fh, "$file" and read the $file from the user, they might pass in bad things, like | rm -rf slash. Perl will then treat the pipe | as the mode, open a pipe and delete all your stuff. Instead, use three-argument open.
open my $fh, '<', 'foo';
Now that you explicitly set the mode, you're safe.
The last point is that you should always check if open worked. That's easy.
open my $fh, '<', 'foo' or die $!;
The variable $! contains the error that open encountered. The or will only trigger if the return value of the open call was false. And die makes the program terminate. The error you might receive could look like this.
No such file or directory at /home/foo/code/scratch.pl line 6154.
So the full file reading should look something like this.
open my $fh, '<', $filename or die "Could not read $filename: $!";
my #nums = <$fh>;
close $fh;

As you've seen from the comments, the problem here is that you don't populate your array correctly. You end up with only one element in #nums - it's a single element that contains all of your data.
You could confirm that by using something like Data::Dumper, which... er... dumps your data :-)
At the top of your program, just after the use warnings; you can add this:
use Data::Dumper;
Then after you have loaded up #nums, try dumping it:
print Dumper(\#nums), "\n";
You'll see this:
$VAR1 = [
'89 62 11 75 8 33 95 4
'
];
Compare that to what you see when we fix your problem and you'll see an obvious difference.
So we have a line of data that contains the numbers you're interested in separated by spaces. To convert that into a list of numbers which we can store in your array, we can use the split() function. split() takes two arguments - a regular expression to split the string on, and the string to split.
You have this code to read from the file and assign to your array:
my #nums = <INFILE>;
You can replace that with:
my #nums = split / /, <INFILE>;
Now our data dump looks like this:
$VAR1 = [
'89',
'62',
'11',
'75',
'8',
'33',
'95',
'4
'
];
I hope the difference is obvious. Your program basically works at this point, but we can clean things up a bit by dealing with the new-line at the end of the record in the file (you can see it after the 4 above).
We'll need to split the line into two.
chomp(my $input = <INFILE>);
my #nums = split / /, $input;
Now our data dump looks like this:
$VAR1 = [
'89',
'62',
'11',
'75',
'8',
'33',
'95',
'4'
];
At this point, your program still has a bug left in it. I'm going to leave that for you to investigate (hint: what does sort() actually do? Read the documentation) - if you have more problems, please ask another question.
But I'd like to finish by suggesting some improvements to your general coding style. I'm not sure where you're learning Perl from, but some of the stuff you're doing looks pretty dated.
When you open a file in Perl, you should always check the results from your call to open and take appropriate action if it fails. In many cases, killing the program is the appropriate action, so I'd use die() in your open statement.
open( INFILE, "<$filename" )
or die "Can't open $filename: $!\n");
The $! in the error message will tell you why Perl couldn't open the file.
It's also regarded as best practice these days to avoid "bareword filehandles" (like your INFILE) and also split the file name from the mode indicators (> or <). Putting that all together, your file handling code becomes:
open( my $in_fh, '<', $filename )
or die "Can't open $filename: $!\n";
chomp(my $input = <$in_fh>);
my #nums = split / /, $input;
close $in_fh;
I see you are already using this style for the output file. Seems strange to mix the styles within the same program.

Maybe You can try this to find out the max and min:
#a=qw(1 3 2 8 7 5 4 10 9);
#a=sort {$a<=>$b}#a;
print "the max number=$a[0]\nthe min number=$a[$#a]\n";

Related

How to read a .txt file and store it into an array

I know this is a fairly simple question, but I cannot figure out how to store all of the values in my array the way I want to.
Here is a small portion what the .txt file looks like:
0 A R N D
A 2 -2 0 0
R -2 6 0 -1
N 0 0 2 2
D 0 -1 2 4
Each value is delimited by either two spaces - if the next value is positive - or a space and a '-' - if the next value is negative
Here is the code:
use strict;
use warnings;
open my $infile, '<', 'PAM250.txt' or die $!;
my $line;
my #array;
while($line = <$infile>)
{
$line =~ /^$/ and die "Blank line detected at $.\n";
$line =~ /^#/ and next; #skips the commented lines at the beginning
#array = $line;
print "#array"; #Prints the array after each line is read
};
print "\n\n#array"; #only prints the last line of the array ?
I understand that #array only holds the last line that was passed to it. Is there a way where I can get #array to hold all of the lines?
You are looking for push.
push #array, $line;
You undoubtedly want to precede this with chomp to snip any newlines, first.
If file is small as compared to available memory of your machine then you can simply use below method to read content of file in to an array
open my $infile, '<', 'PAM250.txt' or die $!;
my #array = <$infile>;
close $infile;
If you are going to read a very large file then it is better to read it line by line as you are doing but use PUSH to add each line at end of array.
push(#array,$line);
I will suggest you also read about some more array manipulating functions in perl
You're unclear to what you want to achieve.
Is every line an element of your array?
Is every line an array in your array and your "words" are the elements of this array?
Anyhow.
Here is how you can achieve both:
use strict;
use warnings;
use Data::Dumper;
# Read all lines into your array, after removing the \n
my #array= map { chomp; $_ } <>;
# show it
print Dumper \#array;
# Make each line an array so that you have an array of arrays
$_= [ split ] foreach #array;
# show it
print Dumper \#array;
try this...
sub room
{
my $result = "";
open(FILE, <$_[0]);
while (<FILE>) { $return .= $_; }
close(FILE);
return $result;
}
so you have a basic functionality without great words. the suggest before contains the risk to fail on large files. fastest safe way is that. call it as you like...
my #array = &room('/etc/passwd');
print room('/etc/passwd');
you can shorten, rename as your convinience believes.
to the kidding ducks nearby: by this way the the push was replaced by simplictiy. a text-file contains linebreaks. the traditional push removes the linebreak and pushing up just the line. the construction of an array is a simple string with linebreaks. now contain the steplength...

Changing element's positions in Perl

So I have a problem and I can't solve it. If I read some words from a file in Perl, in that file the words aren't in order, but have a number (as a first character) that should be the element's position to form a sentence.The 0 means that position is correct, 1 means that the word should be in position [1] etc.
The file looks like: 0This 3a 4sentence 2be 1should, and the solution should look like 0This 1should 2be 3a 4sentence.
In a for loop I get through the words array that i get from the file, and this is how i get the first character(the number) $firstCharacter = substr $words[$i], 0, 1;, but i don't know how to properly change the array.
Here's the code that I use
#!/usr/bin/perl -w
$arg = $ARGV[0];
open FILE, "< $arg" or die "Can't open file: $!\n";
$/ = ".\n";
while($row = <FILE>)
{
chomp $row;
#words = split(' ',$row);
}
for($i = 0; $i < scalar #words; $i++)
{
$firstCharacter = substr $words[$i], 0, 1;
if($firstCharacter != 0)
{
}
}
Just use sort. You can use a match in list context to extract the numbers, using \d+ will work even for numbers > 9:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #words = qw( 0This 3a 4sentence 2be 1should );
say join ' ', sort { ($a =~ /\d+/g)[0] <=> ($b =~ /\d+/g)[0] } #words;
If you don't mind the warnings, or you are willing to turn them off, you can use numeric comparison directly on the words, Perl will extract the numeric prefixes itself:
no warnings 'numeric';
say join ' ', sort { $a <=> $b } #words;
Assuming you have an array like this:
my #words = ('0This', '3a', '4sentence', '2be', '1should');
And you want it sorted like so:
('0This', '1should', '2be', '3a', '4sentence');
There's two steps to this. First is extracting the leading number. Then sorting by that number.
You can't use substr, because you don't know how long the number might be. For example, ('9Second', '12345First'). If you only looked at the first character you'd get 9 and 1 and sort them incorrectly.
Instead, you'd use a regex to capture the number.
my($num) = $word =~ /^(\d+)/;
See perlretut for more on how that works, particularly Extracting Matches.
Now that you can capture the numbers, you can sort by them. Rather than doing it in loop yourself, sort handles the sorting for you. All you have to do is supply the criterion for the sorting. In this case we capture the number from each word (assigned to $a and $b by sort) and compare them as numbers.
#words = sort {
# Capture the number from each word.
my($anum) = $a =~ /^(\d+)/;
my($bnum) = $b =~ /^(\d+)/;
# Compare the numbers.
$anum <=> $bnum
} #words;
There are various ways to make this more efficient, in particular the Schwartzian Transform.
You can also cheat a bit.
If you ask Perl to treat something as a number, it will do its damnedest to comply. If the string starts with a number, it will use that and ignore the rest, though it will complain.
$ perl -wle 'print "23foo" + "42bar"'
Argument "42bar" isn't numeric in addition (+) at -e line 1.
Argument "23foo" isn't numeric in addition (+) at -e line 1.
65
We can take advantage of that to simplify the sort by just comparing the words as numbers directly.
{
no warnings 'numeric';
#words = sort { $a <=> $b } #words;
}
Note that I turned off the warning about using a word as a number. use warnings and no warnings only has effect within the current block, so by putting the no warnings 'numeric' and the sort in their own block I've only turned off the warning for that one sort statement.
Finally, if the words are in a file you can use the Unix sort utility from the command line. Use -n for "numeric sorting" and it will do the same trick as above.
$ cat test.data
00This
3a
123sentence
2be
1should
$ sort -n test.data
00This
1should
2be
3a
123sentence
You should be able to split on the spaces, which will make the numbers the first character of the word. With that assumption, you can simply compare using the numerical comparison operator (<=>) as opposed to the string comparison (cmp).
The operators are important because if you compare strings, the first character is used, meaning 10, 11, and 12 would be out of order, and listed near the 1 (1,10,11,12,2,3,4… instead of 1,2,3,4…10,11,12).
Split, Then Sort
Note: #schwern commented an important point. If you use warnings -- and you should -- you will receive warnings. This is because the values of the internal comparison variables, $a and $b, aren't numbers, but strings (e.g., `"0this", "3a"). I've update the following Codepad and provided more suitable alternatives to avoid this issue.
http://codepad.org/xs2GH9xT
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my #words = split /\s/,$line;
my #sorted = sort {$a <=> $b} #words;
print qq{
Line: $line
Words: #words
Sorted: #sorted
};
Alternatives
One method is to ignore the warning using no warnings 'numeric' as in Schwern's answer. As he has shown, turning off the warnings in a block will re-enable it afterwards, which may be a little foolproof compared to Choroba's answer, which applies it to the broader scope.
Choroba's solution works by parsing the digits from the those values internally. This is much fewer lines of code, but I would generally advise against that for performance reasons. The regex isn't only run once per word, but multiple times over the sorting process.
Another method is to strip the numbers out and use them for the sort comparison. I attempt to do this below by creating a hash, where the key will be the number and the value will be the word.
Hash Mapping / Key Sort
Once you have an array where the values are the words prefixed by the numbers, you could just as easily split those number/word combo into a hash that has the key as the number and value as the word. This is accomplished by using split.
The important thing to note about the split statement is that a limit is passed (in this case 2), which limits the maximum number of fields the string is split into.
The two values are then used in the map to build the key/value assignment. Thus "0This" is split into "0" and "This" to be used in the hash as "0"=>"This"
http://codepad.org/kY8wwajc
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my #words = split /\s/, $line; # [ '0This', '3a', ... ]
my %mapped = map { split /(?=\D)/, $_, 2 } #words; # { '0'=>'This, '3'=>'a', ... }
my #sorted = #mapped{ sort { $a <=> $b } keys %mapped }; # [ 'This', 'should', 'be', ... ]
print qq{
Line: $line
Words: #words
Sorted: #sorted
};
This also can be further optimized, but uses multiple variables to illustrate the steps in the process.

perl split delimiter from file line by line

I have a text file named 'dataexample' with multiple line like this:
a|30|40
b|50|70
then I split the delimiter with this code:
open(FILE, 'dataexample') or die "File not exist";
while(<FILE>){
my #record = split(/\|/, $_);
print "$record[0]";
}
close FILE;
when I print "$record[0]" , this is what I got:
ab
what I expect :
a 30 40
so when I do print "$record[0][0]" I expect the output to be: a
Where I got it wrong?
Your loop while ( <FILE> ) { ... } reads a single line at a time from the file handle and puts it into $_
my #record = split(/\|/, $_) splits that line on pipe characters |, so since the first line is "a|30|40\n", #record will now be 'a', '30', "40\n". The newline read from the file remains, and you should use chomp to remove it if you don't want it there
So now $record[0] is a, which you print, and then go on to read the next line in the file, setting #record to 'b', '50', "70\n" this time. Now $record[0] is b, which you also print, showing ab on the console
You've now reached the end of the file, so the while loop terminates
It sounds like you're expecting a two-dimensional array. You can do that by pushing each array onto a main array each time you read a record, like this
use strict;
use warnings 'all';
open my $fh, '<', 'dataexample' or die qq{Unable to open "dataexample" for input: $!};
my #data;
while ( <$fh> ) {
chomp;
my #record = split /\|/;
push #data, \#record;
}
print "#{$data[0]}\n";
print "$data[0][0]\n";
output
a 30 40
a
Or, more concisely, like this, which produces exactly the same result but may be a little advanced for you
use strict;
use warnings 'all';
open my $fh, '<', 'dataexample' or die qq{Unable to open "dataexample" for input: $!};
my #data = map { chomp; [ split /\|/ ] } <$fh>;
print "#{$data[0]}\n";
print "$data[0][0]\n";
Some points to know about your own code
You must always use strict and use warnings 'all' at the top of every Perl program you write. It's a measure that will uncover many simple mistakes that you may not otherwise notice
You should use lexical filehandles together with the three-parameter form or open. And an open may fail for many other reasons that the file not existing, so you should include the built-in $! variable in your die string to say why it failed
Don't forget to chomp each record read from a file unless you want to keep then trailing newline or it doesn't matter to you
You will be able to write more concise code if you get used to using the default variable $_. For instance, the second parameter to split is $_ by default, so split(/\|/, $_) may be written as just split /\|/
You can use Data::Dumper to display the contents of your variables, which will help you to debug your code. Data::Dump is superior, but it isn't a core module so you will probably have to install it before you can use it in your code
You have to use
print "$record[1]";
print "$record[2]";
As they are stored in consecutive index values.
or
If you want to print the entire thing you can just do
print "#record\n";
You are printing the value at the first index in the array each time through the loop, and without the new line. So you get the first value from each line, right next to each other on the same line, thus ab.
Print the whole array, under quotes, with the new line. with your program changed a bit
use strict;
use warnings;
my $file = 'dataexample';
open my $fh, '<', $file or die "Error opening $file: $!";
while (<$fh>) {
chomp;
my #record = split(/\|/, $_);
print "#record\n";
}
close $fh;
With the quotes the elements are printed with spaces added between them so you get
a 30 40
b 50 70
If you print without quotes the elements get printed without extra spaces, so
this
print #record, "\n";
over the whole loop prints
a3040
b5070
If you don't have the new line "\n" either, it is all printed on one line so this
print #record;
altogether prints
a3040b5070
As for $record[0][0], this is not valid for the array you have. This would print from a two-dimensional array. Take, for example
my #data = ( [1.1, 2.2], [10, 20] );
This array #data has at its first index a reference to an array -- more precisely, an anonymous array [1.1, 2.2]. Its second element is an anonymous array [10, 20]. So $data[0][0] is: the first element of #data (so the first of the two anonymous arrays inside), and then the first element of that array, thus 1.1. Likewise $data[1][1] is 20.
Thanks to Sobrique for the comment.
But you don't have this in your program. When you split data into an array
while(<FILE>){
my #record = split(/\|/, $_);
# ...
}
it creates a new array named #record every time through the loop. So #record is a normal array, not two-dimensional. For that the syntax $record[0][0] doesn't mean much.
I think you're trying to create a 2d array, whereby each element contains all the pipe delimited items from each line of your input:
my #record;
while(<DATA>){
chomp;
my #split = split(/\|/);
push #record, [#split];
}
print "#{$record[0]}\n";
a 30 40
record[0] has the contents of column 1 - 'a' on the first iteration of the loop, 'b' on the second. record[1] has column 2 and so on. You put the print statement, print "record[0]" in the loop so you get 'a' printed in the first iteration and 'b' in the second.
To get what you wanted you need to replace you print statement with;
print join " ", #record, "\n";

Validate | delimited txt file while reading from an array in perl

I have a text file with delimiter as | and it has some 10 columns. I have put this file in an array and before reading i want to put check for file being valid meaning non empty and file having fixed 10 columns. If number of columns < 10 then an error message should be displayed
You can just count the number of pipe characters in each line. Ten columns means at least nine pipes, so you can say
perl -ne '($n = tr/|// + 1) and die "Only $n fields on line $.\n"' myfile.txt
my ($file, $fh);
$file = 'path/to/file.txt';
open $fh, '<', $file or die $!;
$line = <$fh>;
unless($fh=~/^.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*?\|.*/){print "You have more than 10 columns!\n";}
if($10=~/\|/){print "more than 10 columns!\n";}
close $fh;
EDIT:
There's many ways and this is just a quick slop answer. It will ensure at least 10 as stated, although you technically requested 9 or less with the > sign.
EDIT EDIT:
This will ensure at least 10, it could be more. I'd recommend counting with global regex, but its been a while for me. Alternatively, perhaps another method to ensure exactly 10:
$line = <$fh>;
my #check=split(/\|/, $line);
if (scalar #check !=10){print "bad file delimitation!\n";}
EDIT x3:
if($10=~/\|/){print "more than 10 columns!\n";} to ensure no more than 10, but try doing a global regex to count or the array method

Perl push function gives index values instead of array elements

I am reading a text file named, mention-freq, which has data in the following format:
1
1
13
2
I want to read the lines and store the values in an array like this: #a=(1, 1, 13, 2). The Perl push function gives the index values/line numbers, i.e., 1,2,3,4, instead of my desired output. Could you please point out the error? Here is what I have done:
use strict;
use warnings;
open(FH, "<mention-freq") || die "$!";
my #a;
my $line;
while ($line = <FH>)
{
$line =~ s/\n//;
push #a, $line;
print #a."\n";
}
close FH;
The bug is that you are printing the concatenation of #a and a newline. When you concatenate, that forces scalar context. The scalar sense of an array is not its contents but rather its element count.
You just want
print "#a\n";
instead.
Also, while it will not affect your code here, the normal way to remove the record terminator read in by the <> readline operator is using chomp:
chomp $line;

Resources