How to sort numbers in Perl - arrays

print "#_\n";
4109 4121 6823 12967 12971 14003 20186
How do I sort it in Perl?
Using #sorted = sort(#_); gives me an alphabetical ordering:
13041 13045 14003 20186 4109 4121 6823
How do I get a numerical ordering? Does Perl have built-in functions for merge-sort, insertion-sort, etc.?

You can pass a custom comparison function to Perl's sort routine. Just use:
#sorted = sort { $a <=> $b } #unsorted;
The sort function accepts a custom comparison function as its first argument, in the form of a code block. The {...} part is just this code block (see http://perldoc.perl.org/functions/sort.html ).
sort will call this custom comparison function whenever it needs to compare two elements from the array to be sorted. sort always passes in the two values to compare as $a, $b, and the comparison function has to return the result of the comparison. In this case it just uses the operator for numeric comparison (see http://perldoc.perl.org/perlop.html#Equality-Operators ), which was probably created just for this purpose :-).
Solution shamelessly stolen from "Perl Cookbook", Chapter 04 Sub-chapter 15 (buy the book - it's worth it!)

Supply a comparison function to sort():
# sort numerically ascending
my #articles = sort {$a <=> $b} #files;
# sort numerically descending
my #articles = sort {$b <=> $a} #files;
The default sort function is cmp, string comparison, which would sort (1, 2, 10) into (1, 10, 2) . <=> , used above, is the numerical comparison operator.

Perl's sort by default sorts alphabetically in ASCII order. To sort numerically you can use:
#sorted = sort { $a <=> $b } #_;

This is a Perl FAQ. From the command line:
perldoc -q sort
perlfaq4: How do I sort an array by (anything)?

#l = (4109, 4121, 6823, 12967, 12971, 14003, 20186, 1, 3, 4);
#l = sort { $a <=> $b } #l;
print "#l\n"; # 1 3 4 4109 4121 6823 12967 12971 14003 20186
You have to supply your own sorting subroutine { $a <=> $b }

You can predefine a function which should be used to compe values in your array.
perldoc -f sort gives you an example:
# Sort using explicit subroutine name
sub byage {
$age{$a} <=> $age{$b}; # Presuming numeric
}
#sortedclass = sort byage #class;
The <=> operator is used to sort numerically.
#sorted = sort {$a <=> $b} #unsorted;

You find here (and in a lot of other places) that the way to sort a numeric array is:
#sorted_array = sort { $a <=> $b } #unsorted_array;
Now you try it, and you get an error: "Can't use "my $a" in sort comparison"! (This is because you have already declared '$a', using 'strict.pm'). But then, you can't use non-declared variables either since they will be rejected as undefined! So, you might feel trapped in an impasse.
Neither perldoc.perl.org, nor most other places mention that '$a' and '$b' are reserved (tokens) for this use! (This of course when one uses 'strict', which one should. And which is quite crazy, because 'a' and 'b' are among the most common short variables used in programming, and logically so!)

Related

Why Perl Sort function cannot arrange array's element in my expected incremental manner?

Perl Sort function unable to arrange array elements in my expected incremental manner
#array_sort = sort { $a <=> $b } #array
#array = ("BE_10", "BE_110", "BE_111", "BE_23", "BE_34", "BE_220", "BE_335");
#array_sort = sort { $a <=> $b } #array;
print "array_sort = #array_sort\n";
Expected result:
array_sort = BE_10 BE_23 BE_34 BE_110 BE_111 BE_220 BE_335
Actual result:
array_sort = BE_10 BE_110 BE_111 BE_23 BE_34 BE_220 BE_335
Always use use strict; use warnings;. It would have found your problem, which is that all your strings have the numerical value of zero. Since all strings are numerically identical, the sort function you provided always returns zero. Because of this, and because Perl used a stable sort, the order of the strings remained unchanged.
You wish to perform a "natural sort", and there are modules such as Sort::Key::Natural that will do that.
use Sort::Key::Natural qw( natsort );
my #sorted = natsort #unsorted;
Sounds like a good case for a Schwartzian transform.
If the prefix is always going to be the same and it's just the numbers after the underscore that differ:
my #array = ("BE_10", "BE_110", "BE_111", "BE_23", "BE_34", "BE_220", "BE_335");
my #array_sort = map { $_->[0] }
sort { $a->[1] <=> $b->[1] }
map { [ $_, (split /_/, $_)[1] ] } #array;
print "array_sort = #array_sort\n";
And if it might be different:
my #array = ("BE_10", "BE_110", "BE_111", "BE_23", "CE_34", "BE_220", "CE_335");
my #array_sort = map { $_->[0] }
sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] }
map { [ $_, split(/_/, $_) ] } #array;
print "array_sort = #array_sort\n";
Basic idea is that you decompose the original array into a list of array refs holding the original element and the transformed bit(s) you want to sort on, do the sort, and then extract the original elements in the new sorted order.

Changing element's positions in Perl

So I have a problem and I can't solve it. If I read some words from a file in Perl, in that file the words aren't in order, but have a number (as a first character) that should be the element's position to form a sentence.The 0 means that position is correct, 1 means that the word should be in position [1] etc.
The file looks like: 0This 3a 4sentence 2be 1should, and the solution should look like 0This 1should 2be 3a 4sentence.
In a for loop I get through the words array that i get from the file, and this is how i get the first character(the number) $firstCharacter = substr $words[$i], 0, 1;, but i don't know how to properly change the array.
Here's the code that I use
#!/usr/bin/perl -w
$arg = $ARGV[0];
open FILE, "< $arg" or die "Can't open file: $!\n";
$/ = ".\n";
while($row = <FILE>)
{
chomp $row;
#words = split(' ',$row);
}
for($i = 0; $i < scalar #words; $i++)
{
$firstCharacter = substr $words[$i], 0, 1;
if($firstCharacter != 0)
{
}
}
Just use sort. You can use a match in list context to extract the numbers, using \d+ will work even for numbers > 9:
#! /usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #words = qw( 0This 3a 4sentence 2be 1should );
say join ' ', sort { ($a =~ /\d+/g)[0] <=> ($b =~ /\d+/g)[0] } #words;
If you don't mind the warnings, or you are willing to turn them off, you can use numeric comparison directly on the words, Perl will extract the numeric prefixes itself:
no warnings 'numeric';
say join ' ', sort { $a <=> $b } #words;
Assuming you have an array like this:
my #words = ('0This', '3a', '4sentence', '2be', '1should');
And you want it sorted like so:
('0This', '1should', '2be', '3a', '4sentence');
There's two steps to this. First is extracting the leading number. Then sorting by that number.
You can't use substr, because you don't know how long the number might be. For example, ('9Second', '12345First'). If you only looked at the first character you'd get 9 and 1 and sort them incorrectly.
Instead, you'd use a regex to capture the number.
my($num) = $word =~ /^(\d+)/;
See perlretut for more on how that works, particularly Extracting Matches.
Now that you can capture the numbers, you can sort by them. Rather than doing it in loop yourself, sort handles the sorting for you. All you have to do is supply the criterion for the sorting. In this case we capture the number from each word (assigned to $a and $b by sort) and compare them as numbers.
#words = sort {
# Capture the number from each word.
my($anum) = $a =~ /^(\d+)/;
my($bnum) = $b =~ /^(\d+)/;
# Compare the numbers.
$anum <=> $bnum
} #words;
There are various ways to make this more efficient, in particular the Schwartzian Transform.
You can also cheat a bit.
If you ask Perl to treat something as a number, it will do its damnedest to comply. If the string starts with a number, it will use that and ignore the rest, though it will complain.
$ perl -wle 'print "23foo" + "42bar"'
Argument "42bar" isn't numeric in addition (+) at -e line 1.
Argument "23foo" isn't numeric in addition (+) at -e line 1.
65
We can take advantage of that to simplify the sort by just comparing the words as numbers directly.
{
no warnings 'numeric';
#words = sort { $a <=> $b } #words;
}
Note that I turned off the warning about using a word as a number. use warnings and no warnings only has effect within the current block, so by putting the no warnings 'numeric' and the sort in their own block I've only turned off the warning for that one sort statement.
Finally, if the words are in a file you can use the Unix sort utility from the command line. Use -n for "numeric sorting" and it will do the same trick as above.
$ cat test.data
00This
3a
123sentence
2be
1should
$ sort -n test.data
00This
1should
2be
3a
123sentence
You should be able to split on the spaces, which will make the numbers the first character of the word. With that assumption, you can simply compare using the numerical comparison operator (<=>) as opposed to the string comparison (cmp).
The operators are important because if you compare strings, the first character is used, meaning 10, 11, and 12 would be out of order, and listed near the 1 (1,10,11,12,2,3,4… instead of 1,2,3,4…10,11,12).
Split, Then Sort
Note: #schwern commented an important point. If you use warnings -- and you should -- you will receive warnings. This is because the values of the internal comparison variables, $a and $b, aren't numbers, but strings (e.g., `"0this", "3a"). I've update the following Codepad and provided more suitable alternatives to avoid this issue.
http://codepad.org/xs2GH9xT
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my #words = split /\s/,$line;
my #sorted = sort {$a <=> $b} #words;
print qq{
Line: $line
Words: #words
Sorted: #sorted
};
Alternatives
One method is to ignore the warning using no warnings 'numeric' as in Schwern's answer. As he has shown, turning off the warnings in a block will re-enable it afterwards, which may be a little foolproof compared to Choroba's answer, which applies it to the broader scope.
Choroba's solution works by parsing the digits from the those values internally. This is much fewer lines of code, but I would generally advise against that for performance reasons. The regex isn't only run once per word, but multiple times over the sorting process.
Another method is to strip the numbers out and use them for the sort comparison. I attempt to do this below by creating a hash, where the key will be the number and the value will be the word.
Hash Mapping / Key Sort
Once you have an array where the values are the words prefixed by the numbers, you could just as easily split those number/word combo into a hash that has the key as the number and value as the word. This is accomplished by using split.
The important thing to note about the split statement is that a limit is passed (in this case 2), which limits the maximum number of fields the string is split into.
The two values are then used in the map to build the key/value assignment. Thus "0This" is split into "0" and "This" to be used in the hash as "0"=>"This"
http://codepad.org/kY8wwajc
use strict;
use warnings;
my $line = q{0This 3a 4sentence 2be 1should};
my #words = split /\s/, $line; # [ '0This', '3a', ... ]
my %mapped = map { split /(?=\D)/, $_, 2 } #words; # { '0'=>'This, '3'=>'a', ... }
my #sorted = #mapped{ sort { $a <=> $b } keys %mapped }; # [ 'This', 'should', 'be', ... ]
print qq{
Line: $line
Words: #words
Sorted: #sorted
};
This also can be further optimized, but uses multiple variables to illustrate the steps in the process.

How to use Perl `sort` and `pairwise` if I already have variables `$a` and `$b`

I want to write a Perl subroutine like this:
use List::MoreUtils qw{pairwise};
sub multiply {
my ($a, $b) = #_;
return [ pairwise {$a * $b} #$a, #$b ];
}
(multiplication is just an example, I'm gonna do something else)
However, this gives me nonsensical results, because Perl gets confused and tries to use the outer $a and $b instead of the list items.
If I try this code in a REPL (such as reply):
0> use List::MoreUtils qw{pairwise};
1> my $a = [1, 2, 3];
…
2> my $b = [3, 2, 1];
…
3> pairwise {$a * $b} #$a, #$b;
Can't use lexical $a or $b in pairwise code block at reply input line 1.
4> sort {$a <=> $b} #$a;
"my $a" used in sort comparison at reply input line 1.
"my $b" used in sort comparison at reply input line 1.
$res[2] = [
1,
3,
2
]
So far, my solution has been replacing my ($a, $b) = #_; with my ($a_, $b_) = #_; (i.e. renaming the troublesome variables).
Is there any other solution?
Well, first off - $a is an awful variable name. Why are you doing that? Single letters are almost always a bad choice, and just about ok if it's just an iterator or some other simple use.
So the solution would be 'don't call them $a and $b'.
I mean, if you really don't want to use a different name:
sub multiply {
return [ pairwise { $a * $b } #{$_[0]}, #{$_[1]} ];
}
But what you've done is really highlighted why namespace clashes are a bad idea, and thus using $a and $b in your code as actual variables is asking for trouble.
I don't think there's a way to make it work they way you're trying to, and even if you could - you'd end up with some code that's really confusing.
I mean, something like this should work:
sub multiply {
my ( $a, $b ) = #_;
return [ pairwise { $::a * $::b } #$a, #$b ];
}
Because then you're explicitly using the package $a and $b not the lexical one. But note - that won't work the same way if they're imported from another package, and it just generally gets messy.
But it's pretty filthy. perlvar tells you outright that you shouldn't do it:
$a
$b
Special package variables when using sort(), see sort. Because of this specialness $a and $b don't need to be declared (using use vars , or our()) even when using the strict 'vars' pragma. Don't lexicalize them with my $a or my $b if you want to be able to use them in the sort() comparison block or function.
And that's before you get into the territory of 'single letter variable names are pretty much always a bad idea'.
So seriously. Is:
my ( $first, $second ) = #_;
Actually so bad?
pairwise sets the $a and $b found in its caller's package, so you could use fully qualified variable names.
Assuming this code is in package main,
use List::MoreUtils qw{pairwise};
sub multiply {
my ($a, $b) = #_;
return [ pairwise { $main::a * $main::b } #$a, #$b ];
}
Alternatively, our creates a lexical variable that is aliased to the current package's variable with the same name, and it will override the my declaration since the most recent lexical-variable declaration wins out.
use List::MoreUtils qw{pairwise};
sub multiply {
my ($a, $b) = #_;
return [ pairwise { our $a * our $b } #$a, #$b ];
}
The second approach is obviously much less fragile than the first one, but you know what would be even less fragile? Not declaring $a and $b as lexical vars in the first place! :) It would be far simpler just to use different variables. Even $A and $B or $x and $y would be better.

Comparing the values between two hash of arrays

I have two hash of arrays. I want to compare whether the keys in both hash of arrays contain the same values.
#!/usr/bin/perl
use warnings; use strict;
my %h1 = (
w => ['3','1','2'],
e => ['6','2','4'],
r => ['8', '1'],
);
my %h2 = (
w => ['1','2','3'],
e => ['4','2','6'],
r => ['4','1'],
);
foreach ( sort {$a <=> $b} (keys %h2) ){
if (join(",", sort #{$h1{$_}})
eq join(",", sort #{$h1{$_}})) {
print join(",", sort #{$h1{$_}})."\n";
print join(",", sort #{$h2{$_}})."\n\n";
} else{
print "no match\n"
}
}
if ("1,8" eq "1,4"){
print "true\n";
} else{
print "false\n";
}
The output is supposed to should be:
2,4,6
2,4,6
1,2,3
1,2,3
no match
false
but for some reason my if-statement isn't working. thanks
Smart match is an interesting solution; available from 5.010 onward:
if ([sort #{$h1{$_}}] ~~ [sort #{$h2{$_}}]) { ... }
The smart match on array references returns true when the corresponding elements of each array smartmatch themselves. For strings, smart matching tests for string equality.
This may be better than joining the members of an array, as smart matching works for arbitrary data*. On the other hand, smart matching is quite complex and has hidden gotchas
*on arbitrary data: If you can guarantee all your strings only contain numbers, then everything is allright. However, then you could just have used numbers instead:
%h1 = (w => [3, 1, 2], ...);
# sort defaults to alphabetic sorting. This is undesirable here
if ([sort {$a <=> $b} #{$h1{$_}}] ~~ [sort {$a <=> $b} #{$h2{$_}}]) { ... }
If your data may contain arbitrary strings, especially strings containing commata, then your comparision isn't safe — consider the arrays
["1foo,2bar", "3baz"], ["1foo", "2bar,3baz"] # would compare equal per your method
if (join(",", sort #{$h1{$_}})
eq join(",", sort #{$h1{$_}})) {
Should be :
if (join(",", sort #{$h1{$_}})
eq join(",", sort #{$h2{$_}})) {
Note the $h2. You're comparing one hash to itself.
Try this:It compares two hashes line by line exactly.
if ( join(",", sort #{ $h1{$_}})
eq join(",", sort #{ $h2{$_}}) ) #Compares two lines exactly

Perl hash of array - numerical sort of alphanumeric keys

I understand that the default sort in Perl is an ASCII sort, not numerical. But how can I numerically sort strings that have numbers?
For example, I have a hash of arrays, like so:
myhash{ANN20021015_0101_XML_71.9} = ("anta", "hunna", "huma");
myhash{ANN20021115_0049_XML_14.1} = ("lqd", "qAl", "arrajul");
myhash{ANN20021115_0049_XML_14.2} = ("anna", "alwalada");
I just need the keys to be sorted...but the sorting is numerical within strings. I can't do a string sort, because I end up with "10" following "1", but I can't do a numerical sort either!
First of all your code isn't valid Perl and may not do what you think it does. Always
use strict;
use warnings;
at the head of your program to resolve any simple mistakes. The code should look like
$myhash{'ANN20021015_0101_XML_71.9'} = ["anta", "hunna", "huma"];
$myhash{'ANN20021115_0049_XML_14.1'} = ["lqd", "qAl", "arrajul"];
$myhash{'ANN20021115_0049_XML_14.2'} = ["anna", "alwalada"];
To sort on something other than the entire value, you can transform $a and $b within the sort block, and sort the result numerically <=> instead of stringwise <=>. This code does what you need
my #sorted = sort {
my ($aa) = $a =~ /.*_(.+)/;
my ($bb) = $b =~ /.*_(.+)/;
$aa <=> $bb;
} keys %myhash;
But if you have a large amount of data it may be profitable to use the Schwartzian Transform which will avoid extracting the numeric part of your strings every time they are compared
my #sorted = map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { /.*_(.+)/ and [$_, $1] }
keys %myhash;
You need to do a custom sort: cut your strings into parts you know are literals/numbers and compare those as needed.
From your example it looks like you want literal.digits, but you can change the regular expression so that it fits you.
my $cut = qr/(.*?\.)(\d+)(.*)/;
sort {
my #a = $a =~ $cut; my #b = $b =~ $cut;
$a[0] cmp $b[0] || $a[1] <=> $b[1] || $a[2] cmp $b[2]
} keys %myhash;
See also Borodin's answer.

Resources