Random flling the array - perl - arrays

I need to know how can I fill the array in perl randomly. For example: I want declare an array containing 10 elements smaller than 60. Can someone explain me how can I do it or send me any guide? I would be so grateful.

I'm assuming you meant "ten non-negative integers less than 60".
With possibility of repeats:
my #rands = map { int(rand(60)) } 1..10;
For example,
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
0,28,6,49,26,19,56,32,56,16 <-- 56 is repeated
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
15,57,50,16,51,58,46,7,17,53
$ perl -E'say join ",", map { int(rand(60)) } 1..10;'
13,57,26,47,30,14,47,55,39,39 <-- 47 and 39 are repeated
Without possibility of repeats:
use List::Util qw( shuffle );
my #rands = (shuffle 0..59)[0..9];
For example,
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
13,50,8,21,11,24,28,51,55,38
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
1,0,58,46,47,49,52,33,5,13
$ perl -MList::Util=shuffle -E'say join ",", (shuffle 0..59)[0..9];'
19,43,45,49,23,53,2,38,59,35

You can simply do:
my #r = map int(rand(60)), 0..9;
say Dumper\#r;

You could take advantage of perl's random sorting of hash keys. This will fill an array of 10 elements randomly each time you run it:
use warnings;
use strict;
my #nums = (1 .. 60);
my %data;
$data{$_}++ foreach #nums;
my $count = 0;
my #random;
foreach (keys %data){
$count++;
push #random, $_ if $count <= 10;
}

Related

create a hash of occurrences in an array with map

I seem to recall that there was a "clever" way to create a hash from an array with Perl with map such that the keys are the elements of the the array and the values are the number of times the element appears. Something like this, although this does not work:
$ perl -e '#a = ('a','a','b','c'); %h = map { $_ => $_ + 1 } #a ; foreach $k (keys (%h)) { print "$k -> $h{$k}\n"}'
c -> 1
b -> 1
a -> 2
$
Am I imagining things? How can I do this?
You can write map {$h{$_}++} #a ignoring its return value, but why would you do this? for (#a){$h{$_}++} is easy enough to type.
So why would you not do it?
map is meant for transforming lists. It takes an input list and generates an output list. It can confuse a reader if you use it in a different way using side effect instead of output.
Also, although map is optimized to not create the output list when called in void context, it is slower:
use warnings;
use strict;
use Benchmark qw/cmpthese/;
my #in = map {chr(int(rand(127)+1))} 1..10000;
my %out;
cmpthese(10000,
{stmtfor => sub{%out = (); $out{$_}++ for #in},
voidmap => sub{%out = (); map {$out{$_}++} #in;},
}
);
__END__
Rate voidmap stmtfor
voidmap 2075/s -- -17%
stmtfor 2513/s 21% --
I'm not sure this is what you are looking for, but you can easily do that with a for rather than a map:
$ perl -e '#a = ('a','a','b','c'); $h{$_}++ for #a; foreach $k (keys (%h)) { print "$k -> $h{$k}\n"}'
c -> 1
b -> 1
a -> 2

Read space delimited text file into array of hashes [Perl]

I have text file that matches the following format:
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
And I need to read it into a data structure in perl that will allow me to sort and print according to any of those columns.
From left to right the columns are process_id, memory_size, cpu_time and program_name.
How can I read a text file with formatting like that in a way that allows me to sort the data structure and print it according to the sort?
My attempt so far:
my %tasks;
sub open_file{
if (open (my $input, "task_file" || die "$!\n")){
print "Success!\n";
while( my $line = <$input> ) {
chomp($line);
($process_id, $memory_size, $cpu_time, $program_name) = split( /\s/, $line, 4);
$tasks{$process_id} = $process_id;
$tasks{$memory_size} = $memory_size;
$tasks{$cpu_time} = $cpu_time;
$tasks{$program_name} = $program_name;
print "$tasks{$process_id} $tasks{$memory_size} $tasks{$cpu_time} $tasks{$program_name}\n";
}
This does print the output correctly, however I can't figure out how to then sort my resulting %tasks hash by a specific column (i.e. process_id, or any other column) and print the whole data structure in a sorted format.
You're storing the values under keys that are equal to the values. Use Data::Dumper to inspect the structure:
use Data::Dumper;
# ...
print Dumper(\%tasks);
You can store the pids in a hash of hashes, using the value of each column as the inner key.
#!/usr/bin/perl
use strict;
use warnings;
use feature qw{ say };
my #COLUMNS = qw( memory cpu program );
my %sort_strings = ( program => sub { $a cmp $b } );
my (%process_details, %sort);
while (<DATA>) {
my ($process_id, $memory_size, $cpu_time, $program_name) = split;
$process_details{$process_id} = { memory => $memory_size,
cpu => $cpu_time,
program => $program_name };
undef $sort{memory}{$memory_size}{$process_id};
undef $sort{cpu}{$cpu_time}{$process_id};
undef $sort{program}{$program_name}{$process_id};
}
say 'By pid:';
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for sort { $a <=> $b } keys %process_details;
for my $column (#COLUMNS) {
say "\nBy $column:";
my $cmp = $sort_strings{$column} || sub { $a <=> $b };
for my $value (sort $cmp keys %{ $sort{$column} }
) {
my #pids = keys %{ $sort{$column}{$value} };
say join ', ', $_, #{ $process_details{$_} }{#COLUMNS}
for #pids;
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
But if the data aren't really large and the sorting isn't time critical, just sorting the whole array of arrays by a given column is much easier to write and read:
#!/usr/bin/perl
use strict;
use feature qw{ say };
use warnings;
use enum qw( PID MEMORY CPU PROGRAM );
my #COLUMN_NAMES = qw( pid memory cpu program );
my %sort_strings = ((PROGRAM) => 1);
my #tasks;
push #tasks, [ split ] while <DATA>;
for my $column_index (0 .. $#COLUMN_NAMES) {
say "\nBy $COLUMN_NAMES[$column_index]:";
my $sort = $sort_strings{$column_index}
? sub { $a->[$column_index] cmp $b->[$column_index] }
: sub { $a->[$column_index] <=> $b->[$column_index] };
say "#$_" for sort $sort #tasks;
}
__DATA__
...
You need to install the enum distribution.
I can't figure out how to then sort my resulting %tasks hash by a specific column
You can't sort a hash. You need to convert each of your input rows in a hash (which you're doing successfully) and then store all of those hashes in an array. You can then print the contents of the array in a sorted order.
This seems to do what you want:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my #cols = qw[process_id memory_size cpu_time program_name];
#ARGV or die "Usage: $0 [sort_order]\n";
my $sort = lc shift;
if (! grep { $_ eq $sort } #cols ) {
die "$sort is not a valid sort order.\n"
. "Valid sort orders are: ", join('/', #cols), "\n";
}
my #data;
while (<DATA>) {
chomp;
my %rec;
#rec{#cols} = split;
push #data, \%rec;
}
if ($sort eq $cols[-1]) {
# Do a string sort
for (sort { $a->{$sort} cmp $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
} else {
# Do a numeric sort
for (sort { $a->{$sort} <=> $b->{$sort} } #data) {
say join ' ', #{$_}{#cols};
}
}
__DATA__
1 4730 1031782 init
4 0 6 events
2190 450 0 top
21413 5928 1 sshd
22355 1970 2009 find
I've used the built-in DATA filehandle to make the code simpler. You would need to replace that with some code to read from an external file.
I've used a hash slice to simplify reading the data into a hash.
The column that you want to sort by is passed into the program as a command-line argument.
Note that you have to sort the last column (the program name) using string comparison and all other columns using numeric comparison.
This decides how to sort using the first argument the script receives.
#!/usr/bin/env perl
use strict;
use warnings;
use feature 'say';
open my $fh, '<', 'task_file';
my #tasks;
my %sort_by = (
process_id=>0,
memory_size=>1,
cpu_time=>2,
program_name=>3
);
my $sort_by = defined $sort_by{defined $ARGV[0]?$ARGV[0]:0} ? $sort_by{$ARGV[0]} : 0;
while (<$fh>) {
push #tasks, [split /\s+/, $_];
}
#tasks = sort {
if ($b->[$sort_by] =~ /^[0-9]+$/ ) {
$b->[$sort_by] <=> $a->[$sort_by];
} else {
$a->[$sort_by] cmp $b->[$sort_by];
}
} #tasks;
for (#tasks) {
say join ' ', #{$_};
}

"Use of uninitialized value" when indexing an array

I get the following error from Perl when trying to run the code below
Use of uninitialized value within #words in concatenation (.) or string...
It references the line where I try to create an array made up of three-word sequences (the line that starts with $trigrams). Can anyone help me figure out the problem?
my %hash;
my #words;
my $word;
my #trigrams;
my $i = 0;
while (<>) {
#words = split;
foreach $word (#words) {
$hash{$word}++;
# Now trying to create the distinct six-grams in the 10-K.
$trigrams[$i] = join " ", $words[$i], $words[$i + 1], $words[$i + 2];
print "$words[$i]\n";
$i++;
}
}
All that is happening is that you are falling off the end of the array #words. You are executing the loop for each element of #words, so the value of $i goes from 0 to $#words, or the index of the final element of the array. So the line
join " ", $words[$i], $words[$i + 1], $words[$i + 2];
accesses the last element of the array $words[$i] and two elements beyond that which don't exist.
In this case, as with any loop which uses the current index of an array, it is easiest to iterate over the array indices instead of the contents. For the join to be valid you need to start at zero and stop at two elements before the end, so 0 .. $#words-2.
It is also neater to use an array slice to select the three elements for the trigram, and use the fact that interpolating an array into a string, as in "#array", will do the same as join ' ', #array. (More precisely, it does join $", #array, and $" is set to a single space by default.)
I suggest this fix. It is essential to use strict and use warnings at the start of every Perl program, and you should declare all your variables using my as late as possible.
use strict;
use warnings;
my %hash;
while (<>) {
my #words = split;
my #trigrams;
for my $i (0 .. $#words - 2) {
my $word = $words[$i];
++$hash{$word};
$trigrams[$i] = "#words[$i,$i+1,$i+2]";
print "$word\n";
}
}
Update
You may prefer this if it isn't too terse for you
use strict;
use warnings;
my %hash;
while (<>) {
my #words = split;
my #trigrams = map "#words[$_,$_+1,$_+2]", 0 .. $#words-2;
}

Remove elements from an array which have a substring that is itself an element of the array

In Perl, I'd like to remove all elements from an array where another element of the same array is a non-empty substring of said element.
Say I have the array
#itemlist = ("abcde", "ab", "khi", "jklm");
In this instance I would like to have the element "abcde" removed, because "ab" is a substring of "abcde".
I could make a copy of the array (maybe as a hash?), iterate over it, try to index with every element of the original array and remove it, but there has to be a more elegant way, no?
Thanks for your help!
Edited for clarity a bit.
You could construct a regex from all the items and throw out anything that matches:
$alternation = join('|', map(quotemeta, #itemlist));
#itemlist = grep !/($alternation).|.($alternation)/, #itemlist;
The ().|.() thing just ensures that an item doesn't match itself.
Well, I wouldn't call this elegant, but here goes:
#!usr/bin/perl
use strict;
use warnings;
my #itemlist = ("abcde", "ab", "khi", "jklm");
#itemlist = grep {
#itemlist ~~ sub {$_ !~ /(?:.\Q$_[0]\E|\Q$_[0]\E.)/}
} #itemlist;
print "#itemlist";
It relies on a rather obscure behavior of smart match: if the left argument is an array and the right argument a sub, it calls the sub for each element, and the final result is true only if the sub returns true for each element.
Explanation: for each element of the array, it checks that no other element is a substring of that element (requiring at least one additional character so that elements won't match themselves).
Note: wdebeaum's answer is probably the one I would prefer in the real world. Still, it is kind of interesting the strange things one can do with smart match.
wdebeaum's answer is the solution to use, not the one below, but I learned something by doing it and perhaps someone else will too. After I had written mine I decided to test it on lists of several thousand elements.
b.pl:
#!/usr/bin/perl
use strict;
use warnings;
my #itemlist = <>;
for(#itemlist) { chomp; }
my $regex;
if(defined $ENV{wdebeaum}) {
# wdebeaum's solution
my $alternation = join('|', map(quotemeta, #itemlist));
$regex = qr/(?:$alternation).|.(?:$alternation)/;
} else {
# my solution
$regex = join "|", map {qq{(?:\Q$_\E.)|(?:.\Q$_\E)}} #itemlist;
}
my #result = grep !/$regex/, #itemlist;
print scalar #itemlist, "\t", scalar #result, "\n";
I generated a list of 5000 random words.
sort -R /usr/share/dict/american-english|head -5000 > some-words
For small lists both solutions seem fine.
$ time head -200 some-words | wdebeaum=1 ./b.pl
200 198
real 0m0.012s
user 0m0.004s
sys 0m0.004s
$ time head -200 some-words | ./b.pl
200 198
real 0m0.068s
user 0m0.060s
sys 0m0.004s
But for larger lists, wdebeaum's is clearly better.
$ time cat some-words | wdebeaum=1 ./b.pl
5000 1947
real 0m0.068s
user 0m0.064s
sys 0m0.000s
$ time cat some-words | ./b.pl
5000 1947
real 0m8.305s
user 0m8.277s
sys 0m0.012s
I think the reason for the difference, is that even though both regular expressions have the same number of possible paths, my regex has more paths that have to be tried, since it has the same number of .s as paths, while wdebebaum's has only two.
You can use a hash to count substrings of all the words. Any word in the list that has a higher count than one is then a substring of another word. The minimum length of the substrings is two in this example:
use strict;
use warnings;
use feature 'say';
my #list = qw(abcde ab foo foobar de oba cd xs);
my %count;
for my $word (#list) {
my $len = length $word;
$count{$word}++;
for my $start (0 .. $len - 2) {
for my $long (2 .. $len - 2) {
my $sub = substr($word, $start, $long);
$count{$sub}++;
}
}
}
say for grep $count{$_} == 1, #list;
Output:
abcde
foobar
xs
The following will remove the substring from the array.
#!/usr/bin/perl
use strict;
use warnings;
my #ar=("asl","pwe","jsl","nxu","sl","baks","ak");
foreach my $i (#ar){
my $p = grep /$i/, #ar;
if ( $p == 1 ){
print "$i" , "\n";
}
}
I had the inverse problem: removing from the list strings which are substrings of other strings. Here is my not-too-elegant solution.
sub remove_substrings_from_list {
my #list = #_;
my #vals_without_superstrings;
my %hash_of_others;
for ( 0 .. $#list ) {
my $a = shift #list;
$hash_of_others{$a} = [ #list ];
push #list, $a;
}
foreach my $k ( keys %hash_of_others ) {
push #vals_without_superstrings, $k unless grep { index( $_, $k ) != -1 } #{ $hash_of_others{$k} };
}
return #vals_without_superstrings;
}

regex matching I think

First sorry if I should have added this to my earlier question today, but I now have the below code and am having problems getting things to add up to 100...
use strict;
use warnings;
my #arr = map {int( rand(49) + 1) } ( 1..100 ); # build an array of 100 random numbers between 1 and 49
my #count2;
foreach my $i (1..49) {
my #count = join(',', #arr) =~ m/,$i,/g; # ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
# push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
push(#count2, [$count1, $i]);
}
#sort #count2 and print the top 7
my #sorted = sort { $b->[0] <=> $a->[0] } #count2;
my $sum = 0;
foreach my $i (0..$#sorted) { # (0..6)
printf "%d times for %d\n", $sorted[$i][0], $sorted[$i][1];
$sum += $sorted[$i][0]; # try to add up/sum all numbers in the first coloum to make sure they == 100
}
print "Generated $sum random numbers.\n"; # doesn't add up to 100, I think it is because of the regex and because the first number doesn't have a "," in front of it
# seem to be always 96 or 97, 93...
Replace these two lines:
my #count = join(',', #arr) =~ m/,$i,/g; # ???
my $count1 = scalar(#count); # I want this $count1 to be the number of times each of the numbers($i) was found within the string/array.
with this:
my $count1 = grep { $i == $_ } #arr;
grep will return a list of elements where only the expression in {} evaluates to true. This is less error-prone and much more efficient than joining the entire array and using a a regex. Also note that scalar is not necessary since the variable $count1 is scalar, so perl will return the result of grep in scalar context.
You can also get rid of this line:
push(#count2, $count1 ." times for ". $i); # pushing a "number then text and a number / scalar, string, scalar" to an array.
since you are already printing the same information in your last foreach loop.
#!/usr/bin/perl
use strict; use warnings;
use YAML;
my #arr;
$#arr = 99;
my %counts;
for my $i (0 .. 99) {
my $n = int(rand(49) + 1);
$arr[ $i ] = $n;
++$counts{ $n };
}
my #result = map [$_, $counts{$_}],
sort {$counts{$a} <=> $counts{$b} }
keys %counts;
my $sum;
$sum += $_->[1] for #result;
print "Number of draws: $sum\n";
You can probably reuse some well-tested code from List::MoreUtils.
use List::MoreUtils qw/ indexes /;
...
foreach my $i (1..49) {
my #indexes = indexes { $_ == $i } #arr;
my $count1 = scalar( #indexes );
push( #count2, [ $count1, $i ] );
}
If you don't need the warns in the sum loop, then I'd recommend using sum from List:Util.
use List::Util qw/ sum /;
...
my $sum = sum map { $_->[0] } #sorted;
If you insist on the loop, rewrite it as:
foreach my $i ( #sorted ) {

Resources