Remove even amounts of duplicates from an array - arrays

I have an array
[ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ]
I need to delete even amounts of duplicates.
That means, if a value appears an even number of times in the array then remove them all, but if it appears an odd number of times then keep just one.
The result from the array above should be
[ 1, 0, 2, 4 ]
How can I do that?

Removing duplicates is usually done as follows:
use List::Util 1.44 qw( uniqnum );
#a = uniqnum #a;
or
my %seen;
#a = grep { !$seen{$_}++ } #a;
To achieve what you want, we simply need chain grep that removes the other undesired elements.
use List::Util 1.44 qw( uniqnum );
#a = uniqnum grep { $counts{$_} % 2 } #a;
or
my %seen;
#a = grep { !$seen{$_}++ } grep { $counts{$_} % 2 } #a;
or
my %seen;
#a = grep { ( $counts{$_} % 2 ) && !$seen{$_}++ } #a;
The above solutions rely on having the count of each value. To obtain that, we can use the following:
my %counts;
++$counts{$_} for #a;
All together:
my ( %counts, %seen );
++$counts{$_} for #a;
#a = grep { ( $counts{$_} % 2 ) && !$seen{$_}++ } #a;
Note that these methods of removing duplicates preserve the order of the elements (keeping the first duplicate). This is more efficient (O(N)) then involving sort (O(N log N)) to avoid producing something non-deterministic.

This really isn't difficult, and it is very bad form to show no attempt at all at solving it yourself. I would like someone who posted questions like this to describe how they feel comfortable getting someone else to do their work for them. Even difficult crosswords don't get this flood of requests for a solution, but in this case presumably you are being paid for a solution written by someone else? Why is that not a problem to you?
Build a hash to calculate the current count for each value
use $_ % 2 do determine the new final count
Deconstruct the hash to a new array
my $array = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
my #new_array = do {
my %counts;
++$counts{$_} for #$array;
map {
( $_ ) x ( $counts{$_} % 2 )
} sort { $a <=> $b } keys %counts;
};
use Data::Dump;
dd \#new_array;
output
[0, 1, 2, 4]

See the comments, to see how this possible solution does it.
#!/usr/bin/perl
use strict;
use warnings;
my #a = qw(1 0 0 0 5 2 4 5 2 2);
# Move through the array.
for (my $i = 0; $i < scalar(#a); ) {
# Move through the positions at and ahead of current position $i
# and collect all positions $j, that share the value at the
# current position $i.
my #indexes;
for (my $j = $i; $j < scalar(#a); $j++) {
if ($a[$j] == $a[$i]) {
push(#indexes, $j);
}
}
if (scalar(#indexes) % 2) {
# If the number of positions collected is odd remove the first
# position from the collection. The number of positions in the
# collection is then even afterwards.
shift(#indexes);
# As we will keep the value at the current position $i no new
# value will move into that position. Hence we have to advance
# the current position.
$i++;
}
# Move through the collected positions.
for (my $k = 0; $k < scalar(#indexes); $k++) {
# Remove the element at the position as indicated by the
# $k'th element of the collect positions.
# We have to subtract $k from the collected position, to
# compensate for the movement of the remaining elements to the
# left.
splice(#a, $indexes[$k] - $k, 1);
}
}
print("#a");

You have a bunch of answers, here's another:
use strict;
use warnings;
use Data::Dumper;
my $input = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
my $output = dedupe_evens($input);
print Data::Dumper->Dump([$input, $output], ['$input', '$output']);
exit;
sub dedupe_evens {
my($input) = #_;
my %seen;
$seen{$_}++ foreach #$input;
my #output = grep {
my $count = delete $seen{$_}; # only want first occurrence
$count && $count % 2;
} #$input;
return \#output;
}
Which produces this output (reformatted for brevity):
$input = [ 1, 0, 0, 0, 5, 2, 4, 5, 2, 2 ];
$output = [ 1, 0, 2, 4 ];

Related

Is it possible to assign two variables in Perl foreach loop?

Is it possible to assign two variables the same data from an array in a Perl foreach loop?
I am using Perl 5, I think I came across something in Perl 6.
Something like this:
my $var1;
my $var2;
foreach $var1,$var2 (#array){...}
It's not in the Perl 5 core language, but List::Util has a pairs function which should be close enough (and a number of other pair... functions which may be more convenient, depending on what you're doing inside the loop):
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use List::Util 'pairs';
my #list = qw(a 1 b 2 c 3);
for my $pair (pairs #list) {
my ($first, $second) = #$pair;
say "$first => $second";
}
Output:
a => 1
b => 2
c => 3
The easiest way to use this is with a while loop that calls splice on the first two elements of the array each time,
while (my($var1, $var2) = splice(#array, 0, 2)) {
...
}
However, unlike foreach, this continually does a double-shift on the original array, so when you’re done, the array is empty. Also, the variables assigned are copies, not aliases as with foreach.
If you don’t like that, you can use a C-style for loop:
for (my $i = 0; $i < #array; $i += 2) {
my($var1, $var2) = #array[$i, $i+1];
...
}
That leaves the array in place but does not allow you to update it the way foreach does. To do that, you need to address the array directly.
my #pairlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for (my $i = 0; $i < #pairlist; $i += 2) {
$pairlist[ $i + 0 ] x= 2;
$pairlist[ $i + 1 ] *= 2;
}
print "Array is #pairlist\n";
That prints out:
Array is feefee 2 fiefie 4 foefoe 6 fumfum 8
You can get those into aliased variables if you try hard enough, but it’s probably not worth it:
my #kvlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for (my $i = 0; $i < #kvlist; $i += 2) {
our ($key, $value);
local(*key, $value) = \#kvlist[ $i, $i + 1 ];
$key x= 2;
$value *= 2;
}
print "Array is #kvlist\n";
Which prints out the expected changed array:
Array is feefee 2 fiefie 4 foefoe 6 fumfum 8
Note that the pairs offered by the List::Pairwise module, which were but very recently added to the core List::Util module (and so you probably cannot use it), are still not giving you aliases:
use List::Util 1.29 qw(pairs);
my #pairlist = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
for my $pref (pairs(#pairlist)) {
$pref->[0] x= 2;
$pref->[1] *= 2;
}
print "Array is #pairlist\n";
That prints out only:
Array is fee 1 fie 2 foe 3 fum 4
So it didn’t change the array at all. Oops. :(
Of course, if this were a real hash, you could double the values trivially:
for my $value (values %hash) { $value *= 2 }
The reasons that works is because those are aliases into the actual hash values.
You cannot change the keys, since they’re immutable. However, you can make a new hash that’s an updated copy of the old one easily enough:
my %old_hash = (
fee => 1,
fie => 2,
foe => 3,
fum => 4,
);
my %new_hash;
#new_hash{ map { $_ x 2 } keys %old_hash } =
map { $_ * 2 } values %old_hash;
print "Old hash is: ", join(" " => %old_hash), "\n";
print "New hash is: ", join(" " => %new_hash), "\n";
That outputs
Old hash is: foe 3 fee 1 fum 4 fie 2
New hash is: foefoe 6 fiefie 4 fumfum 8 feefee 2
A general algorithm for more than 2 variables:
while( #array ){
my $var1 = shift #array;
my $var2 = shift #array;
my $var3 = shift #array;
# other variables from #array
# do things with $var1, $var2, $var3, ...
}
PS: Using a working copy of the array to that it is preserved for use later:
if( my #working_copy = #array ){
while( #working_copy ){
my $var1 = shift #working_copy;
my $var2 = shift #working_copy;
my $var3 = shift #working_copy;
# other variables from #working_copy
# do things with $var1, $var2, $var3, ...
}
}
PPS: another way is to use indexing. Of course, that is a sure sign that the data structure is wrong. It should be an array of arrays (AoA) or an array of hashes (AoH). See perldoc perldsc and perldoc perllol.
my $i = 0;
while( $i < #array ){
my $var1 = $array[ $i++ ];
my $var2 = $array[ $i++ ];
my $var3 = $array[ $i++ ];
# other variables from #array
# do things with $var1, $var2, $var3, ...
}
PPPS: I've been asked to clarify why the data structure is wrong. It is a flatten set of tuples (aka records aka datasets). The tuples are recreated by counting of the number of data for each. But what is the reader constructing the set has a bug and doesn't always get the number right? If, for a missing value, it just skips adding anything? Then all the remaining tuples are shifted by one, causing the following tuples to be grouped incorrectly and therefore, invalid. That is why an AoA is better; only the tuple with the missing data would be invalid.
But an better structure would be an AoH. Each datum would access by a key. Then new or optional data can be added without breaking the code downstream.
While I'm at it, I'll add some code examples:
# example code for AoA
for my $tuple ( #aoa ){
my $var1 = $tuple->[0];
my $var2 = $tuple->[1];
my $var3 = $tuple->[2];
# etc
}
# example code for AoH
for my $tuple ( #aoh ){
my $var1 = $tuple->{keyname1};
my $var2 = $tuple->{key_name_2};
my $var3 = $tuple->{'key name with spaces'};
my $var4 = $tuple->{$key_name_in_scalar_variable};
# etc
}
Here is a module-less way to "loop" by an arbitrary value ($by) and output the resulting group of elements using an array slice:
#!perl -l
#array = "1".."6";
$by = 3; $by--;
for (my $i = 0 ; $i < #array ; $i += $by ) {
print "#array[$i..$i+$by]";
$i++ ;
}
As a one-liner to test (cut and paste to a Unix shell):
perl -E '#array = "1".."6"; $by = 3; $by--;
for (my $i = 0 ; $i < #array ; $i += $by ) {
say "#array[$i..$i+$by]"; $i++ }'
Output:
1 2 3
4 5 6
If you make $by = 2; it will print pairs of numbers. To get at specific elements of the resulting slice access it as an anonymous array: (e.g. [#array[$i..$i+$by]]->[1]).
See also:
How do I read two items at a time in a Perl foreach loop?
Perl way of iterating over 2 arrays in parallel
Some good responses there, including reference to natatime which is quite easy to use. It's easy to implement too - it is essentially a wrapper around the splice solutions mentioned in the responses here.
The following is not the nicest example, but I've been using autobox::Core and made an #array->natatime() "method" ;-) like this:
use autobox::Core ;
sub autobox::Core::ARRAY::natatime {
my ($self, $by) = #_;
my #copy = #$self ;
my #array ;
push #array, [splice (#copy, 0, $by) ] while #copy ;
if ( not defined wantarray ) {
print "#{ $_ } \n" for #array ;
}
return wantarray ? #array : \#array;
}
The #copy array is spliced destructively, but $self (which is how the #array in front of the autobox method -> arrow gets passed to the function) is still there. So I can do:
my #dozen = "1" .. "12" ; # cakes to eat
#dozen->natatime(4) ; # eat 4 at time
my $arr_ref = #dozen->natatime(4) ; # make a reference
say "Group 3: #{ $arr_ref->[2] }" ; # prints a group of elements
say scalar #dozen , " cakes left" ; # eat cake; still have it
Output:
1 2 3 4
5 6 7 8
9 10 11 12
Group 3: 9 10 11 12
12 cakes left
One other approach that also uses a CPAN module (I gave this answer elsewhere but it is worth repeating). This can also be done non-destructively, with Eric Strom's excellent List::Gen module:
perl -MList::Gen=":all" -E '#n = "1".."6"; say "#$_" for every 2 => #n'
1 2
3 4
5 6
Each group of elements you grab is returned in an anonymous array so the individual values are in: $_->[0] $_->[1] ... etc.
You mentioned Perl6, which handles multiple looping values nicely:
my #qarr = 1 .. 6;
my ($x, $y, $z) ;
for #qarr -> $x , $y , $z { say $x/$y ; say "z = " ~ $z }
Output:
0.5
z = 3
0.8
z = 6
For more on the Perl6 approach see: Looping for Fun and Profit from the 2009 Perl6 Advent Calendar, or the Blocks and Statements Synopsis for details. Perhaps Perl 5 will have a similar "loop by multliple values" construct one day - à la perl5i's foreach :-)

Perl array element manipulation

I've been trying and trying with this one, but it just doesn't seem to click.
If I have an array with let's say 6 numbers:
#a = (1,2,3,4,5,6)
How do I get every second index ( 2, 4, 6) in this case?
how do I compute the difference of every two elements, so
the output here would be:
1 1 1 (because 2-1 =1 and 4-3 =1 and so on..)
Note: don't ever use $a or $b, they're special (sort uses them) ... it's generally better to give your variables a descriptive name, name it as to what's in there rather than what type of variable it is.
for ( my $index = 0; $index < scalar( #pairs ); $index += 2 ) {
my $first = $pairs[ $index + 0 ];
my $second = $pairs[ $index + 1 ];
my $pair = $index / 2;
my $difference = $second - $first;
print "the difference of pair $pair is $difference\n";
}
I think you should post your earlier attempts. In my opinion, the best way to learn is to learn from your mistakes, not being presented a correct solution.
For this problem, I think I would use a C-style for-loop for the first part, simply because it is straightforward, and can easily be tweaked if some new requirement comes up.
The second problem can easily be solved using a regular Perl-style for-loop.
use strict;
use warnings; # always use these two pragmas
my #nums = 1..6;
my #idx;
for (my $n = 0; $n <= $#nums; $n += 2) { # loop from 0 to max index, step 2
push #idx, $n; # store number in #idx
}
print "Indexes: #idx\n";
my #diff;
for my $n (0 .. $#nums - 1) { # loop from 0 to max index minus 1
push #diff, $nums[$n + 1] - $nums[$n]; # store diff in #diff
}
print "Diff: #diff\n";
Output:
Indexes: 0 2 4
Diff: 1 1 1 1 1
Try this:
use strict;
use warnings;
my $index = 1;
my #a = (1,2,3,4,5,6);
for (#a) {
if ($index % 2 == 0) {
my $diff = $_ - $a[$index-2];
print $diff;
}
$index++;
}
You likely want to use the new List::Util pair functions.
For your first question:
use List::Util 'pairvalues';
my #seconds = pairvalues #list; # yields (2, 4, 6)
For your second question:
use List::Util 'pairmap';
my #diffs = pairmap { $b-$a } #list; # yields (1, 1, 1)
You can use map:
my #a = 1 .. 6;
print join ' ', 'Every second:', map $a[ 1 + $_ * 2 ], 0 .. $#a / 2;
print "\n";
print join ' ', 'Differences:', map $a[ 1 + $_ * 2 ] - $a[ $_ * 2 ], 0 .. $#a / 2;
print "\n";
First: Don't use variables a and b. $a and $b are special variables used in sorting. Just be a bit more descriptive of your variables (even if it's merely #my_array) and you should be fine.
You can loop through your array any which way you like. However, I prefer to use a while loop instead of the thee part for because the three part for loop is a bit misleading. It is a while loop in disguise and the promised indexing of the loop can be misleading.
#! /usr/bin/env perl
use warnings;
use strict;
use feature qw(say);
my #array = qw( 1 2 3 4 5 6 );
my $index = 1; # Remember Perl indexes start at zero!
while ( $index <= $#array ) {
say "Item is $array[$index]";
say "The difference is " . ($array[$index] - $array[$index-1]);
$index += 2;
}
You said every second element. Indexes of arrays start at 0, so you want the odd number elements. Most of the answers use map which is a very nice little command, but does an awful lot in a single line which can make it confusing for a beginner. Plus, I don't think the Perldoc on it is very clear. There should be more simple examples.
The say is a newer version of print. However say always adds a \n at the end. You should always use strict; and use warnings;. These will catch about 90% of your programming bugs.
The qw( ... ) is a quick way to make an array. Each word becomes an array element. You don't need quotes or commas.
#!/usr/bin/perl
use strict;
use warnings;
my #ar = (1, 2, 3, 4, 5, 6);
# 1. How do I get every second index ( 2, 4, 6) in this case?
my #even = map { $_ & 1 ? $ar[$_] : () } 0 .. $#ar;
# 2. how do I compute the difference of every two elements?
my (#c, #diff) = #ar;
push #diff, -1 * (shift(#c) - shift(#c)) while #c;
use Data::Dumper;
print Dumper \#even;
print Dumper \#diff;
1;
__END__
$VAR1 = [
2,
4,
6
];
$VAR1 = [
1,
1,
1
];

Loop over one dimension of a multi-dimensional array in Perl using for each

* UPDATED* for typos
Another PERL question.... I am trying to loop through a 2D array. I am positive about the size of one dimension but unsure on the second. The code snippet:
foreach my $value (#surfaces[1])
{
my $sum = 0;
my $smallest = 9999;
my $limit_surface = 0;
for (my $i = 0; $i < 3; $i++)
{
$sum += $surfaces[$i][$counter];
if ($surfaces[$i][$counter] <= $smallest)
{
$smallest = $surfaces[$i][$counter];
$limit_surface = $subchannel_number[$i];
}
}
$counter++;
push(#avg_value,$sum/#rodsurface_number);
push(#limiting_schan,$limit_surface);
push(#limiting_value,$smallest);
}
I am compiled but $value variable is failing to initialize.
Repeat after me:
Perl does not have multidimensional arrays
Perl does not have multidimensional arrays
Perl does not have multidimensional arrays
What Perl does have is have are arrays that contain references pointing to other arrays. You can emulate multidimensional arrays in Perl, but they are not true multidimensional arrays. For example:
my #array;
$array[0] = [ 1, 2, 3, 4, 5 ];
$array[1] = [ 1, 2, 3 ];
$array[2] = [ 1, 2 ];
I can talk about $array[0][1], and $array[2][1], but while $array[0][3] exists, $array[2][3] doesn't exist.
If you don't understand references, read the tutorial on references.
What you need to do is go through your array and then find out the size of each subarray and go through each of those. There's no guarantee that
The reference contained in your primary array actually points to another array:
That your sub-array contains only scalar data.
You can use the $# operator to find the size of your array. For example $#array is the number of items in your array. You an use ( 0..$#array ) to go through each item of your array, and this way, you have the index to play around with.
use strict;
use warnings;
my #array;
$array[0] = [ 1, 2, 3, 4, 5 ];
$array[1] = [ 1, 2, 3 ];
$array[2] = [ 1, 2, ];
#
# Here's my loop for the primary array.
#
for my $row ( 0..$#array ) {
printf "Row %3d: ", $row ;
#
# My assumption is that this is another array that contains nothing
# but scalar data...
#
my #columns = #{ $array[$row] }; # Dereferencing my array reference
for my $column ( #columns ) {
printf "%3d ", $column;
}
print "\n";
}
Note I did my #columns = #{ $array[$row] }; to convert my reference back into an array. This is an extra step. I could have simply done the dereferencing in my for loop and saved a step.
This prints out:
Row 0: 1 2 3 4 5
Row 1: 1 2 3
Row 2: 1 2
I could put some safety checks in here. For example, I might want to verify the size of each row, and if one row doesn't match the other, complain:
my $row_size = $array[0];
for my $row ( 1..$#array ) {
my #columns = #{ $array[$row] };
if ( $#columns ne $array_size ) {
die qq(This is not a 2D array. Not all rows are equal);
}
}
You do not describe your data structure, nor explain exactly what you want to do with it. This limits the advice that we can give to just the general variety.
If you're trying to iterate over an array of arrays, I would advise you to do it based off of element instead of index.
For example, below I have a 4 by 5 matrix of integers. I would like to find the average of these values. One way to do this is to simply iterate over each row and then column, and add up the values:
use strict;
use warnings;
my #AoA = (
[11, 12, 13, 14, 15],
[21, 22, 23, 24, 25],
[31, 32, 33, 34, 35],
[41, 42, 43, 44, 45],
);
my $sum = 0;
my $count = 0;
for my $row (#AoA) {
for my $element (#$row) { # <-- dereference the array ref
$sum += $element;
$count++;
}
}
print "Average of Matrix is " . ($sum / $count) . "\n";
Outputs:
Average of Matrix is 28
For more information on complex data structures, check out: Perl Data Structures Cookbook
I've set up some dummy variables and changed a few things around. This compiles and produces the results I show below.
This might not answer your question, but should allow you to copy and paste the code, run it yourself, edit the input and see how the output compares to what you want.
use warnings;
use strict;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my #surfaces = ( ['1','2','3'],
['10','20','30'],
['100','200','400'],
);
my #subchannel_number = ( ['1','2','3'],
['10','20','30'],
['100','200','400'],
);
my #rodsurface_number = (1 .. 10);
my $counter = 0;
my (#avg_value, #limiting_schan, #limiting_value);
foreach my $value ($surfaces[1]){
my $sum = 0;
my $smallest = 9999;
my $limit_surface = 0;
for (my $i = 0; $i < 3; $i++) {
$sum += $surfaces[$i][$counter];
if ($surfaces[$i][$counter] <= $smallest){
$smallest = $surfaces[$i][$counter];
$limit_surface = $subchannel_number[$i];
}
}
$counter++;
push(#avg_value,$sum/#rodsurface_number);
push(#limiting_schan,$limit_surface);
push(#limiting_value,$smallest);
}
print Dumper (\#avg_value, \#limiting_schan, \#limiting_value);
$VAR1 = [
'11.1'
];
$VAR2 = [
[
'1',
'2',
'3'
]
];
$VAR3 = [
1
];

split numeric array at specific positions

I am trying to split a numerical array into smaller arrays such that each of the smaller arrays cannot contain any numbers that differ.
Example: The array (2,2,2,2,2,9,3,3,3,3) should be split into the three arrays (2,2,2,2,2), (9) and (3,3,3,3).
Here is what I tried:
my #arr = (2,2,2,2,2,9,3,3,3,3);
my #result = ();
my $last = -1;
my #newarr = ();
for my $i (0 .. $#arr){
if ( ($i>0 && $arr[$i] != $last) || $i == $#arr ){
push #result, \#newarr;
#newarr = ();
}
$last = $arr[$i];
push #newarr, $arr[$i];
}
Firstly, this code does not give me the desired result. I think my mistake is when I push the reference to #newarr into #result, but then I re-initialize #newarr.
Secondly, aren't there more elegant ways to do this? I looked at the functions split and splice, but could not think of a good solution.
List::MoreUtils has the "part" function:
use Data::Dumper;
use feature 'state';
use List::MoreUtils 'part';
my #array = ( 2,2,2,2,2, 9, 3,3,3,3 );
my #part = part {
state $prev;
state $i = -1;
$i++ if !defined($prev) || $_ ne $prev;
$prev = $_;
$i
} #array;
print Dumper #part;
With 'part', the value that the code block returns dictates the top level array index where the current value will be pushed into an anonymous array. $prev starts out undefined, so the first element in the input will trigger $i to increment to 0, so all of the '2's will end up in #{$part[0]}. As soon as an element in #array doesn't match $prev, the index will be incremented, and subsequent elements end up in #{$part[1]}. Each time a change is detected, a new grouping starts.
Update:
If this segment of code might be used more than once, the 'state' variables will persist their values across calls. In such a case, state is more trouble than it's worth, and one should just use lexicals in a subroutine:
use Data::Dumper;
use List::MoreUtils 'part';
my #array = ( 2,2,2,2,2, 9, 3,3,3,3 );
my #part = partition(#array);
print Dumper \#part;
sub partition {
my( $prev, $i ) = ( undef, -1 );
return part {
$i++ if ! defined($prev) || $_ ne $prev;
$prev = $_;
$i;
} #_;
}
Creating an array of arrays grouped by like elements.
For a refresher on complex data structures, check out perldsc.
use strict;
use warnings;
my #array = (2,2,2,2,2,9,3,3,3,3);
my #grouped;
for (#array) {
if (! #grouped || $grouped[-1][0] != $_) {
push #grouped, [];
}
push #{$grouped[-1]}, $_;
}
use Data::Dump;
dd #grouped;
Outputs:
([2, 2, 2, 2, 2], [9], [3, 3, 3, 3])
use List::Util 'reduce';
my #arr = (2,2,2,2,2,9,3,3,3,3);
my $result = reduce {
if ( #$a && $b == $a->[-1][0] ) {
push #{ $a->[-1] }, $b
}
else {
push #$a, [ $b ]
}
$a
} [], #arr;
Simpler but maybe more confusing to read:
my $result = reduce {
push #{ $a->[ #$a && $b == $a->[-1][0] ? -1 : #$a ] }, $b;
$a
} [], #arr;
my #arr = (2,2,2,2,2,9,3,3,3,3);
my %h;
my #newarr = map {
my $ok = !$h{$_};
push #{$h{$_}}, $_;
$ok ? $h{$_} : ();
}
#arr;
use Data::Dumper; print Dumper \#newarr;
or
my #arr = (2,2,2,2,2,9,3,3,3,3);
my %h;
my #newarr;
for my $v (#arr) {
if (!$h{$v}) {
push #newarr, ($h{$v} = []);
}
push #{$h{$v}}, $v;
}
output
$VAR1 = [
[
2,
2,
2,
2,
2
],
[
9
],
[
3,
3,
3,
3
]
];
Mandatory regex answer:
my #result = map [ (ord) x length ], grep --$|, join( '', map chr, #arr ) =~ /((.)\2*)/sg;
(under no warnings "non_unicode";).
This will do what you ask. It works by packing the contents of the data as a set of digits and counts and then unpacks it in the required format. The output data is in #num. I have used Data::Dump only to display the resulting data structure.
use strict;
use warnings;
my #arr = (2,2,2,2,2,9,3,3,3,3);
my (%rep, #num);
$rep{$_}++ or push #num, $_ for #arr;
#num = map [ ($_) x $rep{$_} ], #num;
use Data::Dump;
dd \#num;
output
[[2, 2, 2, 2, 2], [9], [3, 3, 3, 3]]
Update
The above solution collects all the elements with the same value into one group, even if they came from separate sequences. If you need the output arrays to be split at every change of value then this will do what you need.
use strict;
use warnings;
my #arr = (2,2,2,2,2,9,9,9,2,2,2,9,9,9);
my #groups;
for (#arr) {
push #groups, [ ] unless #groups and $_ == $groups[-1][-1];
push #{ $groups[-1] }, $_;
}
use Data::Dump;
dd \#groups;
output
[[2, 2, 2, 2, 2], [9, 9, 9], [2, 2, 2], [9, 9, 9]]
Update 2
Here's another version in view of your answer to ikegami's comment, which revealed that a list of values and their associated counts is probably closer to what you need.
use strict;
use warnings;
my #arr = (2,2,2,2,2,9,9,9,2,2,2,9,9,9);
my #groups;
for (#arr) {
if (#groups and $_ == $groups[-1][0]) {
$groups[-1][1] += 1;
}
else {
push #groups, [ $_, 1 ];
}
}
use Data::Dump;
dd \#groups;
output
[[2, 5], [9, 3], [2, 3], [9, 3]]
You can create a hash of arrays where by the key of the has will be a number. and for each time you encounter that number you can push it on to the array reference of the hash. thus all numbers will be split into arrays as you expected. you can then just iterate through the hash to print the arrays or access each array by its number.
use strict;
use Data::Dumper;
my #arr = (2,2,2,2,2,9,3,3,3,3);
my %hash;
push(#{$hash{$_}},$_) foreach (#arr);
print Dumper(\%hash);
output
$VAR1 = {
'3' => [
3,
3,
3,
3
],
'9' => [
9
],
'2' => [
2,
2,
2,
2,
2
]
};

Finding common elements in arrays

I have a hash whose values are arrays. I need to find the common elements of those arrays,
ie. the elements that are present in all the arrays. So I extracted the values of the hash into
a multidimensional array whose each row corresponds to an array in the hash. Then I took the first row
of this matrix into another array (#arr1) and iterated through it to find if there was any element
in arr1 that was also in the rest of the rows of the matrix. If such an element is found, it is
pushed onto another array that contains the final list of all the elements. The code is as follows
(I hope it is clear enough):
sub construct_arr(my %records) {
my $len = keys %records;
my #matrix;
my $i = 0;
# Extract the values of the hash into a matrix
foreach my $key (keys %records) {
$matrix[$i] = $records{$key};
$i++;
}
my #arr1 = $matrix[0];
my #final;
# Iterate through each element of arr1
for my $j (0..$#{$arr1[0]}) {
my $count = 1;
# Iterate through each row of the matrix, starting from the second
for ( my $i = 1; $i < $len ; $i++ ) {
my $flag = 0;
# Iterate through each element of the row
for my $k (0..$#{$matrix[$i]}) {
if ($arr1[0][$j] eq $matrix[$i][$k]) {
$flag = 1;
$count++;
}
}
# On finding the first instance of the element in a row, go to the next row
if (!$flag == 1) {
last;
}
}
# If element is in all the rows, push it on to the final array
if ($count == $len) {
push(#final, $arr1[0][$j]);
}
}
return #final;
}
I know that the above works, but I would like to know if there is any other (perlish) way to do this.
I am starting to learn perl and I am very interested in knowing things that could make my work easier
in perl as compared to other languages. If my code is the best that can be done, please let me know that
too. Any guidance would be appreciated. Thanks!
Take a look at Chris Charley's link for calculating the intersection of arrays.
Hashes are the clear way to go for problems like this. Together with map and grep a solution can be reduced to just a few lines.
This program uses sundar's data for want of anything better, and seems to do what you need.
use strict;
use warnings;
my %records = (
a => [ qw/ A B C / ],
b => [ qw/ C D E A / ],
c => [ qw/ A C E / ],
);
print "$_\n" for construct_arr(\%records);
sub construct_arr {
my $records = shift;
my %seen;
$seen{$_}++ for map #$_, values %$records;
grep $seen{$_} == keys %$records, keys %seen;
}
output
A
C
Edit
I thought it may help to see a more Perlish, tidied version of your own solution.
use strict;
use warnings;
my %records = (
a => [ qw/ A B C / ],
b => [ qw/ C D E A / ],
c => [ qw/ A C E / ],
);
print "$_\n" for construct_arr(\%records);
sub construct_arr {
my $records = shift;
my #matrix = values %$records;
my #final;
# iterate through each element the first row
for my $i ( 0 .. $#{$matrix[0]} ) {
my $count = 1;
# look for this value in all the rest of the rows, dropping
# out to the next row as soon as a match is found
ROW:
for my $j ( 1 .. $#matrix ) {
for my $k (0 .. $#{$matrix[$j]}) {
next unless $matrix[0][$i] eq $matrix[$j][$k];
$count++;
next ROW;
}
}
# If element is in all the rows, push it on to the final array
push #final, $matrix[0][$i] if $count == #matrix;
}
return #final;
}
The output is the same as for my own program, but the functionality is slightly different as mine assumes the values in each row are unique. If the sama value appears more than once my solution will break (the same applies to sundar's). Please let me know if that is acceptable.
Although the poster explained there aren't duplicates within a single array, here is my attempt which handles that case too (notice the slightly modified test data - "5" should not be printed):
#!/usr/bin/env perl
use warnings;
use strict;
my %records = (
a => [1, 2, 3],
b => [3, 4, 5, 1],
c => [1, 3, 5, 5]
);
my %seen;
while (my ($key, $vals) = each %records) {
$seen{$_}{$key} = 1 for #$vals;
}
print "$_\n" for grep { keys %{$seen{$_}} == keys %records } keys %seen;
You can find the size of the hash easily using scalar(keys %hash);
Here's an example code that does what you need:
#!/usr/bin/perl
use strict;
use warnings;
my %records = ( a => [1, 2, 3],
b => [3, 4, 5, 1],
c => [1, 3, 5]
);
my %count;
foreach my $arr_ref (values %records) {
foreach my $elem (#$arr_ref) {
$count{$elem}++;
}
}
my #intersection;
my $num_arrays = scalar(keys %records);
foreach my $elem (keys %count) {
#If all the arrays contained this element,
#allowing for multiple entries per array
if ($count{$elem} >= $num_arrays) {
push #intersection, $elem;
}
}
Feel free to comment if you need any clarification in this code. And the second foreach that constructs the #intersection array is written this way only for clarity - if you're learning Perl, I'd suggest you study and rewrite it using the map construct, since that's arguably more idiomatic Perl.

Resources