grep multiple pattern in perl array at a time

grep multiple pattern in perl array at a time - arrays

Below is the code which actually finds a pattern in a perl array.
my $isAvailable = grep { $_->[0] eq '12345' } {$filteredTableEntriesMap{$REPORT_PART1}} ;
But i would like to search for two patterns in two indexes at a time
my $isWiuAvailable = grep { $_->[0] eq '12345' } #{$filteredTableEntriesMap{$REPORT_PART1}} ;
my $isBsAvailable = grep { $_->[1] eq '6789' } #{$filteredTableEntriesMap{$REPORT_PART1}} ;
This is how the map is represented
$VAR1 = {
'REPORT PART2' => [],
'REPORT PART1' => [
[
'12345',
'6789',
],
[
'343435',
'315',
],
[
'00103',
'000315',
],
]
And i would want to match an array which has these two entries in index 1 and index 2
Thanks

You can combine the two conditions into one expression.
my #found = grep { $_->[0] eq '12345' && $_->[1] eq '6789' }
#{$filteredTableEntriesMap{$REPORT_PART1}};
The stuff inside the {} for grep is basically a subroutine. You can do as much as you want in there as long as you return a true value if you want to keep $_ in your #found result.

Related

Perl sorting arrayref

So I have this Perl code:
$array->[0][0] = "cc";
$array->[0][1] = "3";
$array->[1][0] = "aaaa";
$array->[1][1] = "2";
$array->[2][0] = "bb";
$array->[2][1] = "1";
And I need it sorted in alphabetical order (second column) so that $array->[0][0] is "aaaa" and $array->[0][1] is "2"
I must have been asleep during Programming 101 in the 90's. I've spent hours trawling code and tutorials on the net and just can't get it. Can someone provide me with some sample code please. thanks!

Just sort the dereferenced array by the first element:
$array = [ sort { $a->[0] cmp $b->[0] } #$array ];
or
#$array = sort { $a->[0] cmp $b->[0] } #$array;
Returns:
[ [ 'aaaa', '2' ],
[ 'bb', '1' ],
[ 'cc', '3' ] ]

If you can reach into CPAN, use the sort_by function provided by List::UtilsBy (or via List::AllUtils)
use List::AllUtils 'sort_by';
$array = [ sort_by { $_->[0] } #$array ];
... or alternatively using Sort::Key
use Sort::Key 'keysort';
$array = [ keysort { $_->[0] } #$array ];
Both achieve the same thing, but you should really try to get a modern version of List::AllUtils as it will save you from reinventing a lot of wheels.

Clean way to access a nested data structure

I have a segment of code that, although it works, does not look like a clean way to do things.
I build the structure using:
foreach my $n (#node_list)
{
chomp ($n);
foreach my $c (#cpes)
{
my #returned; #Interfaces to CPEs with MED settings
my #creturned; #General Customer Interfaces
my ($cust) = $c =~ /([a-zA-Z]+)[_-][a-zA-Z0-9]+/s;
print "\n\t\tCustomer is $cust\n";
chomp($c);
$c = uc $c;
my ($search) = $c;
(#returned) = `cat /curr/$n | grep "$search"`;
if (#returned)
{
my $cust_match = 'interface \"' . $cust;
(#creturned) = `cat /curr/$n | egrep -i "$cust_match" | grep -v "$search"`;
}
if (#creturned) #Have we found other CPEs on the same router
{
my ($nf) = $n =~ /([a-zA-Z0-9-]+).cfg/s;
my (#interfaces) = map { /([A-Z0-9_]+)/s } #creturned;
#interfaces = uniq(#interfaces);
unshift (#interfaces, $c);
push (#new_out, {$nf => {$cust => [#interfaces]}});
}
}
This will return:
$VAR1 = [
{
'router-xx-xx' => {
'50000' => [
[
'THXXXXVF_NLXXXX40_1121_2',
'10x.xx.x.50'
],
[
'THXXXPVF_NLXXXX66_1121_1',
'10x.xx.x.70'
],
[
'THXXXXVF_NLXXXX67_1121_2',
'10x.xx.x.78'
],
}
},
Each router can have a number of VPRNs and each VPRN can contain multiple interfaces. In the example above I've shown one router with one VPRN.
However, when it comes to accessing elements in the above, I've written the following convoluted (but working) code:
foreach my $candidate (#nodes)
{
my %node = %{ $candidate };
foreach my $n (keys %node)
{
print "\nRouter is $n\n";
foreach my $cust (keys %{ $node{$n} })
{
print "Customer on $n is \n" . Dumper $cust;
my #intlist = #{$node{$n}{$cust}};
my $med_cpe = $intlist[0]; #the CPE that was used to find node
{truncated}
}
}
}
}

You don't explain exactly what you find "convoluted" about the traversal code, but you have made it unnecessarily complex by duplicating data into #intlist and %node. The excessive and inconsistent indentation also makes it ungainly
I would write something closer to this
for my $node ( #nodes ) {
for my $n ( keys %$node ) {
print "\nRouter is $n\n";
for my $cust ( keys %{ $node->{$n} } ) {
print "Customer on $n is \n" . Dumper \$cust;
my $med_cpe = $node->{$n}{$cust}[0];
}
}
}
If you don't need the values of $node and $n except to access $med_cpe then you don't need a nested data structure at all: a simple array is fine. On the face of it, an array like this will do what you need
[
[
'router-xx-xx',
'50000',
'THXXXXVF_NLXXXX40_1121_2',
'10x.xx.x.50',
],
[
'router-xx-xx',
'50000',
'THXXXPVF_NLXXXX66_1121_1',
'10x.xx.x.70',
],
...
]

Convert an array of strings into a array of arrays of strings

My goal is to convert this
my #array=("red", "blue", "green", ["purple", "orange"]);
into this
my #array = ( ["red"], ["blue"], ["green"], ["purple", "orange"]);
Current test code:
my #array = ("red", "blue", "green", ["purple", "orange"] );
foreach $item ( #array ) {
#if this was Python, it would be as simple as:
# if is not instance(item, array):
# # item wasn't a list
# item = [item]
if(ref($item) ne 'ARRAY'){
#It's an array reference...
#you can read it with $item->[1]
#or dereference it uisng #newarray = #{$item}
#print "We've got an array!!\n";
print $item, "\n";
# keep a copy of our string
$temp = $item;
# re-use the variable but convert to an empty list
#item = ();
# add the temp-copy as first list item
#item[0] = $temp;
# print each list item (should be just one item)
print "$_\n" for $item;
}else{
#not an array in any way...
print "ALREADY an array!!\n";
}
}
# EXPECTED my #array=(["red"], ["blue"], ["green"], ["purple", "orange"]);
print #array , "\n";
foreach $item (#array){
if(ref($item) ne 'ARRAY'){
#
#say for $item;
print "didn't convert properly to array\n";
}
}

The comment about python maps pretty directly to perl.
my #array = ("red", "blue", "green", ["purple", "orange"] );
foreach $item ( #array ) {
#if this was Python, it would be as simple as:
# if is not instance(item, array):
# # item wasn't a list
# item = [item]
if (ref $item ne 'ARRAY') {
$item = [ $item ];
}
}
though using map as in Borodin's answer would be more natural.

I'm wondering why you want to do this, but it's
#array = map { ref ? $_ : [ $_ ] } #array
And please don't call arrays #array; that's what the # is for.
Your comment is ridiculous
#if this was Python, it would be as simple as:
# if is not instance(item, array):
# # item wasn't a list
# item = [item]
If you were familiar with Perl then you wouldn't need to ask the question. You must be aware that there is no one-to-one translation from Python to Perl. Python is much less expressive than either Perl or C, but I can't imagine you demanding a simple conversion to C.
Please get over your bigotry.

If you push the values to a new array, you don't need to do more than evaluate whether or not $item is an arrayref:
#! perl
use strict;
use warnings;
use Data::Dumper;
my #array=("red", "blue", "green", ["purple", "orange"]);
my #new_array;
foreach my $item (#array) {
if ( ref($item) eq 'ARRAY' ) {
push #new_array, $item;
}
else {
push #new_array, [$item];
}
}
print Dumper \#new_array;
Output from Dumper:
$VAR1 = [
[
'red'
],
[
'blue'
],
[
'green'
],
[
'purple',
'orange'
]
];

After a long day of learning more Perl than I ever thought/wanted to learn... here's what I think is a workable solution:
#! perl
use strict;
use warnings;
use Data::Dumper;
my %the_dict = (duts =>
{dut_a => {UDF => 'hamnet'},
dut_b => {UDF => [ '1', '2', '3', ]},
dut_c => {UDF => [ 'Z' ], }});
print Dumper \%the_dict;
foreach my $dut (keys %{$the_dict{duts}}) {
# convert the dut's UDF value to an array if it wasn't already
if ( 'ARRAY' ne ref $the_dict{duts}->{$dut}{UDF} ) {
$the_dict{duts}->{$dut}{UDF} = [ $the_dict{duts}->{$dut}{UDF} ];
}
# now append a new value to the array
push(#{$the_dict{duts}{$dut}{UDF}}, 'works');
}
print Dumper \%the_dict;
when run we see these print-outs:
$VAR1 = {
'duts' => {
'dut_a' => {
'UDF' => 'hamnet'
},
'dut_c' => {
'UDF' => [
'Z'
]
},
'dut_b' => {
'UDF' => [
'1',
'2',
'3'
]
}
}
};
$VAR1 = {
'duts' => {
'dut_a' => {
'UDF' => [
'hamnet',
'works'
]
},
'dut_c' => {
'UDF' => [
'Z',
'works'
]
},
'dut_b' => {
'UDF' => [
'1',
'2',
'3',
'works'
]
}
}
};

PERL dynamically match arrays based on unique key

I'm trying to compare 2 huge arrays and want to use map. I am using unique key concept here to match the indexes.
My Arrays:
my #array1 = ( ['a','b','c','d'], ['e','f','g','h'], ['i','j','k','l'], ['m','n','o','p'], ['q','r','s','t']);
my #array2 = ( ['r','q','s','t'], ['b','a','c','d'], ['n','m','o','p'], ['f','e','g','h'], ['j','i','k','l']);
My unique Keys:
my #uk1 = (0,2,3);
my #uk2 = (1,2,3);
These arrays will be huge in size, over 30,000 indexes in each with over 20 elements in each index.
So effectively i create a map where
for ( my $j = 0; $j <= $#array1 ; $j++ )
{
my searchString;
for ( my $k = 0; $k <= $#uk1; $k++ )
{
if ( $k != 0 )
{
$searchString .= ","
}
$my searchString .= $array1[$j][$uk[$k];
}
my #result = map { $_ }
grep { join (",",$array2[$_][1],$array2[$_][2],$array2[$_][3]) ) =~ join(",",$array1[$j][0],$array1[$j][1],$array1[$j][2]) }
0 .. $#array;
}
returns matched indexes.
My problem is, how do i make this dependant on the unique keys? as the length of the unique key will keep changing and as far as i know i cannot dynamically create the $array2[$_] join part.
Hope my question is clear enough.
I want to have the logic that compares
$array1[$uk1[0]],$array1[$uk1[1]],$array1[$uk1[2]] and so on (depending on the number of keys in UK) with
$array2[$uk2[0]],$array2[$uk2[1]],$array2[$uk2[2]].......

Perhaps,
my #array1 = ( ['a','b','c','d'], ['e','f','g','h'], ['i','j','k','l'], ['m','n','o','p'], ['q','r','s','t']);
my #array2 = ( ['r','q','s','t'], ['b','a','c','d'], ['n','m','o','p'], ['f','e','g','h'], ['j','i','k','l']);
my #result;
for my $i (0 .. $#array1) {
push #result,
map { [$i, $_] }
grep {
"#{ $array1[$i] }[1,2,3]" eq "#{ $array2[$_] }[0,2,3]"
}
0 .. $#array2;
}
use Data::Dumper; print Dumper \#result;
output
$VAR1 = [
[
0,
1
],
[
1,
3
],
[
2,
4
],
[
3,
2
],
[
4,
0
]
];

What you want to use is an array slice:
But lets also make life easier:
for my $sample ( #array1 )
{
my $test= join(",", #$sample[#uk1]) ;
my #result = grep { $_ eq $test } map { join(",", #$_[#uk2] ) } #array2 ;
say "huzzah" if #result ;
}
Perl lets you specify multiple elements from an array via the "array slice":
my #list= ('a', 'b', 'c', 'd') ;
my #pieces= #list[1,3] ;

HoHOA behaves differently when values are pushed

I have the following data:
eya XLOC_000445_Change:10.3_q:0.003 atonal1
six XLOC_00099_Change:70.0_q:0.095 atonal1
six-eya XLOC_0234324_Change:19.8_q:0.05 atonal1
eya XLOC_00010_Change:6.5_q:0.22 c-myc
six XLOC_025437_Change:1.1_q:0.018 c-myc
six-eya XLOC_001045_Change:2.3_q:0.0001 c-myc
eya XLOC_000115_Change:7.3_q:0.03 ezrin
six XLOC_000001_Change:7.9_q:0.00006 ezrin
six-eya XLOC_0234322_Change:9.0_q:0.0225 ezrin
six-eya XLOC_091345_Change:9.3_q:0.005 slc12a2
eya XLOC_000445_Change:9.9_q:0.3 atonal1
six XLOC_00099_Change:7.0_q:0.95 atonal1
six-eya XLOC_0234324_Change:9.8_q:0.5 atonal1
And have tried building a HoHoA as follows:
#!/usr/bin/perl
use warnings;
use strict;
Method 1: Pushing array values onto HoH:
while (<$input>) {
chomp;
push #xloc, $1 if ($_ =~ /(XLOC_\d+)/);
push #change_val, $1 if ($_ =~ /Change:(-?\d+\.\d+|-?inf)/);
push #q_value, $1 if ($_ =~ /q:(\d+\.\d+)/);
my #split = split('\t');
push #condition, $split[0];
push #gene, $split[2];
}
push #{ $experiment{$gene[$_]}{$condition[$_]} }, [ $xloc[$_], $change_val[$_], $q_value[$_] ] for 0 .. $#change_val;
Method 2: Assigning values to HoHoA on the fly:
while (<$input>) {
chomp;
my $xloc = $1 if ($_ =~ /(XLOC_\d+)/);
my $change = $1 if ($_ =~ /Change:(-?\d+\.\d+|-?inf)/);
my $q_value = $1 if ($_ =~ /q:(\d+\.\d+)/);
my #split = split('\t');
my $condition = $split[0];
my $gene = $split[2];
$experiment{$gene}{$condition} = [ $xloc, $change, $q_value ];
}
Both work fine - insofar as I get the data structure I want. However, only the first method (pushing) ensures that genes that exist as duplicates (in this case atonal1) are represented twice in the HoHoA.
My downstream code was originally made to handle HoHoA built in the second fashion, and I can't for the life of me work out why both approaches are handled differently in the follwing code:
Downstream code:
my (%change, %seen, $xloc, $change_val, $q_value);
for my $gene (sort keys %experiment) {
for my $condition (sort keys %{$experiment{$gene}}) {
$seen{$gene}++; # Counts for each occurrence of gene
if ( (not exists $change{$gene}) || (abs $change{$gene} < abs $experiment{$gene}{$condition}[1]) ) { # Has a larger change value
$change{$gene} = $experiment{$gene}{$condition}[1];
}
}
}
print Dumper \%change;
When I run the above code on either approach I get:
Output for method 1:
$VAR1 = {
'atonal1' => [
'XLOC_0234324',
'9.8',
'0.5'
],
'c-myc' => undef,
'ezrin' => undef,
'slc12a2' => undef,
};
Output for method 2:
$VAR1 = {
'atonal1' => '9.9', # i.e. the largest change value for each condition/gene
'c-myc' => '6.5',
'ezrin' => '9.0',
'slc12a2' => '9.3',
};
What I want is:
$VAR1 = {
'atonal1' => [
'9.9',
'70.0' # This is the difference - i.e the both values are added to the hash `%change`
],
'c-myc' => '6.5',
'ezrin' => '9.0',
'slc12a2' => '9.3',
};
I have no idea what's creating the difference
UPDATE
I'll post the Dumper output for %experiment after values have been pushed on using Method 1:
$VAR1 = {
'atonal1' => {
'eya' => [
[
'XLOC_000445',
'10.3',
'0.003'
],
[
'XLOC_000445',
'9.9',
'0.3'
]
],
'six' => [
[
'XLOC_00099',
'70.0',
'0.095'
],
[
'XLOC_00099',
'7.0',
'0.95'
]
],
'six-eya' => [
[
'XLOC_0234324',
'19.8',
'0.05'
],
[
'XLOC_0234324',
'9.8',
'0.5'
]
]
},
'c-myc' => {
'eya' => [
[
'XLOC_00010',
'6.5',
'0.22'
]
],
'six' => [
[
'XLOC_025437',
'1.1',
'0.018'
]
],
'six-eya' => [
[
'XLOC_001045',
'2.3',
'0.0001'
]
]
},
'ezrin' => {
'eya' => [
[
'XLOC_000115',
'7.3',
'0.03'
]
],
'six' => [
[
'XLOC_000001',
'7.9',
'0.00006'
]
],
'six-eya' => [
[
'XLOC_0234322',
'9.0',
'0.0225'
]
]
},
'slc12a2' => {
'six-eya' => [
[
'XLOC_091345',
'9.3',
'0.005'
]
]
},
};

Let's take your data and reformat it a bit. I'm not saying this is the way you need to format your data. I'm just doing it this way to get a better understanding of what it represents:
GENE XLOC CHANGE Q VALUE CONDITION
======== ==================== ======= ======== ==========
eya XLOC_000445_Change: 10.3_q: 0.003 atonal1
six XLOC_00099_Change: 70.0_q: 0.095 atonal1
six-eya XLOC_0234324_Change: 19.8_q: 0.05 atonal1
eya XLOC_00010_Change: 6.5_q: 0.22 c-myc
six XLOC_025437_Change: 1.1_q: 0.018 c-myc
six-eya XLOC_001045_Change: 2.3_q: 0.0001 c-myc
eya XLOC_000115_Change: 7.3_q: 0.03 ezrin
six XLOC_000001_Change: 7.9_q: 0.00006 ezrin
six-eya XLOC_0234322_Change: 9.0_q: 0.0225 ezrin
six-eya XLOC_091345_Change: 9.3_q: 0.005 slc12a2
eya XLOC_000445_Change: 9.9_q: 0.3 atonal1
six XLOC_00099_Change: 7.0_q: 0.95 atonal1
six-eya XLOC_0234324_Change: 9.8_q: 0.5 atonal1
Are my column assumptions correct?
First, I recommend that you use split to split up your values instead of regular expressions. Do this on a line-by-line basis. Perl is pretty efficient at optimization. 90% of programming is debugging and supporting your program. Trying to be efficient by compressing multiple steps into a single step just make things harder to understand with very little value returned in optimization.
Let's take each line, and break it out:
while ( my $line = <$input> ) {
chomp $line;
my ( $gene, $more_data, $condition ) = split /\s+/, $line;
At this point:
$gene = 'eye'
$more_data = 'XLOC_000445_Change:10.3_q:0.003';
$condition = 'atonal1` # I'm not sure what this is...
Now, we can split out $more_data:
my ( $xloc, $change, $q_value ) = split /:/, $more_data;
$xloc =~ s/^XLOC_//;
$change =~ s/_q$//;
Now we have:
$xloc = '000445';
$change = '10.3';
$q_value = '0.003';
Does this make more sense?
One of your problems is you're attempting to store data in a very, very complex structure without really thinking about what that data represents.
Let's say your data is this:
a gene may contain multiple conditions,
Each gene-condition combination can have a result.
This result contains an xloc, q_value, and change.
That means your data should look like this:
$experiment{$gene}->{$condition}->{XLOC} = $xloc;
$experiment{$gene}->{$condition}->{Q_VALUE} = $q_value;
$experiment{$gene}->{$condition}->{CHANGE} = $change;
However, I see gene = eya, condition = atonal1 twice in your list. Maybe you need something more along the lines of this:
a gene may contain multiple conditions,
Each gene-condition combination can have multiple results.
Each result contains an xloc, q_value, and change.
If that's the case, your data structure should look something like this:
$experment{$gene}->{$condition}->[0]->{XLOC} = $xloc;
$experment{$gene}->{$condition}->[0]->{Q_VALUE} = $q_value;
$experment{$gene}->{$condition}->[0]->{CHANGE} = $change;
This isn't an answer. I'm just trying to get a handle on what your data is and what you are trying to store in that data. Once we have that settled, I can help you with the rest of your program.
Let me know if my understanding of what your data represents is accurate. Add a comment to this answer, and update your question a bit.
Once I know I'm on the right track, I'll show you how you can more easily manage this structure and keep track of everything in it.
Now that I knew I was on the right track, the solution was fairly simple: Object Oriented Programming!
Let me explain: Each experiment consists of a Gene-Condition pair. This is what I key my experiments on.
I create a Local::Condition for each of these Gene-Condition pairs. Inside of this I store my array of results.
My results contain three item.
Behold: The Answer!
What I decided to do is to create a results object. This object contains the XLoc, Change, and Q Value of that result. By packing my results into an object, I have fewer issues trying to keep track of it.
So what we have is this:
We have experiments.
Each _experiment consists of a gene/condition pair which we key our experiments on.
Each gene/condition experiment consists of an array of results.
Now, it's a lot easier to keep track of what is going on. For each line, I create a Local::Result type of object that contains the set of results of that gene/condition pair.
So, all I have to do is push my results onto that gene/condition array which represents my set of results
#
# Create a Result for this experiment
#
my $result = Local::Result->new( $xloc, $change, $q_value );
#
# Push this result onto your $gene/condition experiment
#
push #{ $experiments{$gene}->{$condition} }, $result;
Note my syntax here is very spelled out. I have a hash called %experiments that are keyed by Genes. Each gene contains Conditions for that gene. This Gene/Condition pair is an array of results.
Object Oriented syntax can be a bit complex to get use to, but there is an excellent tutorial in the Perl documentation. By using object oriented programming, you group together details that you otherwise must track.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
use autodie;
use Data::Dumper;
my %experiments;
while ( my $line = <DATA> ) {
my ($condition, $more_data, $gene) = split /\s+/, $line;
my ($xloc, $change, $q_value) = split /:/, $more_data;
$xloc =~ s/^XLOC_(.*)_Change/$1/;
$change =~ s/_q$//;
my $result = Local::Result->new( $xloc, $change, $q_value );
push #{ $experiments{$gene}->{$condition} }, $result;
}
printf "%-10.10s %-10.10s %10.10s %-4s %-7s\n\n",
"Gene", "Condition", "XLoc", "Chng", "Q Value";
for my $gene ( sort keys %experiments ) {
for my $condition ( sort keys %{ $experiments{$gene} } ) {
for my $result ( #{ $experiments{$gene}->{$condition} } ) {
printf "%-10.10s %-10.10s %10.10s %-4.1f %-7.1f\n",
$gene, $condition, $result->xloc, $result->change, $result->q_value;
}
}
}
package Local::Result;
sub new {
my $class = shift;
my $xloc = shift;
my $change = shift;
my $q_value = shift;
my $self = {};
bless $self, $class;
$self->xloc($xloc);
$self->change($change);
$self->q_value($q_value);
return $self;
}
sub xloc {
my $self = shift;
my $xloc = shift;
if ( defined $xloc ) {
$self->{XLOC} = $xloc;
}
return $self->{XLOC};
}
sub change {
my $self = shift;
my $change = shift;
if ( defined $change ) {
$self->{CHANGE} = $change;
}
return $self->{CHANGE};
}
sub q_value {
my $self = shift;
my $q_value = shift;
if ( defined $q_value ) {
$self->{Q_VALUE} = $q_value;
}
return $self->{Q_VALUE};
}
package main;
__DATA__
eya XLOC_000445_Change:10.3_q:0.003 atonal1
six XLOC_00099_Change:70.0_q:0.095 atonal1
six-eya XLOC_0234324_Change:19.8_q:0.05 atonal1
eya XLOC_00010_Change:6.5_q:0.22 c-myc
six XLOC_025437_Change:1.1_q:0.018 c-myc
six-eya XLOC_001045_Change:2.3_q:0.0001 c-myc
eya XLOC_000115_Change:7.3_q:0.03 ezrin
six XLOC_000001_Change:7.9_q:0.00006 ezrin
six-eya XLOC_0234322_Change:9.0_q:0.0225 ezrin
six-eya XLOC_091345_Change:9.3_q:0.005 slc12a2
eya XLOC_000445_Change:9.9_q:0.3 atonal1
six XLOC_00099_Change:7.0_q:0.95 atonal1
six-eya XLOC_0234324_Change:9.8_q:0.5 atonal1