I have a function which extracts Excel data into an array of hashes like so:
sub set_exceldata {
my $excel_file_or = '.\Excel\ORDERS.csv';
if (-e $excel_file_or) {
open (EXCEL_OR, $excel_file_or) || die("\n can't open $excel_file_or: $!\n");
while () {
chomp;
my ( $id, $date, $product, $batchid, $address, $cost ) = split ",";
my %a = ( id => $id
, date => $date
, product => $product
, batchid => $batchid
, address => $address
, cost => $cost
);
push ( #array_data_or, \%a );
}
close EXCEL_OR;
}
}
Populating the array of hashes is fine. However, the difficult part is searching for a particular item (hash) in the array. I can't seem to locate items that might have an id or 21, or a batchid of 15, or a cost > $20 etc.
How would I go about implementing such a search facility?
Thanks to all,
With the power of grep
my #matching_items = grep {
$_->{id} == 21
} #array_data_or;
If you know there will be only one item returned you can just do this:
my ($item) = grep {
$_->{id} == 21
} #array_data_or;
(Untested, and I haven't written one of these in a while, but this should work)
If you're sure that the search always returns only one occurence or if you're interested in only the first match then you could use the 'first' subroutine found in List::Util
use List::Util;
my %matching_hash = %{ first { $_->{id} == 21 } #array_data_or };
I enclosed the subroutine call in the %{ } block to ensure that the RHS evaluates to a hash.
Related
In SilverStripe 3.1 I have a function that loops through an array and outputs its contents.
The output it gives me is:
Layout: John
Strategy: John
Management: Martin
In this example John has more than one job.
I would like to group the jobs if a person has more than one job.
This is my desired Output:
Layout and Strategy: John
Management: Martin
//$InfoFieldArray = array('Layout' => 'John', 'Strategy' => 'John', 'Management' => 'Martin');
public function createInfoFields($InfoFieldArray){
$Info = ArrayList::create();
foreach($InfoFieldArray as $key => $value ){
$fields = new ArrayData(array('FieldName' => $key, 'Value' => $value));
$Info->push($fields);
}
return $Info;
}
How do I alter my function to achieve my desired output?
One possible solution to that is by restructuring the data before adding it to the ArrayList.
public function createInfoFields($InfoFieldArray)
{
$info = array();
foreach ($InfoFieldArray as $job => $person)
{
if (!isset($info[$person]))
{
$info[$person] = array();
}
$info[$person][] = $job;
}
$result = ArrayList::create();
foreach ($info as $person => $jobs)
{
$fields = new ArrayData(array('FieldName' => implode(' and ', $jobs), 'Value' => $person));
$result->push($fields);
}
return $result;
}
What I have done is go over the array of jobs and the person assigned and flipped it the other way around, so I have an array of people with a list of jobs. This allows me to then just call implode in PHP, joining the various jobs by the word and.
There are some potential drawbacks, if there are two people named "John", they will be treated as one as I am using the name as the array key.
Also, if there are three jobs for a person, it will list it like "Layout and Strategy and Management". To avoid that, we need to modify the second foreach loop in my code to something like this:
foreach ($info as $person => $jobs)
{
$jobString = null;
if (count($jobs) > 1)
{
$jobString = implode(', ', array_slice($jobs, 0, -1)) . ' and ' . array_pop($jobs);
}
else
{
$jobString = $jobs[0];
}
$fields = new ArrayData(array('FieldName' => $jobString, 'Value' => $person));
$result->push($fields);
}
When there is more than 1 job for a person, we want to implode (glue together) the array pieces for the $jobs array however we don't want the last element at this point. Once array is glued together, we append with with and along with the last item.
I am writing a dedicated ICS (iCalendar file) parser.
I pass an array to a subroutine. All variables are single values apart from $notdates which is a comma-separated list of dates.
#entryl = ($dtstart, $dtend, $attendee, $lastmod, $uid, $notdates);
&entrytoarray(#entryl);
sub entrytoarray {
# print Dumper #_;
my $shiftdur = (&stamptoepoc($_[1]) - &stamptoepoc($_[0])) / 60 / 60;
my $attendee = $_[2];
my $deleted = $_[5];
$attendee =~ /ATTENDEE;USER-KEY=([^;]*);CN=([^;]*);.*:(.*)/;
my %ehash = (
"STARTDATE" , &stamptodate($_[0]),
"ENDDATE" , &stamptodate($_[1]),
"STARTSTAMP" , $_[0],
"ENDSTAMP" , $_[1],
"USERKEY" , $1,
"CN" , $2,
"EMAIL" , $3,
"LASTMOD" , $_[3],
"UID" , $_[4],
"DURATION" , $shiftdur
);
# Only keep current data
my $fdays = 4;
my $tdays = 7;
chomp(my $curstamp = `TZ="UTC" date -d "$fdays days" +"%Y%m%d%H%M00"`);
chomp(my $stpstamp = `TZ="UTC" date -d "$tdays days" +"%Y%m%d%H%M00"`);
if (($_[0] > $curstamp) && ($_[1] < $stpstamp)) {
if (defined($deleted)) {
my #deleted = split /,/, $deleted;
foreach (#deleted) {
if ($_ ne $_[0]) {
push(#entry, \%ehash);
}
}
}
else {
push(#entry, \%ehash);
}
}
print Dumper #entry;
This works mostly as expected:
$VAR1 = {
'DURATION' => '5',
'STARTSTAMP' => '20141122230000',
'UID' => '20141114T010539Z--1092363482',
'LASTMOD' => '20141118214419',
'STARTDATE' => '2014-11-22 23:00:00',
'EMAIL' => 'xxxxxxxxxxxxx',
'ENDDATE' => '2014-11-23 04:00:00',
'CN' => 'xxxxxxxxxxx',
'ENDSTAMP' => '20141123040000',
'KEY' => 'xxxxxxxxxxxxxxxxxx'
};
$VAR2 = {
'EMAIL' => 'xxxxxxxxxxxxx',
'ENDDATE' => '2014-11-23 23:00:00',
'ENDSTAMP' => '20141123230000',
'KEY' => 'xxxxxxxxxxx',
'CN' => 'xxxxxxxxxxxxxx',
'STARTDATE' => '2014-11-23 19:00:00',
'LASTMOD' => '20141118205901',
'UID' => '20141114T010456Z--1092363482',
'DURATION' => '4',
'STARTSTAMP' => '20141123190000'
};
$VAR3 = $VAR2;
Where is the $VAR3 = $VAR2 coming from?
My guess is that this section is the culprit:
foreach (#deleted) {
if ($_ ne $_[0]) {
push(#entry, \%ehash);
}
}
If you have several values in the array, the if-statement can be true twice, and thus push a value twice. Unless this is wanted behaviour, I would make sure that only one value is pushed. You can do this by using grep instead:
if (grep { $_ ne $_[0] } #deleted) {
push #entry, \%ehash;
}
Note that this replaces the foreach loop.
Your array #entry contains hash references. Data::Dumper is saying that the first and second elements of the array refer to two different hashes, while the third refers to the same hash as the second.
You don't show where #entry comes from, but I would expect all three elements to be references to %ehash.
The problem is that, if you keep pushing a reference to %ehash onto #entry, they all point to the same data item, and the intermediate states of the hash won't be recorded.
Unless you mean entrytoarray to push only one copy of %ehash (in which case there's a separate problem that we can't see) you need to fix it by either writing
push #entry, { %ehash }
which copies the hash and returns a reference to the copy, or you can declare and populate %ehash inside the foreach loop, which will create a new hash each time around the loop.
GENERAL IDEA
Here is a snippet of what I'm working with:
my $url_temp;
my $page_temp;
my $p_temp;
my #temp_stuff;
my #collector;
foreach (#blarg_links) {
$url_temp = $_;
$page_temp = get( $url_temp ) or die $!;
$p_temp = HTML::TreeBuilder->new_from_content( $page_temp );
#temp_stuff = $p_temp->look_down(
_tag => 'foo',
class => 'bar'
);
foreach (#temp_stuff) {
push(#collector, "http://www.foobar.sx" . $1) if $_->as_HTML =~ m/href="(.*?)"/;
};
};
Hopefully it is clear that what I'm hopelessly trying to do is push the link endings found in each of a list of links into an array called #temp_stuff. So the first link in #blarg_links, when visited, has greater than or equal to 1 foo tag with an associated bar class that when acted on by as_HTML will match something I want in the href equality to then pump into an array of links which have the data I'm really after... Does that make sense?
ACTUAL DATA
my $url2 = 'http://www.chemistry.ucla.edu/calendar-node-field-date/year';
my $page2 = get( $url2 ) or die $!;
my $p2 = HTML::TreeBuilder->new_from_content( $page2 );
my #stuff2 = $p2->look_down(
_tag => 'div',
class => 'year mini-day-on'
);
my #chem_links;
foreach (#stuff2) {
push(#chem_links, $1) if $_->as_HTML =~ m/(http:\/\/www\.chemistry\.ucla\.edu\/calendar-node-field-date\/day\/[0-9]{4}-[0-9]{2}-[0-9]{2})/;
};
my $url_temp;
my $page_temp;
my $p_temp;
my #temp_stuff;
my #collector;
foreach (#chem_links) {
$url_temp = $_;
$page_temp = get( $url_temp ) or die $!;
$p_temp = HTML::TreeBuilder->new_from_content( $page_temp );
#temp_stuff = $p_temp->look_down(
_tag => 'span',
class => 'field-content'
);
};
foreach (#temp_stuff) {
push(#collector, "http://www.chemistry.ucla.edu" . $1) if $_->as_HTML =~ m/href="(.*?)"/;
};
n.b. - I want to use HTML::TreeBuilder. I'm aware of alternatives.
This is a rough attempt at what I think you want.
It fetches all the links on the first page and visits each of them in turn, printing the link in each <span class="field-content"> element.
use strict;
use warnings;
use 5.010;
use HTML::TreeBuilder;
STDOUT->autoflush;
my $url = 'http://www.chemistry.ucla.edu/calendar-node-field-date/year';
my $tree = HTML::TreeBuilder->new_from_url($url);
my #chem_links;
for my $div ( $tree->look_down( _tag => 'div', class => qr{\bmini-day-on\b} ) ) {
my ($anchor)= $div->look_down(_tag => 'a', href => qr{http://www\.chemistry\.ucla\.edu});
push #chem_links, $anchor->attr('href');
};
my #collector;
for my $url (#chem_links) {
say $url;
my $tree = HTML::TreeBuilder->new_from_url($url);
my #seminars;
for my $span ( $tree->look_down( _tag => 'span', class => 'field-content' ) ) {
my ($anchor) = $span->look_down(_tag => 'a', href => qr{/});
push #seminars, 'http://www.chemistry.ucla.edu'.$anchor->attr('href');
}
say " $_" for #seminars;
say '';
push #collector, #seminars;
};
For a more modern framework for parsing webpages, I would suggest you take a look at Mojo::UserAgent and Mojo::DOM. Instead of having to manually march through each section of your html tree, you can use the power of css selectors to zero in on the specific data that you want. There's a nice 8 minute introductory video on the framework at Mojocast Episode 5.
# Parses the UCLA Chemistry Calendar and displays all seminar links
use strict;
use warnings;
use Mojo::UserAgent;
use URI;
my $url = 'http://www.chemistry.ucla.edu/calendar-node-field-date/year';
my $ua = Mojo::UserAgent->new;
my $dom = $ua->get($url)->res->dom;
for my $dayhref ($dom->find('div.mini-day-on > a[href*="/day/"]')->attr('href')->each) {
my $dayurl = URI->new($dayhref)->abs($url);
print $dayurl, "\n";
my $daydom = $ua->get($dayurl->as_string)->res->dom;
for my $seminarhref ($daydom->find('span.field-content > a[href]')->attr('href')->each) {
my $seminarurl = URI->new($seminarhref)->abs($dayurl);
print " $seminarurl\n";
}
print "\n";
}
Output is identical to that of Borodin's solution using HTML::TreeBuilder:
http://www.chemistry.ucla.edu/calendar-node-field-date/day/2014-01-06
http://www.chemistry.ucla.edu/seminars/nano-rheology-enzymes
http://www.chemistry.ucla.edu/calendar-node-field-date/day/2014-01-09
http://www.chemistry.ucla.edu/seminars/imaging-approach-biology-disease-through-chemistry
http://www.chemistry.ucla.edu/calendar-node-field-date/day/2014-01-10
http://www.chemistry.ucla.edu/seminars/arginine-methylation-%E2%80%93-substrates-binders-function
http://www.chemistry.ucla.edu/seminars/special-inorganic-chemistry-seminar
http://www.chemistry.ucla.edu/calendar-node-field-date/day/2014-01-13
http://www.chemistry.ucla.edu/events/robert-l-scott-lecture-0
...
I have written the code below in Perl but it's not giving the desirable output. I am dealing with the comparison between one array and two hash of arrays.
Given sample input files:
1) file1.txt
A6416 A2318
A84665 A88
2) hashone.pl
%hash1=(
A6416=>['E65559', 'C11162.1', 'c002gnj.3',],
A88=>['E77522', 'M001103', 'C1613.1', 'c001hyf.2',],
A84665=>['E138347', 'M032578', 'C7275.1', 'c009xpt.3',],
A2318=>['E128591', 'C43644.1', 'C47705.1', 'c003vnz.4',],
);
3) hashtwo.pl
%hash2=(
15580=>['C7275.1', 'E138347', 'M032578', 'c001jnm.3', 'c009xpt.2'],
3178=>['C1613.1', 'E77522','M001103', 'c001hyf.2', 'c001hyg.2'],
24406=>['C11162.1', 'E65559', 'M003010', 'c002gnj.2'],
12352=>['C43644.1', 'C47705.1', 'E128591','M001458', 'c003vnz.3'],
);
My aim is to achieve the task described:
From file1.txt, I have to locate the corresponding ID in %hash1. For instance,A6416 (file1.txt) is the key in %hash1. Next, I have to find the values of A6416 ['E65559', 'C11162.1', 'c002gnj.3',] in %hash2. If majority (more than 50%) of the values are found in %hash2, I replace A6416 with corresponding key from %hash2.
Example:
A6416 A2318
A84665 A88
Output:
24406 12352
15580 3178
Please note that the keys for %hash1 and %hash2 are different (they don't overlap). But the values are the same (they overlap).
#!/usr/bin/perl -w
use strict;
use warnings;
open FH, "file1.txt" || die "Error\n";
my %hash1 = do 'hashone.pl';
my %hash2 = do 'hashtwo.pl';
chomp(my #array=<FH>);
foreach my $amp (#array)
{
if ($amp =~ /(\d+)(\s?)/)
{
if (exists ($hash1{$1}))
{
for my $key (keys %hash2)
{
for my $i ( 0 .. $#{ $hash2{$key} } )
{
if ((#{$hash1{$1}}) eq ($hash2{$key}[$i]))
{
print "$key";
}
}
}
}
}
}
close FH;
1;
Any guidance on this problem is highly appreciated. Thank you!
I think you should invert %hash2 into this structure:
$hash2{'C7275.1'} = $hash2{'E138347'} = $hash2{'M032578'}
= $hash2{'c001jnm.3'} = $hash2{'c009xpt.2'} = 15580;
$hash2{'C1613.1'} = $hash2{'E77522'} = $hash2{'M001103'}
= $hash2{'c001hyf.2'} = $hash2{'c001hyg.2'} = 3178;
$hash2{'C11162.1'} = $hash2{'E65559'}
= $hash2{'M003010'} = $hash2{'c002gnj.2'} = 24406;
$hash2{'C43644.1'} = $hash2{'C47705.1'} = $hash2{'E128591'}
= $hash2{'M001458'} = $hash2{'c003vnz.3'} = 3178;
So that you can perform these look-ups much more effectively, rather than having to iterate over every element of every element of %hash2.
Building on the responses from ruakh and zock here is the code you need to build the look-up table for hash2
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash2=(
15580=>['C7275.1', 'E138347', 'M032578', 'c001jnm.3', 'c009xpt.2'],
3178=>['C1613.1', 'E77522','M001103', 'c001hyf.2', 'c001hyg.2'],
24406=>['C11162.1', 'E65559', 'M003010', 'c002gnj.2'],
12352=>['C43644.1', 'C47705.1', 'E128591','M001458', 'c003vnz.3'],
);
# Build LUT for hash2
my %hash2_lut;
foreach my $key (keys %hash2)
{
foreach my $val (#{$hash2{$key}})
{
$hash2_lut{$val} = $key
}
}
print Dumper(\%hash2_lut);
Please select ruakh's post as the answer, just trying to clarify the code for you. Use Data::Dumper...it is your friend.
Here is the output:
$VAR1 = {
'C47705.1' => '12352',
'M032578' => '15580',
'E138347' => '15580',
'E77522' => '3178',
'C7275.1' => '15580',
'c001jnm.3' => '15580',
'E65559' => '24406',
'C1613.1' => '3178',
'M001458' => '12352',
'c002gnj.2' => '24406',
'c009xpt.2' => '15580',
'c001hyf.2' => '3178',
'C43644.1' => '12352',
'E128591' => '12352',
'c001hyg.2' => '3178',
'M003010' => '24406',
'c003vnz.3' => '12352',
'C11162.1' => '24406',
'M001103' => '3178'
};
I am new to complex data structures. I kind of understand the idea behind them but am having some difficulty getting the data out. I found out the structure of my current problem child by using Data::Dumper....
$VAR1 = {
'4' => {
'engine_coded' => 0,
'name' => 'FILTER_1',
'filter_actions' => {
'X_Override_Queue_Level' => 'Value'
},
'filter_criteria' => [
[
'X_Charge',
'=',
'X_CHARGE_1'
]
]
}
};
What I am needing to do is make sure that given a filter name ("4" in this case") that "name" has a value, as well as "filter_actions" and "filter_criteria".
Anyone have an idea how to best accomplish this? Many thanks!
Janie
Let's break this down into pieces...
First, create a function which validates a structure:
sub validate
{
my ($data) = #_;
# always return true for now
return 1;
}
Now let's start filling in the bits... you want to use a filter name as part of the validation checks, so let's add that as an argument:
sub validate
{
my ($data, $filter_name) = #_;
# always return true for now
return 1;
}
Before doing anything else, it would make sense to check if that filter name exists as a key; if it doesn't, validation has failed:
sub validate
{
my ($data, $filter_name) = #_;
return if not exists $data->{$filter_name};
# otherwise, return true
return 1;
}
Now also check that there is a value. Since definedness in a hash key is a superset of 'exists' (any value that is defined must also exist, but not every value that exists needs to be defined - as undef could be the value), the first check can be omitted:
sub validate
{
my ($data, $filter_name) = #_;
return if not defined $data->{$filter_name};
# otherwise, return true
return 1;
}
We've checked that the filter_name key is present in the data and it is defined, but before looking one level deeper, we need to confirm that it really is a hashref:
sub validate
{
my ($data, $filter_name) = #_;
return if not defined $data->{$filter_name};
return if ref $data->{$filter_name} ne 'HASH';
# otherwise, return true
return 1;
}
Now look for the 'filter_actions' and 'filter_criteria' keys under the filter name:
sub validate
{
my ($data, $filter_name) = #_;
return if not defined $data->{$filter_name};
return if ref $data->{$filter_name} ne 'HASH';
return if not defined $data->{$filter_name}{filter_actions};
return if not defined $data->{$filter_name}{filter_actions};
# otherwise, return true
return 1;
}
That's it! Be sure to read up on using perl data structures in perldoc perlreftoot, perldoc perlref, and perldoc perldsc.
You can access filter_actions/etc by checking $var->{4}->{filter_actions}. You might look at perldsc for a comprehensive overview of perl's data structures.
Here's my take. You are just checking to see if the data is there in the filter. If you wanted more structural validation that part comes later.
use List::Util qw<first>;
sub validate_filter {
my ( $filters_ref, $filter_name ) = #_;
my $filter = $filter_name ? $filters_ref->{$filter_name} : $filters_ref;
return 1 unless
my $missing
= first { !!$filter->{ $_ } }
qw<name filter_actions filter_criteria>
;
if ( $missing ) {
Carp::croak( '"Missing '$missing' in filter!" );
}
}
Okay, my first thought was that it would accept the structure and the name, but of course is you know enough when you call
validate_filter( $filters, 4 );
you know enough to pass:
validate_filter( $filters->{4} );
So the parameter processing might not be the easiest to understand, but it makes sense in terms of parameter passing.
If you're after validating structure, you might choose this route. Based on your data, I show an example of a validation failure if a given filter_criteria cluster does not have a '=' operator every 3rd slot.
Like so:
use Carp qw<croak>;
use List::Util qw<first>;
use Params::Util ();
sub _test {
return 1 if shift->( $_ );
local $Carp::CarpLevel = $Carp::CarpLevel + 2;
Carp::croak( shift );
}
my $validators
= { filter_actions => sub {
croak 'filter_actions is not deinfed!' unless defined;
_test( \&Params::Util::_HASH, 'filter_actions must be hash!' );
}
, filter_criters => sub {
croak 'filter_criteria is not defined!' unless defined $crit;
_test( \&Params::Util::_ARRAY, 'filter_criteria must be non-empty ARRAY!' );
foreach ( #$crit ) {
_test( \&Params::Util::_ARRAY, 'criteria must be non-empty ARRAYs' );
_test( sub {
my $arr = shift;
return if #$arr % 3;
# return whether any slot in sequence is not '='
return !first { $arr->[$_] ne '=' }
# every 3 beginning at 1
grep { $_ % 3 == 1 } (1..$#$arr)
;
}
, 'criteria must be key-value pairs separated by equal sign!'
);
}
}
};
And this would change the validate_filter sub like so:
sub validate_filter {
my ( $filters_ref, $filter_name ) = #_;
my $filter = $filter_name ? $filters_ref->{$filter_name} : $filters_ref;
return 1 unless
my $missing
= first {
return 1 unless $filter->{ $_ };
return unless my $validator = $validators->{ $_ };
local $_ = $filter->{ $_ };
return 1 if $validator->( $_ );
}
qw<name filter_actions filter_criteria>
;
if ( $missing ) {
Carp::croak( "Missing '$missing' in filter!" );
}
}