Related
I need to evaluate a series of strings that I want to convert into a hash of arrays. Take into account that, in this case, I want to add an array to an entry in a hash that it is already storing an array. I need to get the following hash:
ConfigurationHash{'Editor'} = (John, Mary, Jane, Peter)
I have stripped down my code to this example:
use strict;
use warnings;
my %ConfigurationHash;
my $String1 = "Editor=John,Mary";
my $String2 = "Editor=Jane,Peter";
my #Line1 = split ("=", $String1);
my #Line2 = split ("=", $String2);
my $Variable1 = #Line1[0];
my $Value1 = #Line1[1];
my $Variable2 = #Line2[0];
my $Value2 = #Line2[1];
my #Values1Array = split(",", $Value1);
my #Values2Array = split(",", $Value2);
if ( ! exists $ConfigurationHash{$Variable1} ) {
$ConfigurationHash{$Variable1} = #Values1Array;
}
else {
push (#ConfigurationHash{$Variable1}, #Values1Array);
}
which produces the following error:
Experimental push on scalar is now forbidden at ./test.pl line 25, near "#Values1Array)"
Execution of ./test.pl aborted due to compilation errors.
I know that the problem lies in references/dereferences, but my knowledge of perl is so basic that I'm not able to figure how to get there by myself.
Could anybody show me how to do it? I would also appreciate if you could show me how to iterate the values of the array in the hash once it is created.
It's unclear why you have $String2 and its derivatives in your code as they are never used. This code processes both strings
You simply need to push the list of values to the array corresponding to $Variable1 (dreadful choice of identifier) in the hash. Accomplish this by dereferencing the array element
use strict;
use warnings;
my %config;
my $s1 = 'Editor=John,Mary';
my $s2 = 'Editor=Jane,Peter';
for ( $s1, $s2 ) {
my ($key, #values) = split /[=,]/;
push #{ $config{$key} }, #values;
}
use Data::Dumper;
print Dumper \%config;
output
$VAR1 = {
'Editor' => [
'John',
'Mary',
'Jane',
'Peter'
]
};
This line isn't doing what you think it does.
$ConfigurationHash{$Variable1} = #Values1Array;
If you printed out what $ConfigurationHash{$Variable1} contains you'll find it only contains the size of #Values1Array.
You should be fine to use push but with a slight modification to what you've written.
push #{$ConfigurationHash{$Variable1}}, #Values1Array;
I've also removed the brackets as you don't need them.
As for iterating over the array, it is no different to iterating over a regular array. You were likely having problems iterating over it before as you didn't have an array
foreach my $whatever (#{$ConfigurationHash{$Variable1}})
{
# Code
}
Thank you to all who posted answers. #Borodin, you're right, I missed a second block that used $String2 and its derivatives, but I think it was obvious it was at the end and was similar to the if-else block in my original code.
Thank you, #chris-turner, for giving me the hint on how to use push the right way and pointing out the error in the $ConfigurationHash{$Variable1} = #Values1Array;
With all these contributions I figured out that the right code I was expecting is:
use strict;
use warnings;
my %ConfigurationHash;
my $String1 = "Editor=John,Mary";
my $String2 = "Editor=Jane,Peter";
my #Line1 = split ("=", $String1);
my #Line2 = split ("=", $String2);
my $Variable1 = $Line1[0];
my $Value1 = $Line1[1];
my $Variable2 = $Line2[0];
my $Value2 = $Line2[1];
my #Values1Array = split(",", $Value1);
my #Values2Array = split(",", $Value2);
if ( ! exists $ConfigurationHash{$Variable1} ) {
$ConfigurationHash{$Variable1} = \#Values1Array;
}
else {
#push (#ConfigurationHash{$Variable1}, #Values1Array);
push #{$ConfigurationHash{$Variable1}}, #Values1Array;
}
if ( ! exists $ConfigurationHash{$Variable2} ) {
$ConfigurationHash{$Variable2} = \#Values2Array;
}
else {
#push (#ConfigurationHash{$Variable2}, #Values2Array);
push #{$ConfigurationHash{$Variable2}}, #Values2Array;
}
use Data::Dumper;
print Dumper \%ConfigurationHash;
Which outputs the following:
$VAR1 = {
'Editor' => [
'John',
'Mary',
'Jane',
'Peter'
]
};
Using Text::Table or Text::ANSITable, or something similar...
If I have a two-dimensional array (which represents a grid of data), where the first row can be the column headers, how can I apply that data and format it into a command line grid with columns.
Something like this: PERL : How to create table from an array?
Except that the number of rows and columns is variable depending on the array setup and needs to automatically output as such.
Thanks
You can use Text::Table to accomplish this, as it handles variable numbers of rows and columns. Although the documentation leaves a lot to be desired, you can usually look to the test files or examples to show you how the library should actually get used. I've adapted example.pl to illustrate this:
table.pl
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use Text::Table;
binmode STDOUT, ':utf8';
my ($rows, $cols) = #ARGV;
$rows ||= 5;
$cols ||= 7;
my #cols = map { "column " . $_} ( 1..$cols );
my $sep = \'│';
my $major_sep = \'║';
my $tb = Text::Table->new($sep, " Number ", $major_sep,
(map { +(" $_ ", $sep) } #cols)
);
my $num_cols = #cols;
for my $row (1..$rows) {
$tb->load([ "row $row", map { "r$row,c$_" } ( 1..$cols ) ]);
}
my $make_rule = sub {
my ($args) = #_;
my $left = $args->{left};
my $right = $args->{right};
my $main_left = $args->{main_left};
my $middle = $args->{middle};
return $tb->rule(
sub {
my ($index, $len) = #_;
return ('─' x $len);
},
sub {
my ($index, $len) = #_;
my $char =
( ($index == 0) ? $left
: ($index == 1) ? $main_left
: ($index == $num_cols+1) ? $right
: $middle
);
return $char x $len;
},
);
};
my $start_rule = $make_rule->(
{
left => '┌',
main_left => '╥',
right => '┐',
middle => '┬',
}
);
my $mid_rule = $make_rule->(
{
left => '├',
main_left => '╫',
right => '┤',
middle => '┼',
}
);
my $end_rule = $make_rule->(
{
left => '└',
main_left => '╨',
right => '┘',
middle => '┴',
}
);
print $start_rule, $tb->title,
(map { $mid_rule, $_, } $tb->body()), $end_rule;
output
perl table.pl 3 5
┌────────╥──────────┬──────────┬──────────┬──────────┬──────────┐
│ Number ║ column 1 │ column 2 │ column 3 │ column 4 │ column 5 │
├────────╫──────────┼──────────┼──────────┼──────────┼──────────┤
│row 1 ║r1,c1 │r1,c2 │r1,c3 │r1,c4 │r1,c5 │
├────────╫──────────┼──────────┼──────────┼──────────┼──────────┤
│row 2 ║r2,c1 │r2,c2 │r2,c3 │r2,c4 │r2,c5 │
├────────╫──────────┼──────────┼──────────┼──────────┼──────────┤
│row 3 ║r3,c1 │r3,c2 │r3,c3 │r3,c4 │r3,c5 │
└────────╨──────────┴──────────┴──────────┴──────────┴──────────┘
While modules offer easier control and features, if you only need to print it out as a grid
use warnings;
use strict;
use feature 'say';
my #ary = ([1..3], [10..12], [100..102]);
foreach my $row (#ary) {
printf "%7.3f ", $_ for #$row;
say ''
}
Prints
1.000 2.000 3.000
10.000 11.000 12.000
100.000 101.000 102.000
Choose your specifier (%7.3f above) accordingly to what data you have. See sprintf
If the first row is headers, shift it off of the array and print with the same width but using %s
my #ary = ([qw(one two three)], [1..3], [10..12], [100..102]);
printf "%7s ", $_ for #{shift #ary};
say '';
foreach my $row (#ary) {
printf "%7.3f ", $_ for #$row;
say ''
}
This prints the same as above but with (aligned) column names in the first row.
If "grid of data" means numeric data, then the code can discover whether there is a header line, with a reasonable assumption that the first line contains something non-numeric
use List::Util qw(any);
my $have_header = any { /[^0-9.+-]/ } #{$ary[0]};
what also assumes the absence of NaN and 1e02 or such on the first line (can be checked).
Better yet, use looks_like_number from Scalar::Util
use List::Util qw(any);
use Scalar::Util qw(looks_like_number);
my $have_header = any { not looks_like_number($_) } #{$ary[0]};
This uses Perl's internal sense of what a number is, and accounts for NaN and exponential notation, among other things.
If the program needs to find out the width of columns, or they differ a lot, there is more to do for a nice output since we need the maximum width of each column.
use warnings;
use strict;
use feature 'say';
use List::Util qw(max);
my $file = shift #ARGV || 'data.txt';
open my $fh, '<', $file or die "Can't open $file: $!";
my #ary = map { [ split ] } <$fh>;
my #maxw = (1) x #{$ary[0]};
for my $r (#ary) {
for (0..$#$r) {
my $len = length $r->[$_];
$maxw[$_] = $len if $len > $maxw[$_]
}
};
my $hdr = shift #ary;
printf "%$maxw[$_]s ", $hdr->[$_] for 0..$#$hdr;
say '';
for my $i (0..$#ary) {
printf "%-$maxw[$_].2f ", $ary[$i]->[$_] for 0..$#{$ary[$i]};
say '';
}
This expects numbers, except for the header. A few other reasonable assumptions are made.
With input data.txt file used to populate the array
one two three
1.12 1.1 12
1.00 10.00 102.00
the program prints
one two three
1.12 1.10 12.00
1.00 10.00 102.00
If there is a reason to "underline" the column names print this line right after headers
say join(" ", map { '-'x$maxw[$_] } 0..$#$hdr);
(However, tables are generally most readable when free of extra decorations.)
Note. If the numbers are computed in the program those scalars are used as numbers and the length of the string representing them in print should be queried as length sprintf "%s", $num. When they are read from a file they are taken as strings, what is used above for simplicity.
Note for another refinement. The above uses (fixed) 2 decimal places. That can be changed, if needed, so to first count the number of decimal places in input and print out accordingly. However, it is probably more sensible to decide on the uniform width, or at least fix the maximum width. (One doesn't want an accidental column with 16 decimal places, most useless!)
I have a major array #major that I want to divide in a given number of minors arrays #minor , by setting the number of slices I want (let's say 4 here, but I wish to be able to chose whatever number I want), like this pseudo-code:
(I know it's not correct, but it sort of gives you the idea)
#major = A,B,C,D,E,F,G,H,I,J
$slice = 4;
$arraySize = $#array + 1;
$slicesize = $arraySize / $slice;
#minor1 = #major[0..$slicesize]
#minor2 = #major[($slicesize+1)..(2*$slicesize)]
#minor3 = #major[((2*$slicesize)+1)..(3*$slicesize)]
#minor4 = #major[((3*$slicesize)+1)..(4*$slicesize)]
The goal here is that I want to be able to change this size of the initial array #major and/or the number of slices $slice and that all values which set the size of the differents minor arrays (($slicesize+1), (2*$slicesize), and so on).
I know this looks a bit complex but I don't know to expose it in another way.
Do you have any idea about how to achieve this ?
I am not pretty sure you meant like this, but here it is how I understood your point:
#!/usr/bin/perl
use strict; use warnings; use 5.010;
my #array = ( 'A' .. 'Z' );
my $length_of_slice = $ARGV[0] || 5 ;
while ( #array ) {
local $" = ', ';
my #minor = splice #array, 0, $length_of_slice;
say "#minor";
}
When you have a complex data structure requirement, my first thought is - use an object. Perl supports object oriented programming, which allows you to do all manner of insanely complicated things like what you're trying to do.
It'd go something like this:
#!/usr/bin/perl
use strict;
use warnings;
package TestObject;
my #major = qw(A B C D E F G H I J );
sub new {
my ( $class, $slice_count ) = #_;
my $self = {};
if ( defined $slice_count ) { $self->{slice_count} = $slice_count }
$self->{major} = #major;
bless( $self, $class );
return $self;
}
sub get_slices {
my ( $self, $count ) = #_;
my #tmp = #major;
my #array_of_slices;
for ( 1 .. $count ) {
my $tmp_arr_ref = ();
for ( 0 .. ( #major / $count ) ) {
if ( #tmp ) { push( #$tmp_arr_ref, shift(#tmp) ) };
}
push( #array_of_slices, $tmp_arr_ref );
}
return (#array_of_slices);
}
Called by:
#!/usr/bin/perl
use strict;
use warnings;
use TestObject;
use Data::Dumper;
my $first = TestObject->new();
my #array_of_refs = $first->get_slices(4);
print Dumper \#array_of_refs;
And giving result of:
$VAR1 = [
[
'A',
'B',
'C'
],
[
'D',
'E',
'F'
],
[
'G',
'H',
'I'
],
[
'J'
]
];
Something like that (you'll have to adjust it a bit to get precisely what you have in mind depending on how you want to handle edge cases/rounding).
I have the following data in a hash of hash of arrays. The numerical data in the array represents that last 4 quarters of financial information.
I'd like to be able to iterate over the array, and pull out the data by quarter, to get it ready to insert into a database. With my code below I can get all of the quarters, or one quarter only if I specifically call it. When I try and add another loop to iterate over the hash of hash of arrays to return only the array subset values, i get all the values and I don't know what I'm doing wrong. See code: Thanks for the help
my %comp_info = (
CompanyA => {
COGS => ["175.00", "155.00", "125.00", "125.00"],
Revenue => ["300.00", "200.00", "250.00", "225.00"],
},
)
# The following works, but I have to specifically push one array subset at a time,
# which makes passing values to db_insert subroutine ineffective.
# (Id have to have 4 or 5 calls depending on number of quarters of data in each record).
sub get_insert_arrays {
foreach $comp (keys %comp_info ) {
foreach $column ( keys %{$comp_info{$comp}} ) {
push (#insert_array, #{$comp_info{$sym}{$column}}[0] );
}
}
my $valuelist = join(", ", #insert_array);
&db_insert($valuelist);
undef #insert_array;
}
#try to loop through, fails, I get all of the data in #insert_array instead of only one quarter.
sub get_insert_arrays {
foreach $comp (keys %comp_info ) {
foreach $column ( keys %{$comp_info{$comp}} ) {
for my $i ( 0 .. $#{$comp_info{$comp}{$column}} ) {
push (#insert_array, #{$comp_info{$sym}{$column}}[$i] );
}
}
my $valuelist = join(", ", #insert_array);
&db_insert($valuelist);
undef #insert_array;
undef $valuelist;
}
}
I highly recommend to dereference with intermediate variables, and also using the -> syntax. Both of these help you figure out what's going on:
Here's your first subroutine using dereferencing:
sub get_insert_arrays {
my #insert_array;
foreach $comp (keys %comp_info ) { # References to a hash (Revenue, COGS)
%columns = %{ $comp_info{$comp} }; # Dereference
foreach $column ( keys %columns ) { # Reference to an Array (quarter data)
my #values = #{ $column } ; # Dereference
push (#insert_array, $column );
my $valuelist = join(", ", #insert_array);
&db_insert($valuelist);
}
}
}
Hmm... Looking at this, it's easy to see that I can simply do:
for my $comp (keys %comp_info ) { # References to a hash (Revenue, COGS)
%columns = %{ $comp_info{$comp} }; # Dereference
for my $column ( keys %columns ) { # Reference to an Array (quarter data)
my #values = #{ $column } ; # Dereference
db_insert(#values);
}
}
}
If you need to see a particular piece of data, use the -> syntax to simplify your structure:
${$comp_info{$sym}{$column}}[$i]; # You had "#". Should be "$".
vs.
$comp_info{$sym}->{$column}->[$i];
Much easier to read.
Also use the warnings and strict pragmas in your program. It'll catch a lot of errors that may include undefined variables, and misspelled variable names.
If you're pulling out data quarter-by-quarter, you probably want the COGS to be with the Revenue column:
#! /usr/bin/env perl
#
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say);
#
# Just initializing the data you had
#
my %comp_info = ( CompanyA => {
Revenue => [
300.00, 200.00, 250.00, 225.00
],
COGS => [
175.00, 155.00, 125.00, 125.00
],
},
);
#
# Go through each company
#
for my $company ( keys %comp_info ) {
my #revenues = #{ $comp_info{$company}->{Revenue} }; # Dereference
my #cogss = #{ $comp_info{$company}->{COGS} }; # Dereferenec
say "Company: $company";
#
# I know the keys are "Revenue" and "COGS", so I don't need a loop.
# I'll just go right to my quarters data. Note that dereferencing
# makes my program a lot easier to write and maintain
#
for my $quarter ( (0..3) ) {
my $revenue = $revenues[$quarter];
my $cogs = $cogss[$quarter];
my $profit = $revenue - $cogs;
say " Quarter: " . ($quarter - 1)
. " Revenue = $revenue COGS = $cogs Profit = $profit";
}
}
How you do your database inserts is up to you. But, you can see how doing a bit of dereferencing and using -> can clarify what you're looking at.
Addendum
how do I pull out just the quarter by quarter data without having to specify revenue, cogs, etc. in some cases there could be 30+ fields, so I don't want to have to specify each field in the program. I just want to grab all Q1 fields, insert, grab all Q2 field, insert, etc.
So two loops:
Like before, I have my outer loop go through for each company which just happens to be the key for that %comp_info hash.
Each value in the %comp_info hash is a reference to another hash that is keyed by the data type (COGS, Revenue, etc.). Again, I simply loop through the keys of that inner hash (after dereferencing to make it easier to understand).
Now that I have the company name (the key to that %comp_info hash, and a list of the keys in that inner hash, I can simply pull up the first quarter numbers for each company and each data type. Getting the quarter value is simple: $comp_info{$company}->{$type}->[$quarter]. Note that there is three levels of data, and I have three sections in my variable with each section separated by ->.
I can see that the outer most section is a simple hash that's keyed by the company name: ($comp_info{$company}).
This %comp_info hash points to a hash reference (->{type}) which is keyed by the data types (COGS, Revenue, etc.).
And that hash reference points to an array reference for each quarter (->[$quarter]). See how that works and why I like that -> syntax? It makes it very clear what I am working with.
This is just the first quarter results. If I wanted to go through each and every quarter, I could have an outer loop for my $quarter (0..3) {.
Here's what it looks like. This is a complete program, so you could cut this out, and try running it yourself and see if you can figure out what's going on.
use strict; # Lets you know when you misspell variable names
use warnings; # Warns of issues (using undefined variables
use feature qw(say);
my $quarter = 0; #First Quarter is 0. Last quarter is 3
my %comp_info = ( CompanyA => {
Revenue => [
300.00, 200.00, 250.00, 225.00
],
COGS => [
175.00, 155.00, 125.00, 125.00
],
},
);
for my $company ( keys %comp_info ) {
say "Company: $company";
%types = %{ $company_info{$company} };
for my $type ( keys %types ) { # COGS, Revenue, etc.
say " $type for quarter "
. ($quarter + 1) . ": "
. $comp_info{$company}->{$type}->[$quarter];
}
}
One more time for each quarter to insert the data into your database:
Using use strict and declaring variables with my means that variables are only valid for a limited scope. That my #type_data; declares a array that holds my type values for inserting into your database. However, since it's declared inside that for my $quarter loop, the array and its values disappears with each iteration of the loop. No need to have to remove the data or reinitialize the variable. It does it all by itself!
Look up how Lexically Scoped Variables work.
for my $quarter ( (0..3) ) {
my #type_values;
for my $company ( keys %comp_info ) {
my %types = %{ $comp_info{$company} };
for my $type ( keys %types ) { # COGS, Revenue, etc.
push #type_values, $comp_info->{$company}->{$type}->{quarter};
}
insert_data( #type_values ); # Database insert you wanted
}
Your recent addition - undef #insert_array; undef $valuelist; is a misuse of undef. Putting it before a variable like that forces a garbage collection cycle, which is something you don't want to do -- it is best to let Perl look after things itself.
Arrays should be emptied with #insert_array = () rather than using undef, and for scalars you should $valuelist = undef. But these variables are irrelevant outside the subroutine, so you should declare them inside, in which case there is no need to reinitialise them in the first place.
Please bear in mind what I said about calling prepare on an SQL statement with placeholders. Your code should look something like this
my $insert = $dbh->prepare('INSERT INTO table VALUES (?, ?)');
and later
my #insert_array = (175.00, 300.00);
$insert->execute(#insert_array);
However I have written this, which I think does what you want, to create a $valuelist string as your own code does. Since you don't need the hash keys, it is much tidier to iterate over the values instead. The db_insert subroutine is a dummy that just prints the values of the parameter passed to it.
use strict;
use warnings;
use 5.010;
my %comp_info = (
CompanyA => {
COGS => ["175.00", "155.00", "125.00", "125.00"],
Revenue => ["300.00", "200.00", "250.00", "225.00"],
},
);
my #values = map values %$_, values %comp_info;
for my $i (0 .. $#{$values[0]}) {
my #insert = map $_->[$i], #values;
db_insert(join ', ', #insert);
}
sub db_insert {
say "db_insert('#_')";
}
output
db_insert('175.00, 300.00')
db_insert('155.00, 200.00')
db_insert('125.00, 250.00')
db_insert('125.00, 225.00')
Update
To comply with the new specification:
use strict;
use warnings;
use 5.010;
my %comp_info = (
CompanyA => {
COGS => ["175.00", "155.00", "125.00", "125.00"],
Revenue => ["300.00", "200.00", "250.00", "225.00"],
},
CompanyB => {
COGS => ["175.00", "155.00", "125.00", "125.00"],
Revenue => ["300.00", "200.00", "250.00", "225.00"],
},
CompanyC => {
COGS => ["175.00", "155.00", "125.00", "125.00"],
Revenue => ["300.00", "200.00", "250.00", "225.00"],
},
);
my #columns = qw/ COGS Revenue /;
for my $comp (keys %comp_info) {
my $data = $comp_info{$comp};
for my $i (0 .. $#{(values %$data)[0]}) {
my #values = ( $comp, map $_->[$i], #{$data}{#columns} );
db_insert(join ', ', #values);
}
}
sub db_insert {
say "db_insert('#_')";
}
output
db_insert('CompanyC, 175.00, 300.00')
db_insert('CompanyC, 155.00, 200.00')
db_insert('CompanyC, 125.00, 250.00')
db_insert('CompanyC, 125.00, 225.00')
db_insert('CompanyA, 175.00, 300.00')
db_insert('CompanyA, 155.00, 200.00')
db_insert('CompanyA, 125.00, 250.00')
db_insert('CompanyA, 125.00, 225.00')
db_insert('CompanyB, 175.00, 300.00')
db_insert('CompanyB, 155.00, 200.00')
db_insert('CompanyB, 125.00, 250.00')
db_insert('CompanyB, 125.00, 225.00')
This is a loop and the array does not get overwritten, it just grows.
for ( $j=0; $j <=$#just_ecps ; $j++){
print "$just_ecps[$j]\n";
for ($x=0; $x<=$#folder_dates ; $x++){
my $archivo_histo = "/home/ha2_d11/data/ibprod/archive/$folder_dates[$x]/$just_ecps[$j]/ghistogram.gz";
next unless (-r $archivo_histo);
open(FILEHANDLE, "gunzip -c $archivo_histo |") or die ("could not open file $archivo_histo");
while (<FILEHANDLE>) {
if ($_ =~ /ave:\s+(\d+\.\d+)\s/){
push ( #ecp_average , $1);
sleep 1;
}
print "#ecp_average\n";
}
}
In every instance it is the last three values that are valid, everything before it is a duplicate. I need to get rid of the duplicates and just keep the last three values.
Eislnd1
0.00420743 0.00414601 0.0044511
Eislnd2
0.00420743 0.00414601 0.0044511 0.00303575 0.00309721 0.00302753
Eislnd3
0.00420743 0.00414601 0.0044511 0.00303575 0.00309721 0.00302753 0.0031753 0.00324729 0.00295381
Eislnd4
0.00420743 0.00414601 0.0044511 0.00303575 0.00309721 0.00302753 0.0031753 0.00324729 0.00295381 0.00324191 0.00344244 0.00311481
You need to clear the array for every file:
open(FILEHANDLE, "gunzip -c $archivo_histo |") or die ...
#ecp_average = ();
while (<FILEHANDLE>) {
...
}
At some point you'll want to read up on lexically-scoped variables (i.e. the my declaration)
but for now this should work.
Things to note that will help improve this code:
use strict; use warnings;
Consider using for my $just_ecp ( #just_ecps ) { ... } instead of the C-style construct
Defining an averaging subroutine ( sub avg { sum(#_) / #_ } ) and using it ( avg( #ecp ) ) is a more intuitive way to compute averages instead of mucking around with array lengths. All it takes one line!