Perl: Group input data by first field - arrays

I have a data file that contains an interface name and destination. I want to group all destinations by interface so that I can iterate through and store results. Here is an example of my output:
eth0,1.1.1.1
eth0,1.1.1.2
eth1,1.1.1.1
eth1,1.1.1.2
How do i dump the unique interface values into a hash and build an array of destinations?

my %ifs;
while ( my $line = <STDIN> ) {
chomp $line;
my ( $iface, $destination ) = split /,/, $line;
push #{ $ifs{ $iface } }, $destination;
}
Should work.

Related

Parse a CSV into Perl hash in a sub and return the value passed to that sub

I try to parse a csv into hash in perl in a sub, then return value of key passed to sub as an argument. CSV file is like:
host1, 12121
host2, 34324252345
host3, 45345
host4, 56363425
host5, 3.1
hostn, 435345
And my code is like:
#!/usr/bin/perl -w
my $host_key = getValue('host1');
print $host_key;
sub getValue {
my $host_file = 'test.csv';
my $hostname = $_[0];
chomp($hostname);
my %hash;
open my $fh, '<', $host_file or die "Cannot open: $!";
while (my $line = <$fh>) {
my #hostname_lines = split /,/, $line;
my $hostname = shift #hostname_lines;
$hash{$hostname} = \#hostname_lines;
}
close $fh;
my $host_value = $hash{$hostname};
return $host_value;
}
When I run this code, my return value would be ARRAY(0x6b72d0) but I expect a return value of 12121
You build a hash with values being arrayrefs inside the while loop.
So you need to retrieve an element in that arrayref
my %host;
...
my $host_value = $host{$name}->[0];
what can also be written as
my $host_value = $host{$name}[0];
where the original array had been shifted so the number is now the first element in arrayref.
This can also be done as
while (my $line = <$fh>) {
my ($name, #numbers) = split /\s*,\s*/, $line;
$host{$name} = \#numbers;
}
since the first element in split's return list goes into the first scalar, and the rest into the array.
The data shows one number for a host on each line. If that is always so, no need for an arrayref
my %host;
while (my $line = <$fh>) {
my ($name, $val) = split /\s*,\s*/, $line;
$host{$name} = $val;
}
where /\s*,\s*/ trims spaces as well, or even
my %host = map { split /\s*,\s*/ } <$fh>;
which has the weakness of not being able to easily check expected data format.

Separating CSV file into key and array

I am new to perl, and I am trying to separate a csv file (has 10 comma-separated items per line) into a key (first item) and an array (9 items) to put in a hash. Eventually, I want to use an if function to match another variable to the key in the hash and print out the elements in the array.
Here's the code I have, which doesn't work right.
use strict;
use warnings;
my %hash;
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my ($key, #value) = split (/,/, $line, 2);
%hash = (
$key => #value
);
}
foreach my $key (keys %hash)
{
print "The key is $key and the array is $hash{$key}\n";
}
Thank you for any help!
Don't use 2 as the third argument to split: it will split the line to only two elements, so there'll be just one #value.
Also, by doing %hash =, you're overwriting the hash in each iteration of the loop. Just add a new key/value pair:
$hash{$key} = \#value;
Note the \ sign: you can't store an array directly as a hash value, you have to store a reference to it. When printing the value, you have to dereference it back:
#! /usr/bin/perl
use warnings;
use strict;
my %hash;
while (<DATA>) {
my ($key, #values) = split /,/;
$hash{$key} = \#values;
}
for my $key (keys %hash) {
print "$key => #{ $hash{$key} }";
}
__DATA__
id0,1,2,a
id1,3,4,b
id2,5,6,c
If your CSV file contains quoted or escaped commas, you should use Text::CSV.
First of all hash can have only one unique key, so when you have lines like these in your CSV file:
key1,val11,val12,val13,val14,val15,val16,val17,val18,val19
key1,val21,val22,val23,val24,val25,val26,val27,val28,val29
after adding both key/value pairs with 'key1' key to the hash, you'll get just one pair saved in the hash, the one that were added to the hash later.
So to keep all records, the result you probably need array of hashes structure, where value of each hash is an array reference, like this:
#result = (
{ 'key1' => ['val11','val12','val13','val14','val15','val16','val17','val18','val19'] },
{ 'key1' => ['val21','val22','val23','val24','val25','val26','val27','val28','val29'] },
{ 'and' => ['so on'] },
);
In order to achieve that your code should become like this:
use strict;
use warnings;
my #AoH; # array of hashes containing data from CSV
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my #string_bits = split (/,/, $line);
my $key = $string_bits[0]; # first element - key
my $value = [ #string_bits[1 .. $#string_bits] ]; # rest get into arr ref
push #AoH, {$key => $value}; # array of hashes structure
}
foreach my $hash_ref (#AoH)
{
my $key = (keys %$hash_ref)[0]; # get hash key
my $value = join ', ', #{ $hash_ref->{$key} }; # join array into str
print "The key is '$key' and the array is '$value'\n";
}

Auto increment numeric key values in a perl hash?

I have a perl script in which I am reading files from a given directory, and then placing those files into an array. I then want to be able to move those array elements into a perl hash, with the array elements being the hash value, and automatically assigning numeric keys to each hash value.
Here's the code:
# Open the current users directory and get all the builds. If you can open the dir
# then die.
opendir(D, "$userBuildLocation") || die "Can't opedir $userBuildLocation: $!\n";
# Put build files into an array.
my #builds = readdir(D);
closedir(D);
print join("\n", #builds, "\n");
This print out:
test.dlp
test1.dlp
I want to take those value and insert them into a hash that looks just like this:
my %hash (
1 => test.dlp
2 => test1.dlp
);
I want the numbered keys to be auto incrementing based on how many files I may find in a given directory.
I'm just not sure how to get the auto-incrementing keys to be set to unique numeric values for each item in the hash.
I am not sure to understand the need, but this should do
my $i = 0;
my %hash = map { ++$i => $_ } #builds;
another way to do it
my $i = 0;
for( #builds ) {
$hash{++$i} = $_;
}
The most straightforward and boring way:
my %hash;
for (my $i=0; $i<#builds; ++$i) {
$hash{$i+1} = $builds[$i];
}
or if you prefer:
foreach my $i (0 .. $#builds) {
$hash{$i+1} = $builds[$i];
}
I like this approach:
#hash{1..#builds} = #builds;
Another:
my %hash = map { $_+1, $builds[$_] } 0..$#builds;
or:
my %hash = map { $_, $builds[$_-1] } 1..#builds;

Trouble converting array to hash

I have an array where elements of the array have values that are separated by tabs.
For example:
client_name \t owner \t date \t port_number.
I need to convert that into a hash so it can be dumped into a MySQL database.
Something like:
my %foo = ();
$foo{date} = "111208";
$foo{port} = "2222";
$foo{owner} = "ownername";
$foo{name} = "clientname";
The problem I have is that there are duplicate client names but they exist on different port numbers. If I convert it directly to a hash using client_name as a key it will delete duplicate client names. The MySQL table is indexed based on {name} and {port}.
Is there any way I can convert this into a hash without losing duplicate client names?
You would go through your file, build up the hash like you've done, then push a reference to that hash onto an array. Something like:
foreach my $line ( #lines ) {
# Make your %foo hash.
push #clients, \%foo;
}
Then afterwards, when you're inserting into your DB, you just iterate through the elements in #clients:
foreach my $client ( #clients ) {
$date = $client->{'date'};
...
}
Edit: If you want to turn this into a hash of hashes, then as you loop through the list of lines, you'd do something like:
foreach my $line ( #lines ) {
# Make your %foo hash.
$clients{$foo{'port'}} = \%foo;
}
Then you'll have a hash of hashes using the port number as the key.
Why not just store it in a list (array)?
my #records = ();
while (my $line = <INFILE>) {
chomp $line;
my #fields = split /\t/ $line;
push #records => { date => $fields[2],
name => $fields[0],
port => $fields[3],
owner => $fields[1] };
}
for my $record (#records) {
$insert_query->execute (%$record);
}
my #record_list;
while ( <$generic_input> ) {
my $foo = {};
#$foo{ qw<date port owner name> } = split /\t/;
push #record_list, \%foo;
}
As a "pipeline" you could do this:
use List::MoreUtils qw<pairwise>;
my #fields = qw<date port owner name>;
my #records
= map {; { pairwise { $a => $b } #fields, #{[ split /\t/ ]}}}
<$input>
;

How to create hash of array in Perl

I have a data like this
Group AT1G01040-TAIR-G
LOC_Os03g02970 69%
Group AT1G01050-TAIR-G
LOC_Os10g26600 85%
LOC_Os10g26633 35%
Group AT1G01090-TAIR-G
LOC_Os04g02900 74%
How can create the data structure that looks like this:
print Dumper \%big;
$VAR = { "Group AT1G01040-TAIR-G" => ['LOC_Os03g02970 69%'],
"Group AT1G01050-TAIR-G" => ['LOC_Os10g26600 85%','LOC_Os10g26633 35%'],
"Group AT1G01090-TAIR-G" => ['LOC_Os04g02900 74%']};
This is my attempt, but fail:
my %big;
while ( <> ) {
chomp;
my $line = $_;
my $head = "";
my #temp;
if ( $line =~ /^Group/ ) {
$head = $line;
$head =~ s/[\r\s]+//g;
#temp = ();
}
elsif ($line =~ /^\t/){
my $cont = $line;
$cont =~ s/[\t\r]+//g;
push #temp, $cont;
push #{$big{$head}},#temp;
};
}
Here's how I'd do it:
my %big;
my $currentGroup;
while (my $line = <> ) {
chomp $line;
if ( $line =~ /^Group/ ) {
$big{$line} = $currentGroup = [];
}
elsif ($line =~ s/^\t+//) {
push #$currentGroup, $line;
}
}
You should probably add some additional error checking to this, e.g. an else clause to warn about lines that don't match either regex. Also, check to see if $currentGroup is undef before pushing (in case the first line begins with a tab instead of "Group").
The biggest problem with your original code is that you're declaring and initializing $head and #temp inside the loop, which means they got reset on every line. Variables that need to persist across lines have to be declared outside the loop, as I've done with $currentGroup.
I'm not quite sure what you're intending to accomplish with the s/[\r\s]+//g; bit. \r is included in \s, so that means the same as s/\s+//g; (which would strip all whitespace), but your desired result hash includes whitespace in your keys. If you want to strip trailing whitespace, you need to include an anchor: s/\s+\z//.
Well, I don't want to give you an answer, so I'll just tell you to look at:
perlref
perlreftut
Well, there ya go :-).
Your pushing arrays to your hash item. You should just be pushing the values. (You don't need #temp at all.)
push #{$big{$head}}, $cont;
Also $head must be declared outside your loop, otherwise it looses its value after each iteration.

Resources