creating hash from array in perl - arrays

I have an array that I want to convert into a hash table. Basically, I want #array[0] to be the keys of the hash, and #array[1] to be the values of the hash. Is there an easy way to do this in perl? The code I have so far is as follows:
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
unless( open(INFILE, "<", 'scratch/Drosophila/fb_synonym_fb_2014_05.tsv')) {
die "Cannot open file for reading: ", $!;
while(<INFILE>) {
my #values = split();
#convert values[0] to keys, values[1] to values
}
the file is available for download here

#array[0] (an array slice, used to return multiple elements) is a bad way of writing $array[0] (an array lookup, used to return a single element). use warnings; would have told you this.
To set a hash element, one uses
$hash{$key} = $val;
So the code becomes
my %hash;
while (<>) {
chomp;
my #fields = split /\t/;
$hash{ $fields[0] } = $fields[1];
}
Better yet,
my %hash;
while (<>) {
chomp;
my ($key, $val) = split /\t/;
$hash{$key} = $val;
}
The name of the file implies the fields are tab-separated, not whitespace separated, so I switched
split ' '
to
split /\t/
This required the addition of chomp.

Related

Separating CSV file into key and array

I am new to perl, and I am trying to separate a csv file (has 10 comma-separated items per line) into a key (first item) and an array (9 items) to put in a hash. Eventually, I want to use an if function to match another variable to the key in the hash and print out the elements in the array.
Here's the code I have, which doesn't work right.
use strict;
use warnings;
my %hash;
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my ($key, #value) = split (/,/, $line, 2);
%hash = (
$key => #value
);
}
foreach my $key (keys %hash)
{
print "The key is $key and the array is $hash{$key}\n";
}
Thank you for any help!
Don't use 2 as the third argument to split: it will split the line to only two elements, so there'll be just one #value.
Also, by doing %hash =, you're overwriting the hash in each iteration of the loop. Just add a new key/value pair:
$hash{$key} = \#value;
Note the \ sign: you can't store an array directly as a hash value, you have to store a reference to it. When printing the value, you have to dereference it back:
#! /usr/bin/perl
use warnings;
use strict;
my %hash;
while (<DATA>) {
my ($key, #values) = split /,/;
$hash{$key} = \#values;
}
for my $key (keys %hash) {
print "$key => #{ $hash{$key} }";
}
__DATA__
id0,1,2,a
id1,3,4,b
id2,5,6,c
If your CSV file contains quoted or escaped commas, you should use Text::CSV.
First of all hash can have only one unique key, so when you have lines like these in your CSV file:
key1,val11,val12,val13,val14,val15,val16,val17,val18,val19
key1,val21,val22,val23,val24,val25,val26,val27,val28,val29
after adding both key/value pairs with 'key1' key to the hash, you'll get just one pair saved in the hash, the one that were added to the hash later.
So to keep all records, the result you probably need array of hashes structure, where value of each hash is an array reference, like this:
#result = (
{ 'key1' => ['val11','val12','val13','val14','val15','val16','val17','val18','val19'] },
{ 'key1' => ['val21','val22','val23','val24','val25','val26','val27','val28','val29'] },
{ 'and' => ['so on'] },
);
In order to achieve that your code should become like this:
use strict;
use warnings;
my #AoH; # array of hashes containing data from CSV
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my #string_bits = split (/,/, $line);
my $key = $string_bits[0]; # first element - key
my $value = [ #string_bits[1 .. $#string_bits] ]; # rest get into arr ref
push #AoH, {$key => $value}; # array of hashes structure
}
foreach my $hash_ref (#AoH)
{
my $key = (keys %$hash_ref)[0]; # get hash key
my $value = join ', ', #{ $hash_ref->{$key} }; # join array into str
print "The key is '$key' and the array is '$value'\n";
}

How to create multiple arrays at once in perl

I'm trying to create 23 arrays without typing out #array1, #array2, and so on, and load them each with the variables from the array #r if the $chrid matches the array number (if $chrid=1 it should be placed in #array1). How can I achieve this?
Here is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
my #chr;
my $input;
open ($input, "$ARGV[0]") || die;
while (<$input>) {
my #r = split(/\t/);
my $snps = $r[0];
my $pval = $r[1];
my $pmid = $r[2];
my $chrpos = $r[3];
my $chrid = $r[4];
for ($chrid) {
push (#chr, $chrid);
}
}
close $input;
You can use an array of arrays, where each subarray is stored at a sequentially increasing index in your array of arrays. Here is what that could look like, but it is still unclear to me what you want data you want to store:
use warnings;
use strict;
my #chr;
open my $input_fh, '<', $ARGV[0]
or die "Unable to open $ARGV[0] for reading: $!";
while (< $input_fh> ) {
# you can unpack your data in a single statement
my ($snps, $pval, $pmid, $chrpos, $chrid) = split /\t/;
# unclear what you actually want to store
push #{ $chr[$chrid] }, ( $snps, $pval, $pmid, $chrpos, $chrid );
}
close $input_fh;

Compare two hashes in perl and list which records are extra?

I have two text files that contain user records. I have to compare these two files and figure out which users are missing from File1. And delete these Orphans from file2.
#!/usr/local/bin/perl -w
use strict;
use warnings;
use autodie;
use Text::Diff;
use List::Compare;
use Data::Dumper;
my $Users1 = "Users1.txt";
my $Users2 ="Users2.txt";
my %hash1;
my %hash2;
my %new_hash;
my #sorted_1;
my #sorted_2;
my #list_keys1;
my #list_keys2;
open(my $fh1, '<:encoding(UTF-8)', $Users1) or die "Colud not open the file!";
while(my $record1 = <$fh1>)
{
chomp $record1;
my #list1 = split( '/', $record1);
foreach my $item(#list1)
{
$new_hash{$list1[1]} = $list1[0];
$hash1{$list1[1]} = $list1[0];
}
while ( my ($key, $value) = each(%hash1) ) {
push (#list_keys1, $key);
#sorted_1 = sort #list_keys1;
}
}
print "\t\tHash values for USERS1:\n";
print Dumper \%hash1;
open(my $fh2, '<:encoding(UTF-8)', $Users2) or die "Colud not open the file!";
while(my $record2 = <$fh2>)
{
chomp $record2;
my #list2 = split( '/', $record2);
foreach my $item(#list2)
{
$hash2{$list2[1]} = $list2[0];
}
while ( my ($key, $value) = each(%hash2) )
{
push (#list_keys2, $key);
#sorted_2 = sort #list_keys2;
}
}
print "\n\n\t\tHash values for Users2:\n";
print Dumper \%hash2;
#hash1{#list_keys1} = 1;
#hash2{#list_keys2} = 1;
foreach(keys %hash2)
{
print "\nThis user does not exist(to be deleted): $_\n" unless exists $hash1{$_};
}
foreach (keys %hash1)
{
print "\nNew User (to be added):$_\n" unless exists $hash2{$_};
}
close ($fh1);
close ($fh2);
Questions:
I am not able to sort the user ID (String) alphabetically(here, USER IDs are random strings of length 7). Is there any limitations when it comes to sorting array/hashes in Perl?
I am not able to compare two hashes and get the differences. What would be the most efficient way to do that?
Are there any additional libraries that I need to install in order to handle this part of code?
Sample records from file:
File1:
ASIA/ASEDF46
INDIA/PSDfT5V
CHINA/FSDfT5V
INDIA/AA44TYB
USA/BBRTT67
File 2:
INDIA/PSDfT5V
CHINA/FSDfT5V
INDIA/AA44TYB
USA/BBRTT67
UK/ZK9EELO
use strict;
use warnings;
use autodie;
open my $in, '<', 'in.txt';
open my $in2, '<', 'in_2.txt';
my (%data1, %data2);
while(<$in>){
chomp;
my #split = split/\//;
$data1{$split[0]} = $split[1];
}
while(<$in2>){
chomp;
my #split = split/\//;
$data2{$split[0]} = $split[1];
}
foreach(sort keys %data1){
print "User: $_ Value: $data1{$_}\n" if $data2{$_};
}

Perl Hashes of Arrays and Some issues

I currently have a csv file that looks like this:
a,b
a,d
a,f
c,h
c,d
So I saved these into a hash such that the key "a" is an array with "b,d,f" and the key "c" is an array with "h,d"... this is what I used for that:
while(<$fh>)
{
chomp;
my #row = split /,/;
my $cat = shift #row;
$category = $cat if (!($cat eq $category)) ;
push #{$hash{$category}}, #row;
}
close($fh);
Not sure about the efficiency but it seems to work when I do a Data Dump...
Now, the issue I'm having is this; I want to create a new file for each key, and in each of those files I want to print every element in the key, as such:
file "a" would look like this:
b
d
f
<end of file>
Any ideas? Everything I've tried isn't working, I'm not too familiar / experienced with hashes...
Thanks in advance :)
The output process is very simple using the each iterator, which provides the key and value pair for the next hash element in a single call
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile.csv';
my %data;
while (<$fh>) {
chomp;
my ($cat, $val) = split /,/;
push #{ $data{$cat} }, $val;
}
while (my ($cat, $values) = each %data) {
open my $out_fh, '>', $cat;
print $out_fh "$_\n" for #$values;
}
#!/usr/bin/perl
use strict;
use warnings;
my %foos_by_cat;
{
open(my $fh_in, '<', ...) or die $!;
while (<$fh_in>) {
chomp;
my ($cat, $foo) = split /,/;
push #{ $foos_by_cat{$cat} }, $foo;
}
}
for my $cat (keys %foos_by_cat) {
open(my $fh_out, '>', $cat) or die $!;
for my $foo (#{ $foos_by_cat{$cat} }) {
print($fh_out "$foo\n");
}
}
I wrote the inner loop as I did to show the symmetry between reading and writing, but it can also be written as follows:
print($fh_out "$_\n") for #{ $foos_by_cat{$cat} };

Check words and synonyms

I have an array with some words, and another array with words and synonyms. I'd like to create a third array when I find a matchin word between first and second array. I tried with grep but I'm not able to write the code in a proper way in order to get what I want.
The problem is that elements in array 1 can be found in array 2 at the beginning but also at the end or in the middle.
Maybe it's easier with an exemple:
#array1 = qw(chose, abstraction);
#array2 = (
"inspirer respirer",
"incapable",
"abstraction",
"abaxial",
"cause,chose,objet",
"ventral",
"chose,objet"
);
The result it should be
#array3 = ("abstraction", "cause,chose,objet", "chose,objet");
Is it right to use "grep"?
I'm not able to write a right syntax to solve the problem..
Thank you
You can construct a regular expression from the array1, then filter the array2 using it:
#!/usr/bin/perl
use warnings;
use strict;
my #array1 = qw(chose, abstraction);
my #array2 = (
"inspirer respirer",
"incapable",
"abstraction",
"abaxial",
"cause,chose,objet",
"ventral",
"chose,objet"
);
my $regex = join '|', map quotemeta $_, #array1; # quotemeta needed for special characters.
$regex = qr/$regex/;
my #array3 = grep /$regex/, #array2;
print "$_\n" for #array3;
I know you have an answer but here is a fun way I thought of.
So, I guess it is like an inverted index.
You take each set of synonyms and make them into an array. Then take each element of that array and put it into a hash as the keys with the value being a reference to the array.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my #array1 = qw(chose abstraction);
my #array2 = ("inspirer respirer",
"incapable",
"abstraction",
"abaxial",
"cause,chose,objet",
"ventral",
"chose,objet"
);
my #array;
push #array, map { /,|\s/ ? [split(/,|\s/, $_)]:[$_] } #array2;
my %construct;
while(my $array_ref = shift(#array)){
for(#{ $array_ref }){
push #{ $construct{$_} }, $array_ref;
}
}
my #array3 = map { s/,//; (#{ $construct{$_} }) } #array1;
print join(', ', #{ $_ }), "\n" for (#array3);
EDIT:
Missed apart of the answer before, this one should be complete.

Resources