Is it possible to put elements of array to hash in perl? - arrays

example of file content:
>random sequence 1 consisting of 500 residues.
VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
>random sequence 2 consisting of 500 residues.
KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN
KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI
YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV
HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;
>random sequence 3 consisting of 500 residues.
AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK
I need an algorithm to build a hash $hash{$key} = $value; where lines starting with > are the values and following lines are the keys.
What I have tried:
open (DATA, "seq-at.txt") or die "blabla";
#data = <DATA>;
%result = ();
$k = 0;
$i = 0;
while($k != #data) {
$info = #data[$k]; #istrina pirma elementa
if(#data[$i] !=~ ">") {
$key .= #data[$i]; $i++;
} else {
$k = $i;
}
$result{$key} = $value;
}
but it doesn't work.

You don't have to previously use an array, you can directly build your hash:
use strict;
use warnings;
# ^- start always your code like this to see errors and what is ambiguous
# declare your variables using "my" to specify the scope
my $filename = 'seq-at.txt';
# use the 3 parameters open syntax to avoid to overwrite the file:
open my $fh, '<', $filename or die "unable to open '$filename' $!";
my %hash;
my $hkey = '';
my $hval = '';
while (<$fh>) {
chomp; # remove the newline \n (or \r\n)
if (/^>/) { # when the line start with ">"
# store the key/value in the hash if the key isn't empty
# (the key is empty when the first ">" is encountered)
$hash{$hkey} = $hval if ($hkey);
# store the line in $hval and clear $hkey
($hval, $hkey) = $_;
} elsif (/\S/) { # when the line isn't empty (or blank)
# append the line to the key
$hkey .= $_;
}
}
# store the last key/val in the hash if any
$hash{$hkey} = $hval if ($hkey);
# display the hash
foreach (keys %hash) {
print "key: $_\nvalue: $hash{$_}\n\n";
}

It is unclear what you want, the array seems to be the lines subsequent to the random sequence number... If the contenst of a file test.txt are:
Line 1:">"random sequence 1 consisting of 500 residues.
Line 2:VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV
Line 3:HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;
You could try something like:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my $contentFile = $ARGV[0];
my %testHash = ();
my $currentKey = "";
open(my $contentFH,"<",$contentFile);
while(my $contentLine = <$contentFH>){
chomp($contentLine);
next if($contentLine eq ''); # Empty lines.
if($contentLine =~ /^"\>"(.*)/){
$currentKey= $1;
}else{
push(#{$testHash{$currentKey}},$contentLine);
}
}
print Dumper(\%testHash);
Which results in a structure like this:
seb#amon:[~]$ perl test.pl test.txt
$VAR1 = {
'random sequence 3 consisting of 500 residues.' => [
'AYCSCWRIHNVVFQKDVVLGYWGHCWMSWGSMNQPFHRQPYNKYFCMAPDWCNIGTYAWK'
],
'random sequence 1 consisting of 500 residues.' => [
'VILVWRISEMNPTHEIYPEVSYEDRQPFRCFDEGINMQMGQKSCRNCLIFTRNAFAYGIV',
'HFLEWGILLTHIIHCCHQIQGGCDCTRHPVRFYPQHRNDDVDKPCQTKSPMQVRYGDDSD;'
],
'random sequence 2 consisting of 500 residues.' => [
'KAAATKKPWADTIPYLLCTFMQTSGLEWLHTDYNNFSSVVCVRYFEQFWVQCQDHVFVKN',
'KNWHQVLWEEYAVIDSMNFAWPPLYQSVSSNLDSTERMMWWWVYYQFEDNIQIRMEWCNI',
'YSGFLSREKLELTHNKCEVCVDKFVRLVFKQTKWVRTMNNRRRVRFRGIYQQTAIQEYHV',
'HQKIIRYPCHVMQFHDPSAPCDMTRQGKRMNFCFIIFLYTLYEVKYWMHFLTYLNCLEHR;'
]
};
You would be basically using each hash "value" as an array structure, the #{$variable} does the magic.

Related

ARRAY(0x7ff4bbb0c7b8) error: perl hash of arrays

Although my code runs without throwing a fatal error, the output is clearly erroneous. I first create a hash of arrays. Then I search sequences in a file against the keys in the hash. If the sequence exists as a key in the hash, I print the key and the associated values. This should be simple enough and I am creating the hash of arrays correctly. However, when I print the associated values I get "ARRAY(0x7ff4bbb0c7b8)" in its place.
The file "INFILE" is tab delimitated and looks like this, for example:
AAAAA AAAAA
BBBBB BBBBB BBBBB
Here is my code:
use strict;
use warnings;
open(INFILE, '<', '/path/to/file') or die $!;
my $count = 0;
my %hash = (
AAAAA => [ "QWERT", "YUIOP" ],
BBBBB => [ "ASDFG", "HJKL", "ZXCVB" ],
);
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
my $number = grep exists $hash{$_}, #elements;
open my $out, '>', "/path/out/Cluster__Number$._$number.txt" or die $!;
foreach my $sequence(#elements){
if (exists ($hash{$sequence})){
print $out ">$sequence\n$hash{$sequence}\n";
}
else
{
$count++;
print "Key Doesn't Exist ", $count, "\n";
}
}
}
Current output looks like:
>AAAAA
ARRAY(0x7fc52a805ce8)
>AAAAA
ARRAY(0x7fc52a805ce8)
Expected output will look like:
>AAAAA
QWERT
>AAAAA
YUIOP
Thank you very much for your help.
In this line:
print $out ">$sequence\n$hash{$sequence}\n";
...$hash{$sequence} is a reference to an array. You have to dereference the referenced array before printing it. Here's an example of printing $sequence, then printing the elements of the $hash{$sequence} array on the following line, with the elements separated by a comma:
print $out ">$sequence\n";
print $out join ', ', #{ $hash{$sequence} };
The key here is work with the arrayref held by the hash rather than just trying to print it. No matter what, you are going to want to remove the first item from the array, you can do this with the shift function. You can then either push the item onto the end of the array, or delete the key from the hash when there are no more items depending on what you want to happen when all keys have been used once. You could also choose a random element from the array with the rand function like this:
my $out_seq = $hash{$sequence}[rand $#{ $hash{$sequence} }];
If you wanted the items to run out in the random case, you would need to remove the item from the array. The best way to do that is probably with splice (the generic form of shift, unshift, pop, and push):
my $out_seq = splice #{ $hash{$sequence} }, rand #{ $hash{$sequence} }, 1;
delete $hash{$sequence} unless #{ $hash{$sequence} };
Here is my version of your program:
#!/usr/bin/perl
use strict;
use warnings;
use strict;
use warnings;
# open my $in, '<', '/path/to/file') or die $!;
my $in = \*DATA; #use internal data file instead for testing
my $count = 0;
my %hash = (
AAAAA => [ "QWERT", "YUIOP" ],
BBBBB => [ "ASDFG", "HJKL", "ZXCVB" ],
);
while (<$in>) {
chomp;
my $hash;
my #elements = split "\t";
my $number = grep exists $hash{$_}, #elements;
#open my $out, '>', "/path/out/Cluster__Number$._$number.txt" or die $!;
my $out = \*STDOUT; # likewise use STDOUT for testing
for my $sequence (#elements) {
if (exists $hash{$sequence}) {
my $out_seq = shift #{ $hash{$sequence} };
# if you want to repeat
push #{ $hash{$sequence} }, $out_seq;
# if you want to remove $sequence when they run out
# delete $hash{$sequence} unless #{ $hash{$sequence} };
print $out ">$sequence\n$out_seq\n";
} else {
warn "Key [$sequence] Doesn't Exist ", ++$count, "\n";
}
}
}
__DATA__
AAAAA AAAAA
CCCCC
BBBBB BBBBB BBBBB

Perl : matching the contents of a file with the contents of an array

I have an array #arr1 where each element is of the form #define A B.
I have another file, f1 with contents:
#define,x,y
#define,p,q
and so on. I need to check if the second value of every line (y, q etc) matches the first value in any element of the array. Example: say the array has an element #define abc 123 and the file has a line #define,hij,abc.
When such a match occurs, I need to add the line #define hij 123 to the array.
while(<$fhDef>) #Reading the file
{
chomp;
$_ =~ tr/\r//d;
if(/#define,(\w+),(\w+)/)
{
my $newLabel = $1;
my $oldLabel = $2;
push #oldLabels, $oldLabel;
push #newLabels, $newLabel;
}
}
foreach my $x(#tempX) #Reading the array
{
chomp $x;
if($x =~ /#define\h{1}\w+\h*0x(\w+)\h*/)
{
my $addr = $1;
unless(grep { $x =~ /$_/ } #oldLabels)
{
next;
}
my $index = grep { $oldLabels[$_] eq $_ } 0..$#oldLabels;
my $new1 = $newLabels[$index];
my $headerLabel1 = $headerLabel."X_".$new1;
chomp $headerLabel1;
my $headerLine = "#define ".$headerLabel1."0x".$addr;
push #tempX, $headerLine;
}
}
This just hangs. No doubt I'm missing something right in front of me, but what??
The canonical way is to use a hash. Hash the array, using the first argument as the key. Then walk the file and check for existence of the key in the hash. I used a HoA (hash of arrays) to handle multiple values for each key (see the last two lines).
#! /usr/bin/perl
use warnings;
use strict;
my #arr1 = ( '#define y x',
'#define abc 123',
);
my %hash;
for (#arr1) {
my ($arg1, $arg2) = (split ' ')[1, 2];
push #{ $hash{$arg1} }, $arg2;
}
while (<DATA>) {
chomp;
my ($arg1, $arg2) = (split /,/)[1, 2];
if ($hash{$arg2}) {
print "#define $arg1 $_\n" for #{ $hash{$arg2} };
}
}
__DATA__
#define,x,y
#define,p,q
#define,hij,abc
#define,klm,abc
As the other answer said, it's better to use a hash. Also, keep in mind that you're doing a
foreach my $x(#tempX)
but you're also doing a
push #tempX, $headerLine;
which means that you're modifying the array on which you're iterating. This is not just bad practice, this also means that you're most likely going to have an infinite loop because of it.

Separating CSV file into key and array

I am new to perl, and I am trying to separate a csv file (has 10 comma-separated items per line) into a key (first item) and an array (9 items) to put in a hash. Eventually, I want to use an if function to match another variable to the key in the hash and print out the elements in the array.
Here's the code I have, which doesn't work right.
use strict;
use warnings;
my %hash;
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my ($key, #value) = split (/,/, $line, 2);
%hash = (
$key => #value
);
}
foreach my $key (keys %hash)
{
print "The key is $key and the array is $hash{$key}\n";
}
Thank you for any help!
Don't use 2 as the third argument to split: it will split the line to only two elements, so there'll be just one #value.
Also, by doing %hash =, you're overwriting the hash in each iteration of the loop. Just add a new key/value pair:
$hash{$key} = \#value;
Note the \ sign: you can't store an array directly as a hash value, you have to store a reference to it. When printing the value, you have to dereference it back:
#! /usr/bin/perl
use warnings;
use strict;
my %hash;
while (<DATA>) {
my ($key, #values) = split /,/;
$hash{$key} = \#values;
}
for my $key (keys %hash) {
print "$key => #{ $hash{$key} }";
}
__DATA__
id0,1,2,a
id1,3,4,b
id2,5,6,c
If your CSV file contains quoted or escaped commas, you should use Text::CSV.
First of all hash can have only one unique key, so when you have lines like these in your CSV file:
key1,val11,val12,val13,val14,val15,val16,val17,val18,val19
key1,val21,val22,val23,val24,val25,val26,val27,val28,val29
after adding both key/value pairs with 'key1' key to the hash, you'll get just one pair saved in the hash, the one that were added to the hash later.
So to keep all records, the result you probably need array of hashes structure, where value of each hash is an array reference, like this:
#result = (
{ 'key1' => ['val11','val12','val13','val14','val15','val16','val17','val18','val19'] },
{ 'key1' => ['val21','val22','val23','val24','val25','val26','val27','val28','val29'] },
{ 'and' => ['so on'] },
);
In order to achieve that your code should become like this:
use strict;
use warnings;
my #AoH; # array of hashes containing data from CSV
my $in2 = "metadata1.csv";
open IN2, "<$in2" or die "Cannot open the file: $!";
while (my $line = <IN2>) {
my #string_bits = split (/,/, $line);
my $key = $string_bits[0]; # first element - key
my $value = [ #string_bits[1 .. $#string_bits] ]; # rest get into arr ref
push #AoH, {$key => $value}; # array of hashes structure
}
foreach my $hash_ref (#AoH)
{
my $key = (keys %$hash_ref)[0]; # get hash key
my $value = join ', ', #{ $hash_ref->{$key} }; # join array into str
print "The key is '$key' and the array is '$value'\n";
}

I need to create an out files out of my hash keys and store a file list to the files based on my keys

I have two files the first one has a number range and a version name, the number range is retrieved from the second file which is consist of a list number. From the second file I am looking for the numbers start in position 11 for 9 char then compare it with my first file "the range file" then print to the screen the name of the version and how many matches.
My first file looks like this
imb,folded ,655575645,827544086
imb,selfmail ,827549192,827572977
My second file looks like this
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527
0026110795182757370277063564556
My output now looks like this
folded IMB Count: 15
No Matched IMB Count: 1
selfmail IMB Count: 14
I need to create files with a name based on the version name in my first array, then to print to each files the original value for what it match. For instance folded has 15 match I need to print the original number from the file list to a file with a name of folded.txt
my code is
#!/usr/bin/perl
use warnings
use strict
use feature qw{ say };
sub trimspaces {
my #argsarray = #_;
$argsarray[0] =~ s/^\s+//;
$argsarray[0] =~ s/\s+$//;
return $argsarray[0];
}
open(INPUT , "< D:\\Home\\emahou\\imbfilelist.txt") or die $!;
open(INPUT2 , "< D:\\Home\\emahou\\imbrange.txt") or die $!;
my $n;
my #fh;
my $value;
my #ranges;
my $isMatch;
my $printed;
my $fVersion;
my %versionHash=();
while (<INPUT2>) {
chomp;
my ($version, $from, $to) = (split /,/)[ 1, 2, 3 ];
push #ranges, [ $from, $to, trimspaces($version)];
if (!exists $versionHash{trimspaces($version)})
{
$versionHash{trimspaces($version)}=0;
}
}
$versionHash{"No Matched"}=0;
close INPUT2;
while (<INPUT>) {
$isMatch=0;
$n = substr($_,12-1,9);
for my $r (#ranges) {
if ( $n >= $r->[0] && $n <= $r->[1]) {
$fVersion=$r->[2];
if (exists $versionHash{$fVersion}) {
$versionHash{$fVersion}++;
}
$isMatch=1;
last;
}
}
if (!$isMatch) {
$versionHash{"No Matched"}++;
}
}
foreach my $key (keys %versionHash) {
print STDOUT "$key IMB Count: " . $versionHash{$key} . "\n";
}
close INPUT;
This seems to do as you ask
It works by building a hash %filelist with keys from the second column of imbfilelist.txt and values from, to, fh (the output file handle) and count (the number of records that matched this range
Then the imbrange.txt is read a line at a time, the nine-digit code extracted, and compared with the from and to values of each element of the %filelist hash. If a match is found then the line is printed to the corresponding file handle and the counter is incremented. If the code from this line doesn't match any of the ranges then $none_matched is incremented for output in the summary
use strict;
use warnings;
use 5.010;
use autodie;
chdir 'D:\Home\emahou';
# Build a hash of `version` strings with their `from` and `to` values
open my $fh, '<', 'imbfilelist.txt';
my %filelist;
while ( <$fh> ) {
chomp;
my ($version, $from, $to) = (split /\s*,\s*/)[1,2,3];
$filelist{$version} = { from => $from, to => $to };
}
# Open an output file for each item and set the count to zero
while ( my ($version, $info) = each %filelist ) {
open $info->{fh}, '>', "$version.txt";
$info->{count} = 0;
}
# Filter the data in the range file, printing to the
# appropriate output file and keeping count
open $fh, '<', 'imbrange.txt';
my $none_matched = 0;
while ( my $line = <$fh> ) {
next unless $line =~ /\S/;
chomp $line;
my $code = substr $line, 11, 9;
my $matched = 0;
while ( my ($version, $info) = each %filelist ) {
next unless $code >= $info->{from} and $code <= $info->{to};
print { $info->{fh} } $line, "\n";
++$info->{count};
++$matched;
}
++$none_matched unless $matched;
}
close $_->{fh} for values %filelist;
# Print the summary
while ( my ($version, $info) = each %filelist ) {
print "$version IMB Count: $info->{count}\n"
}
print "None matched IMB Count: $none_matched\n"
output
selfmail IMB Count: 14
folded IMB Count: 15
None matched IMB Count: 1
folded.txt
0026110795165557564528452972062
0026110795165557648628452974959
0026110795182749420290503162401
0026110795182749566690703875348
0026110795182750564290503365856
0026110795182751155490713282618
0026110795182751819190503415474
0026110795182752054790503331977
0026110795182752888194578410931
0026110795182753115893308242647
0026110795182753522398248322033
0026110795182753601890723246006
0026110795182754156995403760702
0026110795182754174597213102232
0026110795182754408698248770395
selfmail.txt
0026110795182754919290713221614
0026110795182755128698248922635
0026110795182755566790713334451
0026110795182755669490713213633
0026110795182755806390507009696
0026110795182756204890713212248
0026110795182756217690713273839
0026110795182756259998248961157
0026110795182756309595403769515
0026110795182756708894578164887
0026110795182756829090713282238
0026110795182757082791367220156
0026110795182757130090713274108
0026110795182757297798248934527

Read CSV file and save in 2 d array

I am trying to read a huge CSV file in 2 D array, there must be a better way to split the line and save it in the 2 D array in one step :s
Cheers
my $j = 0;
while (<IN>)
{
chomp ;
my #cols=();
#cols = split(/,/);
shift(#cols) ; #to remove the first number which is a line header
for(my $i=0; $i<11; $i++)
{
$array[$i][$j] = $cols[$i];
}
$j++;
}
CSV is not trivial. Don't parse it yourself. Use a module like Text::CSV, which will do it correctly and fast.
use strict;
use warnings;
use Text::CSV;
my #data; # 2D array for CSV data
my $file = 'something.csv';
my $csv = Text::CSV->new;
open my $fh, '<', $file or die "Could not open $file: $!";
while( my $row = $csv->getline( $fh ) ) {
shift #$row; # throw away first value
push #data, $row;
}
That will get all your rows nicely in #data, without worrying about parsing CSV yourself.
If you ever find yourself reaching for the C-style for loop, then there's a good chance that your program design can be improved.
while (<IN>) {
chomp;
my #cols = split(/,/);
shift(#cols); #to remove the first number which is a line header
push #array, \#cols;
}
This assumes that you have a CSV file that can be processed with a simple split (i.e. the records contain no embedded commas).
Aside: You can simplify your code with:
my #cols = split /,/;
Your assignment to $array[$col][$row] uses an unusual subscript order; it complicates life.
With your column/row assignment order in the array, I don't think there's a simpler way to do it.
Alternative:
If you were to reverse the order of the subscripts in the array ($array[$row][$col]), you could think about using:
use strict;
use warnings;
my #array;
for (my $j = 0; <>; $j++) # For testing I used <> instead of <IN>
{
chomp;
$array[$j] = [ split /,/ ];
shift #{$array[$j]}; # Remove the line label
}
for (my $i = 0; $i < scalar(#array); $i++)
{
for (my $j = 0; $j < scalar(#{$array[$i]}); $j++)
{
print "array[$i,$j] = $array[$i][$j]\n";
}
}
Sample Data
label1,1,2,3
label2,3,2,1
label3,2,3,1
Sample Output
array[0,0] = 1
array[0,1] = 2
array[0,2] = 3
array[1,0] = 3
array[1,1] = 2
array[1,2] = 1
array[2,0] = 2
array[2,1] = 3
array[2,2] = 1

Resources