How to count the number of keys that exist in a hash? - arrays

I am working with an input file that contains tab delimitated sequences. Groups of sequences are separated by line breaks. The file looks like:
TAGC TAGC TAGC HELP
TAGC TAGC TAGC
TAGC HELP
TAGC
Here is the code I have:
use strict;
use warnings;
open(INFILE, "<", "/path/to/infile.txt") or die $!;
my %hash = (
TAGC => 'THIS_EXISTS',
GCTA => 'THIS_DOESNT_EXIST',
);
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
open my $out, '>', "/path/to/outfile.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
else
}
$count++;
print "Doesn't exist ", $count, "\n";
}
}
}
How can I tell how many sequences exist before I print? I need to put that information into the name of the output file.
Ideally, I would have a variable that I could include in the name of the file. Unfortunately, I can't just take the scalar of #elements because there are some sequences that won't get printed out. When I try to push the keys that exist into an array and then print the scalar of that array, I still don't get the results I need. Here is what I tried (all variables that need to be global are):
open my $out, '>', "/path/to/file.$number.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
push(#Array, $hash{$sequence}, "\n");
my $number = #Array;
print $out ">$sequence\n$hash{$sequence}\n";
#....
Thanks for the help. Really appreciate it.

my $sequences = grep exists $hash{$_}, #elements;
open my $out, '>', "/path/to/outfile_containing_$sequences.txt" or die $!;
In list context, grep filters a list by a criterion; in scalar context, it returns a count of elements that met the criterion.

The easiest way would be to keep track of how many keys you are printing in a variable and once your loop finish, just rename the file with the number you calculated. Perl comes with a built-in function to do this. The code would be something like this:
use strict;
use warnings;
open(INFILE, "<", "/path/to/infile.txt") or die $!;
my %hash = (
TAGC => 'THIS_EXISTS',
GCTA => 'THIS_DOESNT_EXIST',
);
my $ammt;
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
open my $out, '>', "/path/to/outfile.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
$ammt++;
}
else
}
print "Doesn't exist ", $count, "\n";
}
}
}
rename "/path/to/outfile.txt", "/path/to/outfile${ammt}.txt" or die $!;
I removed the $count variable, since it's not declared in your code (strict would complain about that). Here's the official doc for rename. Since it returns True or False, you can check that it was successful or not.
By the way, be aware that:
push(#Array, $hash{$sequence}, "\n");
is storing two items ($hash{$sequence} and \n), so that count would be twice as it should be.

Related

Convert string into array perl

I have a script which takes headers of a multi-fasta file and pushes them into an array. Then I want to loop through this array to find a specific pattern and perform some commands.
open(FH, '<', $ref_seq) or die $!;
while(<FH>){
$line = $_;
chomp $line;
if(m/^>([^\s]+)/){
$ref_header = $1;
print "$ref_header\n";
chomp $header;
if($1 eq $header){
$ref_header = $header;
#print "header is $ref_header\n";
}
}
}
This code prints headers like
chr1
chr2
chr3
How can I push these headers into an array?
I tried following code, but it splits individual letters, instead of $header_array[0] being chr1
#header_array = split(/\n*/, $ref_header);
print ("Here's the first element $header_array[0]");
Any help will be appreciated.
Shorten the code as shown below, removing some extra statements, and use push. You can combine push and the pattern match:
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
my $in_file = shift;
my #headers;
open my $in_fh, '<', $in_file or croak "cannot open $in_file: $!";
while ( <$in_fh> ) {
push #headers, />(\S+)/;
}
close $in_fh or croak "cannot close $in_file: $!";
print "#headers";
# Now, loop through headers and select the ones you need, for example:
for my $header ( #headers ) {
if ( $header =~ /foo/ ) {
# do something
}
}
A few suggestion on fixing your original code are below:
# Always use strict and use warnings.
# Remove extra parens and make the error message more informative:
open(FH, '<', $ref_seq) or die $!;
while(<FH>){
$line = $_;
chomp $line;
# [^\s] is simply \S:
if(m/^>([^\s]+)/){
$ref_header = $1;
print "$ref_header\n";
# where is $header coming from?
chomp $header;
# if the condition is satisfied, this assignment does not make sense:
# $ref_header is already the same as $header:
if($1 eq $header){
$ref_header = $header;
#print "header is $ref_header\n";
}
}
}
You can use push:
push #header_array, $ref_header;

How to get the data of each line from a file?

Here, I want to print the data in each line as 3 separate values with ":" as separator. The file BatmanFile.txt has the following details:
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
And the output I expected was:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
The output after executing the script was:
Bruce
Batman
bat#bat.com
Bruce
Batman
bat#bat.com
Please explain me what I am missing here:
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
push(#resultarray, #linearray);
print join("\n",$resultarray[0]),"\n";
print join("\n",$resultarray[1]),"\n";
print join("\n",$resultarray[2]),"\n";
}
close $info;
You are looping through file line by line. You have stored all lines (after splitting) in an array. Once the loop finishes you have all data in resultarray array, just print whole array after the loop (instead of printing just first 3 indexes which are you doing at the moment).
#!/usr/bin/perl
use strict;
use warnings;
my #resultarray;
while( my $line = <DATA>){
chomp $line;
my #linearray = split(":", $line);
push #resultarray, #linearray;
}
print "$_\n" foreach #resultarray;
__DATA__
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
Demo
You can avoid all variables and do something like below
while(<DATA>){
chomp;
print "$_\n" foreach split(":");
}
One liner:
perl -e 'while(<>){chomp; push #result, split(":",$_);} print "$_\n" foreach #result' testdata.txt
When you do:
push(#resultarray, #linearray);
you're pushing #linearray into #resultarray at the end, so index 0 through 2 is still the items from the first time you pushed #linearray.
To overwrite #resultarray with the values from the second iteration, do:
#resultarray = #linearray;
instead.
Alternatively, use unshift to place #linearray at the start of #resultarray, as suggested by Sobrique:
unshift #resultarray, #linearray;
So, you just want to transliterate : to \n?
$ perl -pe 'tr/:/\n/' data.txt
Output:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
#push(#resultarray, #linearray);
print join("\n",$linearray[0]),"\n";
print join("\n",$linearray[1]),"\n";
print join("\n",$linearray[2]),"\n";
}
close $info;

Perl Hashes of Arrays and Some issues

I currently have a csv file that looks like this:
a,b
a,d
a,f
c,h
c,d
So I saved these into a hash such that the key "a" is an array with "b,d,f" and the key "c" is an array with "h,d"... this is what I used for that:
while(<$fh>)
{
chomp;
my #row = split /,/;
my $cat = shift #row;
$category = $cat if (!($cat eq $category)) ;
push #{$hash{$category}}, #row;
}
close($fh);
Not sure about the efficiency but it seems to work when I do a Data Dump...
Now, the issue I'm having is this; I want to create a new file for each key, and in each of those files I want to print every element in the key, as such:
file "a" would look like this:
b
d
f
<end of file>
Any ideas? Everything I've tried isn't working, I'm not too familiar / experienced with hashes...
Thanks in advance :)
The output process is very simple using the each iterator, which provides the key and value pair for the next hash element in a single call
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile.csv';
my %data;
while (<$fh>) {
chomp;
my ($cat, $val) = split /,/;
push #{ $data{$cat} }, $val;
}
while (my ($cat, $values) = each %data) {
open my $out_fh, '>', $cat;
print $out_fh "$_\n" for #$values;
}
#!/usr/bin/perl
use strict;
use warnings;
my %foos_by_cat;
{
open(my $fh_in, '<', ...) or die $!;
while (<$fh_in>) {
chomp;
my ($cat, $foo) = split /,/;
push #{ $foos_by_cat{$cat} }, $foo;
}
}
for my $cat (keys %foos_by_cat) {
open(my $fh_out, '>', $cat) or die $!;
for my $foo (#{ $foos_by_cat{$cat} }) {
print($fh_out "$foo\n");
}
}
I wrote the inner loop as I did to show the symmetry between reading and writing, but it can also be written as follows:
print($fh_out "$_\n") for #{ $foos_by_cat{$cat} };

print specific word staring with in text and count

I like to find word start with sid=word and sid=text and print and count it the same word.
sid=word 2
sid=text 5
I have try make some script
use warnings;
use strict;
my $input = 'input.txt';
my $output = 'output.txt';
open (FILE, "<", $input) or die "Can not open $input $!";
open my $out, '>', $output or die "Can not open $output $!";
while (<FILE>){
foreach my #arr = /(?: ^|\s )(sid=\S*) {
$count{$arr}++;
}
}
foreach my #arr (sort keys %count){
printf "%-31s %s\n", $str, $count{$arr};
}
but show error missing $ on loop variable
anyone can help me out what i miss.
thanks.
This should produce desired output to output.txt, with words in order of appearance
use warnings;
use strict;
my $input = 'input.txt';
my $output = 'output.txt';
open (my $FILE, "<", $input) or die "Can not open $input $!";
open (my $out, ">", $output) or die "Can not open $output $!";
my (%count, #arr);
while (<$FILE>){
if ( /(?: ^|\s )(sid=\S*)/x ) {
push #arr, $1 if !$count{$1};
$count{$1}++;
}
}
foreach my $str (#arr) {
print $out sprintf("%-31s %s\n", $str, $count{$str});
}

problem with the code in perl

My problem is that I am not able to figure out that why my code is taking each of the line from the file as one element of an array instead of taking the whole record starting from AD to SS as one element of the array. As you can see that my file is starting from AD and ending at SS which is same for all the followed lines in the data. But I want to make the array having elements starting from AD to SS which will be having all the lines in between AD to SS that is BC....,EG...., FA.....etc.not each line as an element. I tried my way and get the same file as such.Could anyone check my code. Thanks in advance.
AD uuu23
BC jjj
EG iii
FA vvv
SS
AD hhh25
BC kkk
EG ppp
FA aaa
SS
AD ttt26
BC xxx
FA rrr
SS
#!/usr/bin/env perl
use strict;
use warnings;
my $ifh;
my $line = '';
my #data;
my $ifn = "fac.txt";
open ($ifh, "<$ifn") || die "can't open $ifn";
my $a = "AD ";
my $b = "SS ";
my $_ = " ";
while ($line = <$ifh>)
{
chomp
if ($line =~ m/$a/g); {
$line = $_;
push #data, $line;
while ($line = <$ifh>)
{
$line .= $_;
push #data, $line;
last if
($line =~ m/$b/g);
}
}
push #data, $line; }
print #data;
If I understand correctly your problem, the fact is that the way you are reading the file:
while ($line = <$ifh>)
is inherently a line-by-line approach. It uses the content of the "line termination variable" ($/) to understand where to split lines. One easy way to change this behavior is un-defining the $/:
my $oldTerminator = $/;
undef $/;
....... <your processing here>
$/ = $oldTerminator;
so, your file would be just one line, but I am not sure what would happen of your code.
Another approach is the following (keeping in mind what I said about the fact that you are reading the file line-by-line): instead of doing
`push #data, $line;`
at each iteration of your loop, just accumulate the lines you read in a variable
$line .= $_;
(like you already do), and do the push only at the end, just once. Actually, this second approach will be more easily applicable to your code (you only have to remove the two push statements you have and put one outside of the loop).
I believe part of your problem is here
chomp
if ($line =~ m/$a/g);
it should be
chomp;
if ($line =~ m/$a/g)
otherwise the if statement is always executed. Please update your question if this has helped you advance
Here's a way to accomplish reading the records into an array, with newlines removed:
Code:
use strict;
use warnings;
use autodie;
my #data;
my $record;
my $file = "fac.txt";
open my $fh, '<', $file;
while (<$fh>) {
chomp;
if (/^AD /) { # new record starts
$record = $_;
while (<$fh>) {
chomp;
$record .= $_;
last if /^SS\s*/;
}
push #data, $record;
} else { die "Data outside record: $_" }
}
use Data::Dumper;
print Dumper \#data;
Output:
$VAR1 = [
'AD uuu23BC jjjEG iiiFA vvvSS',
'AD hhh25BC kkkEG pppFA aaaSS',
'AD ttt26BC xxxFA rrrSS'
];
This is another version, using the input record separator $/:
use strict;
use warnings;
use autodie;
my $file = "fac.txt";
open my $fh, '<', $file;
my #data;
$/ = "\nSS";
while (<$fh>) {
s/\n//g;
push #data, $_;
}
use Data::Dumper;
print Dumper \#data;
Produces the same output with this data. It does not care about the record start characters, only the end, which is SS at the beginning of a line.

Resources