Simple Perl: can't compare array elements to strings - arrays

I initially wrote a script to convert a wordlist(each line is just 1 word) into an array of words #keywords(each line an element) using:
-------------------
open (FH, "< $keyword_file") or die "Can't open $keyword_file for read: $!";
my #keywords;
while (<FH>) {
push (#keywords, $_);
}
close FH or die "Cannot close $keyword_file: $!";
--------------------
I am now trying to use regex to compare this with other strings, but i just keep getting false results for some reason?
-----------------------
FULL PROGRAM
-----------------------------------
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
print "\n[Keywords]";
my $keyword_file = "keywords.txt";
#read keywords
my #keywords;
open (FH, "$keyword_file") or die "Can't open $keyword_file for read: $!";
while (<FH>) {
chomp;
push (#keywords, $_);
}
close FH or die "Cannot close $keyword_file: $!";
#pattern match
foreach(#keywords)
{
if ("print" =~ m/$_/) {
print "match found\n";
}
}
----------------------------------
the above argument is supposed to be true but it just keeps returning false. What am i doing wrong? Is this because the array is storing nextlines(enter) as well (sorry if i sound ignorant for thinking this :p)?

Yes, you need to chomp your array to remove line endings:
chomp(#keywords);

You forget to remove the newline in your keywords, try this:
while (<FH>) {
chomp;
push (#keywords, $_);
}
Here is a test program:
#!/usr/bin/perl
use strict;
use warnings;
my #keyword;
while (<DATA>) {
chomp;
push #keyword, $_;
}
foreach (#keyword) {
if ("print exit write" =~ m/$_/) {
print "match found\n";
}
}
__DATA__
print
exit
write

Related

How to get the data of each line from a file?

Here, I want to print the data in each line as 3 separate values with ":" as separator. The file BatmanFile.txt has the following details:
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
And the output I expected was:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
The output after executing the script was:
Bruce
Batman
bat#bat.com
Bruce
Batman
bat#bat.com
Please explain me what I am missing here:
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
push(#resultarray, #linearray);
print join("\n",$resultarray[0]),"\n";
print join("\n",$resultarray[1]),"\n";
print join("\n",$resultarray[2]),"\n";
}
close $info;
You are looping through file line by line. You have stored all lines (after splitting) in an array. Once the loop finishes you have all data in resultarray array, just print whole array after the loop (instead of printing just first 3 indexes which are you doing at the moment).
#!/usr/bin/perl
use strict;
use warnings;
my #resultarray;
while( my $line = <DATA>){
chomp $line;
my #linearray = split(":", $line);
push #resultarray, #linearray;
}
print "$_\n" foreach #resultarray;
__DATA__
Bruce:Batman:bat#bat.com
Santosh:Bhaskar:santosh#santosh.com
Demo
You can avoid all variables and do something like below
while(<DATA>){
chomp;
print "$_\n" foreach split(":");
}
One liner:
perl -e 'while(<>){chomp; push #result, split(":",$_);} print "$_\n" foreach #result' testdata.txt
When you do:
push(#resultarray, #linearray);
you're pushing #linearray into #resultarray at the end, so index 0 through 2 is still the items from the first time you pushed #linearray.
To overwrite #resultarray with the values from the second iteration, do:
#resultarray = #linearray;
instead.
Alternatively, use unshift to place #linearray at the start of #resultarray, as suggested by Sobrique:
unshift #resultarray, #linearray;
So, you just want to transliterate : to \n?
$ perl -pe 'tr/:/\n/' data.txt
Output:
Bruce
Batman
bat#bat.com
Santosh
Bhaskar
santosh#santosh.com
use strict;
use warnings;
my $file = 'BatmanFile.txt';
open my $info, $file or die "Could not open $file: $!";
my #resultarray;
while( my $line = <$info>) {
#print $line;
chomp $line;
my #linearray = split(":", $line);
#push(#resultarray, #linearray);
print join("\n",$linearray[0]),"\n";
print join("\n",$linearray[1]),"\n";
print join("\n",$linearray[2]),"\n";
}
close $info;

How to count the number of keys that exist in a hash?

I am working with an input file that contains tab delimitated sequences. Groups of sequences are separated by line breaks. The file looks like:
TAGC TAGC TAGC HELP
TAGC TAGC TAGC
TAGC HELP
TAGC
Here is the code I have:
use strict;
use warnings;
open(INFILE, "<", "/path/to/infile.txt") or die $!;
my %hash = (
TAGC => 'THIS_EXISTS',
GCTA => 'THIS_DOESNT_EXIST',
);
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
open my $out, '>', "/path/to/outfile.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
}
else
}
$count++;
print "Doesn't exist ", $count, "\n";
}
}
}
How can I tell how many sequences exist before I print? I need to put that information into the name of the output file.
Ideally, I would have a variable that I could include in the name of the file. Unfortunately, I can't just take the scalar of #elements because there are some sequences that won't get printed out. When I try to push the keys that exist into an array and then print the scalar of that array, I still don't get the results I need. Here is what I tried (all variables that need to be global are):
open my $out, '>', "/path/to/file.$number.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
push(#Array, $hash{$sequence}, "\n");
my $number = #Array;
print $out ">$sequence\n$hash{$sequence}\n";
#....
Thanks for the help. Really appreciate it.
my $sequences = grep exists $hash{$_}, #elements;
open my $out, '>', "/path/to/outfile_containing_$sequences.txt" or die $!;
In list context, grep filters a list by a criterion; in scalar context, it returns a count of elements that met the criterion.
The easiest way would be to keep track of how many keys you are printing in a variable and once your loop finish, just rename the file with the number you calculated. Perl comes with a built-in function to do this. The code would be something like this:
use strict;
use warnings;
open(INFILE, "<", "/path/to/infile.txt") or die $!;
my %hash = (
TAGC => 'THIS_EXISTS',
GCTA => 'THIS_DOESNT_EXIST',
);
my $ammt;
while (my $line = <INFILE>){
chomp $line;
my $hash;
my #elements = split "\t", $line;
open my $out, '>', "/path/to/outfile.txt" or die $!;
foreach my $sequence(#elements){
if (exists $hash{$sequence}){
print $out ">$sequence\n$hash{$sequence}\n";
$ammt++;
}
else
}
print "Doesn't exist ", $count, "\n";
}
}
}
rename "/path/to/outfile.txt", "/path/to/outfile${ammt}.txt" or die $!;
I removed the $count variable, since it's not declared in your code (strict would complain about that). Here's the official doc for rename. Since it returns True or False, you can check that it was successful or not.
By the way, be aware that:
push(#Array, $hash{$sequence}, "\n");
is storing two items ($hash{$sequence} and \n), so that count would be twice as it should be.

Perl Hashes of Arrays and Some issues

I currently have a csv file that looks like this:
a,b
a,d
a,f
c,h
c,d
So I saved these into a hash such that the key "a" is an array with "b,d,f" and the key "c" is an array with "h,d"... this is what I used for that:
while(<$fh>)
{
chomp;
my #row = split /,/;
my $cat = shift #row;
$category = $cat if (!($cat eq $category)) ;
push #{$hash{$category}}, #row;
}
close($fh);
Not sure about the efficiency but it seems to work when I do a Data Dump...
Now, the issue I'm having is this; I want to create a new file for each key, and in each of those files I want to print every element in the key, as such:
file "a" would look like this:
b
d
f
<end of file>
Any ideas? Everything I've tried isn't working, I'm not too familiar / experienced with hashes...
Thanks in advance :)
The output process is very simple using the each iterator, which provides the key and value pair for the next hash element in a single call
use strict;
use warnings;
use autodie;
open my $fh, '<', 'myfile.csv';
my %data;
while (<$fh>) {
chomp;
my ($cat, $val) = split /,/;
push #{ $data{$cat} }, $val;
}
while (my ($cat, $values) = each %data) {
open my $out_fh, '>', $cat;
print $out_fh "$_\n" for #$values;
}
#!/usr/bin/perl
use strict;
use warnings;
my %foos_by_cat;
{
open(my $fh_in, '<', ...) or die $!;
while (<$fh_in>) {
chomp;
my ($cat, $foo) = split /,/;
push #{ $foos_by_cat{$cat} }, $foo;
}
}
for my $cat (keys %foos_by_cat) {
open(my $fh_out, '>', $cat) or die $!;
for my $foo (#{ $foos_by_cat{$cat} }) {
print($fh_out "$foo\n");
}
}
I wrote the inner loop as I did to show the symmetry between reading and writing, but it can also be written as follows:
print($fh_out "$_\n") for #{ $foos_by_cat{$cat} };

print specific word staring with in text and count

I like to find word start with sid=word and sid=text and print and count it the same word.
sid=word 2
sid=text 5
I have try make some script
use warnings;
use strict;
my $input = 'input.txt';
my $output = 'output.txt';
open (FILE, "<", $input) or die "Can not open $input $!";
open my $out, '>', $output or die "Can not open $output $!";
while (<FILE>){
foreach my #arr = /(?: ^|\s )(sid=\S*) {
$count{$arr}++;
}
}
foreach my #arr (sort keys %count){
printf "%-31s %s\n", $str, $count{$arr};
}
but show error missing $ on loop variable
anyone can help me out what i miss.
thanks.
This should produce desired output to output.txt, with words in order of appearance
use warnings;
use strict;
my $input = 'input.txt';
my $output = 'output.txt';
open (my $FILE, "<", $input) or die "Can not open $input $!";
open (my $out, ">", $output) or die "Can not open $output $!";
my (%count, #arr);
while (<$FILE>){
if ( /(?: ^|\s )(sid=\S*)/x ) {
push #arr, $1 if !$count{$1};
$count{$1}++;
}
}
foreach my $str (#arr) {
print $out sprintf("%-31s %s\n", $str, $count{$str});
}

Create CSV file from 2d array perl

I am trying to load a csv file, transpose it and write a new one. I have everything working correctly except writing a new file. I have looked around online without success.
use strict;
use warnings;
use Text::CSV;
use Data::Dump qw(dump);
use Array::Transpose;
my #data; # 2D array for CSV data
my $file = 'sample_array.csv';
my $csv = Text::CSV->new;
open my $fh, '<', $file or die "Could not open $file: $!";
while( my $row = $csv->getline( $fh ) ) {
shift #$row; # throw away first value
push #data, $row;
}
#data=transpose(\#data);
dump(#data);
The output here is the transposed array #data (["blah", 23, 22, 43], ["tk1", 1, 11, 15],["huh", 5, 55, 55]). I need that output to be written to a new CSV file.
CSV file:
text,blah,tkl,huh
14,23,1,5
12,22,11,55
23,42,15,55
Refer to the code after dump. This was derived from the Text::CSV SYNOPSIS:
use strict;
use warnings;
use Text::CSV;
use Data::Dump qw(dump);
use Array::Transpose;
my #data; # 2D array for CSV data
my $file = 'sample_array.csv';
my $csv = Text::CSV->new;
open my $fh, '<', $file or die "Could not open $file: $!";
while( my $row = $csv->getline( $fh ) ) {
shift #$row; # throw away first value
push #data, $row;
}
#data=transpose(\#data);
dump(#data);
open $fh, ">:encoding(utf8)", "new.csv" or die "new.csv: $!";
for (#data) {
$csv->print($fh, $_);
print $fh "\n";
}
close $fh or die "new.csv: $!";
Along with Toolic's addition I had to make some edits due to the specific type of data I was dealing with. This was an extremely large set with engineering symbols & units and negative numbers with long decimals. For reference, my final code is below.
use strict;
use warnings;
use Text::CSV;
use Data::Dump qw(dump);
use Array::Transpose;
my #data; # 2D array for CSV data
my $file = 'rawdata.csv';
my $csv = Text::CSV->new({ binary => 1, quote_null => 0 });
open my $fh, '<', $file or die "Could not open $file: $!";
while( my $row = $csv->getline( $fh ) ) {
#shift #$row; # throw away first value, I needed the first values.
push #data, $row;
}
#data=transpose(\#data);
open $fh, ">:encoding(utf8)", "rawdata_trans.csv" or die "rawdata_trans.csv: $!";
for (#data) {
$csv->print($fh, $_);
print $fh "\n";
}
close $fh or die "rawdata_trans.csv: $!";

Resources