Indexing the values of string elements from array perl - arrays

I have an array which contains some DNA sequences as strings stored in its elements.
Ex: print $array[0]; give an output like this: ACTAG (#the first position in each sequence).
I have written this code that allows me to analyse the first position for each sequence.
#!/usr/bin/perl
$infile = #ARGV[0];
$ws= $ARGV[1];
$wsnumber= $ARGV[2];
open INFILE, $infile or die "Can't open $infile: $!"; # This opens file, but if file isn't there it mentions this will not open
my $sequence = (); # This sequence variable stores the sequences from the .fasta file
my $line; # This reads the input file one-line-at-a-time
while ($line = <INFILE>) {
chomp $line;
if ($ws ne "--ws=") {
print "no flag or invalid flag\n";
last;
}
else {
if($line =~ /^\s*$/) { # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
next;
} elsif($line =~ /^\s*#/) { # This finds lines with spaces before the hash character. Removes .fasta comment
next;
} elsif($line =~ /^>/) { # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
next;
} else {
$sequence = $line;
$sequence =~ s/\s//g; # Whitespace characters are removed
#array = split //,$sequence;
$seqlength = length($sequence);}
}
$count=0;
foreach ($array[0]){
if( $array[0] !~ m/A|T|C|G/ ){
next;
}
else {
$count += 1;
$suma += $count;
}
}
}
But I don't know how to modify $array[0] for running this code for each position (I'm only manage to do it for a specific postion (in the above example for the first position).
Can someone help me?
Thank you!

Not sure I understand well, but is that what you want?
Assuming #array contains one sequence by element.
foreach my $seq(#array) {
my #chars = split '', $seq;
foreach my $char(#char) {
if ($char =~ /[ATCG]/) {
# do stuff
}
}
}

You're only looking at one element in your array:
foreach ($array[0]){ ... }
To iterate over the whole array use:
my #array = qw(ATTTCFCGGCTTA);
foreach (#array){
my #split = split('');
foreach (#split){
die "$_ is not a valid character\n" unless /[ATGC]/;
print "$_\n";
}
}

Related

Error: readline on closed filehandle using Perl

!/usr/bin/perl
use Cwd;
use warnings;
open($fh,'<', "clinical.txt");
$line = <$fh>;
while($line = <$fh>)
{
my #fields2 =split(" ",$line);
push(#id,$fields2[1]);
push(#status,$fields2[3]);
}
#flow = grep { -d } glob "*";
$arrSize = #flow;
for ($p = 0; $p < $arrSize; $p++)
{
#files=<$flow[$p]/*.maf>;
print $flow[$p],"\n";
foreach $file(#files)
{
print $file,"\n";
open(x,$file);
%hash={};
%tested={};
%Mut_Count={};
$hyper=0;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
while($line = <x>)
{
#temp=split("\t",$line);
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12]."_".$temp[15];
push #{$hash{$key}}, "0";
push #{$Mut_Count{$temp[15]}}, "0";
}
#nm=split(/\./,$file);
open(x,$file);
open(Out1,">Results/".$nm[1]."_Hyper.txt");
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
$line = <x>;
#temp=split(" ",$line);
#temp2=split(",",$temp[1]);
print Out1 "Gene\tMutation\tType\tdbSNP\tStatus\tPolyphen\tSift";
for($j=0;$j<scalar(#id);$j++){
my #M=split('-', $id[$j]);
for($i=0;$i<scalar(#temp2);$i++)
{
my #N=split('-', $temp2[$i]);
if(scalar(#{$Mut_Count{$temp2[$i]}})>499 && $M[0] eq $N[0] && $M[1] eq $N[1] && $M[2] eq $N[2] && $status[$j] eq 'MSS')
{
print Out1 "\t",$temp2[$i];
$hyper++;
}
}
}
$line = <x>;
while($line = <x>)
{
$hy=0;
#temp=split("\t",$line);
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12];
if(!exists $tested{$key})
{
push #{$tested{$key}}, "0";
print Out1 "\n",$temp[0],"\t",$key,"\t",$temp[8],"\t",$temp[13],"\t",$temp[25],"\t",$temp[72],"\t",$temp[73];
for($i=0;$i<scalar(#temp2);$i++)
{
$key=$temp[4]."_".$temp[5]."_".$temp[6]."_".$temp[10]."_".$temp[11]."_".$temp[12]."_".$temp2[$i];
my #L=split('-', $temp2[$i]);
for($j=0;$j<scalar(#id);$j++){
my #O=split('-', $id[$j]);
if(scalar(#{$Mut_Count{$temp2[$i]}})>499 && $L[0] eq $O[0] && $L[1] eq $O[1] && $L[2] eq $O[2] && $status[$j] eq 'MSS')
{
if(exists $hash{$key})
{
print Out1 "\t1";
$hy++;
}
else
{
print Out1 "\t0";
}
}
}
}
}
}
open(Out3,">Results/".$nm[1]."_Summary.txt");
print Out3 "Hypermutated\t$hyper\n";
}
}
$line = <$fh>; while($line = <$fh>)
these lines are showing readline() on closed filehandle $fh
%hash={}; %tested={}; %Mut_Count={};
on these three lines it says Reference found where even-sized list expected
.maf are basically GDC downloaded files with bit modified header according to our need with unique TCGA IDs.
Whereas, Clinical Info is file contaning TCGA IDs, its source and MSI_Status that tells us whether it is MSI-L, MSI-H or MSS.
I'm reading multiple .maf files and comparing it with clinical_info file
and if the if condition is satisfied that I want it to generate mastertable(write a file)
I'm doing it in windows. Kindly, help me resolve this, Thanks in anticipation.
Answer to
%hash={}; %tested={}; %Mut_Count={}; on these three lines it says Reference found where even-sized list expected
You can get rid of the errors by getting rid of these three lines, though if you accept (please!) the use strict; recommendation you will need to replace them with my %hash; and so forth.
Under Perl, hashes (and arrays) do not need to be initialized. If you wish to initialize one, you assign it another array, or a list, or another hash. The length of the initializer must be even, since it is interpreted as key/value pairs. Your code supplied a hash reference, which will not be expanded by Perl to an empty hash. Hence the error.
If it makes you nervous not to initialize the hash, you can say my %hash = ();.
On the other hand, if you intend to use a scalar as a hash reference, you will need to initialize it. This is where you use the curly brackets, which (among other things) are a hash constructor which returns a reference to the constructed hash. So:
my $hash = {};
$hash->{$key} = $value;
...

Adding a key value pair in hash, by assigning an array in the value => Can't use an undefined value as an ARRAY reference

I'm trying to assign an array in a hash key-value pair as a value of a key. After assigning it i'm trying to dereference it and print the array values from the specific key in an output file as you can see from the code below.
The code is not working well on the array manipulation part. Can someone tell me what I'm doing wrong?
use strict;
use warnings;
use Data::Dumper;
# File input
my $in_file = 'log.txt';
# Output file
my $out_file_name = 'output.csv';
open(my $fout, '>', $out_file_name);
# print header csv
print $fout "Col1\,Col2\,Col3\,Col4\,Col5\n";
# Read the input file
open(FH, '<', $in_file) or die "Could not open file '$in_file' $!";
my #log_file = <FH>;
# print Dumper(#log_file),"\n";
close (FH);
# my #test_val;
my ($read, $ref, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
foreach(#log_file) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
$ref->{code} = $+{county_name};
$val_county = $ref->{code};
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$/i) {
$ref->{code} = $val_county;
push(#test_values, $+{tar_records});
$ref->{tar_rec} = \#test_values;
# $val_rec = $ref->{tar_rec};
# $val_rec =~ s/\.//g;
}
&print_file($ref);
}
sub print_file {
my $ref = shift;
my $status = shift;
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n"; # Line 68
print Dumper($ref);
}
close $fout;
print "Done!","\n";
The code is a providing an error like:
"Can't use an undefined value as an ARRAY reference at test_array_val_hash.pl line 68."
Until the second regex in your forloop block is matched, the $ref->{tar_rec} key will not be assigned a value - and will be undefined. The following snippet - based on your own code - highlights the issue.
#!/usr/bin/perl -w
my #tar_records = (15,35,20);
my $ref = {
code => 'Cork',
tar_rec => \#tar_records,
};
sub print_info {
my $ref = shift;
print join(", ", $ref->{code}, (#{$ref->{tar_rec}})), $/;
}
print_info($ref);
# Once we 'undefine' the relevant key, we witness the afore-
# mentioned error.
undef $ref->{tar_rec};
print_info($ref);
To avoid this error, you could assign an anonymous array reference to $ref->{tar_rec} key before the for loop (since $ref->{tar_rec} is a cumulative value).
# Be sure not to declare $ref twice!
my ($read, $val_county, $val_rec, $val_tar, $val_print, #test_values, $status);
my $ref = {
code => '',
tar_rec => [],
}
P.S. Notice also that I used round brackets rather than square brackets in the join() function (although you actually don't need either).
The problem is that you're calling print_file in the wrong place.
Imagine that you're parsing the file a line at a time. Your code parses the first line and that populates $ref->{code}. But then you call print_file on a partially populated $ref so it doesn't work.
Your code is also not resetting any of the variables used, so as it progresses through the file, the contents of $ref are going to grow.
The code below fixes the first problem by implicitly setting an empty array in $ref->{tar_rec} and only printing out the record when it's starting a new one or when it's finished reading in the file. Since $ref->{tar_rec} is an array it solves the other problem by allowing you to directly push into it rather than relying upon #test_values. Just for added safety it assigns an empty hash to $ref.
if(open(my $fh, '<', $in_file)) {
my $ref;
my $val_county;
foreach(<$fh>) {
# print $_;
if ($_ =~ /\t+(?<county_name>(?!Total).+)\s+/i) {
if(defined($val_county)) {
print_file($ref);
}
$ref={};
$val_county = $+{county_name};
$ref->{code} = $val_county;
$ref->{tar_rec} = [];
} elsif ($_ =~ /^Total\s+records\s+in\s+TAR\s+\(pr.+\)\:\s+(?<tar_records>.+)$
push #{$ref->{tar_rec}}, $+{tar_records};
}
}
if(defined($ref)) {
print_file($ref);
}
close($fh);
} else {
die "Could not open file '$in_file' $!";
}
You're also printing out the array incorrectly
print $fout join(",", $ref->{code}, [#{$ref->{tar_rec}}]), "\n";
you don't need any brackets around #{$ref->{tar_rec}} - it'll be treated as a list of values to pass to join as is.
print $fout join(",", $ref->{code}, #{$ref->{tar_rec}}), "\n";

Perl - Scan directory for pattern in contained filenames and fill array

i ve tried to write a simple function that takes two arguments, input directory and pattern, and returns an array with the matched files that contain the pattern(s) in their name.
my $dir = "/drives/D/Desktop/perlscripts";
sub getfiles
{
my ($dirName, #patterns) = #_;
opendir(my $dir, $dirName) or die "Can't open '$dirName': $!";
my #return;
for my $file (readdir($dir))
{
for my $pattern (#patterns)
{
if ($file =~ /$pattern/)
{
print "$file \n";
push #return, $file;
# This avoids having the file in the result twice
last;
}
}
}
return #return;
}
my #plscripts = getfiles($dir, "pl");
print "#plscripts \n";
I get nothing in STDOUT with the code as is, after some tests, i think the problem is with the.
Edited with changes : As you can see i commented out #last; but yes it matches every file name on every iteration then. So i end up with duplicate entries in the array.
The quick fix could be add a label for your first loop and then change the "last" to "next FILE":
my $dir = "/drives/D/Desktop/perlscripts";
sub getfiles
{
my ($dirName, #patterns) = #_;
opendir(my $dir, $dirName) or die "Can't open '$dirName': $!";
my #return;
FILE:
for my $file (readdir($dir))
{
for my $pattern (#patterns)
{
if ($file =~ /$pattern/)
{
print "$file \n";
push #return, $file;
# This avoids having the file in the result twice
# last
next FILE;
}
}
}
return #return;
}
my #plscripts = getfiles($dir, "pl");
print "#plscripts \n";

Perl script problems

The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016

How can I search for simple if statements in C source code?

I'd like to do a search for simple if statements in a collection of C source files.
These are statements of the form:
if (condition)
statement;
Any amount of white space or other sequences (e.g. "} else ") might appear on the same line before the if. Comments may appear between the "if (condition)" and "statement;".
I want to exclude compound statements of the form:
if (condition)
{
statement;
statement;
}
I've tried each of the following in awk:
awk '/if \(.*\)[^{]+;/ {print NR $0}' file.c # (A) No results
awk '/if \(.*\)[^{]+/ {print NR $0}' file.c # (B)
awk '/if \(.*\)/ {print NR $0}' file.c # (C)
(B) and (C) give different results. Both include items I'm looking for and items I want to exclude. Part of the problem, obviously, is how to deal with patterns that span multiple lines.
Edge cases (badly formed comments, odd indenting or curly braces in odd places, etc.) can be ignored.
How can I accomplish this?
Based on Al's answer, but with fixes for a couple of problems (plus I decided to check for simple else clauses, too (also, it prints the full if block):
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
#ifdef NEW
my $block = "";
#endif /* NEW */
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
$block = $block . $line_number . "+ " . $_;
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
$block = "";
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line;
print $block;
$block = "";
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/(if \(.*\)|[^#]else)(.*)/)
{
# Store the line contents
$if_line = $line_number . ": " . $_;
# If the end of the line has a semicolon, report it
if ($2 =~ ';')
{
print $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($2 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm not sure how you'd do this with a one liner (I'm sure you could by using sed's 'n' command to read the next line, but it would be very complicated), so you probably want to use a script for this. How about:
perl parse_if.pl file.c
Where parse_if.pl contains:
#!/usr/bin/perl -w
my $line_number = 0;
my $in_if = 0;
my $if_line = "";
# Scan through each line
while(<>)
{
# Count the line number
$line_number += 1;
# If we're in an if block
if ($in_if)
{
# Check for open braces (and ignore the rest of the if block
# if there is one).
if (/{/)
{
$in_if = 0;
}
# Check for semi-colons and report if present
elsif (/;/)
{
print $if_line_number . ": " . $if_line;
$in_if = 0;
}
}
# If we're not in an if block, look for one and catch the end of the line
elsif (/^[^#]*\b(?:if|else|while) \(.*\)(.*)/)
{
# Store the line contents
$if_line = $_;
$if_line_number = $line_number;
# If the end of the line has a semicolon, report it
if ($1 =~ ';')
{
print $if_line_number . ": " . $if_line;
}
# If the end of the line contains the opening brace, ignore this if
elsif ($1 =~ '{')
{
}
# Otherwise, read the following lines as they come in
else
{
$in_if = 1;
}
}
}
I'm sure you could do something fairly easily in any other language (including awk) if you wanted to; I just thought that I could do it quickest in perl by way of an example.
In awk, each line is treated as a record and "\n" is the record separator. As all the records are parsed line by line so you need to keep track of next line after if. I don't know how can you do this in awk..
In perl, you can do this easily as
open(INFO,"<file.c");
$flag=0;
while($line = <INFO>)
{
if($line =~ m/if\s*\(/ )
{
print $line;
$flag = 1;
}
else
{
print $line && $flag ;
$flag = 0 if($flag);
}
}
Using Awk you can do this by:
awk '
BEGIN { flag=0 }
{
if($0 ~ /if/) {
print $0;
flag=NR+1
}
if(flag==NR)
print $0
}' try.c

Resources