#!/usr/bin/perl
use strict;
use warnings;
sub paragraph
{
open my $file, "<", "dict.txt" or die "$!";
my #words = <$file>;
close $file;
print "Number of lines:";
my $lines = <>;
print "Max words per line:";
my $range = <>;
for(my $i = 0; $i<$lines; $i++){
my $wordcount = int(rand($range));
for(my $s = 0; $s<=$wordcount; $s++){
my $range2 = scalar(#words);
my $word = int(rand($range2));
print $words[$word]." ";
if($s==$wordcount){
print "\n";}
}
}
}
paragraph;
I'm trying to learn programming, so I just wrote this simple script.
When running this code, I am getting use of uninitialized value errors... I can't figure out why, but I sure I am just overlooking something.
These two lines open the dict.txt file for writing and then try to read from it.
open FILE, ">dict.txt" or die $!;
my #words = <FILE>;
Since you can't read from a write-only file, it fails. If the file was writable, then it is empty now - sorry about your nice word list. Suggestion:
open my $file, "<", "dict.txt" or die "$!";
my #words = <$file>;
close $file;
Also, please learn to indent your braces in an orthodox fashion, such as:
sub go
{
print "Number of lines:";
my $lines = <>;
print "Max words per line:";
my $range = <>;
for (my $i = 0; $i<$lines; $i++){
my $wordcount = int(rand($range));
for (my $s = 0; $s<$wordcount; $s++){
my $range2 = 23496;
my $word = int(rand($range2));
my $chosen = #words[$word];
print "$chosen ";
if ($s=$wordcount){
print "\n";
}
}
}
}
Also leave a space between 'if' or 'for' and the open parenthesis.
Your assignment if ($s = $wordcount) probably isn't what you intended; however, the condition if ($s == $wordcount) will always be false since it is in the scope of a loop with the condition $s < $wordcount. You need to rethink that part of your logic.
On average, you should choose a better name for your function than go. Also, it is probably better to invoke it as go();.
When I test compile your script, Perl warns about:
Scalar value #words[$word] better written as $words[$word] at xx.pl line 19.
You should fix such errors before posting.
You have:
my $range2 = 23496;
my $word = int(rand($range2));
Unless you have more than 23,496 words in your dictionary, you will likely be accessing an uninitialized word. You should probably use:
my $range2 = scalar(#words);
That then just leaves you with some logic problems to resolve.
Given 'dict.txt' containing:
word1
word2
word3
word4
nibelung
abyssinia
tirade
pearl
And 'xx.pl' containing:
#!/usr/bin/env perl
use strict;
use warnings;
open my $file, "<", "dict.txt" or die $!;
my #words = <$file>;
close $file;
sub go
{
print "Number of lines: ";
my $lines = <>;
print "Max words per line: ";
my $range = <>;
my $range2 = scalar(#words);
for (1..$lines)
{
for (1..$range)
{
my $index = int(rand($range2));
my $chosen = $words[$index];
chomp $chosen;
print "$chosen ";
}
print "\n";
}
}
go();
When I run it, I get:
$ perl xx.pl
Number of lines: 3
Max words per line: 4
word4 word3 word4 nibelung
abyssinia pearl word1 tirade
word3 word1 word3 word2
$
Some more bugs:
if($s=$wordcount){
You need == here.
Related
I'm new to Stack Overflow and I would like to ask for some advice with regard to a minor problem I have with my Perl code.
In short, I have written a small programme that opens text files from a pre-defined array, then searches for certain strings in them and finally prints out the line containing the string.
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print ("\n");
print ("Search results for $_:\n\n");
open (F, $_) or die("Can't open file!\n");
while ($line = <F>) {
if ($line =~ /\$(voc)?[R|L]D|\$Rep|\/\//) {
print ($line);
}
}
}
close (F);
I was wondering whether it is possible to create an exception to the foreach loop, so that the line containing
print ("\n");
not be executed if the file is $S1A_SING_Files[0]. It should then be normally executed if the file is any of the following ones. Do you think this could be accomplished?
Thank you very much in advance!
Yes. Just add a check for the first file. Change:
print ("\n");
to:
print ("\n") if $_ ne $S1A_SING_Files[0];
If the array contains unique strings, you can use the following:
print("\n") if $_ ne $S1A_SING_Files[0]; # Different stringification than 1st element?
The following will work even if the array contains non-strings or duplicate values (and it's faster too):
print("\n") if \$_ != \$S1A_SING_Files[0]; # Different scalar than 1st element?
Both of the above could fail for magical arrays. The most reliable solution is to iterate over the indexes.
for my $i (0..$#S1A_SING_Files) {
my $file = $S1A_SING_Files[$i];
print("\n") if $i; # Different index than 1st element?
...
}
Your code can be written in following form
use strict;
use warnings;
my #S1A_SING_Files = (
'S1A-001_SING_annotated.txt',
'S1A-002_SING_annotated.txt',
'S1A-003_SING_annotated.txt',
'S1A-004_SING_annotated.txt',
'S1A-005_SING_annotated.txt'
);
foreach (#S1A_SING_Files) {
print "\n" unless $_ ne $S1A_SING_Files[0];
print "Search results for $_:\n\n";
open my $fh, '<', $_ or die("Can't open file!\n");
m!\$(voc)?[R|L]D|\$Rep|//! && print while <$fh>;
close $fh;
}
User first enter a number of lines. It then reads n lines of text from user input, and prints these lines backwards, i.e., if n=5, it prints the 5th
line first, the 4th, …, and the 1st,line last. I don't think wrong in my code, but when i start enter lines(line end with a newline), it won't stop. my array can store infinite number of lines which cause the problem stop them
use strict;
use warnings;
print "Enter number of n \n";
my $n = <STDIN>;
print "Enter couple lines of text \n";
my #array = (1..$n);
#array=<STDIN>;
do{
print "$array[$n]";
$n--;
}until($n=0);
Below are some tips that might help you:
Always include use strict; and use warnings at the top of your scripts to enforce good coding practices and help you find syntax errors sooner.
You must chomp your input from <STDIN> to remove return characters.
Use the array function push to add elements to the end of an array. Other relevant array functions include pop, shift, unshift, and potentially splice.
Use reverse to return an array of reversed order.
Finally, the for my $element (#array) { can be a helpful construct for iterating over the elements of an array.
These tips lead to the following code:
use strict;
use warnings;
print "Enter number of n \n";
chomp(my $n = <STDIN>);
die "n must be an integer" if $n =~ /^\d+$/;
my #array;
print "Enter $n lines of text\n";
for (1..$n) {
my $input = <STDIN>;
# chomp $input; # Do you want line endings, or not?
push #array, $input;
}
print "Here is your array: #array";
The end loop condition is do { … } until ($n = 0);, which is equivalent to do { … } while (!($n = 0));
The trouble is that the condition is an assignment, and zero is always false, so the loop is infinite.
Use do { … } until ($n == 0);.
Note that the input is in the line #array = <STDIN>; — it reads all the input provided until EOF. It is not constrained by the value of $n in any way.
#!/usr/bin/env perl
use strict;
use warnings;
print "Enter number of n \n";
my $n = <STDIN>;
printf "Got %d\n", $n;
print "Enter couple lines of text \n";
my #array = (1..$n);
#array = <STDIN>;
do {
print "$n = $array[$n]";
$n--;
} until ($n == 0);
When run, that gives:
$ perl inf.pl
Enter number of n
3
Got 3
Enter couple lines of text
abc
def
ghi
jkl
^D
3 = jkl
2 = ghi
1 = def
$
I think your expectations are skewiff.
I have an array which contains some DNA sequences as strings stored in its elements.
Ex: print $array[0]; give an output like this: ACTAG (#the first position in each sequence).
I have written this code that allows me to analyse the first position for each sequence.
#!/usr/bin/perl
$infile = #ARGV[0];
$ws= $ARGV[1];
$wsnumber= $ARGV[2];
open INFILE, $infile or die "Can't open $infile: $!"; # This opens file, but if file isn't there it mentions this will not open
my $sequence = (); # This sequence variable stores the sequences from the .fasta file
my $line; # This reads the input file one-line-at-a-time
while ($line = <INFILE>) {
chomp $line;
if ($ws ne "--ws=") {
print "no flag or invalid flag\n";
last;
}
else {
if($line =~ /^\s*$/) { # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
next;
} elsif($line =~ /^\s*#/) { # This finds lines with spaces before the hash character. Removes .fasta comment
next;
} elsif($line =~ /^>/) { # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
next;
} else {
$sequence = $line;
$sequence =~ s/\s//g; # Whitespace characters are removed
#array = split //,$sequence;
$seqlength = length($sequence);}
}
$count=0;
foreach ($array[0]){
if( $array[0] !~ m/A|T|C|G/ ){
next;
}
else {
$count += 1;
$suma += $count;
}
}
}
But I don't know how to modify $array[0] for running this code for each position (I'm only manage to do it for a specific postion (in the above example for the first position).
Can someone help me?
Thank you!
Not sure I understand well, but is that what you want?
Assuming #array contains one sequence by element.
foreach my $seq(#array) {
my #chars = split '', $seq;
foreach my $char(#char) {
if ($char =~ /[ATCG]/) {
# do stuff
}
}
}
You're only looking at one element in your array:
foreach ($array[0]){ ... }
To iterate over the whole array use:
my #array = qw(ATTTCFCGGCTTA);
foreach (#array){
my #split = split('');
foreach (#split){
die "$_ is not a valid character\n" unless /[ATGC]/;
print "$_\n";
}
}
I have a program that creates an array of hashes while parsing a FASTA file. Here is my code
use strict;
use warnings;
my $docName = "A_gen.txt";
my $alleleCount = 0;
my $flag = 1;
my $tempSequence;
my #tempHeader;
my #arrayOfHashes = ();
my $fastaDoc = open(my $FH, '<', $docName);
my #fileArray = <$FH>;
for (my $i = 0; $i <= $#fileArray; $i++) {
if ($fileArray[$i] =~ m/>/) { # creates a header for the hashes
$flag = 0;
$fileArray[$i] =~ s/>//;
$alleleCount++;
#tempHeader = split / /, $fileArray[$i];
pop(#tempHeader); # removes the pointless bp
for (my $j = 0; $j <= scalar(#tempHeader)-1; $j++) {
print $tempHeader[$j];
if ($j < scalar(#tempHeader)-1) {
print " : "};
if ($j == scalar(#tempHeader) - 1) {
print "\n";
};
}
}
# push(#arrayOfHashes, "$i");
if ($fileArray[$i++] =~ m/>/) { # goes to next line
push(#arrayOfHashes, {
id => $tempHeader[0],
hla => $tempHeader[1],
bpCount => $tempHeader[2],
sequence => $tempSequence
});
print $arrayOfHashes[0]{id};
#tempHeader = ();
$tempSequence = "";
}
$i--; # puts i back to the current line
if ($flag == 1) {
$tempSequence = $tempSequence.$fileArray[$i];
}
}
print $arrayOfHashes[0]{id};
print "\n";
print $alleleCount."\n";
print $#fileArray +1;
My problem is when the line
print $arrayOfHashes[0]{id};
is called, I get an error that says
Use of uninitialized value in print at fasta_tie.pl line 47, line 6670.
You will see in the above code I commented out a line that says
push(#arrayOfHashes, "$i");
because I wanted to make sure that the hash works. Also the data prints correctly in the
desired formatting. Which looks like this
HLA:HLA00127 : A*74:01 : 2918
try to add
print "Array length:" . scalar(#arrayOfHashes) . "\n";
before
print $arrayOfHashes[0]{id};
So you can see, if you got some content in your variable. You can also use the module Data::Dumper to see the content.
use Data::Dumper;
print Dumper(\#arrayOfHashes);
Note the '\' before the array!
Output would be something like:
$VAR1 = [
{
'sequence' => 'tempSequence',
'hla' => 'hla',
'bpCount' => 'bpCount',
'id' => 'id'
}
];
But if there's a Module for Fasta, try to use this. You don't have to reinvent the wheel each time ;)
First you do this:
$fileArray[$i] =~ s/>//;
Then later you try to match like this:
$fileArray[$i++] =~ m/>/
You step through the file array, removing the first "greater than" sign in the line on each line. And then you want to match the current line by that same character. That would be okay if you only want to push the line if it has a second "greater than", but you will never push anything into the array if you only expect 1, or there turns out to be only one.
Your comment "puts i back to the current line" shows what you were trying to do, but if you only use it once, why not use the expression $i + 1?
Also, because you're incrementing it post-fix and not using it for anything, your increment has no effect. If $i==0 before, then $fileArray[$i++] still accesses $fileArray[0], only $i==1 after the expression has been evaluated--and to no effect--until being later decremented.
If you want to peek ahead, then it is better to use the pre-fix increment:
if ($fileArray[++$i] =~ m/>/) ...
The purpose of the script is to process all words from a file and output ALL words that occur the most. So if there are 3 words that each occur 10 times, the program should output all the words.
The script now runs, thanks to some tips I have gotten here. However, it does not handle large text files (i.e. the New Testament). I'm not sure if that is a fault of mine or just a limitation of the code. I am sure there are several other problems with the program, so any help would be greatly appreciated.
#!/usr/bin/perl -w
require 5.10.0;
print "Your file: " . $ARGV[0] . "\n";
#Make sure there is only one argument
if ($#ARGV == 0){
#Make sure the argument is actually a file
if (-f $ARGV[0]){
%wordHash = (); #New hash to match words with word counts
$file=$ARGV[0]; #Stores value of argument
open(FILE, $file) or die "File not opened correctly.";
#Process through each line of the file
while (<FILE>){
chomp;
#Delimits on any non-alphanumeric
#words=split(/[^a-zA-Z0-9]/,$_);
$wordSize = #words;
#Put all words to lowercase, removes case sensitivty
for($x=0; $x<$wordSize; $x++){
$words[$x]=lc($words[$x]);
}
#Puts each occurence of word into hash
foreach $word(#words){
$wordHash{$word}++;
}
}
close FILE;
#$wordHash{$b} <=> $wordHash{$a};
$wordList="";
$max=0;
while (($key, $value) = each(%wordHash)){
if($value>$max){
$max=$value;
}
}
while (($key, $value) = each(%wordHash)){
if($value==$max && $key ne "s"){
$wordList.=" " . $key;
}
}
#Print solution
print "The following words occur the most (" . $max . " times): " . $wordList . "\n";
}
else {
print "Error. Your argument is not a file.\n";
}
}
else {
print "Error. Use exactly one argument.\n";
}
Your problem lies in the two missing lines at the top of your script:
use strict;
use warnings;
If they had been there, they would have reported lots of lines like this:
Argument "make" isn't numeric in array element at ...
Which comes from this line:
$list[$_] = $wordHash{$_} for keys %wordHash;
Array elements can only be numbers, and since your keys are words, that won't work. What happens here is that any random string is coerced into a number, and for any string that does not begin with a number, that will be 0.
Your code works fine reading the data in, although I would write it differently. It is only after that that your code becomes unwieldy.
As near as I can tell, you are trying to print out the most occurring words, in which case you should consider the following code:
use strict;
use warnings;
my %wordHash;
#Make sure there is only one argument
die "Only one argument allowed." unless #ARGV == 1;
while (<>) { # Use the diamond operator to implicitly open ARGV files
chomp;
my #words = grep $_, # disallow empty strings
map lc, # make everything lower case
split /[^a-zA-Z0-9]/; # your original split
foreach my $word (#words) {
$wordHash{$word}++;
}
}
for my $word (sort { $wordHash{$b} <=> $wordHash{$a} } keys %wordHash) {
printf "%-6s %s\n", $wordHash{$word}, $word;
}
As you'll note, you can sort based on hash values.
Here is an entirely different way of writing it (I could have also said "Perl is not C"):
#!/usr/bin/env perl
use 5.010;
use strict; use warnings;
use autodie;
use List::Util qw(max);
my ($input_file) = #ARGV;
die "Need an input file\n" unless defined $input_file;
say "Input file = '$input_file'";
open my $input, '<', $input_file;
my %words;
while (my $line = <$input>) {
chomp $line;
my #tokens = map lc, grep length, split /[^A-Za-z0-9]+/, $line;
$words{ $_ } += 1 for #tokens;
}
close $input;
my $max = max values %words;
my #argmax = sort grep { $words{$_} == $max } keys %words;
for my $word (#argmax) {
printf "%s: %d\n", $word, $max;
}
why not just get the keys from the hash sorted by their value and extract the first X?
this should provide an example: http://www.devdaily.com/perl/edu/qanda/plqa00016