perl to get user input and print into file - arrays

I am writing a perl script to get user input and add into text file. Multiple fruits and info are separated by coma. I wanted to achieve below req:
If the fruit provided in cmd line not exist in file, then add it in with the info provided. (jackfruit in below case)
If the fruit provided in cmd already exist in file, then add in info provided into it. Info 1 is mapping fruit1
Example cmd:
<scipt> -fruit apple,orange,jackfruit -info <info1>,<info2>,<info3>
Example of input file:
apple
x
y
orange
a
b
Expected output:
apple
<info1>
x
y
orange
<info2>
a
b
jackfruit
<info3>
My code:
use strict;
use warnings;
use Getopt::Long;
use Tie::File;
my $fruit;
my $info;
if (! GetOptions(
"fruit=s" => \$fruit,
"info=s" => \$info,
)){
print "\nEntered Arguments are not enough!\nPlease Use Switch '-help' or '-h' For More Information.\n";
exit;
}
my $split_info;
my $split_fruit;
my #split_info = split(',', $info);
my #split_fruit = split(',', $fruit);
tie my #file, 'Tie::File', 'filenamee.txt' or die $!;
for (0 .. $#file) {
if ($file[$_] eq "$split_fruit[0]") {
splice #file, $_ + 1, 0, "\t$split_info[0]\n";
}
}

Ok, so this works as far as it goes. But (as you've no doubt noticed) it only handles the first items in your inputs. And you want to handle all of them.
You could loop over #split_fruit for each line in your input and do clever stuff with #split_info when it matches, but I think it's easier to store both values in a hash (it also makes me slightly uncomfortable to have linked pieces of data stored in two separate data structures.
my %new_data;
#new_data{#split_fruit} = #split_info;
This is called a "hash slice" and it's an easy way to populate a hash when you have the keys and values in two separate arrays. It's really just a shortcut for:
for (0 .. $#split_fruit) {
$new_data{$split_fruit[$_]} = $split_info[$_]
}
We can now loop over our tied file array and easily check for the data we want to add.
for (0 .. $#file) {
# We can skip lines that start with whitespace
next if /^\s/;
# If our input is a key in the hash...
if (exists $new_data{$file[$_]}) {
# ... then add the value as a new line in the file
splice #file, $_ + 1, 0, "\t$new_data{$file[$_]}\n";
}
}
This gets us most of the way there. But it doesn't handle the case where we're adding new data to the file (your "jackfruit" example). So, here's what we're going to do:
As we process a record in the loop, we'll delete it from the hash
Once the loop has finished, if there are any keys left in the hash, then we know we have new data and we can run a new loop to add that.
So our code becomes something like this:
for (0 .. $#file) {
next if /^\s/;
if (exists $new_data{$file[$_]}) {
splice #file, $_ + 1, 0, "\t$new_data{$file[$_]}\n";
# Delete our current key/value pair from the hash
delete $new_data{$file[$_]};
}
}
# For each key left in the hash...
for (keys %new_data) {
# Push two new lines into our tied file array
# (This adds new lines to the end of the file)
push #file, "$_\n", "\t$new_data{$_}\n";
}

Related

Group similar element of array together to use in foreach at once in perl

i have an array which contents elements in which some elements are similiar under certain conditions (if we detete the "n and p" from the array element then the similiar element can be recognised) . I want to use these similiar element at once while using foreach statement. The array is seen below
my #array = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
Order of the array element need not to be in this way always.
i am editing this question again. Sorry if i am not able to deliver the question properly. I have to print a string multiple times in the file with the variable present in the above array . I am just trying to make you understand the question through below code, the below code is not right in any sense .... i m just using it to make you understand my question.
open (FILE, ">" , "test.v");
foreach my $xy (#array){
print FILE "DUF A1 (.pin1($1), .pin2($2));" ; // $1 And $2 is just used to explain that
} // i just want to print abc_n and abc_p in one iteration of foreach loop and followed by other pairs in successive loops respectively
close (FILE);
The result i want to print is as follows:
DUF A1 ( .pin1(abc_n), .pin2(abc_p));
DUF A1 ( .pin1(gg_n), .pin2(gg_p));
DUF A1 ( .pin1(munday_n_xy), .pin2(munday_p_xy));
DUF A1 ( .pin1(soc_n), .pin2(soc_p));
The scripting language used is perl . Your help is really appreciated .
Thank You.!!
Partitioning a data set depends entirely on how data are "similiar under certain conditions."
The condition given is that with removal of _n and _p the "similar" elements become equal (I assume that underscore; the OP says n and p). In such a case one can do
use warnings;
use strict;
use feature 'say';
my #data = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
my %part;
for my $elem (#data) {
push #{ $part{ $elem =~ s/_(?:n|p)//r } }, $elem;
}
say "$_ => #{$part{$_}}" for keys %part;
The grouped "similar" strings are printed as a demo since I don't understand the logic of the shown output. Please build your output strings as desired.
If this is it and there'll be no more input to process later in code, nor will there be a need to refer to those common factors, then you may want the groups in an array
my #groups = values %part;
If needed throw in a suitable sorting when writing the array, sort { ... } values %part.
For more fluid and less determined "similarity" try "fuzzy matching;" here is one example.

Finding motifs and position of motif in FASTA file - Perl

Can someone help me with this Perl code? When I run it, nothing happens. No errors or anything which is weird to me. It reads in and opens the file just fine. I believe the problem is in the while loop or the foreach loop cause I honestly don't think I understand them. I'm very new at this with a pretty shit teacher.
Instructions: Declare a scalar variable called motif and make it AAA. Declare an array variable called locations, which is where the locations of the motif will be stored. Place the gene in a scalar variable. Now search for that motif in the amborella gene. The code should print the position of the motif and the motif found. You will need to write a while loop that searches for the motif and includes push, pos, and –length commands in order to save and report locations. Then you will need a foreach loop to print the locations and the motif. (If it only reports locations in the first line of the gene, remember that is because the gene is in a scalar variable that will only read the first line. That is acceptable.
My code so far:
#!/usr/bin/perl
use warnings;
use strict;
#Declare a scalar variable called motif and make it AAA.
my$motif="AAA";
#Declare an array variable called locations, which is where the
#locations of the motif will be stored.
my#locations=();
my$foundMotif="";
my$position=();
#Place the gene in a scalar variable.
my$geneFileName = 'amborella.txt';
open(GENEFILE, $geneFileName) or die "Can't read file!";
my$gene = <GENEFILE>;
#Now search for that motif in the amborella gene.
#The code should print the position of the motif and the motif
#found. You will need to write a while loop that searches for the
#motif and includes push, pos, and –length commands in order to
#save and report locations.
while($foundMotif =~ m/AAA/g) {
$position=(pos($foundMotif)-3);
push (#locations, $position);
}
#Then you will need a foreach loop to print the locations and the motif.
foreach $position (#locations){
print "\n Found motif: ", $motif, "\n at position: ", $position;
}
#close the file
close GENEFILE;
exit;
Your program is fine, it's a simple mix-up.
You are matching against an empty string.
while($foundMotif =~ m/AAA/g) {
$position = (pos($foundMotif)-3);
push (#locations, $position);
}
You're looking for AAA in $foundMotif. But that's an empty string because you just declared it further up. Your gene string (disclaimer: I know nothing about bio informatics) is $gene. That's what you need to match.
Let's go through it step by step. I've simplified your code and put in an example string. I'm aware that isn't what genes look like, but that doesn't matter. This is already fixed.
use strict;
use warnings;
my $motif = "AAA";
my #locations = ();
# ... skip reading the file
my $gene = "ABAABAAABAAAAB\n";
while ($gene =~ m/$motif/g) { # 1, 2
my $position = (pos($gene) - length($motif)); # 3, 4
push(#locations, $position);
}
foreach $position (#locations) {
print "\n Found motif: ", $motif, "\n at position: ", $position;
}
If you run this, the code now produces meaningful output.
Found motif: AAA
at position: 5
Found motif: AAA
at position: 9
I've made four changes:
You need to search in $gene
Your variable $motif is meaningless if you don't use it to search. That way, your program becomes dynamic.
Again, you need to use the pos() in $gene
To make it dynamic, you shouldn't hard-code the length
You don't need the $foundMotif variable at all. The $position can actually be lexical to the block it's in. That means, it will be a different variable each time the loop is run, which is simply good practice. In Perl, you want to always use the smallest scope possible for variables, and declare them only when you need them, not in advance.
Since this is a learning exercise, it makes sense to iterate the array separately. In a real life program, you could eliminate the foreach loop and the array and output the positions directly if you were not to use them later on.

Perl: RegEx: Storing variable lines in array

I'm developing a Perl script and one of the script functions is to detect many lines of data between two terminals and store them in an array. I need to store all lines in an array but to be grouped separately as 1st line in $1 and 2nd in $2 and so on. The problem here is that number of these lines is variable and will change with each new run.
my #statistics_of_layers_var;
for( <ALL_FILE> ) {
#statistics_of_layers_var = ($all_file =~ /(Statistics\s+Of\s+Layers)
(?:(\n|.)*)(Summary)/gm );
print #statistics_of_layers_var;
The given data should be
Statistics Of Layers
Line#1
Line#2
Line#3
...
Summary
How I could achieve it?
You can achieve this without a complicated regular expression. Simply use the range operator (also called flip-flop operator) to find the lines you want.
use strict;
use warnings;
use Data::Printer;
my #statistics_of_layers_var;
while (<DATA>) {
# use the range-operator to find lines with start and end flag
if (/^Statistics Of Layers/ .. /^Summary/) {
# but do not keep the start and the end
next if m/^Statistics Of Layers/ || m/^Summary/;
# add the line to the collection
push #statistics_of_layers_var, $_ ;
}
}
p #statistics_of_layers_var;
__DATA__
Some Data
Statistics Of Layers
Line#1
Line#2
Line#3
...
Summary
Some more data
It works by looking at the current line and flipps the block on and off. If /^Statistics of Layers/ matches the line it will run the block for each following line until the `/^Summary/ matches a line. Because those start and end lines are included we need to skip them when adding lines to the array.
This also works if your file contains multiple intances of this pattern. Then you'd get all of the lines in the array.
Maybe you can try this :
push #statistics_of_layers_var ,[$a] = ($slurp =~ /(Statistics\s+Of\s+Layers)
(?:(\n|.)*)(Summary)/gm );

Perl, A hash of arrays: adding and removing keys, adding to an array, all in a while loop

I have a hash which should contain certain keys which are linked to their own arrays. To be more specific, the hash keys are quality values and the arrays are sequence names. If there already is an array for that quality, I'd want to add the sequence name to the array that is linked to the quality in question. If there isn't one, I want to create one and add the sequence name to it. All this is done in a while loop, going through all the sequences one by one.
I've tried to do things like in Perl How do I retrieve an array from a hash of arrays? but I can't seem to get it right.
I just get these error messages:
Scalar value #{hash{$q} better written as ${hash{$q} at asdasd.pl line 69.
Global symbol "#q" requires explicit package name asdasd.pl line 58.
And some others, too.
Here is an example of what I've tried:
my %hash;
while (reading the sequences) {
my $q = "the value the sequence has";
my $seq = "the name of the sequence";
if (exists $hash{$q}) {
push (#{$hash{$q}}, $seq);
} else {
$hash{$q} = \#q;
$hash{$q} = [$seq];
next;
}
}
This obviously shouldn't be a very complicated problem but I'm new to perl and this kind of a problem feels difficult. I've googled this from various places but there seems to be something I just don't realize, and it might be really obvious, too.
You can use what perl calls autovivification to make this quite easy. Your code doesn't need that central if-statement. You can boil it down to:
push #{ $hash{$q} }, $seq;
If the particular key doesn't yet exist in the hash, perl will autoviv it, since it can infer that you wanted an array reference here.
You can find further resources on autovivification by Googling it. It's a unique enough word that the vast majority of the hits seem relevant. :-)
You are actually pretty close, a few notes though:
In your else block you assign a reference to #q into your hash then immediately overwrite it with [$seq], only the last operation on the hash will hold
You don't need next at the end of your loop, it will automatically go to the next iteration if there are no more statements to execute in the loop body.
Everything else seems to work fine, here are my revisions and the test data I used (since I don't know anything about DNA sequences I just used letters I remember from high school Biology)
Input file:
A 1
T 2
G 3
A 3
A 2
G 5
C 1
C 1
C 2
T 4
Code:
use strict;
use warnings FATAL => 'all';
# open file for reading
open(my $fh, '<', 'test.txt');
my %hash;
while ( my $line = <$fh> ) { # read a line
# split the line read from a file into a sequence name and value
my ($q, $seq) = split(/\s+/, $line);
if( exists $hash{$q} ) {
push #{ $hash{$q} }, $seq;
}
else {
$hash{$q} = [$seq];
}
}
# print the resulting hash
for my $k ( keys %hash ) {
print "$k : ", join(', ', #{$hash{$k}}), "\n";
}
# prints
# A : 1, 3, 2
# T : 2, 4
# C : 1, 1, 2
# G : 3, 5

How to get a single column of emails from a html textarea into array

I was thinking I could do this on my own but I need some help.
I need to paste a list of email addresses from a local bands mail list into a textarea and process them my Perl script.
The emails are all in a single column; delimited by newlines:
email1#email.com
email2#email.com
email3#email.com
email4#email.com
email5#email.com
I would like to obviously get rid of any whitespace:
$emailgoodintheory =~ s/\s//ig;
and I am running them through basic validation:
if (Email::Valid->address($emailgoodintheory)) { #yada
I have tried all kinds of ways to get the list into an array.
my $toarray = CGI::param('toarray');
my #toarraya = split /\r?\n/, $toarray;
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Above is just to test to see if I was successful. I have no need to print them.
It just loops through, grabs the schedules .txt file and sends each member the band schedule. All that other stuff works but I cannot get the textarea into an array!
So, as you can see, I am pretty lost.
Thank you sir(s), may I have another quick lesson?
You seem a bit new to Perl, so I will give you a thorough explanation why your code is bad and how you can improve it:
1 Naming conventions:
I see that this seems to be symbolic code, but $emailgoodintheory is far less readable than $emailGoodInTheory or $email_good_in_theory. Pick any scheme and stick to it, just don't write all lowercase.
I suppose that $emailgoodintheory holds a single email address. Then applying the regex s/\s//g or the transliteration tr/\s// will be enough; space characters are not case sensitive.
Using a module to validate adresses is a very good idea. :-)
2 Perl Data Types
Perl has three man types of variables:
Scalars can hold strings, numbers or references. They are denoted by the $ sigil.
Arrays can hold an ordered sequence of Scalars. They are denoted by the # sigil.
Hashes can hold an unordered set of Scalars. Some people tend to know them as dicitonaries. All keys and all values must be Scalars. Hashes are denoted by the % sigil.
A word on context: When getting a value/element from a hash/array, you have to change the sigil to the data type you want. Usually, we only recover one value (which always is a scalar), so you write $array[$i] or $hash{$key}. This does not follow any references so
my $arrayref = [1, 2, 3];
my #array = ($arrayref);
print #array[0];
will not print 123, but ARRAY(0xABCDEF) and give you a warning.
3 Loops in Perl:
Your loop syntax is very weird! You can use C-style loops:
for (my $i = 0; $i < #array; $i++)
where #array gives the length of the array, because we have a scalar context. You could also give $i the range of all possible indices in your array:
for my $i (0 .. $#array)
where .. is the range operator (in list context) and $#array gives the highest available index of our array. We can also use a foreach-loop:
foreach my $element (#array)
Note that in Perl, the keywords for and foreach are interchangeable.
4 What your loop does:
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Here you put each element of #toarraya into the scalar $address. Then you try to use it as an array (wrong!) and get the index $arrcnt out of it. This does not work; I hope your program died.
You can use every loop type given above (you don't need to count manually), but the standard foreach loop will suit you best:
foreach my $address (#toarraya){
print "$address<br/>\n";
}
A note on quoting syntax: while qq~ quoted ~ is absolutely legal, this is the most obfuscated code I have seen today. The standard quote " would suffice, and when using qq, try to use some sort of parenthesis (({[<|) as delimiter.
5 complete code:
I assume you wanted to write this:
my #addressList = split /\r?\n/, CGI::param('toarray');
foreach my $address (#addressList) {
# eliminate white spaces
$address =~ s/\s//g;
# Test for validity
unless (Email::Valid->address($address)) {
# complain, die, you decide
# I recommend:
print "<strong>Invalid address »$address«</strong><br/>";
next;
}
print "$address<br/>\n";
# send that email
}
And never forget to use strict; use warnings; and possibly use utf8.

Resources