Perl: RegEx: Storing variable lines in array - arrays

I'm developing a Perl script and one of the script functions is to detect many lines of data between two terminals and store them in an array. I need to store all lines in an array but to be grouped separately as 1st line in $1 and 2nd in $2 and so on. The problem here is that number of these lines is variable and will change with each new run.
my #statistics_of_layers_var;
for( <ALL_FILE> ) {
#statistics_of_layers_var = ($all_file =~ /(Statistics\s+Of\s+Layers)
(?:(\n|.)*)(Summary)/gm );
print #statistics_of_layers_var;
The given data should be
Statistics Of Layers
Line#1
Line#2
Line#3
...
Summary
How I could achieve it?

You can achieve this without a complicated regular expression. Simply use the range operator (also called flip-flop operator) to find the lines you want.
use strict;
use warnings;
use Data::Printer;
my #statistics_of_layers_var;
while (<DATA>) {
# use the range-operator to find lines with start and end flag
if (/^Statistics Of Layers/ .. /^Summary/) {
# but do not keep the start and the end
next if m/^Statistics Of Layers/ || m/^Summary/;
# add the line to the collection
push #statistics_of_layers_var, $_ ;
}
}
p #statistics_of_layers_var;
__DATA__
Some Data
Statistics Of Layers
Line#1
Line#2
Line#3
...
Summary
Some more data
It works by looking at the current line and flipps the block on and off. If /^Statistics of Layers/ matches the line it will run the block for each following line until the `/^Summary/ matches a line. Because those start and end lines are included we need to skip them when adding lines to the array.
This also works if your file contains multiple intances of this pattern. Then you'd get all of the lines in the array.

Maybe you can try this :
push #statistics_of_layers_var ,[$a] = ($slurp =~ /(Statistics\s+Of\s+Layers)
(?:(\n|.)*)(Summary)/gm );

Related

Group similar element of array together to use in foreach at once in perl

i have an array which contents elements in which some elements are similiar under certain conditions (if we detete the "n and p" from the array element then the similiar element can be recognised) . I want to use these similiar element at once while using foreach statement. The array is seen below
my #array = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
Order of the array element need not to be in this way always.
i am editing this question again. Sorry if i am not able to deliver the question properly. I have to print a string multiple times in the file with the variable present in the above array . I am just trying to make you understand the question through below code, the below code is not right in any sense .... i m just using it to make you understand my question.
open (FILE, ">" , "test.v");
foreach my $xy (#array){
print FILE "DUF A1 (.pin1($1), .pin2($2));" ; // $1 And $2 is just used to explain that
} // i just want to print abc_n and abc_p in one iteration of foreach loop and followed by other pairs in successive loops respectively
close (FILE);
The result i want to print is as follows:
DUF A1 ( .pin1(abc_n), .pin2(abc_p));
DUF A1 ( .pin1(gg_n), .pin2(gg_p));
DUF A1 ( .pin1(munday_n_xy), .pin2(munday_p_xy));
DUF A1 ( .pin1(soc_n), .pin2(soc_p));
The scripting language used is perl . Your help is really appreciated .
Thank You.!!
Partitioning a data set depends entirely on how data are "similiar under certain conditions."
The condition given is that with removal of _n and _p the "similar" elements become equal (I assume that underscore; the OP says n and p). In such a case one can do
use warnings;
use strict;
use feature 'say';
my #data = qw(abc_n abc_p gg_n gg_p munday_n_xy munday_p_xy soc_n soc_p);
my %part;
for my $elem (#data) {
push #{ $part{ $elem =~ s/_(?:n|p)//r } }, $elem;
}
say "$_ => #{$part{$_}}" for keys %part;
The grouped "similar" strings are printed as a demo since I don't understand the logic of the shown output. Please build your output strings as desired.
If this is it and there'll be no more input to process later in code, nor will there be a need to refer to those common factors, then you may want the groups in an array
my #groups = values %part;
If needed throw in a suitable sorting when writing the array, sort { ... } values %part.
For more fluid and less determined "similarity" try "fuzzy matching;" here is one example.

Finding motifs and position of motif in FASTA file - Perl

Can someone help me with this Perl code? When I run it, nothing happens. No errors or anything which is weird to me. It reads in and opens the file just fine. I believe the problem is in the while loop or the foreach loop cause I honestly don't think I understand them. I'm very new at this with a pretty shit teacher.
Instructions: Declare a scalar variable called motif and make it AAA. Declare an array variable called locations, which is where the locations of the motif will be stored. Place the gene in a scalar variable. Now search for that motif in the amborella gene. The code should print the position of the motif and the motif found. You will need to write a while loop that searches for the motif and includes push, pos, and –length commands in order to save and report locations. Then you will need a foreach loop to print the locations and the motif. (If it only reports locations in the first line of the gene, remember that is because the gene is in a scalar variable that will only read the first line. That is acceptable.
My code so far:
#!/usr/bin/perl
use warnings;
use strict;
#Declare a scalar variable called motif and make it AAA.
my$motif="AAA";
#Declare an array variable called locations, which is where the
#locations of the motif will be stored.
my#locations=();
my$foundMotif="";
my$position=();
#Place the gene in a scalar variable.
my$geneFileName = 'amborella.txt';
open(GENEFILE, $geneFileName) or die "Can't read file!";
my$gene = <GENEFILE>;
#Now search for that motif in the amborella gene.
#The code should print the position of the motif and the motif
#found. You will need to write a while loop that searches for the
#motif and includes push, pos, and –length commands in order to
#save and report locations.
while($foundMotif =~ m/AAA/g) {
$position=(pos($foundMotif)-3);
push (#locations, $position);
}
#Then you will need a foreach loop to print the locations and the motif.
foreach $position (#locations){
print "\n Found motif: ", $motif, "\n at position: ", $position;
}
#close the file
close GENEFILE;
exit;
Your program is fine, it's a simple mix-up.
You are matching against an empty string.
while($foundMotif =~ m/AAA/g) {
$position = (pos($foundMotif)-3);
push (#locations, $position);
}
You're looking for AAA in $foundMotif. But that's an empty string because you just declared it further up. Your gene string (disclaimer: I know nothing about bio informatics) is $gene. That's what you need to match.
Let's go through it step by step. I've simplified your code and put in an example string. I'm aware that isn't what genes look like, but that doesn't matter. This is already fixed.
use strict;
use warnings;
my $motif = "AAA";
my #locations = ();
# ... skip reading the file
my $gene = "ABAABAAABAAAAB\n";
while ($gene =~ m/$motif/g) { # 1, 2
my $position = (pos($gene) - length($motif)); # 3, 4
push(#locations, $position);
}
foreach $position (#locations) {
print "\n Found motif: ", $motif, "\n at position: ", $position;
}
If you run this, the code now produces meaningful output.
Found motif: AAA
at position: 5
Found motif: AAA
at position: 9
I've made four changes:
You need to search in $gene
Your variable $motif is meaningless if you don't use it to search. That way, your program becomes dynamic.
Again, you need to use the pos() in $gene
To make it dynamic, you shouldn't hard-code the length
You don't need the $foundMotif variable at all. The $position can actually be lexical to the block it's in. That means, it will be a different variable each time the loop is run, which is simply good practice. In Perl, you want to always use the smallest scope possible for variables, and declare them only when you need them, not in advance.
Since this is a learning exercise, it makes sense to iterate the array separately. In a real life program, you could eliminate the foreach loop and the array and output the positions directly if you were not to use them later on.

Adding regex to an array in Perl

I've just started using PERL for some scripting i'm doing, having never used it before, and I'm having some trouble getting some values into an array, and calculating the total.
I have a log file that i want to parse, and using a regex, pick up when certain values appear. i want these values added to an array, and then the total calculated at the end.
The file I'm trying to parse looks like
...completed_pop_count: 0
...uncompleted: 0
CALL NEXT
...completed_pop_count: 2
...uncompleted: 0
CALL NEXT
...completed_pop_count: 2
...uncompleted: 3
CALL NEXT
....and carries on
This is what i have so far:
open (my $file, 'test.log');
while (<$file>){
my #array = /.*completed_pop_count: (.*)$/;
print #array;
}
close($file);
The output to this is like
022.....
To me this looks like all the values are in a single element of the array. However I need them to be on separate so that I can calculate the total sum.
If you want to add elements to array, use push #arr, "element".
use List::Util qw(sum);
my #array;
while (<$file>){
push #array, $1 if /.*completed_pop_count: (.*)$/;
}
print "#array\n";
print sum(#array), "\n";

How to get a single column of emails from a html textarea into array

I was thinking I could do this on my own but I need some help.
I need to paste a list of email addresses from a local bands mail list into a textarea and process them my Perl script.
The emails are all in a single column; delimited by newlines:
email1#email.com
email2#email.com
email3#email.com
email4#email.com
email5#email.com
I would like to obviously get rid of any whitespace:
$emailgoodintheory =~ s/\s//ig;
and I am running them through basic validation:
if (Email::Valid->address($emailgoodintheory)) { #yada
I have tried all kinds of ways to get the list into an array.
my $toarray = CGI::param('toarray');
my #toarraya = split /\r?\n/, $toarray;
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Above is just to test to see if I was successful. I have no need to print them.
It just loops through, grabs the schedules .txt file and sends each member the band schedule. All that other stuff works but I cannot get the textarea into an array!
So, as you can see, I am pretty lost.
Thank you sir(s), may I have another quick lesson?
You seem a bit new to Perl, so I will give you a thorough explanation why your code is bad and how you can improve it:
1 Naming conventions:
I see that this seems to be symbolic code, but $emailgoodintheory is far less readable than $emailGoodInTheory or $email_good_in_theory. Pick any scheme and stick to it, just don't write all lowercase.
I suppose that $emailgoodintheory holds a single email address. Then applying the regex s/\s//g or the transliteration tr/\s// will be enough; space characters are not case sensitive.
Using a module to validate adresses is a very good idea. :-)
2 Perl Data Types
Perl has three man types of variables:
Scalars can hold strings, numbers or references. They are denoted by the $ sigil.
Arrays can hold an ordered sequence of Scalars. They are denoted by the # sigil.
Hashes can hold an unordered set of Scalars. Some people tend to know them as dicitonaries. All keys and all values must be Scalars. Hashes are denoted by the % sigil.
A word on context: When getting a value/element from a hash/array, you have to change the sigil to the data type you want. Usually, we only recover one value (which always is a scalar), so you write $array[$i] or $hash{$key}. This does not follow any references so
my $arrayref = [1, 2, 3];
my #array = ($arrayref);
print #array[0];
will not print 123, but ARRAY(0xABCDEF) and give you a warning.
3 Loops in Perl:
Your loop syntax is very weird! You can use C-style loops:
for (my $i = 0; $i < #array; $i++)
where #array gives the length of the array, because we have a scalar context. You could also give $i the range of all possible indices in your array:
for my $i (0 .. $#array)
where .. is the range operator (in list context) and $#array gives the highest available index of our array. We can also use a foreach-loop:
foreach my $element (#array)
Note that in Perl, the keywords for and foreach are interchangeable.
4 What your loop does:
foreach my $address(#toarraya) {
print qq~ $address[$arrcnt]<br /> ~:
$arrcnt++;
}
Here you put each element of #toarraya into the scalar $address. Then you try to use it as an array (wrong!) and get the index $arrcnt out of it. This does not work; I hope your program died.
You can use every loop type given above (you don't need to count manually), but the standard foreach loop will suit you best:
foreach my $address (#toarraya){
print "$address<br/>\n";
}
A note on quoting syntax: while qq~ quoted ~ is absolutely legal, this is the most obfuscated code I have seen today. The standard quote " would suffice, and when using qq, try to use some sort of parenthesis (({[<|) as delimiter.
5 complete code:
I assume you wanted to write this:
my #addressList = split /\r?\n/, CGI::param('toarray');
foreach my $address (#addressList) {
# eliminate white spaces
$address =~ s/\s//g;
# Test for validity
unless (Email::Valid->address($address)) {
# complain, die, you decide
# I recommend:
print "<strong>Invalid address »$address«</strong><br/>";
next;
}
print "$address<br/>\n";
# send that email
}
And never forget to use strict; use warnings; and possibly use utf8.

Reading and Writing text to a NEW file - Matlab

I have a file that contains a full set of values for some sentences which have transcribed for a speech recognition program. Ive been trying to write some matlab code to go through this file and extract the values for each sentence and write them to a new individual file. So instead of having them all in one 'mlf' file i want them in separate files for each sentence.
For example by 'mlf' file (contains all values for all sentences) looks like this:
#!MLF!#
"/N001.lab"
AH
SEE
I
GOT
THEM
MONTHS
AGO
.
"/N002.lab"
WELL
WORK
FOR
LIVE
WIRE
BUT
ERM
.
"/N003.lab"
IM
GOING
TO
SEE
JAMES
VINCENT
MCMORROW
.
etc
So each sentences is separated by the 'Nxxx.lab' and the '.'. I need to create a new file for every Nxxx.lab, for example the file for N001 would just contain:
AH
SEE
I
GOT
THEM
MONTHS
AGO
I've been trying to use fgetline to specify the 'Nxxx.lab' and '.' boundaries, but it doesn't work as i don't know how to write the content into a new file separate from the 'mlf'.
If anyone can give me any guidance of what sort of approach to use would be greatly appreciated!
Cheers!
Try this code (input file test.mlf has to be in the working directory):
%# read the file
filename = 'test.mlf';
fid = fopen(filename,'r');
lines = textscan(fid,'%s','Delimiter','\n','HeaderLines',1);
lines = lines{1};
fclose(fid);
%# find start and stop indices
istart = find(cellfun(#(x) strcmp(x(1),'"'), lines));
istop = find(strcmp(lines, '.'));
assert(numel(istop)==numel(istop) && all(istop>istart),'Check the input file format.')
%# write lines to new files
for k = 1:numel(istart)
filenew = lines{istart(k)}(2:end-1);
fout = fopen(filenew,'wt');
for l = (istart(k)+1):(istop(k)-1)
fprintf(fout,'%s\n',lines{l});
end
fclose(fout);
end
The code assume that the file names are in double-quotes as in your example. If not, you can find istart indices base on a pattern. Or just assuming that entries for new file start from the 2nd line and follows the dot: istart = [1; istop(1:end-1)+1];
You could use a growing cell array to gather the information.
Read one line at a time from the file.
Grab the file name and put it into the first column if its the first read for the sentence.
If the line read is a period, add it to the string and move the index to a row in the array. Write the new file with the content.
This bit of code should help you in building the cell array while appending a string within it. I assume reading line by line is not a problem. You can also retain the carriage returns/new lines within the string ('\n').
%% Declare A
A = {}
%% Fill row 1
A(1,1) = {'file1'}
A(1,2) = {'Sentence 1'}
A(1,2) = { strcat(A{1,2}, ', has been appended')}
%% Fill row 2
A(2,1) = {'file2'}
A(2,2) = {'Sentence 2'}
While I'm sure you can do this with MATLAB, I would suggest you use Perl to split the original file and then process the individual files using MATLAB.
The following Perl script reads the entire file ("xxx.txt") and writes out the individual files according the the "NAME.lab" lines:
open(my $fh, "<", "xxx.txt");
# read the entire file into $contents
# This may not be a good idea if the file is huge.
my $contents = do { local $/; <$fh> };
# iterate over the $contents string and extract the individual
# files
while($contents =~ /"(.*)"\n((.*\n)*?)\./mg) {
# We arrive here with $1 holding the filename
# and $2 the content up to the "." ending the section/sentence.
open(my $fout, ">", $1);
print $fout $2;
close($fout);
}
close($fh);
The multiline regular expression is a bit difficult but it does the job.
For these sort of text manipulation, perl is much faster and useful. A good tool to learn if you process a lot of text.

Resources