Perl split array - arrays

I'm new in Perl, I want to write a simple program which reads an input-file and count the letters of this file, this is my code:
#!/usr/bin/perl
$textfile = "example.txt";
open(FILE, "< $textfile");
#array = split(//,<FILE>);
$counter = 0;
foreach(#array){
$counter = $counter + 1;
}
print "Letters: $counter";
this code shows me the number of letters, but only for the first paragraph of my Input-File, it doesn't work for more than one paragraph, can anyone help me, i don't know the problem =(
thank you

You only ever read one line.
You count bytes (for which you could use -s), not letters.
Fix:
my $count = 0;
while (<>) {
$count += () = /\pL/g;
}

You code is a rather over-complicated way of doing this:
#!/usr/bin/perl
# Always use these
use strict;
use warnings;
# Define variables with my
my $textfile = "example.txt";
# Lexical filehandle, three-argument open
# Check return from open, give sensible error
open(my $file, '<', $textfile) or die "Can't open $textfile: $!"
# No need for an array.
my $counter = length <$file>;
print "Letters: $counter";
But, as others have pointed out, you're counting bytes not characters. If your file is in ASCII or an 8-bit encoding, then you should be fine. Otherwise you should look at perluniintro.

Here's an alternative aproach using a module to do the work ..
# the following two lines enforce 'clean' code
use strict;
use warnings;
# load some help (read_file)
use File::Slurp;
# load the file into the variable $text
my $text = read_file('example.txt');
#get rid of multiple whitespace and linefeed chars # ****
# and replace them with a single space # ****
$text =~ s/\s+/ /; # ****
# length gives you the length of the 'string' / scalar variable
print length($text);
you might want to comment out the lines marked '****'
and play with the code...

Related

How to read a .txt file and store it into an array

I know this is a fairly simple question, but I cannot figure out how to store all of the values in my array the way I want to.
Here is a small portion what the .txt file looks like:
0 A R N D
A 2 -2 0 0
R -2 6 0 -1
N 0 0 2 2
D 0 -1 2 4
Each value is delimited by either two spaces - if the next value is positive - or a space and a '-' - if the next value is negative
Here is the code:
use strict;
use warnings;
open my $infile, '<', 'PAM250.txt' or die $!;
my $line;
my #array;
while($line = <$infile>)
{
$line =~ /^$/ and die "Blank line detected at $.\n";
$line =~ /^#/ and next; #skips the commented lines at the beginning
#array = $line;
print "#array"; #Prints the array after each line is read
};
print "\n\n#array"; #only prints the last line of the array ?
I understand that #array only holds the last line that was passed to it. Is there a way where I can get #array to hold all of the lines?
You are looking for push.
push #array, $line;
You undoubtedly want to precede this with chomp to snip any newlines, first.
If file is small as compared to available memory of your machine then you can simply use below method to read content of file in to an array
open my $infile, '<', 'PAM250.txt' or die $!;
my #array = <$infile>;
close $infile;
If you are going to read a very large file then it is better to read it line by line as you are doing but use PUSH to add each line at end of array.
push(#array,$line);
I will suggest you also read about some more array manipulating functions in perl
You're unclear to what you want to achieve.
Is every line an element of your array?
Is every line an array in your array and your "words" are the elements of this array?
Anyhow.
Here is how you can achieve both:
use strict;
use warnings;
use Data::Dumper;
# Read all lines into your array, after removing the \n
my #array= map { chomp; $_ } <>;
# show it
print Dumper \#array;
# Make each line an array so that you have an array of arrays
$_= [ split ] foreach #array;
# show it
print Dumper \#array;
try this...
sub room
{
my $result = "";
open(FILE, <$_[0]);
while (<FILE>) { $return .= $_; }
close(FILE);
return $result;
}
so you have a basic functionality without great words. the suggest before contains the risk to fail on large files. fastest safe way is that. call it as you like...
my #array = &room('/etc/passwd');
print room('/etc/passwd');
you can shorten, rename as your convinience believes.
to the kidding ducks nearby: by this way the the push was replaced by simplictiy. a text-file contains linebreaks. the traditional push removes the linebreak and pushing up just the line. the construction of an array is a simple string with linebreaks. now contain the steplength...

Assigning range to an array in Perl

I have some mini problem. How can I assign a range into an array, like this one:
input file: clktest.spf
*
.GLOBAL vcc! vss!
*
.SUBCKT eclk_l_25h brg2eclk<1> brg2eclk<0> brg_cs_sel brg_out brg_stop cdivx<1>
+ eclkout1<24> eclkout1<23> eclkout1<22> eclkout1<21> eclkout1<20> eclkout1<19>
+ mc1_brg_dyn mc1_brg_outen mc1_brg_stop mc1_div2<1> mc1_div2<0> mc1_div3p5<1>
+ mc1_div3p5<0> mc1_div_mux<3> mc1_div_mux<2> mc1_div_mux<1> mc1_div_mux<0>
+ mc1_gsrn_dis<0> pclkt6_0 pclkt6_1 pclkt7_0 pclkt7_1 slip<1> slip<0>
+ ulc_pclkgpll0<1> ulc_pclkgpll0<0> ulq_eclkcib<1> ulq_eclkcib<0>
*
*Net Section
*
*|GROUND_NET 0
*
*|NET eclkout3<48> 2.79056e-16
*|P (eclkout3<48> X 0 54.8100 -985.6950)
*|I (RXR0<16>#NEG RXR0<16> NEG X 0 54.2255 -985.6950)
C1 RXR0<16>#NEG 0 5.03477e-17
C2 eclkout3<48> 0 2.28708e-16
Rk_6_1 eclkout3<48> RXR0<16>#NEG 0.110947
output (this should be the saved value in the array)
.SUBCKT eclk_l_25h brg2eclk<1> brg2eclk<0> brg_cs_sel brg_out brg_stop cdivx<1>
+ eclkout1<24> eclkout1<23> eclkout1<22> eclkout1<21> eclkout1<20> eclkout1<19>
+ mc1_brg_dyn mc1_brg_outen mc1_brg_stop mc1_div2<1> mc1_div2<0> mc1_div3p5<1>
+ mc1_div3p5<0> mc1_div_mux<3> mc1_div_mux<2> mc1_div_mux<1> mc1_div_mux<0>
+ mc1_gsrn_dis<0> pclkt6_0 pclkt6_1 pclkt7_0 pclkt7_1 slip<1> slip<0>
+ ulc_pclkgpll0<1> ulc_pclkgpll0<0> ulq_eclkcib<1> ulq_eclkcib<0>
*
*Net Section
my simple code:
#!/usr/bin/perl
use strict;
use warnings;
my $input = "clktest.spf";
open INFILE, $input or die "Can't open $input" ;
my #allports;
while (<INFILE>){
#allports = /\.SUBCKT/ ... /\*Net Section/ ;
print #allports;
}
I am doing a correct job of assigning the selected range into an array? If not how can I modify this code?
Thanks for advance.
The while loop only gives you one line at a time so you can't assign all of the lines you want at once. Use push instead to grow the array line by line.
Also, you should be using lexical file handles like $in_fh (rather than global ones like INFILE) with the three-parameter form of open, and you should include the $! variable in the die string so that you know why the open failed.
This is how your program should look
#!/usr/bin/perl
use strict;
use warnings;
my $input = 'clktest.spf';
open my $in_fh, '<', $input or die "Can't open $input: $!" ;
my #allports;
while ( <$in_fh> ) {
push #allports, $_ if /\.SUBCKT/ ... /\*Net Section/;
}
print #allports;
Note that, if all you want to do is to print the selected lines from the file, you can forget about the array and replace push #allports, $_ with print
The <INFILE> inside a while will read the file line-by-line, so not a right place to apply a regex which need to cover more than one line. In order to get a substring, the simplest way is to first join all these lines. And only after that you apply your regex.
my $contents = "";
while ( <INFILE> ) {
$contents = $contents . $_;
}
$contents =~ s/.*(\.SUBCKT.*\*Net Section).*/$1/s; # remove unneeded part
Please note that there is /s modifier in the last part of substitution line. This is required because $contents contains newlines.
To get the substring into array, just use split my #allports = split("\n", $contents);

Perl text file grep

I would like to create an array in Perl of strings that I need to search/grep from a tab-deliminated text file. For example, I create the array:
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my #searchArray = ('10060\t', '10841\t', '11164\t');
I want to have a foreach loop to grep a text file with a format like this:
c18 10706 463029 K
c2 10841 91075 G
c36 11164 . B
c19 11257 41553 C
for each of the elements of the above array. In the end, I want to have a NEW text file that would look like this (continuing this example):
c2 10841 91075 G
c36 11164 . B
How do I go about doing this? Also, this needs to be able to work on a text file with ~5 million lines, so memory cannot be wasted (I do have 32GB of memory though).
Thanks for any help/advice in advanced! Cheers.
Using a perl one-liner. Just translate your list of numbers into a regex.
perl -ne 'print if /\b(?:10060|10841|11164)\b/' file.txt > newfile.txt
You can search for alternatives by using a regexp like /(10060\t|100841\t|11164\t)/. Since your array could be large, you could create this regexp, by something like
$searchRegex = '(' + join('|',#searchArray) + ')';
this is just a simple string, and so it would be better (faster) to compile it to a regexp by
$searchRegex = qr/$searchRegex/;
With only 5 million lines, you could actually pull the entire file into memory (less than a gigabyte if 100 chars/line), but otherwise, line by line you could search with this pattern as in
while (<>) {
print if $_ =~ $searchRegex
}
So I'm not the best coder but this should work.
#!/usr/bin/perl -w
use strict;
use warnings;
# array of search terms
my $searchfile = 'file.txt';
my $outfile = 'outfile.txt';
my #searchArray = ('10060', '10841', '11164');
my #findArray;
open(READ,'<',$searchfile) || die $!;
while (<READ>)
{
foreach my $searchArray (#searchArray) {
if (/$searchArray/) {
chomp ($_);
push (#findArray, $_) ;
}
}
}
close(READ);
### For Console Print
#foreach (#findArray){
# print $_."\n";
#}
open(WRITE,'>',$outfile) || die $!;
foreach (#findArray){
print WRITE $_."\n";
}
close(WRITE);

Popping keys of an array to calculate a total

I'm trying to simply pop off each numeric value and add them together to gain a total.
Input file:
Samsung 46
RIM 16
Apple 87
Microsoft 30
My code compiles, however, it only returns 0:
open (UNITS, 'units.txt') || die "Can't open it $!";
my #lines = <UNITS>;
my $total = 0;
while (<UNITS>) {
chomp;
my $line = pop #lines;
$line += $total;
}
print $total;
No need to slurp all lines into an array if you're just going to loop through them anyway with a while. Also, you need to split each line to get your numbers.
use warnings;
use strict;
open (UNITS, 'units.txt') || die "Can't open it $!";
my $total = 0;
while (<UNITS>) {
chomp;
my $num = (split)[1];
$total += $num;
}
print "$total\n";
__END__
179
There are three problems here
You are trying to add strings like 'Samsung 46' + 'RIM 16'
You read the entire file into #lines and then try to read more from the file in the while loop. That loop is never entered because you have already read to end of file
You are adding $total to the (undeclared) variable $line within the loop, instead of the other way around. So $total remains at zero and $line keeps having zero added to it
It is best to use while to read files unless you need something other than sequential access to the records, so removing #lines is a start.
It isn't completely clear which part of the records you want to accumulate. This program splits the lines on whitespace and adds together the last field of each line.
You must always use strict and use warnings at the start of every program. It is a measure that will make it far easier to locate bugs in your code. It is also best to use lexical file handles rather than the global one you used, and the three-parameter form of open.
use strict;
use warnings;
open my $units, '<', 'units.txt' or die "Can't open it: $!";
my $total;
while (<$units>) {
my #fields = split;
$total += $fields[-1];
}
print $total;
output
179
use strict;
use warnings;
open my $fh, "<", "units.txt" or die "well...";
my $total = 0;
while(<$fh>){
chomp;
my ($string, $num) = split(" ", $_);
$total += $num;
}
print $total;
This problem is a doddle with a one-liner:
$ perl -ane '$sum += $F[1] }{ print $sum' units.txt
Explanation
-a enables autosplit, each line is split and stored in #F
-n loops over the file line by line
-e tells perl that the next argument is to be treated as Perl code
the LHS of the Eskimo-kiss (that funny-looking }{ in the middle) is performed for every line in the input file, RHS performed only once
LHS accumulates the second column of every line in $sum
RHS prints the result of $sum once all lines have been processed

perl - cutting many strings with given array of numbers

dear my fellow perl masters in the world~!
I need your help.
I have a string file A and a number file B like this:
File A:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
...and so on till 200.
File B:
3, 6, 2, 5, 6, 1, ... 2
(total 200 numbers in an array)
then, with the numbers in file B, I would like to cut each string from the start position to the number of characters in File B.
E.g. as File B starts with 3, 6, 2 ...
File A will be
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
like this.
So. this is my code so far...
use strict;
if (#ARGV != 2) {
print "Invalid usage\n";
print "Usahe: perl program.pl [num_list] [string_file]\n";
exit(0);
}
my $numbers=$ARGV[0];
my $strings=$ARGV[1];
my $i;
open(LIST,$number);
open(DATA,$strings);
my #list = <LIST>;
my $list_size = scalar #sp_list;
for ($i=0;$i<=$list_size;$i++) {
print $i,"\n";
#while (my $line = <DATA>) {
}
close(LIST);
close(DATA);
As the strings and numbers are 200 I changed the array into a scalar value to work on every numbers of every strings.
I'm working on this. and I know I suppose to use a pos function but i do not know how to match each number with each string. is reading the string first by while? or using for to know how many time that I have to run this to achieve the result?
Your help will be much appreciated!
Thank you.
I will be working on it, too. Need your feedback.
It is good that you use strict, and you should also use warnings. Further things to note:
You should check the return value of open to make sure they did not fail. You should also use the three argument form of open and use a lexical file handle. Especially when handling command line arguments, which does pose a security risk.
open my $listfh, "<", $file or die $!;
You may wish to use a safety precaution
use ARGV::readonly;
You can easily make the list of numbers with a map statement. Assuming the numbers are in a comma separated list:
my #list = map split(/\s*,\s*/), <$listfh>;
This will split the input line(s) on comma and strip excess whitespace.
When reading your input file, you do not need to use a counter variable. You can simply do
open my $inputfh, "<", $file or die $!;
while (<$inputfh>) {
my $length = shift #list; # these are your numbers
chomp; # remove newline
my $string = substr($_, 0, -$length); # negative length on substr
print "$string\n";
}
The negative length on substr makes it leave that many characters off the end of the string.
Here is a one-liner in action that demonstrates these principles:
perl -lwe '$f = pop; # save file name for later
#nums = map split(/\s*,\s*/), <>; # process first file
push #ARGV, $f; # put back file name
while (<>) {
my $len = shift #nums;
chomp;
print substr($_,0,-$len);
}' fileb.txt filea.txt
Output:
AAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEEEEEEEEEEEE
Note the use of implicit open of file name arguments by manipulating #ARGV. Also handling newlines with -l switch.
Here is my suggestion. It does use autodie so that there is no need to explicitly check the status of open calls, and temporarily undefines $/ - the input record separator - so that all of the num_list file is read in one go. You aren't clear whether this file will always contain just single line, in which case you can omit local $/.
The numbers are extracted from the text using a regular expression /\d+/g returns all the strings of digits in the input as a list.
The second parameter to substr is the start position of the substring you want, and using a negative number counts from the end of the string instead of the beginning. The third parameter is the number of characters in the substring, and the fourth is a string to replace that substring in the target variable. So substr $data, -$n, $n, '' replaces the substring of length $n starting $n characters from the end with an empty string - i.e. it deletes it.
Note that if it is your intention to remove the given number of characters from the beginning of the string, then you would write substr $data, 0, $n, '' instead.
use strict;
use warnings;
use autodie;
unless (#ARGV == 2) {
print "Usage: perl program.pl [num_list] [string_file]\n";
exit;
}
my #numbers;
{
open my $listfh, '<', $ARGV[0];
local $/;
my $numbers = <$listfh>;
#numbers = $numbers =~ /\d+/g;
};
open my $datafh, '<', $ARGV[1];
for my $i (0 .. $#numbers) {
print "$i\n";
my $n = $numbers[$i];
my $data = <$datafh>;
chomp $data;
substr $data, -$n, $n, '';
print "$data\n";
}
Here is how I would do it. substr is the function to remove a part of a string. From your example, it is not clear whether you want to remove the characters at the beginning or at the end. Both alternatives are shown here:
#!/usr/bin/perl
use warnings;
use strict;
if (#ARGV != 2) {
die "Invalid usage\n"
. "Usage: perl program.pl [num_list] [string_file]\n";
}
my ($number_f, $string_f) = #ARGV;
open my $LIST, '<', $number_f or die "Cannot open $number_f: $!";
my #numbers = split /, */, <$LIST>;
close $LIST;
open my $DATA, '<', $string_f or die "Cannot open $string_f: $!";
while (my $string = <$DATA>) {
substr $string, 0, shift #numbers, q(); # Replace the first n characters with an empty string.
# To remove the trailing portion, replace the previous line with the following:
# my $n = shift #numbers;
# substr $string, -$n-1, $n, q();
print $string;
}
You were not checking the return value of open. Try to remember to always do that.
Do not declare variables far before you are going to use them ($i here).
Do not use C-style for loops if you do not have to. They are prone to fence post errors.
You can use substr():
use strict;
use warnings;
if (#ARGV != 2) {
print "Invalid usage\n";
print "Usage: perl program.pl [num_list] [string_file]\n";
exit(0);
}
my $numbers=$ARGV[0];
my $strings=$ARGV[1];
open my $list, '<', $numbers or die "Can't open $numbers: $!";
open my $data, '<', $strings or die "Can't open $strings: $!";
chomp(my $numlist = <$list>);
my #numbers = split /\s*,\s*/,$numlist;
for my $chop_length (#numbers)
{
my $data = <$data> // die "not enough data in $strings";
print substr($data,0,length($data)-$chop_length)."\n";
}
Your specs say you want "... to cut each string from the start position to the number of characters in File B." I agree with choroba that it's not perfectly clear whether characters from the start or the end of the string are to be cut. However, I tend to think that you want to remove characters from the beginning when you say, "... from the start position ...", but a string like ABCDEFGHIJKLMNOPQRSTUVWXYZ012345 would help clarify this issue.
This option is not as well self-documenting as the other solutions, but a discussion of it will follow:
use strict;
use warnings;
#ARGV == 2 or die "Usage: perl program.pl [num_list] [string_file]\n";
open my $fh, '<', pop or die "Cannot open string file: $!";
chomp( my #str = <$fh> );
local $/ = ', ';
while (<>) {
chomp;
print +( substr $str[ $. - 1 ], $_ ) . "\n";
}
Strings:
ABCDEFGHIJKLMNOPQRSTUVWXYZ012345
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
Numbers:
3, 6, 2, 5, 6
Output:
DEFGHIJKLMNOPQRSTUVWXYZ012345
BBBBBBBBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEEEEEEEEEEEE
The strings' file name is poped off #ARGV (since an explicit argument for pop is not used) and passed to open to read the strings into #str. The record separator is set to ', ' so chomp leaves only the number. The current line number in $. is used as part of the index to the corresponding #str element, and the remaining characters in the string from n on are printed.

Resources