how to split a string into arrays using perl? - arrays

There is a string as:
$string= 123456-9876;
Need to split it in array as follows:
$string = [12,34,56,98,76]
trying to split it as split('-',$string) is not serving the purpose.
How could i do that in perl?

Extract pairs of digits: (e.g. "1234-5678" ⇒ [12,34,56,78])
$string = [ $string =~ /\d\d/g ];
Extract pairs of digits, even if separated by non-digits: (e.g. "1234-567-8" ⇒ [12,34,56,78])
$string = [ $string =~ s/\D//rg =~ /../sg ];

Rather than splitting, you can capture all 2 digit numbers with this perl code,
$str = "123456-9876";
my #matches = $str =~ /\d{2}/g;
print "#matches\n";
Prints,
12 34 56 98 76
Another solution, that just groups two digits no matter whatever, wherever non-digits are present in the string, without mutating the original string
$string = "1dd23-dsd--456-9-876";
while($string =~ /(\d).*?(\d)/g) {
print "$1$2 ";
}
Prints,
12 34 56 98 76

Related

Perl output successive string from array

I have an array I can print out as "abcd" however I am trying to print it as "a>ab>abc>abcd". I can't figure out the nested loop I need within the foreach loop I have. What loop do I need within it to print it this way?
my $str = "a>b>c>d";
my #words = split />/, $str;
foreach my $i (0 .. $#words) {
print $words[$i], "\n";
}
Thank you.
You had the right idea, but instead of printing the word at position i, you want to print all the words between positions 0 and i (inclusive). Also, your input can contain multiple strings, so loop over them.
use warnings;
while (my $str = <>) { # read lines from stdin or named files
chomp($str); # remove any trailing line separator
my #words = split />/, $str; # break string into array of words
foreach my $i (0 .. $#words) {
print join '', #words[0 .. $i]; # build the term from the first n words
print '>' if $i < $#words; # print separator between terms (but not at end)
}
print "\n";
}
There are many other ways to write it, but hopefully this way helps you understand what's happening and why. Good luck!
one liner:
perl -e '#a=qw(a b c d); for(#a) {$s.=($h.=$_).">"} $s=substr($s,0,-1);print $s'
I would do it like this:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
$" = '';
my #new_words;
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
print join '>', #new_words;
A few things to explain.
Perl will expand array variables in a double-quoted string. So something like this:
#array = ('x', 'y', 'z');
print "#array";
will print x y z. Notice there are spaces between the elements. The string that is inserted between the elements is controlled by the $" variable. So by setting that variable to an empty string we can remove the spaces, so:
$" = '';
#array = ('x', 'y', 'z');
print "#array";
will print xyz.
The most complex line is:
push #new_words, "#words[0 .. $_]" for 0 .. $#words;
That's just a compact way to write:
for (0 .. $#words) {
my $new_word = "#words[0 .. $_]";
push #new_words, $new_word;
}
We iterate across the integers from zero to the last index in #words. Each time around the loop, we use an array slice to get a list of elements from the array, convert that to a string (by putting it in double-quotes) and then push that string onto #new_words.
This is what I ended up with, It's the only way I could understand and get the output I was looking for.
use strict;
use warnings;
my $str = "a>b>c>d>e>f>g";
my #words = split />/, $str;
my $j = $#words;
my $i = 0;
my #newtax;
while($i <= $#words){
foreach my $i (0 .. $#words - $j){
push (#new, $words[$i]);
}
if($i < $#words){
push(#new, ">");
}
$j--;
$i++;
}
print #new;
This output "a>ab>abc>abcd>abcde>abcdef>abcdefg"

loop through elements of array to find character perl

I have a perl array where I only want to loop through elements 2-8.
The elements are only meant to contain numbers, so if any of those elements contain a letter, I want to set an error flag = 1, as well as some other variables as seen.
The reason I have 2 error flag variables is due to scope rules within the loop.
fields is an array, I created by splitting another irrelevant array by the " " key.
So, when I try to print error_line2, error_fname2 from outside the loop, I get this:
Use of uninitialized value $error_flag2 in numeric eq (==)
I don't know why, because I've initialized the value within the loop and created the variable outside the loop.
Not sure if I'm even looping to find characters correctly, so then it's not setting the error_flag2 = 1.
Example line:
bob hankerman 2039 3232 23 232 645 64x3 324
since element 7 has the letter 'x' , I want the flag to be set to 1.
#!/usr/bin/perl
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my $players_file = $ARGV[0];
my #players_array;
open (my $file, "<", "$players_file")
or die "Failed to open file: $!\n";
while(<$file>) {
chomp;
push #players_array, $_;
}
close $file;
#print join "\n", #players_array;
my $num_of_players = #players_array;
my $error_flag;
my $error_line;
my $error_fname;
my $error_lname;
my $error_flag2=1;
my $error_line2;
my $error_fname2;
my $error_lname2;
my $i;
foreach my $player(#players_array){
my #fields = split " ", $player;
my $size2 = #fields;
for($i=2; $i<9; $i++){
print "$fields[$i] \n";
if (grep $_ =~ /^[a-zA-Z]+$/){
my $errorflag2 = 1;
$error_flag2 = $errorflag2;
my $errorline2 = $player +1;
$error_line2 = $errorline2;
my $errorfname2 = $fields[0];
$error_fname2 = $errorfname2;
}
}
if ($size2 == "9" ) {
my $firstname = $fields[0];
my $lastname = $fields[1];
my $batting_average = ($fields[4]+$fields[5]+$fields[6]+$fields[7]) / $fields[3];
my $slugging = ($fields[4]+($fields[5]*2)+($fields[6]*3)+($fields[7]*4)) / $fields[3];
my $on_base_percent = ($fields[4]+$fields[5]+$fields[6]+$fields[7] +$fields[8]) / $fields[2];
print "$firstname ";
print "$lastname ";
print "$batting_average ";
print "$slugging ";
print "$on_base_percent\n ";
}
else {
my $errorflag = 1;
$error_flag = $errorflag;
my $errorline = $player +1;
$error_line = $errorline;
my $errorfname = $fields[0];
$error_fname = $errorfname;
my $errorlname = $fields[1];
$error_lname = $errorlname;
}
}
if ($error_flag == "1"){
print "\n Line $error_line : ";
print "$error_fname, ";
print "$error_lname :";
print "Line contains not enough data.\n";
}
if ($error_flag2 == "1"){
print "\n Line $error_line2 : ";
print "$error_fname2, ";
print "Line contains bad data.\n";
}
OK, so the problem you've got here is that you're thinking of grep in Unix terms - a text based thing. It doesn't work like that in perl - it operates on a list.
Fortunately, this is pretty easy to handle in your case, because you can split your line into words.
Without your source data, this is hopefully a proof of concept:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
while ( <DATA> ) {
#split the current line on whitespace into an array.
#first two elements get assigned to firstname lastname, and then the rest
#goes into #values
my ( $firstname, $lastname, #values ) = split; #works on $_ implicitly.
#check every element in #values, and test the regex 'non-digit' against it.
my #errors = grep { /\D/ } #values;
#output any matches e.g. things that contained 'non-digits' anywhere.
print Dumper \#errors;
#an array in a scalar context evaluates as the number of elements.
#we need to use "scalar" here because print accepts list arguments.
print "There were ", scalar #errors, " errors\n";
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
Or reducing down the logic:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
#note - we don't need to explicity specify 'scalar' here,
#because assigning it to a scalar does that automatically.
#(split) splits the current line, and [2..8] skips the first two.
my $count_of_errors = grep { /\D/ } (split)[2..8];
print $count_of_errors;
}
__DATA__
bob hankerman 2039 3232 23 232 645 64x3 324
First : You don't need to use "GREP", Simply you can match the string with "=~" in perl and you can print matched value with $&.
Second : You should use $_ if and only if there is not other variable used in the loop. There is already $i used in the loop, you can write the loop as :
for my $i (2..9) {
print "$i\n";
}
or
foreach(2..9) {
print "$_\n";
}

Multiple standard input in array-like format

I need to get my code to produce the following output, but I can't get it to work.
Example program output
Please enter your 3 numbers: 12 45 78
Your numbers forward:
12
45
78
Your numbers reversed:
78
45
12
My Perl code
#!/usr/bin/perl
#use strict;
use warnings;
use 5.010;
print "Please enter your 3 numbers: \n";
my $n1 = <STDIN>;
my $n2 = <STDIN>;
my $n3 = <STDIN>;
print "Your numbers forward: \n";
print $n1;
print $n2;
print $n3;
#Possible Backup Idea
# my #names = (n1, n2, n3);
# foreach my $n (#names) {
# say $n;
# }
#2nd Possible Backup Idea
# print "$coins[0]\n"; #Prints the first element
# print "$coins[-1]\n"; #Prints the last element
# print "$coins[-2]"; #Prints 2nd to last element
print "Your numbers reversed: \n";
print $n3;
print $n2;
print $n1;
But when it runs it doesn't take the input all in one line like I
need, and it must be input three times to work.
Please enter your 3 numbers:
12
23
34
Your numbers forward:
12
23
34
Your numbers reversed:
34
23
12
You can use stdin to take the input but you'll want to split according to the space character as delimiter.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw(say);
say "Pick 3 numbers";
my $input = <STDIN>;
my #numbers = split(/\s+/, $input);
my #reverse_numbers = reverse(#numbers);
say "Your numbers forward:";
say join("\n", #numbers);
say "Your numbers backwards:";
say join("\n", #reverse_numbers);
my $n1 = <STDIN>; reads one full line at a time. So doing it three times will read three lines.
Instead, you want to read one line and split it up on whitespace into an array of numbers.
my $input = <STDIN>;
my #numbers = split /\s+/, $input;
/\s+/ is a regular expression to match any number of whitespace characters. See the Perl Regex Tutorial for more.
Then you can work with the list of numbers using for loops.
print "Your numbers forward:\n";
for my $number (#numbers) {
print "$number\n";
}
print "Your numbers reversed: \n";
for my $number (reverse #numbers) {
print "$number\n";
}

How to split the entire string into array in Perl

I'm trying to process an entire string but the way my code is written, part of it is not being processed. Here's a representation of my code:
#!/usr/bin/perl
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g; # remove white space from string
# split the string into fragments of 58 characters and store in array
my #array = $string =~ /[A-Z]{58}/g;
my $len = scalar #array;
print $len . "\n"; # this prints 3
# print the fragments
print $array[0] . "\n";
print $array[1] . "\n";
print $array[2] . "\n";
print $array[3] . "\n";
The code outputs the following:
3
MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD
PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF
VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL
<blank space>
Notice that the rest of the string EDAVTKPELRPCPTP is not stored in #array. When I'm creating my array, how do I store EDAVTKPELRPCPTP? Perhaps I could store it in $array[3]?
You've almost got it. You need to change your regex to allow for 1 to 58 characters.
my #array = $string =~ /[A-Z]{1,58}/g;
In addition, you have an error in your script using #prot_seq instead of #array. You should always use strict to protect yourself against this sort of thing. Here's the script with strict, warnings, and 5.10 features (to get say).
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
# Strip whitespace.
$string =~ s/\s+//g;
# Split the string into fragments of 58 characters or less
my #fragments = $string =~ /[A-Z]{1,58}/g;
say "Num fragments: ".scalar #fragments;
say join "\n", #fragments;
What you're missing is the ability to capture less than 58 characters. And since you only want to do that if it's the end, you can do this:
/[A-Z]{58}|[A-Z]{1,57}\z/
Which I would prefer to write like this:
/\p{Upper}{58}|\p{Upper}{1,57}\z/
However, since this expression is greedy by default, it will prefer to gather 58 characters, and only default to less when it runs out of matching input.
/\p{Upper}{1,58}/
Or, for reasons as Schwern mentions (such as avoiding any foreign letters)
/[A-Z]{1,58}/
You may prefer to use unpack, like this
$string =~ s/\s+//g;
my #fragments = unpack '(A58)*', $string;
Or if you would rather leave $string unchanged and have v5.14 or better of Perl, then you can write
my #fragments = unpack '(A58)*', $string =~ s/\s+//gr;
If you don't actually need regex character classes, this is how I'd do it:
use strict;
use warnings;
use Data::Dump;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g;
my #chunks;
while (length($string)) {
push(#chunks, substr($string, 0, 58, ''));
}
dd($string, \#chunks);
Output:
(
"",
[
"MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD",
"PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF",
"VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL",
"EDAVTKPELRPCPTP",
],
)

perl to add the results of 2 arrays from 2 differnt files together

I run a report between 2 csv files, the last bit i wish to do check is to add matching elemants of the 2 arrays (built up of unique values and occurances) together. but i can't work out how to do a for each matching name in each both arrays add together, to get the output as below.
INPUT:
jon 22
james 12
ken 22
jack 33
jim 11
harry 7
dave 9
grant 12
matt 74
malc 12
INPUT1:
jon 2
james 1
ken 8
jack 5
jim 1
harry 51
dave 22
Desired Output:
jon 24
james 13
ken 30
jack 38
jim 12
harry 58
dave 31
grant 12
matt 74
malc 12
code i have so to create oput from INPUT and INPUT1
my %seen;
seek INPUT, 0, 0;
while (<INPUT>)
{
chomp;
my $line = $_;
my #elements = split (",", $line);
my $col_name = $elements[1];
#print " $col_name \n" if !
$seen{$col_name}++;
}
while ( my ( $col_name, $times_seen ) = each %seen ) {
my $loc_total = $times_seen * $dd;
print "\n";
print " $col_name \t\t : = $loc_total";
printf OUTPUT "%-34s = %15s\n", $col_name , " $loc_total ";
}
############## ###################
my %seen2;
seek INPUT1, 0, 0;
while (<INPUT1>)
{
chomp;
my $line = $_;
my #elements1 = split (",", $line);
my $col_name = $elements1[1];
my $col_type = $elements1[5];
$seen2{$col_name}++ if $col_type eq "YES";
}
while ( my ( $col_name, $times_seen2 ) = each %seen2 ) {
my $loc_total = $times_seen2 ;
print "\n $col_name \t\t= $loc_total";
printf OUTPUT "%-34s = %15s\n", $col_name , $times_seen2 ;
}
close INPUT;
Instead of using %seen, store the running total in the hash directly:
#!/usr/bin/perl
use warnings;
use strict;
my %count;
for my $file ('INPUT', 'INPUT1') {
open my $IN, '<', $file or die "$file: $!";
while (<$IN>) {
my ($name, $num) = split;
$count{$name} += $num;
}
}
for my $name (sort { $count{$b} <=> $count{$a} } keys %count) {
print "$name\t$count{$name}\n";
}
First, I'll assume that the input files are actual CSV files -- whereas your examples are just whitespace delimited. In other words:
jon,22
james,12
ken,22
jack,33
jim,11
harry,7
dave,9
grant,12
matt,74
malc,12
and
jon,2
james,1
ken,8
jack,5
jim,1
harry,51
dave,22
ASSUMING I'm correct, then your while loops will do the trick, with a couple of tweaks:
The first element of your #elements arrays have index 0, not 1. So the "key" here is at $elements[0], and the "value" is at $elements[1]. So you'd have something like:
my $col_name = $elements[0];
my $col_value = $elements[1];
Instead of incrementing %seen, it seems more useful to add the value, like so:
$seen{ $col_name } += $col_value;
In your while loop which iterates over INPUT1, do the same thing done in the first loop to extract data; also, don't use %seen2; instead, simply add to %seen as above:
my $col_name = $elements1[0];
my $col_value = $elements1[1];
$seen{$col_name} += $col_value;
Your totals will then be stored in %seen, so your final while loop is slightly modified:
while ( my ( $col_name, $times_seen2 ) = each %seen ) { # instead of %seen2
If your two processing loops are identical (and I see it's possible that they're not), then I'd suggest factoring them into a common subroutine. But that's a different matter.
The following could easily be adapted to just take file names from the command line instead.
Maintains the order of the keys in your file:
use strict;
use warnings;
use autodie;
my #names;
my %total;
local #ARGV = qw(INPUT INPUT1);
while (<>) {
my ($name, $val) = split;
push #names, $name if ! exists $total{$name};
$total{$name} += $val;
}
for (#names) {
print "$_ $total{$_}\n";
}

Resources