Perl: string to an array reference? - arrays

Say, there is a string "[1,2,3,4,5]", how could I change it to an array reference as [1,2,3,4,5]? Using split and recomposing the array is one way, but it looks like there should be a simpler way.

eval is the simplest way
$string = "[1,2,3,4,5]";
$ref = eval $string;
but this is insecure if you don't have control over the contents of $string.
Your input string is also valid JSON, though, so you could say
use JSON;
$ref = decode_json( $string );

You can use eval but this should definitely be avoided when the string in question comes from untrusted sources.
Otherwise you have to parse it yourself:
my #arr = split(/\s*,\s*/, substr($string, 1, -1));
my $ref = \#arr;

You really should avoid eval if you can. If the string comes from outside the program then untold damage can be done by simply applying eval to it.
If the contents of the array is just numbers then you can use a regex to extract the information you need.
Here's an example
use strict;
use warnings;
my $string = "[1,2,3,4,5]";
my $data = [ $string =~ /\d+/g ];
use Data::Dump;
dd $data;
output
[1 .. 5]

Related

Extract number from array in Perl

I have a array which have certain elements. Each element have two char "BC" followed by a number
e.g - "BC6"
I want to extract the number which is present and store in a different array.
use strict;
use warnings;
use Scalar::Util qw(looks_like_number);
my #band = ("BC1", "BC3");
foreach my $elem(#band)
{
my #chars = split("", $elem);
foreach my $ele (#chars) {
looks_like_number($ele) ? 'push #band_array, $ele' : '';
}
}
After execution #band_array should contain (1,3)
Can someone please tell what I'm doing wrong? I am new to perl and still learning
To do this with a regular expression, you need a very simple pattern. /BC(\d)/ should be enough. The BC is literal. The () are a capture group. They save the match inside into a variable. The first group relates to $1 in Perl. The \d is a character group for digits. That's 0-9 (and others, but that's not relevant here).
In your program, it would look like this.
use strict;
use warnings;
use Data::Dumper;
my #band = ('BC1', 'BC2');
my #numbers;
foreach my $elem (#band) {
if ($elem =~ m/BC(\d)/) {
push #numbers, $1;
}
}
print Dumper #numbers;
This program prints:
$VAR1 = '1';
$VAR2 = '2';
Note that your code had several syntax errors. The main one is that you were using #band = [ ... ], which gives you an array that contains one array reference. But your program assumed there were strings in that array.
Just incase your naming contains characters other than BC this will exctract all numeric values from your list.
use strict;
use warnings;
my #band = ("AB1", "BC2", "CD3");
foreach my $str(#band) {
$str =~ s/[^0-9]//g;
print $str;
}
First, your array is an anonymous array reference; use () for a regular array.
Then, i would use grep to filter out the values into a new array
use strict;
use warnings;
my #band = ("BC1", "BC3");
my #band_array = grep {s/BC(\d+)/$1/} #band;
$"=" , "; # make printing of array nicer
print "#band_array\n"; # print array
grep works by passing each element of an array in the code in { } , just like a sub routine. $_ for each value in the array is passed. If the code returns true then the value of $_ after the passing placed in the new array.
In this case the s/// regex returns true if a substitution is made e.g., the regex must match. Here is link for more info on grep

Perl split and throw away the first element in one line

I have some data that should be able to be easily split into a hash.
The following code is intended to split the string into its corresponding key/value pairs and store the output in a hash.
Code:
use Data::Dumper;
# create a test string
my $string = "thing1:data1thing2:data2thing3:data3";
# Doesn't split properly into a hash
my %hash = split m{(thing.):}, $string;
print Dumper(\%hash);
However upon inspecting the output it is clear that this code does not work as intended.
Output:
$VAR1 = {
'data3' => undef,
'' => 'thing1',
'data2' => 'thing3',
'data1' => 'thing2'
};
To further investigate the problem I split the output into an array instead and printed the results.
Code:
# There is an extra blank element at the start of the array
my #data = split m{(thing.):}, $string;
for my $line (#data) {
print "LINE: $line\n";
}
Output:
LINE:
LINE: thing1
LINE: data1
LINE: thing2
LINE: data2
LINE: thing3
LINE: data3
As you can see the problem is that split is returning an extra empty element at the start of the array.
Is there any way that I can throw away the first element from the split output and store it in a hash in one line?
I know I can store the output in an array and then just shift off the first value and store the array in a hash... but I'm just curious whether or not this can be done in one step.
my (undef, %hash) = split m{(thing.):}, $string; will throw away the first value.
I'd alternatively suggest - use regex not split:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $string = "thing1:data1thing2:data2thing3:data3";
my %results = $string =~ m/(thing\d+):([A-Z]+\d+)/ig;
print Dumper \%results;
Of course, this does make the assumption that you're matching 'word+digit' groups, as without that "numeric" separator it won't work as well.
I'm aiming to primarily illustrate the technique - grab 'paired' values out of a string, because then they assign straight to a hash.
You might have to be a bit more complicated with the regex, for example nongreedy quantifiers:
my %results = $string =~ m/(thing.):(\w+?)(?=thing|$)/ig;
This may devalue it in terms of clarity.

How do I find overlapping regex in a string?

I have this string:
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*"
I want to find every substring starting with M and ending with *, without * within them. this means that the above string would give me 4 elements in my final array.
#ORF= (MZEFSRGGRMEAZFE*,MEAZFE*, MQZEFFMAEZF*,MAEZF*)
A simple regex will not do since it does not find overlapping substrings. Is there a simple way to do this?
Regular expression matching consumes the pattern as it matches - that's by design.
You can use a lookahead expression to avoid this happening PerlMonks:
Using Look-ahead and Look-behind
So something like this will work:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my #matches = $line =~ m/(?=(M[^*]+))/g;
print Dumper \#matches;
Which gives you:
$VAR1 = [
'MZEFSRGGRMEAZFE',
'MEAZFE',
'MQZEFFMAEZF',
'MAEZF'
];
You can also use a recursive approach instead of an advanced-feature regex to do that. The program below takes each match and reparses the match, but omitting the starting M so it won't match the whole thing again.
use strict;
use warnings;
use Data::Printer;
my $line = "MZEFSRGGRMEAZFE*MQZEFFMAEZF*";
my #matches;
sub parse {
my ( $string ) = #_;
while ($string =~ m/(M[^*]+\*)/g ) {
push #matches, $1;
parse(substr $1, 1);
}
}
parse($line);
p #matches;
Here's the output:
[
[0] "MZEFSRGGRMEAZFE*",
[1] "MEAZFE*",
[2] "MQZEFFMAEZF*",
[3] "MAEZF*"
]

How to split the entire string into array in Perl

I'm trying to process an entire string but the way my code is written, part of it is not being processed. Here's a representation of my code:
#!/usr/bin/perl
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g; # remove white space from string
# split the string into fragments of 58 characters and store in array
my #array = $string =~ /[A-Z]{58}/g;
my $len = scalar #array;
print $len . "\n"; # this prints 3
# print the fragments
print $array[0] . "\n";
print $array[1] . "\n";
print $array[2] . "\n";
print $array[3] . "\n";
The code outputs the following:
3
MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD
PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF
VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL
<blank space>
Notice that the rest of the string EDAVTKPELRPCPTP is not stored in #array. When I'm creating my array, how do I store EDAVTKPELRPCPTP? Perhaps I could store it in $array[3]?
You've almost got it. You need to change your regex to allow for 1 to 58 characters.
my #array = $string =~ /[A-Z]{1,58}/g;
In addition, you have an error in your script using #prot_seq instead of #array. You should always use strict to protect yourself against this sort of thing. Here's the script with strict, warnings, and 5.10 features (to get say).
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
# Strip whitespace.
$string =~ s/\s+//g;
# Split the string into fragments of 58 characters or less
my #fragments = $string =~ /[A-Z]{1,58}/g;
say "Num fragments: ".scalar #fragments;
say join "\n", #fragments;
What you're missing is the ability to capture less than 58 characters. And since you only want to do that if it's the end, you can do this:
/[A-Z]{58}|[A-Z]{1,57}\z/
Which I would prefer to write like this:
/\p{Upper}{58}|\p{Upper}{1,57}\z/
However, since this expression is greedy by default, it will prefer to gather 58 characters, and only default to less when it runs out of matching input.
/\p{Upper}{1,58}/
Or, for reasons as Schwern mentions (such as avoiding any foreign letters)
/[A-Z]{1,58}/
You may prefer to use unpack, like this
$string =~ s/\s+//g;
my #fragments = unpack '(A58)*', $string;
Or if you would rather leave $string unchanged and have v5.14 or better of Perl, then you can write
my #fragments = unpack '(A58)*', $string =~ s/\s+//gr;
If you don't actually need regex character classes, this is how I'd do it:
use strict;
use warnings;
use Data::Dump;
my $string = "MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEAN
VVLTGTVEEILNVDPVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLI
CDNQVSTGDTRIFFVNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTH
LRDVVVGRHPLHLLEDAVTKPELRPCPTP";
$string =~ s/\s+//g;
my #chunks;
while (length($string)) {
push(#chunks, substr($string, 0, 58, ''));
}
dd($string, \#chunks);
Output:
(
"",
[
"MAGRSHPGPLRPLLPLLVVAACVLPGAGGTCPERALERREEEANVVLTGTVEEILNVD",
"PVQHTYSCKVRVWRYLKGKDLVARESLLDGGNKVVISGFGDPLICDNQVSTGDTRIFF",
"VNPAPPYLWPAHKNELMLNSSLMRITLRNLEEVEFCVEDKPGTHLRDVVVGRHPLHLL",
"EDAVTKPELRPCPTP",
],
)

In Perl, how can I replace sequences of duplicates with one element in an array?

I have a string that I read in like:
a+c+c+b+v+f+d+d+d+c
I need to write the program so it splits at the + then deletes the duplicates so the output is:
acbvfdc
I've tried tr///cs; but I guess I'm not using it right?
#!/usr/bin/env perl
use strict; use warnings;
my #strings = qw(
a+c+c+b+v+f+d+d+d+c
alpha+bravo+bravo+bravo+charlie+delta+delta+delta+echo+delta
foo+food
bark+ark
);
for my $s (#strings) {
# Thanks #ikegami
$s =~ s/ (?<![^+]) ([^+]+) \K (?: [+] \1 )+ (?![^+]) //gx;
print "$s\n";
}
Output:
a+c+b+v+f+d+c
alpha+bravo+charlie+delta+echo+delta
foo+food
bark+ark
Now, you can split the string and have no sequences of duplicates using split /[+]/, $s because the first argument of split is a pattern.
Note to any who reads: this does not address the OP's question directly, though in my defense the question was worded ambiguously. :-) Still, it answers an interpretation of the question that others might have, so I'll leave it as-is.
Does order matter? If not, you can always try something like this:
use strict;
use warnings;
my $string = 'a+c+c+b+v+f+d+d+d+c';
# Extract unique 'words'
my #words = keys %{{map {$_ => 1} split /\+/, $string}};
print "$_\n" for #words;
Better yet, use List::MoreUtils from CPAN (which does preserve the order):
use strict;
use warnings;
use List::MoreUtils 'uniq';
my $string = 'a+c+c+b+v+f+d+d+d+c';
# Extract unique 'words'
my #words = uniq split /\+/, $string;
print "$_\n" for #words;
my $s="a+c+c+b+v+f+d+d+d+c";
$s =~ tr/+//d;
$s =~ tr/\0-\xff/\0-\xff/s;
print "$s\n"; # => acbvfdc

Resources