Can a perl regex use parens but skip backreferencing on them? - arrays

I have this regex:
#disks = $sysconfig =~ /(\d+)\.\d+:(\s+[\w.\/]+){5}\s+\((\w+)\)/ig
If there were only one line that matched, I'd get something like
1835 x #array
1836 35
1837 ' 520B/sect'
1838 'KXG813JF'
It matches:
the first digits in the string
the fifth copy of the "spaces then alphanumeric-periods-and-slashes" and
the alphanumeric string at the end
I don't want to backreference #2 above and clutter my array with it, but I also don't want to write out that repeating pattern when what I've got is a more concise regex (to look at).
Is there a way to say "don't backreference this piece" or should I just deal with it when I parse the array out into something more usable by my program?

Yes, there is. Use the non-capturing group construct: (?: ... ).
You even should use this by default, unless you need backreferences or capturing.

Related

Why does print give different output with and without newline?

I am getting two different results with and without new line character in the print statement of regular expression match. why?
$string16="abfoo bcfooo defooo ghfooo ijfoo klfooo mnfooo";
#foo=$string16=~ m/foo/g;
print(#foo);
print("\n");
$string17="abfoo bcfooo defooo ghfooo ijfoo klfooo mnfooo";
#foo=$string17=~ m/foo/g;
print(#foo."\n");
result:
foofoofoofoofoofoofoo
7
Because you are using the concatenation operator ., which forces #foo into scalar context. Arrays return their number of elements in scalar context.
Use an argument list instead of the concatenation with your print to get the list and the newline. The array #foo will be expanded to the list it contains anyway, so the newline will just be another argument to print.
print #foo, "\n";
As an alternative, you can use say, which needs to be activated with use feature 'say'. It removes the need to append newlines when printing. say is available from Perl 5.10.
use feature 'say';
say #foo;
You can also turn it on with
use v5.10;
This has nothing to do with the regular expression. It's a core behavior of Perl, and how lists in Perl work.
Here's a good explanation of what context means in Perl.

Searching a string in perl for x number of occurrences of a character in a IF statement

I'm trying to use perl to find the number of periods on a line and if it contains the number I have defined then to group portions of a line together.
Here is what I have so far
if (($lines [0] =~ /./) ==3){
$lines [3] = $lines [0][3..-1]
if ($lines [2] =~ /'get'/);
print Output "qa;$lines[2];get\n"
I feel like I'm close buy not close enough.
Here is a sample of the text file I am reading from, but what I have is not matching.
...State [ ]
.....County [ ]
.......City [ get set clear ]
.......ZIP [ get detail ]
The goal is for the output to look something like this when I'm done. I really only want get commands right now.
qa;State;County;City get
But I can't seem to get it to match the periods. Any help would be awesome.
You're not really that close, actually. Here are some pointers to get you in the right direction:
m// does not return how many times a match was found. In scalar context (like comparing it to a number as you are), it returns true or false, depending on whether a single match was found at all.
In a list context, the m// would return all the matches, but you'd still need to assign that list to an array and then find the size of the array, in order to get a count of all the matches
you need to escape the period in the pattern match, as that is a special character
Strings are not character arrays. You cannot refer to $lines[3] as though it's an array of characters by putting another set of [ ] subscripts on the end. There are functions that help you get pieces of strings. Look into substr, for example. (And even if it was, [3..-1] would not be a valid subscript, even though -1 does represent the last index of an array)
Get rid of the single quotes in the second pattern match. That's saying you literally want to look for the string: single quote, g, e, t, single quote.
Good luck to you!

How should I deal with escapes when splitting a delimited nvarchar(max) in SQL

I am in the process of rolling over a bunch of old stored procedures that take NVARCHAR(MAX) strings of comma and/or semicolon separated values (never mind about one value per variable etc.). The code is currently using the CHARINDEX approach described in this question, though in principle any of the approaches would work (I'm tempted to replace it with the XML one, because neatness).
The question, though, is what is the most efficient may of handing escaped delimiters? Obviously the lowest level approach is a character by character parser, but I can't shake the feeling that (1) that's going to be horrible when executed a million times in close succession and (2) it'll be overcomplicated for the situation.
Basically, I want to handle 3 possible escapes:
"\\", "\,", and "\;" somewhere in my string. What's the best way to do it? I should add that, ideally, I don't want to make any assumptions about what characters are included in the string.
Sample data would look something like the below.
Value1,Value\,2,ValueWithSlashAtTheEnd\\,ValueWithSlashAndCommaAtTheEnd\\\,
I'm actually splitting to rows rather than columns, but the principle is the same; I'd expect the below output typically:
SomeName
^^^^^^^^
Value1
Value,2
ValueWithSlashAtTheEnd\
ValueWithSlashAndCommaAtTheEnd\,
Needless to say, the escapes could occur anywhere in a value, and ideally I'd like to handle for semicolons as well, but I'll probably be able to infer that from the comma behaviour.
Just provide your function edited string:
replace(replace(#yourstring, '\\', '^'), '\,', '#')
Then replace back:
replace(replace(#returnedstring, '#', ','), '^', '\')
Replace ^ and # with any characters that are not on the string.

Problems with Arrays in Perl

I am new to Perl and having some difficulty with arrays in Perl. Can somebody will explain to me as to why I am not able to print the value of an array in the script below.
$sum=();
$min = 999;
$LogEntry = '';
foreach $item (1, 2, 3, 4, 5)
{
$min = $item if $min > $item;
if ($LogEntry eq '') {
push(#sum,"1"); }
print "debugging the IF condition\n";
}
print "Array is: $sum\n";
print "Min = $min\n";
The output I get is:
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
debugging the IF condition
Array is:
Min = 1
Shouldn't I get Array is: 1 1 1 1 1 (5 times).
Can somebody please help?
Thanks.
You need two things:
use strict;
use warnings;
at which point the bug in your code ($sum instead of #sum) should become obvious...
$sum is not the same variable as #sum.
In this case you would benefit from starting your script with:
use strict;
use warnings;
Strict forces you to declare all variables, and warnings gives warnings..
In the meantime, change the first line to:
#sum = ();
and the second-to-last line to:
print "Array is: " . join (', ', #sum) . "\n";
See join.
As others have noted, you need to understand the way Perl uses sigils ($, #, %) to denote data structures and the access of the data in them.
You are using a scalar sigil ($), which will simply try to access a scalar variable named $sum, that has nothing to do with a completely distinct array variable named #sum - and you obviously want the latter.
What confuses you is likely the fact that, once the array variable #sum exists, you can access individual values in the array using $sum[0] syntax, but here the sigil+braces ($[]) act as a "unified" syntactic constract.
The first thing you need to do (after using strict and warnings) is to read the following documentation on sigils in Perl (aside from good Perl book):
https://stackoverflow.com/a/2732643/119280 - brian d. foy's excellent summary
The rest of the answers to the same question
This SO answer
The best summary I can give you on the syntax of accessing data structures in Perl is (quoting from my older comment)
the sigil represents the amount of data from the data structure that you are retrieving ($ of 1 element, # for a list of elements, % for entire hash)
whereas the brace style represent what your data structure is (square for array, curly for hash).
As a special case, when there are NO braces, the sigil will represent BOTH the amount of data, as well as what the data structure is.
Please note that in your specific case, it's the last bullet point that matters. Since you're referring to the array as a whole, you won't have braces, and therefore the sigil will represent the data structure type - since it's an array, you must use the # sigil.
You push the values into the array #sum, then finish up by printing the scalar $sum. #sum and $sum are two completely independent variables. If you print "#sum\n" instead, you should get the output "11111".
print "Array is: $sum\n";
will print a non-existent scalar variable called $sum, not the array #sum and not the first item of the array.
If you 'use strict' it will flag the user of un-initialized variables like this.
You should definitly add use strict; and use warnings; to your script. That would have complained about the print "Array is: $sum\n"; line (among others).
And you initialize an array with my #sum=(); not with my $sum=();
Like CFL_Jeff mentions, you can't just do a quick print. Instead, do something like:
print "Array is ".join(', ',#array);
Still would like to add some details to this picture. )
See, Perl is well-known as a Very High Level Language. And this is not just because you can replace (1,2,3,4,5) with (1..5) and get the same result.
And not because you may leave your variables without (explicitly) assigning some initial values to them: my #arr is as good as my #arr = (), and my $scal (instead of my $scal = 'some filler value') may actually save you an hour or two one day. Perl is usually (with use warnings, yes) good at spotting undefined values in unusual places - but not so lucky with 'filler values'...
The true point of VHLL is that, in my opinion, you can express a solution in Perl code just like in any human language available (and some may be even less suitable for that case).
Don't believe me? Ok, check your code - or rather your set of tasks, for example.
Need to find the lowest element in a array? Or a sum of all values in array? List::Util module is to your command:
use List::Util qw( min sum );
my #array_of_values = (1..10);
my $min_value = min( #array_of_values );
my $sum_of_values = sum( #array_of_values );
say "In the beginning was... #array_of_values";
say "With lowest point at $min_value";
say "Collected they give $sum_of_values";
Need to construct an array from another array, filtering out unneeded values? grep is here to save the day:
#filtered_array = grep { $filter_condition } #source_array;
See the pattern? Don't try to code your solution into some machine-codish mumbo-jumbo. ) Find a solution in your own language, then just find means to translate THAT solution into Perl code instead. It's easier than you thought. )
Disclaimer: I do understand that reinventing the wheel may be good for learning why wheels are so useful at first place. ) But I do see how often wheels are reimplemented - becoming uglier and slower in process - in production code, just because people got used to this mode of thinking.

Perl: Good way to test if a value is in an array?

If I have an array:
#int_array = (7,101,80,22,42);
How can I check if the integer value 80 is in the array without looping through every element?
You can't without looping. That's part of what it means to be an array. You can use an implicit loop using grep or smartmatch, but there's still a loop. If you want to avoid the loop, use a hash instead (or in addition).
# grep
if ( grep $_ == 80, #int_array ) ...
# smartmatch
use 5.010001;
if ( 80 ~~ #int_array ) ...
Before using smartmatch, note:
http://search.cpan.org/dist/perl-5.18.0/pod/perldelta.pod#The_smartmatch_family_of_features_are_now_experimental:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when. To disable these warnings, you can add this line to the appropriate scope
CPAN solution: use List::MoreUtils
use List::MoreUtils qw{any};
print "found!\n" if any { $_ == 7 } (7,101,80,22,42);
If you need to do MANY MANY lookups in the same array, a more efficient way is to store the array in a hash once and look up in the hash:
#int_array{#int_array} = 1;
foreach my $lookup_value (#lookup_values) {
print "found $lookup_value\n" if exists $int_array{$lookup_value}
}
Why use this solution over the alternatives?
Can't use smart match in Perl before 5.10. According to this SO post by brian d foy]2, smart match is short circuiting, so it's as good as "any" solution for 5.10.
grep solution loops through the entire list even if the first element of 1,000,000 long list matches. any will short-circuit and quit the moment the first match is found, thus it is more efficient. Original poster explicitly said "without looping through every element"
If you need to do LOTs of lookups, the one-time sunk cost of hash creation makes the hash lookup method a LOT more efficient than any other. See this SO post for details
Yet another way to check for a number in an array:
#!/usr/bin/env perl
use strict;
use warnings;
use List::Util 'first';
my #int_array = qw( 7 101 80 22 42 );
my $number_to_check = 80;
if ( first { $_ == $number_to_check } #int_array ) {
print "$number_to_check exists in ", join ', ', #int_array;
}
See List::Util.
if ( grep /^80$/, #int_array ) {
...
}
If you are using Perl 5.10 or later, you can use the smart match operator ~~:
my $found = (80 ~~ $in_array);

Resources