How to ignore any empty values in a perl grep? - arrays

I am using the following to count the number of occurrences of a pattern in a file:
my #lines = grep /$text/, <$fp>;
print ($#lines + 1);
But sometimes it prints one more than the actual value. I checked and it is because the last element of #lines is null, and that is also counted.
How can the last element of the grep result be empty sometimes? Also, how can this issue be resolved?

It really depends a lot on your pattern, but one thing you could do is join a couple of matches, the first one disqualifying any line that contains only space (or nothing). This example will reject any line that is either empty, newline only, or any amount of whitespace only.
my #lines = grep { not /^\s*$/ and /$test/ } <$fp>;
Keep in mind that if the contents of $test happen to include regexp special metacharacters they either need to be intended for their metacharacter purposes, or sterilized with quotemeta().
My theories are that you might have a line terminated in \n which is somehow matching your $text regexp, or your $text regexp contains metacharacters in it that are affecting the match without you being aware. Either way, the snippet I provided will at least force rejection of "blank lines", where blank could mean completely empty (unlikely), newline terminated but otherwise empty (probable), or whitespace containing (possible) lines that appear blank when printed.

A regular expression that matches the empty string will match undef. Perl will warn about doing so, but casts undef to '' before trying to match against it, at which point grep will quite happily promote the undef to its results. If you don't want to pick up the empty string (or anything that will be matched as though it were the empty string), you need to rewrite your regular expression to not match it.

To accurately see what is in lines, do:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#lines;

Ok, since no more information about the contents of $text (the regex) is forthcoming, I guess I'll toss out some general information.
Consider the following example:
use Data::Dumper;
my #array = (' ', 1, 2, 'a', '');
print Dumper [ grep /\s*/, #array ];
We get:
$VAR1 = [
' ',
1,
2,
'a',
''
];
All the values match. Why? Because they also match the empty string. To get what we want, we need \s or \s+. (There will be no practical difference between the two)
You may have such a problem.

Related

How do I modify elements in a Perl array inside a foreach loop?

My goal with this piece of code is to sanitize an array of elements (a list of URL's, some with special characters like %) so that I can eventually compare it to another file of URL's and output which ones match. The list of URL's is from a .csv file with the first field being the URL that I want (with some other entries that I skip over with a quick if() statement).
foreach my $var(#input_1) {
#Skip anything that doesn't start with http:
if ((/^[#U]/ ) || !(/^h/)) {
next;
}
#Split the .csv into the relevant field:
my #fields = split /\s?\|\s?/, $_;
$var = uri_unescape($fields[0]);
}
My delimiter is a | in the csv. In its current setup, and also when I change the $_ to $var, it only returns blank lines. When I remove the $var declaration at the beginning of the loop and use $_, it will output the URL's in the correct format. But in that case, how can I assign the output to the same element in the array? Would this require a second array to output the value to?
I'm relatively new to perl, so I'm sure there is some stuff that I'm missing. I have no clue at this moment why removing the $var at the foreach declaration breaks the parsing of the #fields line, but removing it and using $_ doesn't. Reading the perlsyn documentation did not help as much as I would have liked. Any help appreciated!
/^h/ is not bound to anything, so the match happens against $_. If you want to match $var, you have to bind it:
if ($var =~ /^[#U]/ || $var !~ /^h/) {
Using || with two matches could probably be incorporated into a single regular expression with an alternative:
next if $var =~ /^(?: [#U] | [^h] | $ )/x;
i.e. The line has to start with #, U, something else than h, or be empty.
You can populate a new array with the results by using push:
push #results, $var;
Also note that if your data can contain | quoted or escaped (or newlines etc.), you should use Text::CSV instead of split.

How to find index of string in array Perl without iterating

I need to find value in array without iterating through whole array.
I get array of strings from file, and I need to get index of some value in this array, I have tried this code, but it doesn't work.
my #array =<$file>;
my $search = "SomeValue";
my $index = first { $array[$_] eq $search } 0 .. $#array;
print "index of $search = $index\n";
Please suggest how can I get index of value, or it is better to get all indexes of line if there are more than one entry.
Thx in advance.
What does "it doesn't work" mean?
The code you have will work fine, except that an element in the array is going to be "SomeValue\n", not "SomeValue". You can remove the newlines with chomp(#array) or include a newline in your $search string.
Your initial question: "I need to find value in array without iterating through whole array."
You can't. It is impossible to check every element of an array, without checking every element of an array. The very best you can do is stop looking once you've found it - but you indicate in your question multiple matches.
There are various options that will do this for you - like List::Util and grep. But they are still doing a loop, they're just hiding it behind the scenes.
The reason first doesn't work for you, is probably because you need to load it from List::Util first. Alternatively - you forgot to chomp, which means your list includes line feeds, where your search pattern doesn't.
Anyway - in the interests of actually giving something that'll do the job:
while ( my $line = <$file> ) {
chomp ( $line );
#could use regular expression based matching for e.g. substrings.
if ( $line eq $search ) { print "Match on line $.\n"; last; }
}
If you want want every match - omit the last;
Alternatively - you can match with:
if ( $line =~ m/\Q$search\E/ ) {
Which will substring match (Which in turn means the line feeds are irrelevant).
So you can do this instead:
while ( <$file> ) {
print "Match on line $.\n" if m/\Q$search\E/;
}

How to use chomp

Below I have a list of data I am trying to manipulate. I want to split the columns and rejoin them in a different arrangement.
I would like to switch the last element of the array with the third one but I'm running into a problem.
Since the last element of the array contains a line character at the end, when I switch it to be a thrid, it kicks everything a line down.
CODE
while (<>) {
my #flds = split /,/;
DO STUFF HERE;
ETC;
print join ",", #flds[ 0, 1, 3, 2 ]; # switches 3rd element with last
}
SAMPLE DATA
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
MY RESULTS - Kicked down a line.
1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name,Hello
*Desired REsults**
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
I know it has something to do with chomp but when I chomp the first or last element, all \n are removed.
When I use chomp on anything in between, nothing happens. Can anyone help?
chomp removes the trailing newline from the argument. Since none of your four fields should actually contain a newline, this is probably something you want to do for the purposes of data processing. You can remove the newline with chomp before you even split the line into fields, and then add a newline after each record with your final print statement:
while (<>) {
chomp; # Automatically operates on $_
my #flds = split /,/;
DO STUFF HERE;
ETC;
print join(",", #flds[0,1,3,2]) . "\n"; # switches 3rd element with last
}
while ( <> ) {
chomp;
my #flds = split /,/;
... rest of your stuff
}
In the while loop, as each line is processed, $_ is set to the contents of the line. chomp by default, acts on $_ and removes trailing line feeds. split also defaults to using $_, so that works fine.
Technically what will be happening is the last element in #flds includes the trailing \n from the line - e.g. $flds[3].
The chomp() function will remove (usually) any newline character from the end of a string. The reason we say usually is that it actually removes any character that matches the current value of $/ (the input record separator), and $/ defaults to a newline.
Example 1. Chomping a string
Most often you will use chomp() when reading data from a file or from a user. When reading user input from the standard input stream (STDIN) for instance, you get a newline character with each line of data. chomp() is really useful in this case because you do not need to write a regular expression and you do not need to worry about it removing needed characters.
while (my $text = <STDIN>) {
chomp($text);
print "You entered '$text'\n";
last if ($text eq '');
}
Example usage and output of this program is:
a word
You entered 'a word'
some text
You entered 'some text'
You entered ''

Why can't I seem to remove undefined element from an array in Perl?

I have an array of strings that comes from data in a hash table. I am trying to remove any (apparently) empty elements, but for some reason there seems to be an obstinate element that refuses to go.
I am doing:
# Get list array from hash first, then
#list = grep { $_ ne ' ' } #list;
#list = uniq #list;
return sort #list;
At the grep line I get the Use of uninitialized value in string ne... message with the rest of the array printed correctly below.
I've tried doing it the 'long' way:
foreach (#list) {
if ($_ ne ' ') {
push #new_list, $_;
}
}
But this produces exactly the same result. I tried using defined with the expected result (nothing).
I could sort the array beforehand and delete the first element, but that seems very risky as I cannot guarantee that the data set will always have blank elements. It also seems excessive to resort to regular expressions, but perhaps I'm wrong. I'm sure I'm missing something ridiculously simple, as usual.
Elements can't be empty. You're trying to remove undefined elements. But you're not checking if the element is undefined, you're checking if it consists of a string consisting of a single space. You want:
#list = grep defined, #list;
My answer assumes that you do not want strings that are either empty (meaning undefined or have a length of 0) or consist solely of spaces.
Your grep line only tests for strings that equal exactly one space. However, the warning implies that at least one array element is indeed undefined. Comparing an undefined value with eq will only yield true for an empty string, not for a single space.
So in order to remove all entries that are either undefined or consist only of spaces you could do something like this:
#list = grep { defined && m/[^\s]/ } #list;
Note that an empty space is trueish for Perl. Therefore a simple grep defined, #list will actually not throw out the entries that consist solely of spaces.
It looks like you want to filter all the elements that contain a non-space character. To do this as well as reject undefined elements you can write simply
#list = uniq grep { defined and /\S/ } #list;

Perl print shows leading spaces for every element

I load a file into an array (every line in array element).
I process the array elements and save to a new file.
I want to print out the new file:
print ("Array: #myArray");
But - it shows them with leading spaces in every line.
Is there a simple way to print out the array without the leading spaces?
Yes -- use join:
my $delimiter = ''; # empty string
my $string = join($delimiter, #myArray);
print "Array: $string";
Matt Fenwick is correct. When your array is in double quotes, Perl will put the value of $" (which defaults to a space; see the perlvar manpage) between the elements. You can just put it outside the quotes:
print ('Array: ', #myArray);
If you want the elements separated by for example a comma, change the output field separator:
use English '-no_match_vars';
$OUTPUT_FIELD_SEPARATOR = ','; # or "\n" etc.
print ('Array: ', #myArray);

Resources