How to use chomp - arrays

Below I have a list of data I am trying to manipulate. I want to split the columns and rejoin them in a different arrangement.
I would like to switch the last element of the array with the third one but I'm running into a problem.
Since the last element of the array contains a line character at the end, when I switch it to be a thrid, it kicks everything a line down.
CODE
while (<>) {
my #flds = split /,/;
DO STUFF HERE;
ETC;
print join ",", #flds[ 0, 1, 3, 2 ]; # switches 3rd element with last
}
SAMPLE DATA
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
1,josh,Hello,Company_name
MY RESULTS - Kicked down a line.
1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name
,Hello1,josh,Company_name,Hello
*Desired REsults**
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
1,josh,Company_name,Hello
I know it has something to do with chomp but when I chomp the first or last element, all \n are removed.
When I use chomp on anything in between, nothing happens. Can anyone help?

chomp removes the trailing newline from the argument. Since none of your four fields should actually contain a newline, this is probably something you want to do for the purposes of data processing. You can remove the newline with chomp before you even split the line into fields, and then add a newline after each record with your final print statement:
while (<>) {
chomp; # Automatically operates on $_
my #flds = split /,/;
DO STUFF HERE;
ETC;
print join(",", #flds[0,1,3,2]) . "\n"; # switches 3rd element with last
}

while ( <> ) {
chomp;
my #flds = split /,/;
... rest of your stuff
}
In the while loop, as each line is processed, $_ is set to the contents of the line. chomp by default, acts on $_ and removes trailing line feeds. split also defaults to using $_, so that works fine.
Technically what will be happening is the last element in #flds includes the trailing \n from the line - e.g. $flds[3].

The chomp() function will remove (usually) any newline character from the end of a string. The reason we say usually is that it actually removes any character that matches the current value of $/ (the input record separator), and $/ defaults to a newline.
Example 1. Chomping a string
Most often you will use chomp() when reading data from a file or from a user. When reading user input from the standard input stream (STDIN) for instance, you get a newline character with each line of data. chomp() is really useful in this case because you do not need to write a regular expression and you do not need to worry about it removing needed characters.
while (my $text = <STDIN>) {
chomp($text);
print "You entered '$text'\n";
last if ($text eq '');
}
Example usage and output of this program is:
a word
You entered 'a word'
some text
You entered 'some text'
You entered ''

Related

perl split delimiter from file line by line

I have a text file named 'dataexample' with multiple line like this:
a|30|40
b|50|70
then I split the delimiter with this code:
open(FILE, 'dataexample') or die "File not exist";
while(<FILE>){
my #record = split(/\|/, $_);
print "$record[0]";
}
close FILE;
when I print "$record[0]" , this is what I got:
ab
what I expect :
a 30 40
so when I do print "$record[0][0]" I expect the output to be: a
Where I got it wrong?
Your loop while ( <FILE> ) { ... } reads a single line at a time from the file handle and puts it into $_
my #record = split(/\|/, $_) splits that line on pipe characters |, so since the first line is "a|30|40\n", #record will now be 'a', '30', "40\n". The newline read from the file remains, and you should use chomp to remove it if you don't want it there
So now $record[0] is a, which you print, and then go on to read the next line in the file, setting #record to 'b', '50', "70\n" this time. Now $record[0] is b, which you also print, showing ab on the console
You've now reached the end of the file, so the while loop terminates
It sounds like you're expecting a two-dimensional array. You can do that by pushing each array onto a main array each time you read a record, like this
use strict;
use warnings 'all';
open my $fh, '<', 'dataexample' or die qq{Unable to open "dataexample" for input: $!};
my #data;
while ( <$fh> ) {
chomp;
my #record = split /\|/;
push #data, \#record;
}
print "#{$data[0]}\n";
print "$data[0][0]\n";
output
a 30 40
a
Or, more concisely, like this, which produces exactly the same result but may be a little advanced for you
use strict;
use warnings 'all';
open my $fh, '<', 'dataexample' or die qq{Unable to open "dataexample" for input: $!};
my #data = map { chomp; [ split /\|/ ] } <$fh>;
print "#{$data[0]}\n";
print "$data[0][0]\n";
Some points to know about your own code
You must always use strict and use warnings 'all' at the top of every Perl program you write. It's a measure that will uncover many simple mistakes that you may not otherwise notice
You should use lexical filehandles together with the three-parameter form or open. And an open may fail for many other reasons that the file not existing, so you should include the built-in $! variable in your die string to say why it failed
Don't forget to chomp each record read from a file unless you want to keep then trailing newline or it doesn't matter to you
You will be able to write more concise code if you get used to using the default variable $_. For instance, the second parameter to split is $_ by default, so split(/\|/, $_) may be written as just split /\|/
You can use Data::Dumper to display the contents of your variables, which will help you to debug your code. Data::Dump is superior, but it isn't a core module so you will probably have to install it before you can use it in your code
You have to use
print "$record[1]";
print "$record[2]";
As they are stored in consecutive index values.
or
If you want to print the entire thing you can just do
print "#record\n";
You are printing the value at the first index in the array each time through the loop, and without the new line. So you get the first value from each line, right next to each other on the same line, thus ab.
Print the whole array, under quotes, with the new line. with your program changed a bit
use strict;
use warnings;
my $file = 'dataexample';
open my $fh, '<', $file or die "Error opening $file: $!";
while (<$fh>) {
chomp;
my #record = split(/\|/, $_);
print "#record\n";
}
close $fh;
With the quotes the elements are printed with spaces added between them so you get
a 30 40
b 50 70
If you print without quotes the elements get printed without extra spaces, so
this
print #record, "\n";
over the whole loop prints
a3040
b5070
If you don't have the new line "\n" either, it is all printed on one line so this
print #record;
altogether prints
a3040b5070
As for $record[0][0], this is not valid for the array you have. This would print from a two-dimensional array. Take, for example
my #data = ( [1.1, 2.2], [10, 20] );
This array #data has at its first index a reference to an array -- more precisely, an anonymous array [1.1, 2.2]. Its second element is an anonymous array [10, 20]. So $data[0][0] is: the first element of #data (so the first of the two anonymous arrays inside), and then the first element of that array, thus 1.1. Likewise $data[1][1] is 20.
Thanks to Sobrique for the comment.
But you don't have this in your program. When you split data into an array
while(<FILE>){
my #record = split(/\|/, $_);
# ...
}
it creates a new array named #record every time through the loop. So #record is a normal array, not two-dimensional. For that the syntax $record[0][0] doesn't mean much.
I think you're trying to create a 2d array, whereby each element contains all the pipe delimited items from each line of your input:
my #record;
while(<DATA>){
chomp;
my #split = split(/\|/);
push #record, [#split];
}
print "#{$record[0]}\n";
a 30 40
record[0] has the contents of column 1 - 'a' on the first iteration of the loop, 'b' on the second. record[1] has column 2 and so on. You put the print statement, print "record[0]" in the loop so you get 'a' printed in the first iteration and 'b' in the second.
To get what you wanted you need to replace you print statement with;
print join " ", #record, "\n";

How to compare each element of an array using regex?

I am using the Lingua::EN::Tagger Perl module in order to tag parts of speech from a user's input. That portion of my code works perfect. However, the problem is that I only want to keep the input that has the noun tags which are "NN, NNS, NNP, NNPS", and store these words in a separate array #nounArray. The user will be inputting a question such as "what is your name?" Each element of the question will be tagged: What/WP is/is your/PN name/NN
my #UserInput = $readable_text;
my #nounArray;
foreach my $UserInput (#UserInput){
if ($UserInput =~ m/NN|NNS$|NNP$|NNPS$/){
$UserInput = #nounArray;
}
print #nounArray;
}
However, nothing occurs when I run the code. The goal is to have the nouns of the user's input be placed in a separate array after separating them from the original array. I do not want to print the array, but i do this in order to see if the code was working.
Since you want to iterate over the words in $readable_text you can split them first into array,
my $readable_text = "What/WP is/is your/PN name/NN";
my #UserInput = split ' ', $readable_text;
my #nounArray;
foreach my $UserInput (#UserInput) {
if ($UserInput =~ m/NN|NNS$|NNP$|NNPS$/) {
# print "$UserInput\n";
push #nounArray, $UserInput;
}
}
print #nounArray;
$ matches at the end of the string. I suppose your strings have at least a \n at the end, which would prevent them from matching.
But as you point out in your comment, it looks like you're trying to match word boundaries here, so just replace all $ in your expression with \b.
First, split your words by whitespace:
my #UserInput = split /\s+/, $UserInput;
Then grep for the nouns:
my #nouns = grep { m%/N% } #UserInput; # only noun tags include /N

Perl regex not matching as expected

I'm trying to compare each word in a list to a string to find matching words, but I can't seem to get this to work.
Here is some sample code
my $sent = "this is a test line";
foreach (#keywords) { # array of words (contains the word 'test')
if ($sent =~ /$_/) {
print "match found";
}
}
It seems to work if I manually enter /test/ instead of $_, but I can't enter words manually.
Your code works fine. I hope you have use strict and use warnings in place in the real program? Here is an example where I have populated #keywords with a few items including test.
use strict;
use warnings;
my $sent = "this is a test line";
my #keywords = qw/ a b test d e /;
foreach (#keywords) {
if ($sent =~ /$_/) {
print "match found\n";
}
}
output
match found
match found
match found
So your array doesn't contain what you think it does. I would bet that you've read the data from a file or from the keyboard and forgot to remove the newline from the end of each word with chomp.
You can do that by simply writing
chomp #keywords
which will remove a newline (if there is one) from the end of all elements of #keywords. To see the real contents of #keywords, you can add these lines to your program
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#keywords;
You will also see that the elements a and e produce a match as well as test, which I guess you don't want. You could add a word boundary metacharacter \b before and after the value of $_, like this
foreach (#keywords) {
if ( $sent =~ /\b$_\b/ ) {
print "match found\n";
}
}
but a regular expression's definition of a word is very restrictive and allows only alphanumeric characters or an underscore _, so Roger's, "essay", 99%, and nicely-formatted are not "words" in this sense. Depending on your actual data you may want something different.
Finally, I would write this loop more compactly using for instead of foreach (they are identical in every respect) and the postfixed statement modifier form of if, like this
for (#keywords) {
print "match found\n" if $sent =~ /\b$_\b/;
}

Perl print shows leading spaces for every element

I load a file into an array (every line in array element).
I process the array elements and save to a new file.
I want to print out the new file:
print ("Array: #myArray");
But - it shows them with leading spaces in every line.
Is there a simple way to print out the array without the leading spaces?
Yes -- use join:
my $delimiter = ''; # empty string
my $string = join($delimiter, #myArray);
print "Array: $string";
Matt Fenwick is correct. When your array is in double quotes, Perl will put the value of $" (which defaults to a space; see the perlvar manpage) between the elements. You can just put it outside the quotes:
print ('Array: ', #myArray);
If you want the elements separated by for example a comma, change the output field separator:
use English '-no_match_vars';
$OUTPUT_FIELD_SEPARATOR = ','; # or "\n" etc.
print ('Array: ', #myArray);

How to ignore any empty values in a perl grep?

I am using the following to count the number of occurrences of a pattern in a file:
my #lines = grep /$text/, <$fp>;
print ($#lines + 1);
But sometimes it prints one more than the actual value. I checked and it is because the last element of #lines is null, and that is also counted.
How can the last element of the grep result be empty sometimes? Also, how can this issue be resolved?
It really depends a lot on your pattern, but one thing you could do is join a couple of matches, the first one disqualifying any line that contains only space (or nothing). This example will reject any line that is either empty, newline only, or any amount of whitespace only.
my #lines = grep { not /^\s*$/ and /$test/ } <$fp>;
Keep in mind that if the contents of $test happen to include regexp special metacharacters they either need to be intended for their metacharacter purposes, or sterilized with quotemeta().
My theories are that you might have a line terminated in \n which is somehow matching your $text regexp, or your $text regexp contains metacharacters in it that are affecting the match without you being aware. Either way, the snippet I provided will at least force rejection of "blank lines", where blank could mean completely empty (unlikely), newline terminated but otherwise empty (probable), or whitespace containing (possible) lines that appear blank when printed.
A regular expression that matches the empty string will match undef. Perl will warn about doing so, but casts undef to '' before trying to match against it, at which point grep will quite happily promote the undef to its results. If you don't want to pick up the empty string (or anything that will be matched as though it were the empty string), you need to rewrite your regular expression to not match it.
To accurately see what is in lines, do:
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#lines;
Ok, since no more information about the contents of $text (the regex) is forthcoming, I guess I'll toss out some general information.
Consider the following example:
use Data::Dumper;
my #array = (' ', 1, 2, 'a', '');
print Dumper [ grep /\s*/, #array ];
We get:
$VAR1 = [
' ',
1,
2,
'a',
''
];
All the values match. Why? Because they also match the empty string. To get what we want, we need \s or \s+. (There will be no practical difference between the two)
You may have such a problem.

Resources