I am using the Lingua::EN::Tagger Perl module in order to tag parts of speech from a user's input. That portion of my code works perfect. However, the problem is that I only want to keep the input that has the noun tags which are "NN, NNS, NNP, NNPS", and store these words in a separate array #nounArray. The user will be inputting a question such as "what is your name?" Each element of the question will be tagged: What/WP is/is your/PN name/NN
my #UserInput = $readable_text;
my #nounArray;
foreach my $UserInput (#UserInput){
if ($UserInput =~ m/NN|NNS$|NNP$|NNPS$/){
$UserInput = #nounArray;
}
print #nounArray;
}
However, nothing occurs when I run the code. The goal is to have the nouns of the user's input be placed in a separate array after separating them from the original array. I do not want to print the array, but i do this in order to see if the code was working.
Since you want to iterate over the words in $readable_text you can split them first into array,
my $readable_text = "What/WP is/is your/PN name/NN";
my #UserInput = split ' ', $readable_text;
my #nounArray;
foreach my $UserInput (#UserInput) {
if ($UserInput =~ m/NN|NNS$|NNP$|NNPS$/) {
# print "$UserInput\n";
push #nounArray, $UserInput;
}
}
print #nounArray;
$ matches at the end of the string. I suppose your strings have at least a \n at the end, which would prevent them from matching.
But as you point out in your comment, it looks like you're trying to match word boundaries here, so just replace all $ in your expression with \b.
First, split your words by whitespace:
my #UserInput = split /\s+/, $UserInput;
Then grep for the nouns:
my #nouns = grep { m%/N% } #UserInput; # only noun tags include /N
Related
I am trying to replace a string with a substring which is located in between other two parts.
To describe, I have a file which contains some text in. In this text file there is one word which some parts of it are written in different character, as an example like:
acc\E34rate
acc\?4rate
acc§54rate
.....
What I want to write as a code is, to lookup for for acc and then rate and then replace what is between them with u. Because all strings are in commen with the the first part and the last part.
I wonder how I can do it in Perl?
Thanks!
Update: including Code
well what I have written is:
use strict;
use warnings;
my #stringArray = ('acc\E34rate', 'acc\?4rate');
my $find = '\E34';
my $replace = 'u';
my #newArray;
foreach my $str(#stringArray)
{
my $pos = index($str, $find);
while($pos > -1) {
substr($str, $pos, length($find), $replace);
$pos = index($str, $find, $pos + length($replace));
}
push #newSrray, $str;
}
foreach(#newArray)
{
print "$_\r\n";
}
To simplify, I have added an array instead of a file. Because it works for only a proper word rather than the whole array/file.
I think this is what you want but the requirements are not clear. See perldoc perlre for more details.
#!/usr/bin/env perl
use strict;
use warnings;
my $begin = 'acc';
my $end = 'rate';
my $replace = 'u';
while( my $line = <DATA> ){
$line =~ s{ \Q$begin\E \S*? \Q$end\E }{$begin$replace$end}gmsx;
print $line;
}
__DATA__
acc\E34rate
acc\?4rate
acc§54rate
acc\E34rate acc\?4rate acc§54rate
accFOOacc
rateFOOrate
rateFOOrate accFOOacc
accFOOacc rateFOOrate
Try this:
$Text = "acc\E34rate
acc\?4rate
acc§54rate"; # This is the joined string (using enter key) after reading from the file
$Text =~ s/^acc.*?rate$/accutext/mg;
print $Text;
I've just tested it in my system and it is working fine.
Output:
accutext
accutext
accutext
m is to denote that the string is a multi line string and that each \n will be treated as an end of string character.
g is to replace all possible occurrences.
To get back as an array, split using \n.
Please note that the above code is written based on the assumption that each line in the file will begin and end with acc and text respectively and that there are no additional text after or before them in that line (ie, File is not having individual lines like "Driving acc\?4rate at 60kmph" and only "acc\?4rate").
In case this word is in between words in a sentence, replace below in the above code.
$Text =~ s/acc.*?rate/accutext/g;
Incidentally, this will work in all possible inputs too, including the code at the top.
My goal with this piece of code is to sanitize an array of elements (a list of URL's, some with special characters like %) so that I can eventually compare it to another file of URL's and output which ones match. The list of URL's is from a .csv file with the first field being the URL that I want (with some other entries that I skip over with a quick if() statement).
foreach my $var(#input_1) {
#Skip anything that doesn't start with http:
if ((/^[#U]/ ) || !(/^h/)) {
next;
}
#Split the .csv into the relevant field:
my #fields = split /\s?\|\s?/, $_;
$var = uri_unescape($fields[0]);
}
My delimiter is a | in the csv. In its current setup, and also when I change the $_ to $var, it only returns blank lines. When I remove the $var declaration at the beginning of the loop and use $_, it will output the URL's in the correct format. But in that case, how can I assign the output to the same element in the array? Would this require a second array to output the value to?
I'm relatively new to perl, so I'm sure there is some stuff that I'm missing. I have no clue at this moment why removing the $var at the foreach declaration breaks the parsing of the #fields line, but removing it and using $_ doesn't. Reading the perlsyn documentation did not help as much as I would have liked. Any help appreciated!
/^h/ is not bound to anything, so the match happens against $_. If you want to match $var, you have to bind it:
if ($var =~ /^[#U]/ || $var !~ /^h/) {
Using || with two matches could probably be incorporated into a single regular expression with an alternative:
next if $var =~ /^(?: [#U] | [^h] | $ )/x;
i.e. The line has to start with #, U, something else than h, or be empty.
You can populate a new array with the results by using push:
push #results, $var;
Also note that if your data can contain | quoted or escaped (or newlines etc.), you should use Text::CSV instead of split.
I need to find value in array without iterating through whole array.
I get array of strings from file, and I need to get index of some value in this array, I have tried this code, but it doesn't work.
my #array =<$file>;
my $search = "SomeValue";
my $index = first { $array[$_] eq $search } 0 .. $#array;
print "index of $search = $index\n";
Please suggest how can I get index of value, or it is better to get all indexes of line if there are more than one entry.
Thx in advance.
What does "it doesn't work" mean?
The code you have will work fine, except that an element in the array is going to be "SomeValue\n", not "SomeValue". You can remove the newlines with chomp(#array) or include a newline in your $search string.
Your initial question: "I need to find value in array without iterating through whole array."
You can't. It is impossible to check every element of an array, without checking every element of an array. The very best you can do is stop looking once you've found it - but you indicate in your question multiple matches.
There are various options that will do this for you - like List::Util and grep. But they are still doing a loop, they're just hiding it behind the scenes.
The reason first doesn't work for you, is probably because you need to load it from List::Util first. Alternatively - you forgot to chomp, which means your list includes line feeds, where your search pattern doesn't.
Anyway - in the interests of actually giving something that'll do the job:
while ( my $line = <$file> ) {
chomp ( $line );
#could use regular expression based matching for e.g. substrings.
if ( $line eq $search ) { print "Match on line $.\n"; last; }
}
If you want want every match - omit the last;
Alternatively - you can match with:
if ( $line =~ m/\Q$search\E/ ) {
Which will substring match (Which in turn means the line feeds are irrelevant).
So you can do this instead:
while ( <$file> ) {
print "Match on line $.\n" if m/\Q$search\E/;
}
I'm trying to compare each word in a list to a string to find matching words, but I can't seem to get this to work.
Here is some sample code
my $sent = "this is a test line";
foreach (#keywords) { # array of words (contains the word 'test')
if ($sent =~ /$_/) {
print "match found";
}
}
It seems to work if I manually enter /test/ instead of $_, but I can't enter words manually.
Your code works fine. I hope you have use strict and use warnings in place in the real program? Here is an example where I have populated #keywords with a few items including test.
use strict;
use warnings;
my $sent = "this is a test line";
my #keywords = qw/ a b test d e /;
foreach (#keywords) {
if ($sent =~ /$_/) {
print "match found\n";
}
}
output
match found
match found
match found
So your array doesn't contain what you think it does. I would bet that you've read the data from a file or from the keyboard and forgot to remove the newline from the end of each word with chomp.
You can do that by simply writing
chomp #keywords
which will remove a newline (if there is one) from the end of all elements of #keywords. To see the real contents of #keywords, you can add these lines to your program
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper \#keywords;
You will also see that the elements a and e produce a match as well as test, which I guess you don't want. You could add a word boundary metacharacter \b before and after the value of $_, like this
foreach (#keywords) {
if ( $sent =~ /\b$_\b/ ) {
print "match found\n";
}
}
but a regular expression's definition of a word is very restrictive and allows only alphanumeric characters or an underscore _, so Roger's, "essay", 99%, and nicely-formatted are not "words" in this sense. Depending on your actual data you may want something different.
Finally, I would write this loop more compactly using for instead of foreach (they are identical in every respect) and the postfixed statement modifier form of if, like this
for (#keywords) {
print "match found\n" if $sent =~ /\b$_\b/;
}
I load a file into an array (every line in array element).
I process the array elements and save to a new file.
I want to print out the new file:
print ("Array: #myArray");
But - it shows them with leading spaces in every line.
Is there a simple way to print out the array without the leading spaces?
Yes -- use join:
my $delimiter = ''; # empty string
my $string = join($delimiter, #myArray);
print "Array: $string";
Matt Fenwick is correct. When your array is in double quotes, Perl will put the value of $" (which defaults to a space; see the perlvar manpage) between the elements. You can just put it outside the quotes:
print ('Array: ', #myArray);
If you want the elements separated by for example a comma, change the output field separator:
use English '-no_match_vars';
$OUTPUT_FIELD_SEPARATOR = ','; # or "\n" etc.
print ('Array: ', #myArray);