I want to replace the text in the file and overwrite file.
use strict;
use warnings;
my ($str1, $str2, $i, $all_line);
$str1 = "AAA";
$str2 = "bbb";
open(RP, "+>", "testfile") ;
$all_line = $_;
$i = 0;
while(<RP>) {
while(/$str1/) {
$i++;
}
s/$str1/$str2/g;
print RP $_;
}
close(RP);
A normal process is to read the file line by line and write out each line, changed as/if needed, to a new file. Once that's all done rename that new file, atomically as much as possible, so to overwrite the orgiinal.
Here is an example of a library that does all that for us, Path::Tiny
use warnings;
use strict;
use feature 'say';
use Path::Tiny;
my $file = shift || die "Usage: $0 file\n";
say "File to process:";
say path($file)->slurp;
# NOTE: This CHANGES THE INPUT FILE
#
# Process each line: upper-case a letter after .
path($file)->edit_lines( sub { s/\.\s+\K([a-z])/\U$1/g } );
say "File now:";
say path($file)->slurp;
This upper-cases a letter following a dot (period), after some spaces, on each line where it is found and copies all other lines unchanged. (It's just an example, not a proper linguistic fix.)
Note: the input file is edited in-place so it will have been changed after this is done.
This capability was introduced in the module's version 0.077, of 2016-02-10. (For reference, Perl version 5.24 came in 2016-08. So with the system Perl 5.24 or newer, a Path::Tiny installed from an OS package or as a default CPAN version should have this method.)
Perl has a built-in in-place editing facility: The -i command line switch. You can access the functionality of this switch via $^I.
use strict;
use warnings;
my $str1 = "AAA";
my $str2 = "bbb";
local $^I = ''; # Same as `perl -i`. For `-i~`, assign `"~"`.
local #ARGV = "testfile";
while (<>) {
s/\Q$str1/$str2/g;
print;
}
I was not going to leave an answer, but I discovered when leaving a comment that I did have some feedback for you, so here goes.
open(RP, "+>", "testfile") ;
The mode "+>" will truncate your file, delete all it's content. This is described in the documentation for open:
You can put a + in front of the > or < to indicate that you want both read and write access to the file; thus +< is almost always preferred for read/write updates--the +> mode would clobber the file first. You can't usually use either read-write mode for updating textfiles, since they have variable-length records. See the -i switch in perlrun for a better approach.
So naturally, you can't read from the file if you first delete it's content. They mention here the -i switch, which is described like this in perl -h:
-i[extension] edit <> files in place (makes backup if extension supplied)
This is what ikegami's answer describes, only in his case it is done from within a program file rather than on the command line.
But, using the + mode for both reading and writing is not really a good way to do it. Basically it becomes difficult to print where you want to print. The recommended way is to instead read from one file, and then print to another. After the editing is done, you can rename and delete files as required. And this is exactly what the -i switch does for you. It is a predefined functionality of Perl. Read more about it in perldoc perlrun.
Next, you should use a lexical file handle. E.g. my $fh, instead of a global. And you should also check the return value from the open, to make sure there was not an error. Which gives us:
open my $fh, "<", "testfile" or die "Cannot open 'testfile': $!";
Usually if open fails, you want the program to die, and report the reason it failed. The error is in the $! variable.
Another thing to note is that you should not declare all your variables at the top of the file. It is good that you use use strict; use warnings, Perl code should never be written without them, but this is not how you handle it. You declare your variable as close to the place you use the variable as possible, and in the smallest scope possible. With a my declared variable, that is the nearest enclosing block { ... }. This will make it easy to trace your variable in bigger programs, and it will encapsulate your code and protect your variable.
In your case, you would simply put the my before all the variables, like so:
my $str1 = "AAA";
my $str2 = "bbb";
my $all_line = $_;
my $i = 0;
Note that $_ will be empty/undefined there, so that assignment is kind of pointless. If your intent was to use $all_line as the loop variable, you would do:
while (my $all_line = <$fh>) {
Note that this variable is declared in the smallest scope possible, and we are using a lexical file handle $fh.
Another important note is that your replacement strings can contain regex meta characters. Sometimes you want them to be included, like for example:
my $str1 = "a+"; # match one or more 'a'
Sometimes you don't want that:
my $str1 = "google.com"; # '.' is meant as a period, not the "match anything" character
I will assume that you most often do not want that, in which case you should use the escape sequence \Q ... \E which disables regex meta characters inside it.
So what do we get if we put all this together? Well, you might get something like this:
use strict;
use warnings;
my $str1 = "AAA";
my $str2 = "bbb";
my $filename = shift || "testfile"; # 'testfile', or whatever the program argument is
open my $fh_in, "<", $filename or die "Cannot open '$filename': $!";
open my $fh_out, ">", "$filename.out" or die "Cannot open '$filename.out': $!";
while (<$fh_in>) { # read from input file...
s/\Q$str1\E/$str2/g; # perform substitution...
print $fh_out $_; # print to output file
}
close $fh_in;
close $fh_out;
After this finishes, you may choose to rename the files and delete one or the other. This is basically the same procedure as using the -i switch, except here we do it explicitly.
rename $filename, "$filename.bak"; # save backup of input file in .bak extension
rename "$filename.out", $filename; # clobber the input file
Renaming files is sometimes also facilitated by the File::Copy module, which is a core module.
With all this said, you can replace all your code with this:
perl -i -pe's/AAA/bbb/g' testfile
And this is the power of the -i switch.
Related
I am writing a script to append text file by adding some text under specific string in the file after tab spacing. Help needed to add new line and tab spacing after matched string "apple" in below case.
Example File:
apple
<tab_spacing>original text1
orange
<tab_spacing>original text2
Expected output:
apple
<tab_spacing>testing
<tab_spacing>original text1
orange
<tab_spacing>original text2
What i have tried:
use strict;
use warnings;
my $config="filename.txt";
open (CONFIG,"+<$config") or die "Fail to open config file $config\n";
while (<CONFIG>) {
chop;
if (($_ =~ /^$apple$/)){
print CONFIG "\n";
print CONFIG "testing\n";
}
}
close CONFIG;
We cannot simply "add" text to a middle of a file as attempted. A file is a sequence of bytes and one cannot add or remove them (except at the end) but only change them. So if we start writing to a middle of a file then we are changing the bytes there, so overwriting what follows that place. Instead, we have to copy the rest of the text and write it back following the "addition," or to copy the file adding text in the process.
Yet another way is to read the whole file into a string and run a regex on it to change it, then write out the new string. Assuming that the file isn't too large for that
perl -0777 -pe's{apple\n\K(\t)}{Added text\n$1}g' in.txt
The -0777 switch makes it read the whole file into a string ("slurp" it), available in $_, to which the regex is bound by default. That \K, which is a lookbehind, drops previous matches so they are not consumed out of the string and we don't have to (capture and) put them back. With the /g modifier it keeps going through the whole string, to find and change all occurrences of the pattern.
This prints the changed file to screen, what can be saved in a new file by redirecting it
perl -0777 -pe'...' in.txt > out.txt
Or, one can change the input file "in place" with -i
perl -0777 -i.bak -pe'...' in.txt
The .bak makes it save the original with .bak extension. See switches in perlrun.
Another way is to use a lookahead for what follows (the tab) so that we don't have to capture and put it back
perl -0777 -pe's{apple\n\K(?=\t)}{Added text\n}g' in.txt
All of these produce the desired change.
Note on that tab ("tab_spacing")
The regex above assumes a tab character at the beginning of the line following the line with apple. When we say "tab" we mean one (tab) character.
But there are many reasons why there may in fact not be a tab character, even if it looks just like there is one. An example: all tabs may be automatically replaced by spaces by an editor.
So it may be safer to use \s+ (multiple spaces) instead of \t in the regex
s{apple\n\K(\s+)}{Added text\n$1}g
or
s{apple\n\K(?=\s+)}{Added text\n}g
If this is to be done inside of an existing larger Perl program (and not as a command-line program, "one-liner" as above), one way
use Path::Tiny; # path()
my $file_content = path($file)->slurp; # read the file into a string
# Now use a regex; all discussion above applies
$file_content =~ s{apple\n\K(?=\t)}{Added text\n}g;
# Print out $file_content, to be redirected etc. Or write to a file
path($new_file)->spew($file_content);
I use the library Path::Tiny to "slurp" the file into a string and spew to write $file_content to a new file. That need be installed as it is not in a "core" (doesn't usually come installed with Perl), and if that is a problem for some strange reason here is an idiom-of-sorts for it without any libraries
my $file_content = do {
local $/;
open my $fh, '<', $file or die "Can't open $file: $!";
<$fh>;
};
or even
my $file_content = do { local (#ARGV, $/) = $file; <> };
(see this post for some explanation and references)
Some pretty weird stuff in your code, to be honest:
Reading from and writing to a file at the same time is hard. Your code, for example, would just write all over the existing data in the file
Using a bareword filehandle (CONFIG) instead of a lexical variable and two-arg open() instead of the three-arg version (open my $config_fh, '+<', $config') makes me think you're using some pretty old Perl tutorials
Using chop() instead of chomp() makes me think you're using some ancient Perl tutorials
You seem to have an extra $ in your regex - ^$apple$ should probably be ^apple$
Also, Tie::File has been included with Perl's standard library for over twenty years and would make this task far easier.
#!/usr/bin/perl
use strict;
use warnings;
use Tie::File;
tie my #file, 'Tie::File', 'filename.txt' or die $!;
for (0 .. $#file) {
if ($file[$_] eq 'apple') {
splice #file, $_ + 1, 0, "\ttesting\n";
}
}
It's not entirely clear what you mean by "tab spacing", but you might be looking for:
perl -pE 'm/^(\t*)/; say "${1}testing" if $a; $a = /apple/' filename.txt
I suspect you actually want \s instead of \t, but YMMV. Basically, on each line of input, you match the leading whitespace and then print a line with that whitespace and the string 'testing' if the previous line matched.
To write it verbosely:
#!/usr/bin/env perl
use 5.12.0;
use strict;
use warnings;
my $n = 'filename.txt';
open my $f, '<', $n, or die "$n: $!\n";
while(<$f>){
m/^(\t*)/; # possibly \s is preferred over \t
say "${1}testing" if $a;
$a = /apple/;
print;
}
Inspired by the following perl one-liner perl -0777 -ne 'while(m/YOUR_REGEX_HERE/g){print "$&\n";}' YOUR_FILE_HERE, I am searching for the simplest way to read an entire file into a string (a procedure also known as “slurping”) in a complete perl file (not just a perl command), as I need to do simple operations using two files. I came around this: https://www.perl.com/article/21/2013/4/21/Read-an-entire-file-into-a-string/. (I am looking at Slurping files without modules.)
It suggests using
open my $fh, '<', 'text_document.txt' or die "Can't open file $!";
read $fh, my $file_content, -s $fh;
But how do I close the file after processing it? (1)
Also, how would that one-liner translate to a perl file, not just a perl command? (2)
Thank you!
You can change the line ending special variable to be nothing within a block to keep it local. It'll be restored to the local platform's record separator after the block ends:
use warnings;
use strict;
my $contents;
{
local $/;
open my $fh, '<', 'file.txt' or die $!;
$contents = <$fh>;
close $fh;
}
print $contents;
Note that the bare block in the above code could be substituted with a subroutine instead.
If you want it compact, on one line:
my $file_content = do { local (#ARGV, $/) = $file_name; <> };
The <> "null" filehandle opens and reads all files with names in #ARGV special variable. We add $file_name to #ARGV but only after this global variable is local-ized, so that whatever may have been in the array is saved away and then restored once control leaves this block. Thus <> reads the whole file, and since it is the last command in the do block what it read is returned.
In order to read the whole file into a string ("slurp" it) we unset the $/ special variable ("input record separator"), also after it is local-ized of course.
my below code is very simple... all im doing is grepping a file using an IP address REGEX, and any IP addresses that are found that match my REGEX, I want to store them in #array2.
i know my REGEX works, because ive tested it on a linux command line and it works exactly how I want, but when its integrated into the script, it just returns blank. there should be 700 IP's stored in the array.
#!/usr/bin/perl
use warnings;
use strict;
my #array2 = `grep -Eo "\"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\"" test1.txt`;
print #array2;
Backticks `` behave like a double quoted string by default.
Therefore you need to escape your backslashes:
my #array2 = `grep -Eo "\\"\\b[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\"" test1.txt`;
Alternatively, you can use a single quoted version of qx to avoid any interpolation:
my #array2 = qx'grep -Eo "\"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\"" test1.txt';
However, the method I'd recommend is to not shell out at all, but instead do this logic in perl:
my #array2 = do {
open my $fh, '<', 'test1.txt' or die "Can't open file: $!";
grep /\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\b/, <$fh>;
};
I really wouldn't mix bash and perl. It's just asking for pain. Perl can do it all natively.
Something like:
open (my $input_fh, "<", "test.txt" ) or die $!;
my #results = grep ( /\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/, <$input_fh> );
This does however, require slurping the file into memory, which isn't optimal - I'd generally use a while loop, instead.
The text inside the backticks undergoes double-quotish substitution. You will need to double your backslashes.
Running grep from inside Perl is dubious, anyway; just slurp in the text file and use Perl to find the matches.
The easiest way to retrieve the output from an external command is to use open():
open(FH, 'grep -Eo \"\b[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\" test1.txt'."|")
my #array2=<FH>;
close (FH);
..though I think Sobrique's idea is the best answer here.
I'm trying to get back into Perl and having a bugger of time with my code. I have a large source .DAT file (2GB). I have another .TXT file that contains the strings (almost 2000 of them) I am wanting to search for in that .DAT file. I throw the values from that .TXT file into an array.
I want to efficiently do a search for each of those strings in the array, and output the matches. Can anyone help straighten me out? Thanks in advance!
my $source = "/KEYS.txt";
my $data= "/claims.dat";
my #array;
my $arraySize = scalar (#DESYarray);
open (DAT, $data) or die "Cannot open file!";
open (LOG, ">>/output.log");
open (TXT,$source);
while (my $searchValues = <TXT>) {
push (#array, $searchValues);
}
close (TXT);
while (my $line = <DAT>) {
for (my $x = 0; $x <= $arraySize; $x++) {
if (my $line =~ /$array[$x]/) {
print LOG $line;
}
}
}
close (DAT);
close (LOG);
You are re-declaring my $line in your inner loop, which means it will be equal to:
if (undef =~ /$array[$x]/) {
Which of course will always fail. If you had used use warnings, you would have gotten the error:
Use of uninitialized value in pattern match (m//) at ...
Which makes me suspect you are not using warnings, which is a very bad idea.
Also, keep in mind that when you read values into #array, you will get a newline at the end, so you are searching your DAT file for strings that end with \n, which may not be what you want. E.g. if you have foo\n, it will not match foo bar baz.
The solution to that is to chomp your data:
chomp(my #array = <TXT>);
Yes, you can chomp an array, and you can assign an entire file to an array this way.
You can and should improve your script a little. It is quite unnecessary to loop using array indexes, unless you in fact need to use the indexes for something.
use strict;
use warnings; # ALWAYS use these!
use autodie; # handles the open statements for convenience
my $source = "/KEYS.txt";
my $data= "/claims.dat";
open $txt, '<', $source;
chomp(my #array = <$txt>);
close $txt;
open my $dat, '<', $data; # use three argument open and lexical file handle
open my $log, '>>', "/output.log";
while (<$dat>) { # using $_ for convenience
for my $word (#array) {
if (/\Q$word/i) { # adding /i modifier to match case insensitively
print $log $line; # also adding \Q to match literal strings
}
}
Using \Q might be very important, depending on what your KEYS.txt file contains. Meta characters for regexes may cause subtle mismatches, if you are expecting them to match literally. E.g. if you have a word such as foo?, the regex /foo?/ will match foo, but it will also match for.
Also, you may wish to decide whether to allow partial matches. E.g. /foo/ will also match football. To overcome that, one way is to use the word boundary escape character:
/\b\Q$word\E\b/i
You will need to place them outside the \Q .. \E sequence, or they will be interpreted literally.
ETA: As tchrist points out and Borodin suggests, building a regex with all the words will save you getting duplicate lines. E.g. if you have the words "foo", "bar" and "baz", and the line foo bar baz you would get this line printed three times, once for each matching word.
This may be fixed afterwards, by deduping your data in some suitable way. Only you know your data and whether this is a problem or not. I would hesitate to compile such a long regex, for performance reasons, but you can try it and see if it works for you.
You should always start your program with use strict and use warnings, especially if you are asking for help with your code. They are an enormous help with debugging and will often find simple mistakes that are otherwise easily overlooked.
How long are the strings in KEYS.txt? It may be feasible to build a regex from them using join '|', #array. By the way, the code you have written is equivalent to #array = <TXT>, and don't forget to chomp the contents!
I suggest something like this
use strict;
use warnings;
my $source = "/KEYS.txt";
my $data= "/claims.dat";
open my $dat, '<', $data or die "Cannot open data file: $!";
open my $log, '>>', '/output.log' or die "Cannot open output file: $!";
open my $txt, '<', $source or die "Cannot open keys file: $!";
my #keys = <$txt>;
chomp #keys;
close $txt;
my $regex = join '|', map quotemeta, #keys;
$regex = qr/$regex/i;
while (my $line = <$dat>) {
next unless $line =~ $regex;
print $log $line;
}
close $log or die "Unable to close log file: $!";
I've used Regexp::Assemble in the past to take a list of tokens, create a optimized regexp and use it filter large amounts of text. Once we moved from a | delimited regexp to Regexp::Assemble we saw a great performance boost.
Regexp::Assemble
I want to read in mulitple files from a directory and store each value in a unique variable so I can print it out later with a descriptive header. The file names have a common prefix, but are unique.
I know how to open up one file, but is there an efficient way to open up many files? or do i have open unique file handles for each one? thanks.
Filenames have a common prefix like (abc_*):
abc_foo_dir
abc_bar.dat1.20101208
abc_bar.dat2.20101209
Example opening up first file:
open FILE, "< /home/test/data/abc_foo_dir";
while (<FILE>) {
my $line = $_;
chomp($line);
print "$line\n";
}
close FILE;
You say "store each value in a unique variable", but this is actually a task for a hash table.
my %file_contents;
foreach my $filename (qw(...filenames here... or use a glob to fetch them))
{
open my $fh, '<', $filename or die "Cannot open $filename: $!";
local $/; # enable slurp mode
# read in the entire contents of the file and store in the hash
$file_contents{$filename} = <$fh>;
# this would close automatically when going out of scope,
# but it's nice to be explicit
close $fh;
}
You can iterate through all the keys later with keys %file_contents, but if you aren't familiar with how to work with hashes, I urge you to read perldoc perldata and perldoc perlsyn.
There's the ARGV / #ARGV hack. ARGV is a special filehandle that will read from all of the files named in the (also special) variable #ARGV. Normally #ARGV holds the command-line arguments that your Perl program was invoked with, but you are allowed to change it.
{
#local #ARGV = ('abc_foo_dir', 'abc_bar.dat1.20101208', ...);
local #ARGV = glob("$dir/abc_*");
while (<ARGV>) {
print "Read a line from $ARGV: $_"; # $ARGV holds name of current file
}
}
Am not a Perl guru but I will try something simple,
Why dont you temporarily concatenate all files into a single file and then read it.
AFAIK, there is no way to read all file from a single handle.